spark git commit: [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups

2015-09-15 Thread meng
Repository: spark Updated Branches: refs/heads/master 64c29afcb -> b921fe4dc [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics,

spark git commit: [SPARK-7685] [ML] Apply weights to different samples in Logistic Regression

2015-09-15 Thread meng
Repository: spark Updated Branches: refs/heads/master 31a229aa7 -> be52faa7c [SPARK-7685] [ML] Apply weights to different samples in Logistic Regression In fraud detection dataset, almost all the samples are negative while only couple of them are positive. This type of high imbalanced data

spark git commit: [SPARK-10548] [SPARK-10563] [SQL] Fix concurrent SQL executions

2015-09-15 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master be52faa7c -> b6e998634 [SPARK-10548] [SPARK-10563] [SQL] Fix concurrent SQL executions *Note: this is for master branch only.* The fix for branch-1.5 is at #8721. The query execution ID is currently passed from a thread to its children,

spark git commit: [SPARK-10548] [SPARK-10563] [SQL] Fix concurrent SQL executions / branch-1.5

2015-09-15 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 7286c2ba6 -> 997be78c3 [SPARK-10548] [SPARK-10563] [SQL] Fix concurrent SQL executions / branch-1.5 *Note: this is for branch-1.5 only* This is the same as #8710 but affects only SQL. The more general fix for SPARK-10563 is

spark git commit: [SPARK-10613] [SPARK-10624] [SQL] Reduce LocalNode tests dependency on SQLContext

2015-09-15 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 38700ea40 -> 35a19f335 [SPARK-10613] [SPARK-10624] [SQL] Reduce LocalNode tests dependency on SQLContext Instead of relying on `DataFrames` to verify our answers, we can just use simple arrays. This significantly simplifies the test

spark git commit: [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator

2015-09-15 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.5 997be78c3 -> 2bbcbc659 [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator When speculative execution is enabled, consider a scenario where the authorized committer of a particular output partition

spark git commit: [SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table.

2015-09-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 35a19f335 -> 64c29afcb [SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database

spark git commit: [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator

2015-09-15 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 99ecfa594 -> 38700ea40 [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator When speculative execution is enabled, consider a scenario where the authorized committer of a particular output partition fails

spark git commit: [SPARK-10612] [SQL] Add prepare to LocalNode.

2015-09-15 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master b6e998634 -> a63cdc769 [SPARK-10612] [SQL] Add prepare to LocalNode. The idea is that we should separate the function call that does memory reservation (i.e. prepare) from the function call that consumes the input (e.g. open()), so all

spark git commit: [SPARK-10575] [SPARK CORE] Wrapped RDD.takeSample with Scope

2015-09-15 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master a63cdc769 -> 99ecfa594 [SPARK-10575] [SPARK CORE] Wrapped RDD.takeSample with Scope Remove return statements in RDD.takeSample and wrap it withScope Author: vinodkc Author: vinodkc

spark git commit: Update version to 1.6.0-SNAPSHOT.

2015-09-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6503c4b5f -> 09b7e7c19 Update version to 1.6.0-SNAPSHOT. Author: Reynold Xin Closes #8350 from rxin/1.6. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Small fixes to docs

2015-09-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master a2249359d -> 833be7331 Small fixes to docs Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski

spark git commit: Small fixes to docs

2015-09-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.5 d5c0361e7 -> 7286c2ba6 Small fixes to docs Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski

spark git commit: [SPARK-10598] [DOCS]

2015-09-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 833be7331 -> 6503c4b5f [SPARK-10598] [DOCS] Comments preceding toMessage method state: "The edge partition is encoded in the lower * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to

spark git commit: [SPARK-10491] [MLLIB] move RowMatrix.dspr to BLAS

2015-09-15 Thread meng
Repository: spark Updated Branches: refs/heads/master 09b7e7c19 -> c35fdcb7e [SPARK-10491] [MLLIB] move RowMatrix.dspr to BLAS jira: https://issues.apache.org/jira/browse/SPARK-10491 We implemented dspr with sparse vector support in `RowMatrix`. This method is also used in

spark git commit: [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.

2015-09-15 Thread vanzin
Repository: spark Updated Branches: refs/heads/master c35fdcb7e -> 8abef21da [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. This change does two things: - tag a few tests and adds the mechanism in the build to be able to disable those tags, both in maven and sbt,

spark git commit: [PYSPARK] [MLLIB] [DOCS] Replaced addversion with versionadded in mllib.random

2015-09-15 Thread meng
Repository: spark Updated Branches: refs/heads/master 8abef21da -> 7ca30b505 [PYSPARK] [MLLIB] [DOCS] Replaced addversion with versionadded in mllib.random Missed this when reviewing `pyspark.mllib.random` for SPARK-10275. Author: noelsmith Closes #8773 from

spark git commit: Closes #8738 Closes #8767 Closes #2491 Closes #6795 Closes #2096 Closes #7722

2015-09-15 Thread meng
Repository: spark Updated Branches: refs/heads/master 7ca30b505 -> 0d9ab0167 Closes #8738 Closes #8767 Closes #2491 Closes #6795 Closes #2096 Closes #7722 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d9ab016 Tree:

spark git commit: [DOCS] Small fixes to Spark on Yarn doc

2015-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master 0d9ab0167 -> 416003b26 [DOCS] Small fixes to Spark on Yarn doc * a follow-up to 16b6d18613e150c7038c613992d80a7828413e66 as `--num-executors` flag is not suppported. * links + formatting Author: Jacek Laskowski

spark git commit: Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py."

2015-09-15 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 416003b26 -> b42059d2e Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py." This reverts commit 8abef21dac1a6538c4e4e0140323b83d804d602b. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10475] [SQL] improve column prunning for Project on Sort

2015-09-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 841972e22 -> 31a229aa7 [SPARK-10475] [SQL] improve column prunning for Project on Sort Sometimes we can't push down the whole `Project` though `Sort`, but we still have a chance to push down part of it. Author: Wenchen Fan

spark git commit: [SPARK-10437] [SQL] Support aggregation expressions in Order By

2015-09-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master b42059d2e -> 841972e22 [SPARK-10437] [SQL] Support aggregation expressions in Order By JIRA: https://issues.apache.org/jira/browse/SPARK-10437 If an expression in `SortOrder` is a resolved one, such as `count(1)`, the corresponding rule