spark git commit: [SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync

2016-01-06 Thread sarutak
Repository: spark Updated Branches: refs/heads/master b2467b381 -> 5d871ea43 [SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync I have closed pull request https://github.com/apache/spark/pull/10487. And I create this pull request to

spark git commit: [SPARK-12578][SQL] Distinct should not be silently ignored when used in an aggregate function with OVER clause

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master d1fea4136 -> b2467b381 [SPARK-12578][SQL] Distinct should not be silently ignored when used in an aggregate function with OVER clause JIRA: https://issues.apache.org/jira/browse/SPARK-12578 Slightly update to Hive parser. We should keep

spark git commit: [SPARK-11878][SQL] Eliminate distribute by in case group by is present with exactly the same grouping expressi

2016-01-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 94c202c7d -> 9061e777f [SPARK-11878][SQL] Eliminate distribute by in case group by is present with exactly the same grouping expressi For queries like : select <> from table group by a distribute by a we can eliminate distribute by ;

spark git commit: [SPARK-7675][ML][PYSPARK] sparkml params type conversion

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 9061e777f -> 3b29004d2 [SPARK-7675][ML][PYSPARK] sparkml params type conversion >From JIRA: Currently, PySpark wrappers for spark.ml Scala classes are brittle when accepting Param types. E.g., Normalizer's "p" param cannot be set to "2"

spark git commit: [SPARK-11531][ML] SparseVector error Msg

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 3b29004d2 -> 007da1a9d [SPARK-11531][ML] SparseVector error Msg PySpark SparseVector should have "Found duplicate indices" error message Author: Joshi Author: Rekha Joshi Closes #9525 from

spark git commit: [SPARK-11945][ML][PYSPARK] Add computeCost to KMeansModel for PySpark spark.ml

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 007da1a9d -> 95eb65163 [SPARK-11945][ML][PYSPARK] Add computeCost to KMeansModel for PySpark spark.ml Add ```computeCost``` to ```KMeansModel``` as evaluator for PySpark spark.ml. Author: Yanbo Liang Closes #9931

spark git commit: [SPARK-11815][ML][PYSPARK] PySpark DecisionTreeClassifier & DecisionTreeRegressor should support setSeed

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 95eb65163 -> 3aa348822 [SPARK-11815][ML][PYSPARK] PySpark DecisionTreeClassifier & DecisionTreeRegressor should support setSeed PySpark ```DecisionTreeClassifier``` & ```DecisionTreeRegressor``` should support ```setSeed``` like what we

[4/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/hive/src/main/antlr3/org/apache/spark/sql/parser/SparkSqlParser.g -- diff --git a/sql/hive/src/main/antlr3/org/apache/spark/sql/parser/SparkSqlParser.g

[8/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
[SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made: The ANTLR Parser & Supporting classes have been

[1/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3aa348822 -> ea489f14f http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/hive/src/test/scala/org/apache/spark/sql/hive/ErrorPositionSuite.scala -- diff --git

[7/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g -- diff --git

[5/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/hive/src/main/antlr3/org/apache/spark/sql/parser/IdentifiersParser.g -- diff --git a/sql/hive/src/main/antlr3/org/apache/spark/sql/parser/IdentifiersParser.g

[3/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/hive/src/main/java/org/apache/spark/sql/parser/ParseDriver.java -- diff --git a/sql/hive/src/main/java/org/apache/spark/sql/parser/ParseDriver.java

[6/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystQl.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystQl.scala

[2/8] spark git commit: [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst

2016-01-06 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ea489f14/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

spark git commit: [SPARK-12539][SQL] support writing bucketed table

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6f7ba6409 -> 917d3fc06 [SPARK-12539][SQL] support writing bucketed table This PR adds bucket write support to Spark SQL. User can specify bucketing columns, numBuckets and sorting columns with or without partition columns. For example:

spark git commit: [SPARK-12681] [SQL] split IdentifiersParser.g into two files

2016-01-06 Thread davies
Repository: spark Updated Branches: refs/heads/master cbaea9591 -> 6f7ba6409 [SPARK-12681] [SQL] split IdentifiersParser.g into two files To avoid to have a huge Java source (over 64K loc), that can't be compiled. cc hvanhovell Author: Davies Liu Closes #10624 from

spark git commit: [SPARK-12604][CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 917d3fc06 -> ac56cf605 [SPARK-12604][CORE] Java count(AprroxDistinct)ByKey methods return Scala Long not Java Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for

spark git commit: [SPARK-12663][MLLIB] More informative error message in MLUtils.loadLibSVMFile

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master a74d743cc -> 6b6d02be0 [SPARK-12663][MLLIB] More informative error message in MLUtils.loadLibSVMFile This PR contains 1 commit which resolves [SPARK-12663](https://issues.apache.org/jira/browse/SPARK-12663). For the record, I got a

spark git commit: [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6b6d02be0 -> 8e19c7663 [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0 This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger

spark git commit: [SPARK-12673][UI] Add missing uri prepending for job description

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 11b901b22 -> 94af69c9b [SPARK-12673][UI] Add missing uri prepending for job description Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26

spark git commit: [SPARK-12673][UI] Add missing uri prepending for job description

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.5 5e86c0cce -> f2bc02ec4 [SPARK-12673][UI] Add missing uri prepending for job description Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26

spark git commit: [SPARK-12673][UI] Add missing uri prepending for job description

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 8e19c7663 -> 174e72cec [SPARK-12673][UI] Add missing uri prepending for job description Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26

spark git commit: [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks.

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master ac56cf605 -> a74d743cc [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks. [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks. We've run benchmarks ad hoc to

spark git commit: [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.5 d10b9d572 -> 5e86c0cce [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a

spark git commit: [SPARK-12678][CORE] MapPartitionsRDD clearDependencies

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 174e72cec -> b67385203 [SPARK-12678][CORE] MapPartitionsRDD clearDependencies MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin

spark git commit: [SPARK-12678][CORE] MapPartitionsRDD clearDependencies

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 94af69c9b -> d061b8522 [SPARK-12678][CORE] MapPartitionsRDD clearDependencies MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin

spark git commit: Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None"

2016-01-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 bc397753c -> d4914647a Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None" This reverts commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04. Author: Yin Huai Closes #10632 from

spark git commit: [DOC] fix 'spark.memory.offHeap.enabled' default value to false

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 34effc46c -> 47a58c799 [DOC] fix 'spark.memory.offHeap.enabled' default value to false modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp Closes #10633 from

spark git commit: [DOC] fix 'spark.memory.offHeap.enabled' default value to false

2016-01-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master e5cde7ab1 -> 84e77a15d [DOC] fix 'spark.memory.offHeap.enabled' default value to false modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp Closes #10633 from

spark git commit: [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.5 7a49b6048 -> 5d2d2dd91 [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting

spark git commit: [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.6 c3135d021 -> 175681914 [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting

spark git commit: [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master ea489f14f -> fcd013cf7 [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting

spark git commit: [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None

2016-01-06 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 28adc45d5 -> bc397753c [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting

spark git commit: [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.5 5d2d2dd91 -> 598a5c2cc [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu

spark git commit: [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f82ebb152 -> 1e6648d62 [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu

spark git commit: [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 175681914 -> d821fae0e [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu

spark git commit: Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url."

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 8f0ead3e7 -> 39b0a3480 Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url." This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge #10618

spark git commit: Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url."

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 19e4e9feb -> cbaea9591 Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url." This reverts commit 19e4e9febf9bb4fd69f6d7bc13a54844e4e096f1. Will merge #10618 instead.

spark git commit: Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url."

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.5 fb421af08 -> d10b9d572 Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url." This reverts commit fb421af08de73e4ae6b04a576721109cae561865. Will merge #10618

spark git commit: [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url.

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 d821fae0e -> 8f0ead3e7 [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url. Author: huangzhaowei Closes #10617 from SaintBacchus/SPARK-12672.

spark git commit: [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url.

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.5 598a5c2cc -> fb421af08 [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url. Author: huangzhaowei Closes #10617 from SaintBacchus/SPARK-12672.

spark git commit: [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url.

2016-01-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1e6648d62 -> 19e4e9feb [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url. Author: huangzhaowei Closes #10617 from SaintBacchus/SPARK-12672.