spark git commit: [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 e70be6987 - b0e7c6633 [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs Add version info for public Python SQL API. cc rxin Author: Davies Liu dav...@databricks.com Closes #6295 from davies/versions and squashes the

spark git commit: [SPARK-7753] [MLLIB] Update KernelDensity API

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 8ddcb25b3 - 947ea1cf5 [SPARK-7753] [MLLIB] Update KernelDensity API Update `KernelDensity` API to make it extensible to different kernels in the future. `bandwidth` is used instead of `standardDeviation`. The static `kernelDensity`

spark git commit: [SPARK-7753] [MLLIB] Update KernelDensity API

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 b0e7c6633 - 64762444e [SPARK-7753] [MLLIB] Update KernelDensity API Update `KernelDensity` API to make it extensible to different kernels in the future. `bandwidth` is used instead of `standardDeviation`. The static `kernelDensity`

spark git commit: [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 04940c497 - 8ddcb25b3 [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs Add version info for public Python SQL API. cc rxin Author: Davies Liu dav...@databricks.com Closes #6295 from davies/versions and squashes the

spark git commit: [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 947ea1cf5 - 1ee8eb431 [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming Assertions can be turned off. `require` throws an `IllegalArgumentException` which makes more sense when it's a user set variable.

spark git commit: [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 64762444e - f08c6f319 [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming Assertions can be turned off. `require` throws an `IllegalArgumentException` which makes more sense when it's a user set variable.

spark git commit: [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative

2015-05-21 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.4 21b150569 - 0df461e08 [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative Document current limitation of rdd.fold. This does not resolve SPARK-6416 but just documents the issue. CC JoshRosen Author: Sean Owen

spark git commit: [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative

2015-05-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4b7ff3092 - 6e5340269 [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative Document current limitation of rdd.fold. This does not resolve SPARK-6416 but just documents the issue. CC JoshRosen Author: Sean Owen

spark git commit: [SPARK-7394][SQL] Add Pandas style cast (astype)

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6e5340269 - 699906e53 [SPARK-7394][SQL] Add Pandas style cast (astype) Author: kaka1992 kaka_1...@163.com Closes #6313 from kaka1992/astype and squashes the following commits: 73dfd0b [kaka1992] [SPARK-7394] Add Pandas style cast

spark git commit: [SPARK-7787] [STREAMING] Fix serialization issue of SerializableAWSCredentials

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 70d9839cf - 21b150569 [SPARK-7787] [STREAMING] Fix serialization issue of SerializableAWSCredentials Lack of default constructor causes deserialization to fail. This occurs only when the AWS credentials are explicitly specified

spark git commit: [SPARK-7787] [STREAMING] Fix serialization issue of SerializableAWSCredentials

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8730fbb47 - 4b7ff3092 [SPARK-7787] [STREAMING] Fix serialization issue of SerializableAWSCredentials Lack of default constructor causes deserialization to fail. This occurs only when the AWS credentials are explicitly specified through

spark git commit: [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 13348e21b - 8730fbb47 [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables When no partition columns can be found, we should have an empty `PartitionSpec`, rather than a `PartitionSpec` with empty partition columns.

spark git commit: [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 b97a8053a - 70d9839cf [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables When no partition columns can be found, we should have an empty `PartitionSpec`, rather than a `PartitionSpec` with empty partition columns.

spark git commit: [SPARK-7394][SQL] Add Pandas style cast (astype)

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 0df461e08 - fec3041a6 [SPARK-7394][SQL] Add Pandas style cast (astype) Author: kaka1992 kaka_1...@163.com Closes #6313 from kaka1992/astype and squashes the following commits: 73dfd0b [kaka1992] [SPARK-7394] Add Pandas style cast

spark git commit: [SPARK-7478] [SQL] Added SQLContext.getOrCreate

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 30f3f556f - 3d085 [SPARK-7478] [SQL] Added SQLContext.getOrCreate Having a SQLContext singleton would make it easier for applications to use a lazily instantiated single shared instance of SQLContext when needed. It would avoid

spark git commit: [SPARK-7585] [ML] [DOC] VectorIndexer user guide section

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 e79ecc7dc - e29b811ed [SPARK-7585] [ML] [DOC] VectorIndexer user guide section Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it. CC: mengxr Author: Joseph K. Bradley

spark git commit: [SPARK-7722] [STREAMING] Added Kinesis to style checker

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 7e0912b1d - 33e0e [SPARK-7722] [STREAMING] Added Kinesis to style checker Author: Tathagata Das tathagata.das1...@gmail.com Closes #6325 from tdas/SPARK-7722 and squashes the following commits: 9ab35b2 [Tathagata Das] Fixed

spark git commit: [SPARK-7722] [STREAMING] Added Kinesis to style checker

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master cdc7c055c - 311fab6f1 [SPARK-7722] [STREAMING] Added Kinesis to style checker Author: Tathagata Das tathagata.das1...@gmail.com Closes #6325 from tdas/SPARK-7722 and squashes the following commits: 9ab35b2 [Tathagata Das] Fixed styles in

spark git commit: [SPARK-7498] [MLLIB] add varargs back to setDefault

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 e29b811ed - 7e0912b1d [SPARK-7498] [MLLIB] add varargs back to setDefault We removed `varargs` due to Java compilation issues. That was a false alarm because I didn't run `build/sbt clean`. So this PR reverts the changes. jkbradley

spark git commit: [SPARK-7498] [MLLIB] add varargs back to setDefault

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 6d75ed7e5 - cdc7c055c [SPARK-7498] [MLLIB] add varargs back to setDefault We removed `varargs` due to Java compilation issues. That was a false alarm because I didn't run `build/sbt clean`. So this PR reverts the changes. jkbradley

spark git commit: [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 33e0e - 96c82515b [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore Author: Yin Huai yh...@databricks.com Author: Cheng Lian l...@databricks.com Closes #6285 from liancheng/spark-7763 and squashes the

spark git commit: [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 311fab6f1 - 30f3f556f [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore Author: Yin Huai yh...@databricks.com Author: Cheng Lian l...@databricks.com Closes #6285 from liancheng/spark-7763 and squashes the following

spark git commit: [SPARK-7711] Add a startTime property to match the corresponding one in Scala

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 e597692ac - c9a80fc40 [SPARK-7711] Add a startTime property to match the corresponding one in Scala Author: Holden Karau hol...@pigscanfly.ca Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the

spark git commit: [SPARK-7478] [SQL] Added SQLContext.getOrCreate

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 96c82515b - e597692ac [SPARK-7478] [SQL] Added SQLContext.getOrCreate Having a SQLContext singleton would make it easier for applications to use a lazily instantiated single shared instance of SQLContext when needed. It would avoid

spark git commit: [SPARK-7711] Add a startTime property to match the corresponding one in Scala

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3d085 - 6b18cdc1b [SPARK-7711] Add a startTime property to match the corresponding one in Scala Author: Holden Karau hol...@pigscanfly.ca Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the

spark git commit: [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 c9a80fc40 - ba04b5236 [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures in

spark git commit: [SPARK-7585] [ML] [DOC] VectorIndexer user guide section

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 15680aeed - 6d75ed7e5 [SPARK-7585] [ML] [DOC] VectorIndexer user guide section Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it. CC: mengxr Author: Joseph K. Bradley

spark git commit: [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6b18cdc1b - 5287eec5a [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures in Spark we

spark git commit: [BUILD] Always run SQL tests in master build.

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 e4489c36d - 2be72c99a [BUILD] Always run SQL tests in master build. Seems our master build does not run HiveCompatibilitySuite (because _RUN_SQL_TESTS is not set). This PR introduces a property `AMP_JENKINS_PRB` to differentiate a PR

spark git commit: [BUILD] Always run SQL tests in master build.

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 5a3c04bb9 - 147b6be3b [BUILD] Always run SQL tests in master build. Seems our master build does not run HiveCompatibilitySuite (because _RUN_SQL_TESTS is not set). This PR introduces a property `AMP_JENKINS_PRB` to differentiate a PR

spark git commit: [SPARK-7800] isDefined should not marked too early in putNewKey

2015-05-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 5287eec5a - 5a3c04bb9 [SPARK-7800] isDefined should not marked too early in putNewKey JIRA: https://issues.apache.org/jira/browse/SPARK-7800 `isDefined` is marked as true twice in `Location.putNewKey`. The first one is unnecessary and

spark git commit: [SPARK-7800] isDefined should not marked too early in putNewKey

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 ba04b5236 - e4489c36d [SPARK-7800] isDefined should not marked too early in putNewKey JIRA: https://issues.apache.org/jira/browse/SPARK-7800 `isDefined` is marked as true twice in `Location.putNewKey`. The first one is unnecessary

spark git commit: [SPARK-7737] [SQL] Use leaf dirs having data files to discover partitions.

2015-05-21 Thread lian
Repository: spark Updated Branches: refs/heads/master 147b6be3b - 347b50106 [SPARK-7737] [SQL] Use leaf dirs having data files to discover partitions. https://issues.apache.org/jira/browse/SPARK-7737 cc liancheng Author: Yin Huai yh...@databricks.com Closes #6329 from yhuai/spark-7737 and

spark git commit: [SPARK-7737] [SQL] Use leaf dirs having data files to discover partitions.

2015-05-21 Thread lian
Repository: spark Updated Branches: refs/heads/branch-1.4 2be72c99a - 11a0640db [SPARK-7737] [SQL] Use leaf dirs having data files to discover partitions. https://issues.apache.org/jira/browse/SPARK-7737 cc liancheng Author: Yin Huai yh...@databricks.com Closes #6329 from yhuai/spark-7737

spark git commit: [SPARK-7776] [STREAMING] Added shutdown hook to StreamingContext

2015-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 ba620d62f - a17a5cb30 [SPARK-7776] [STREAMING] Added shutdown hook to StreamingContext Shutdown hook to stop SparkContext was added recently. This results in ugly errors when a streaming application is terminated by ctrl-C. ```

spark git commit: [DOCS] [MLLIB] Fixing broken link in MLlib Linear Methods documentation.

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 df55a0d76 - 2cc7907d7 [DOCS] [MLLIB] Fixing broken link in MLlib Linear Methods documentation. Just a small change: fixed a broken link in the MLlib Linear Methods documentation by removing a newline character between the link title

spark git commit: [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 11a0640db - ba620d62f [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python Author: Davies Liu dav...@databricks.com Closes #6311 from davies/rollup and squashes the following commits: 0261db1 [Davies Liu] use @since

spark git commit: [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python

2015-05-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master d68ea24d6 - 17791a581 [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python Author: Davies Liu dav...@databricks.com Closes #6311 from davies/rollup and squashes the following commits: 0261db1 [Davies Liu] use @since a51ca6b

spark git commit: [SPARK-7219] [MLLIB] Output feature attributes in HashingTF

2015-05-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master f5db4b416 - 85b96372c [SPARK-7219] [MLLIB] Output feature attributes in HashingTF This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to support

spark git commit: [SPARK-7219] [MLLIB] Output feature attributes in HashingTF

2015-05-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 ef9336335 - df55a0d76 [SPARK-7219] [MLLIB] Output feature attributes in HashingTF This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to

spark git commit: [SPARK-7657] [YARN] Add driver logs links in application UI, in cluster mode.

2015-05-21 Thread irashid
Repository: spark Updated Branches: refs/heads/master 85b96372c - 956c4c910 [SPARK-7657] [YARN] Add driver logs links in application UI, in cluster mode. This PR adds the URLs to the driver logs to `SparkListenerApplicationStarted` event, which is later used by the `ExecutorsListener` to

spark git commit: [SPARK-7794] [MLLIB] update RegexTokenizer default settings

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 17791a581 - f5db4b416 [SPARK-7794] [MLLIB] update RegexTokenizer default settings The previous default is `{gaps: false, pattern: \\p{L}+|[^\\p{L}\\s]+}`. The default pattern is hard to understand. This PR changes the default to `{gaps:

spark git commit: [SPARK-7794] [MLLIB] update RegexTokenizer default settings

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 a17a5cb30 - ef9336335 [SPARK-7794] [MLLIB] update RegexTokenizer default settings The previous default is `{gaps: false, pattern: \\p{L}+|[^\\p{L}\\s]+}`. The default pattern is hard to understand. This PR changes the default to

spark git commit: [SPARK-7793] [MLLIB] Use getOrElse for getting the threshold of SVM model

2015-05-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 fec3041a6 - f6a29c72c [SPARK-7793] [MLLIB] Use getOrElse for getting the threshold of SVM model same issue and fix as in Spark-7694. Author: Shuo Xiang shuoxiang...@gmail.com Closes #6321 from coderxiang/nb and squashes the following

spark git commit: [SQL] [TEST] udf_java_method failed due to jdk version

2015-05-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4f572008f - f6c486aa4 [SQL] [TEST] udf_java_method failed due to jdk version java.lang.Math.exp(1.0) has different result between jdk versions. so do not use createQueryTest, write a separate test for it. ``` jdk version result

spark git commit: [SPARK-7775] YARN AM negative sleep exception

2015-05-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master f6c486aa4 - 15680aeed [SPARK-7775] YARN AM negative sleep exception ``` SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in

spark git commit: [SPARK-7565] [SQL] fix MapType in JsonRDD

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master feb3a9d3f - a25c1ab8f [SPARK-7565] [SQL] fix MapType in JsonRDD The key of Map in JsonRDD should be converted into UTF8String (also failed records), Thanks to yhuai viirya Closes #6084 Author: Davies Liu dav...@databricks.com Closes

spark git commit: [SPARK-7565] [SQL] fix MapType in JsonRDD

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 f0e421351 - 3aa618510 [SPARK-7565] [SQL] fix MapType in JsonRDD The key of Map in JsonRDD should be converted into UTF8String (also failed records), Thanks to yhuai viirya Closes #6084 Author: Davies Liu dav...@databricks.com

spark git commit: [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll()

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1ee8eb431 - feb3a9d3f [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll() Follow up of #6340, to avoid the test report missing once it fails. Author: Cheng Hao hao.ch...@intel.com Closes #6312 from chenghao-intel/rollup_minor

spark git commit: [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll()

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 f08c6f319 - f0e421351 [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll() Follow up of #6340, to avoid the test report missing once it fails. Author: Cheng Hao hao.ch...@intel.com Closes #6312 from

spark git commit: [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType

2015-05-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master a25c1ab8f - 13348e21b [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley

spark git commit: [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType

2015-05-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 3aa618510 - b97a8053a [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley