spark git commit: [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 b5d65b45d -> 90e046024 [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes" ## Proposed Changes * Update the list of "important classes" in `pyspark.sql` to match 2.0. * Fix references to

spark git commit: [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 14dba4520 -> 2dd038861 [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes" ## Proposed Changes * Update the list of "important classes" in `pyspark.sql` to match 2.0. * Fix references to `UDF

spark git commit: [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ…

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 55d6dad6f -> 14dba4520 [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ… ## What changes were proposed in this pull request? Mask `spark.authenticate.secret` on Spark environment page (Web UI). This is addition to ht

spark git commit: [SPARK-16847][SQL] Prevent to potentially read corrupt statstics on binary in Parquet vectorized reader

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master e679bc3c1 -> 55d6dad6f [SPARK-16847][SQL] Prevent to potentially read corrupt statstics on binary in Parquet vectorized reader ## What changes were proposed in this pull request? This problem was found in [PARQUET-251](https://issues.apa

spark git commit: [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values

2016-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 d99d90982 -> b5d65b45d [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values ## What changes were proposed in this pull request? When we create the HiveConf for metastore client, we use a Hadoop Conf a

spark git commit: [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values

2016-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6cbde337a -> e679bc3c1 [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values ## What changes were proposed in this pull request? When we create the HiveConf for metastore client, we use a Hadoop Conf as th

spark git commit: [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests.

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 7fbac48f0 -> d99d90982 [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests. ## What changes were proposed in this pull request? This is follow-up for #14378. When we add ```transformS

spark git commit: [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests.

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1f96c97f2 -> 6cbde337a [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests. ## What changes were proposed in this pull request? This is follow-up for #14378. When we add ```transformSchem

spark git commit: [SPARK-13238][CORE] Add ganglia dmax parameter

2016-08-05 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 180fd3e0a -> 1f96c97f2 [SPARK-13238][CORE] Add ganglia dmax parameter The current ganglia reporter doesn't set metric expiration time (dmax). The metrics of all finished applications are indefinitely left displayed in ganglia web. The dma

[2/5] spark git commit: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

2016-08-05 Thread srowen
http://git-wip-us.apache.org/repos/asf/spark/blob/180fd3e0/examples/src/main/java/org/apache/spark/examples/JavaPageRank.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/JavaPageRank.java b/examples/src/main/j

[3/5] spark git commit: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

2016-08-05 Thread srowen
http://git-wip-us.apache.org/repos/asf/spark/blob/180fd3e0/data/mllib/sample_tree_data.csv -- diff --git a/data/mllib/sample_tree_data.csv b/data/mllib/sample_tree_data.csv deleted file mode 100644 index bc97e29..000 --- a/data

[5/5] spark git commit: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

2016-08-05 Thread srowen
[SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs ## What changes were proposed in this pull request? Improve example outputs to better reflect the functionality that is being presented. This mostly consisted of modifying what was printed at the end of the example, such as calling show() w

[1/5] spark git commit: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 2460f03ff -> 180fd3e0a http://git-wip-us.apache.org/repos/asf/spark/blob/180fd3e0/examples/src/main/scala/org/apache/spark/examples/ml/NormalizerExample.scala -- diff --git

[4/5] spark git commit: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

2016-08-05 Thread srowen
http://git-wip-us.apache.org/repos/asf/spark/blob/180fd3e0/data/mllib/lr_data.txt -- diff --git a/data/mllib/lr_data.txt b/data/mllib/lr_data.txt deleted file mode 100644 index d4df063..000 --- a/data/mllib/lr_data.txt +++ /dev

spark git commit: [SPARK-16826][SQL] Switch to java.net.URI for parse_url()

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 39a2b2ea7 -> 2460f03ff [SPARK-16826][SQL] Switch to java.net.URI for parse_url() ## What changes were proposed in this pull request? The java.net.URL class has a globally synchronized Hashtable, which limits the throughput of any single ex

spark git commit: [SPARK-16625][SQL] General data types to be mapped to Oracle

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master e02606414 -> 39a2b2ea7 [SPARK-16625][SQL] General data types to be mapped to Oracle ## What changes were proposed in this pull request? Spark will convert **BooleanType** to **BIT(1)**, **LongType** to **BIGINT**, **ByteType** to **BYTE*

spark git commit: [MINOR] Update AccumulatorV2 doc to not mention "+=".

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 b4a89c1c1 -> 7fbac48f0 [MINOR] Update AccumulatorV2 doc to not mention "+=". ## What changes were proposed in this pull request? As reported by Bryan Cutler on the mailing list, AccumulatorV2 does not have a += method, yet the document

spark git commit: [MINOR] Update AccumulatorV2 doc to not mention "+=".

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master c9f2501af -> e02606414 [MINOR] Update AccumulatorV2 doc to not mention "+=". ## What changes were proposed in this pull request? As reported by Bryan Cutler on the mailing list, AccumulatorV2 does not have a += method, yet the documentatio

spark git commit: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/master 5effc016c -> c9f2501af [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration ## What changes were proposed in this pull request? Doc for the Kafka 0.10 integration ## How was this patch tested? Scala code examples were taken

spark git commit: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration

2016-08-05 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 dae08fb5a -> b4a89c1c1 [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration ## What changes were proposed in this pull request? Doc for the Kafka 0.10 integration ## How was this patch tested? Scala code examples were ta

spark git commit: [SPARK-16879][SQL] unify logical plans for CREATE TABLE and CTAS

2016-08-05 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master faaefab26 -> 5effc016c [SPARK-16879][SQL] unify logical plans for CREATE TABLE and CTAS ## What changes were proposed in this pull request? we have various logical plans for CREATE TABLE and CTAS: `CreateTableUsing`, `CreateTableUsingAsSe

spark git commit: [SPARK-15726][SQL] Make DatasetBenchmark fairer among Dataset, DataFrame and RDD

2016-08-05 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 1fa644497 -> faaefab26 [SPARK-15726][SQL] Make DatasetBenchmark fairer among Dataset, DataFrame and RDD ## What changes were proposed in this pull request? DatasetBenchmark compares the performances of RDD, DataFrame and Dataset while run