date:20181019

spark git commit: [MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only

2018-10-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 d6a02c568 -> 869242c6b


[MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only

## What changes were proposed in this pull request?

Since we didn't test Java 9 ~ 11 up to now in the community, fix the document 
to describe Java 8 only.

## How was this patch tested?
N/A (This is a document only change.)

Closes #22781 from dongjoon-hyun/SPARK-JDK-DOC.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fc9ba9dcc6ad47fbd05f093b94e7e1358d5f)
Signed-off-by: Dongjoon Hyun 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/869242c6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/869242c6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/869242c6

Branch: refs/heads/branch-2.4
Commit: 869242c6b8008c30b7e527760df48d7cb8df4593
Parents: d6a02c5
Author: Dongjoon Hyun 
Authored: Fri Oct 19 23:56:40 2018 -0700
Committer: Dongjoon Hyun 
Committed: Fri Oct 19 23:56:53 2018 -0700

--
 docs/building-spark.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/869242c6/docs/building-spark.md
--
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 1501f0b..7b9697c 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -12,7 +12,7 @@ redirect_from: "building-with-maven.html"
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+.
+Building Spark using Maven requires Maven 3.5.4 and Java 8.
 Note that support for Java 7 was removed as of Spark 2.2.0.
 
 ### Setting up Maven's Memory Usage


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only

2018-10-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/master ed9d0aac9 -> fc9ba9dcc


[MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only

## What changes were proposed in this pull request?

Since we didn't test Java 9 ~ 11 up to now in the community, fix the document 
to describe Java 8 only.

## How was this patch tested?
N/A (This is a document only change.)

Closes #22781 from dongjoon-hyun/SPARK-JDK-DOC.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc9ba9dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc9ba9dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc9ba9dc

Branch: refs/heads/master
Commit: fc9ba9dcc6ad47fbd05f093b94e7e1358d5f
Parents: ed9d0aa
Author: Dongjoon Hyun 
Authored: Fri Oct 19 23:56:40 2018 -0700
Committer: Dongjoon Hyun 
Committed: Fri Oct 19 23:56:40 2018 -0700

--
 docs/building-spark.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc9ba9dc/docs/building-spark.md
--
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 6bcc30d..8af90db 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -12,7 +12,7 @@ redirect_from: "building-with-maven.html"
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+.
+Building Spark using Maven requires Maven 3.5.4 and Java 8.
 Note that support for Java 7 was removed as of Spark 2.2.0.
 
 ### Setting up Maven's Memory Usage


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24499][SQL][DOC][FOLLOWUP] Fix some broken links

2018-10-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 3370865b0 -> ed9d0aac9


[SPARK-24499][SQL][DOC][FOLLOWUP] Fix some broken links

## What changes were proposed in this pull request?
Fix some broken links in the new document. I have clicked through all the 
links. Hopefully i haven't missed any :-)

## How was this patch tested?
Built using jekyll and verified the links.

Closes #22772 from dilipbiswal/doc_check.

Authored-by: Dilip Biswal 
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed9d0aac
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed9d0aac
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed9d0aac

Branch: refs/heads/master
Commit: ed9d0aac905136375444c1e00a2a9a0822b264aa
Parents: 3370865
Author: Dilip Biswal 
Authored: Fri Oct 19 23:55:19 2018 -0700
Committer: gatorsmile 
Committed: Fri Oct 19 23:55:19 2018 -0700

--
 docs/sql-data-sources.md  | 4 ++--
 docs/sql-programming-guide.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed9d0aac/docs/sql-data-sources.md
--
diff --git a/docs/sql-data-sources.md b/docs/sql-data-sources.md
index aa607ec..636636a 100644
--- a/docs/sql-data-sources.md
+++ b/docs/sql-data-sources.md
@@ -16,8 +16,8 @@ goes into specific options that are available for the 
built-in data sources.
   * [Manually Specifying 
Options](sql-data-sources-load-save-functions.html#manually-specifying-options)
   * [Run SQL on files 
directly](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
   * [Save Modes](sql-data-sources-load-save-functions.html#save-modes)
-  * [Saving to Persistent 
Tables](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
-  * [Bucketing, Sorting and 
Partitioning](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
+  * [Saving to Persistent 
Tables](sql-data-sources-load-save-functions.html#saving-to-persistent-tables)
+  * [Bucketing, Sorting and 
Partitioning](sql-data-sources-load-save-functions.html#bucketing-sorting-and-partitioning)
 * [Parquet Files](sql-data-sources-parquet.html)
   * [Loading Data 
Programmatically](sql-data-sources-parquet.html#loading-data-programmatically)
   * [Partition Discovery](sql-data-sources-parquet.html#partition-discovery)

http://git-wip-us.apache.org/repos/asf/spark/blob/ed9d0aac/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 42b00c9..eca8915 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -22,7 +22,7 @@ Spark SQL can also be used to read data from an existing Hive 
installation. For
 configure this feature, please refer to the [Hive 
Tables](sql-data-sources-hive-tables.html) section. When running
 SQL from within another programming language the results will be returned as a 
[Dataset/DataFrame](#datasets-and-dataframes).
 You can also interact with the SQL interface using the 
[command-line](sql-distributed-sql-engine.html#running-the-spark-sql-cli)
-or over 
[JDBC/ODBC](#sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server).
+or over 
[JDBC/ODBC](sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server).
 
 ## Datasets and DataFrames
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24499][SQL][DOC][FOLLOWUP] Fix some broken links

2018-10-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 e3a60b0d6 -> d6a02c568


[SPARK-24499][SQL][DOC][FOLLOWUP] Fix some broken links

## What changes were proposed in this pull request?
Fix some broken links in the new document. I have clicked through all the 
links. Hopefully i haven't missed any :-)

## How was this patch tested?
Built using jekyll and verified the links.

Closes #22772 from dilipbiswal/doc_check.

Authored-by: Dilip Biswal 
Signed-off-by: gatorsmile 
(cherry picked from commit ed9d0aac905136375444c1e00a2a9a0822b264aa)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6a02c56
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6a02c56
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6a02c56

Branch: refs/heads/branch-2.4
Commit: d6a02c5687b7fd566be493010c7efa246dbe410b
Parents: e3a60b0
Author: Dilip Biswal 
Authored: Fri Oct 19 23:55:19 2018 -0700
Committer: gatorsmile 
Committed: Fri Oct 19 23:55:34 2018 -0700

--
 docs/sql-data-sources.md  | 4 ++--
 docs/sql-programming-guide.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d6a02c56/docs/sql-data-sources.md
--
diff --git a/docs/sql-data-sources.md b/docs/sql-data-sources.md
index aa607ec..636636a 100644
--- a/docs/sql-data-sources.md
+++ b/docs/sql-data-sources.md
@@ -16,8 +16,8 @@ goes into specific options that are available for the 
built-in data sources.
   * [Manually Specifying 
Options](sql-data-sources-load-save-functions.html#manually-specifying-options)
   * [Run SQL on files 
directly](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
   * [Save Modes](sql-data-sources-load-save-functions.html#save-modes)
-  * [Saving to Persistent 
Tables](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
-  * [Bucketing, Sorting and 
Partitioning](sql-data-sources-load-save-functions.html#run-sql-on-files-directly)
+  * [Saving to Persistent 
Tables](sql-data-sources-load-save-functions.html#saving-to-persistent-tables)
+  * [Bucketing, Sorting and 
Partitioning](sql-data-sources-load-save-functions.html#bucketing-sorting-and-partitioning)
 * [Parquet Files](sql-data-sources-parquet.html)
   * [Loading Data 
Programmatically](sql-data-sources-parquet.html#loading-data-programmatically)
   * [Partition Discovery](sql-data-sources-parquet.html#partition-discovery)

http://git-wip-us.apache.org/repos/asf/spark/blob/d6a02c56/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 42b00c9..eca8915 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -22,7 +22,7 @@ Spark SQL can also be used to read data from an existing Hive 
installation. For
 configure this feature, please refer to the [Hive 
Tables](sql-data-sources-hive-tables.html) section. When running
 SQL from within another programming language the results will be returned as a 
[Dataset/DataFrame](#datasets-and-dataframes).
 You can also interact with the SQL interface using the 
[command-line](sql-distributed-sql-engine.html#running-the-spark-sql-cli)
-or over 
[JDBC/ODBC](#sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server).
+or over 
[JDBC/ODBC](sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server).
 
 ## Datasets and DataFrames
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30175 - in /dev/spark/2.4.1-SNAPSHOT-2018_10_19_22_02-e3a60b0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Sat Oct 20 05:16:47 2018
New Revision: 30175

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_10_19_22_02-e3a60b0 docs


[This commit notification would consist of 1477 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30174 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_20_02-3370865-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Sat Oct 20 03:16:56 2018
New Revision: 30174

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_20_02-3370865 docs


[This commit notification would consist of 1483 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25785][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json

2018-10-19 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/master 4acbda4a9 -> 3370865b0


[SPARK-25785][SQL] Add prettyNames for from_json, to_json, from_csv, and 
schema_of_json

## What changes were proposed in this pull request?

This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and 
`schema_of_json` so that appropriate names are used.

## How was this patch tested?

Unit tests

Closes #22773 from HyukjinKwon/minor-prettyNames.

Authored-by: hyukjinkwon 
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3370865b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3370865b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3370865b

Branch: refs/heads/master
Commit: 3370865b0ebe9b04c6671631aee5917b41ceba9c
Parents: 4acbda4
Author: hyukjinkwon 
Authored: Sat Oct 20 10:15:53 2018 +0800
Committer: hyukjinkwon 
Committed: Sat Oct 20 10:15:53 2018 +0800

--
 .../catalyst/expressions/csvExpressions.scala   |  2 +
 .../catalyst/expressions/jsonExpressions.scala  |  6 +++
 .../sql-tests/results/csv-functions.sql.out |  4 +-
 .../sql-tests/results/json-functions.sql.out| 50 ++--
 .../native/stringCastAndExpressions.sql.out |  2 +-
 5 files changed, 36 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3370865b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
index a63b624..853b1ea 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
@@ -117,4 +117,6 @@ case class CsvToStructs(
   }
 
   override def inputTypes: Seq[AbstractDataType] = StringType :: Nil
+
+  override def prettyName: String = "from_csv"
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/3370865b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index 9f28483..b4815b4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -610,6 +610,8 @@ case class JsonToStructs(
 case _: MapType => "entries"
 case _ => super.sql
   }
+
+  override def prettyName: String = "from_json"
 }
 
 /**
@@ -730,6 +732,8 @@ case class StructsToJson(
   override def nullSafeEval(value: Any): Any = converter(value)
 
   override def inputTypes: Seq[AbstractDataType] = TypeCollection(ArrayType, 
StructType) :: Nil
+
+  override def prettyName: String = "to_json"
 }
 
 /**
@@ -774,6 +778,8 @@ case class SchemaOfJson(
 
 UTF8String.fromString(dt.catalogString)
   }
+
+  override def prettyName: String = "schema_of_json"
 }
 
 object JsonExprUtils {

http://git-wip-us.apache.org/repos/asf/spark/blob/3370865b/sql/core/src/test/resources/sql-tests/results/csv-functions.sql.out
--
diff --git 
a/sql/core/src/test/resources/sql-tests/results/csv-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/csv-functions.sql.out
index 15dbe36..f19f34a 100644
--- a/sql/core/src/test/resources/sql-tests/results/csv-functions.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/csv-functions.sql.out
@@ -5,7 +5,7 @@
 -- !query 0
 select from_csv('1, 3.14', 'a INT, f FLOAT')
 -- !query 0 schema
-struct>
+struct>
 -- !query 0 output
 {"a":1,"f":3.14}
 
@@ -13,7 +13,7 @@ struct>
 -- !query 1
 select from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 
'dd/MM/'))
 -- !query 1 schema
-struct>
+struct>
 -- !query 1 output
 {"time":2015-08-26 00:00:00.0}
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3370865b/sql/core/src/test/resources/sql-tests/results/json-functions.sql.out
--
diff --git 
a/sql/core/src/test/resources/sql-tests/results/json-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/json-functions.sql.out
index 77e9000..868eee8 100644
--- a/sql/core/src/test/resources/sql-tests/results/json-functions.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/json-fun

spark git commit: Revert "[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to use ClusteringEvaluator"

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 432697c7b -> e3a60b0d6


Revert "[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to use 
ClusteringEvaluator"

This reverts commit 36307b1e4b42ce22b07e7a3fc2679c4b5e7c34c8.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3a60b0d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3a60b0d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e3a60b0d

Branch: refs/heads/branch-2.4
Commit: e3a60b0d63eec58adbb1aabb9640b049e40be3bf
Parents: 432697c
Author: Wenchen Fan 
Authored: Sat Oct 20 09:30:12 2018 +0800
Committer: Wenchen Fan 
Committed: Sat Oct 20 09:30:12 2018 +0800

--
 .../spark/examples/ml/JavaBisectingKMeansExample.java   | 12 +++-
 .../src/main/python/ml/bisecting_k_means_example.py | 12 +++-
 .../spark/examples/ml/BisectingKMeansExample.scala  | 12 +++-
 3 files changed, 9 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e3a60b0d/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
index f517dc3..8c82aaa 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
@@ -20,7 +20,6 @@ package org.apache.spark.examples.ml;
 // $example on$
 import org.apache.spark.ml.clustering.BisectingKMeans;
 import org.apache.spark.ml.clustering.BisectingKMeansModel;
-import org.apache.spark.ml.evaluation.ClusteringEvaluator;
 import org.apache.spark.ml.linalg.Vector;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
@@ -51,14 +50,9 @@ public class JavaBisectingKMeansExample {
 BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
 BisectingKMeansModel model = bkm.fit(dataset);
 
-// Make predictions
-Dataset predictions = model.transform(dataset);
-
-// Evaluate clustering by computing Silhouette score
-ClusteringEvaluator evaluator = new ClusteringEvaluator();
-
-double silhouette = evaluator.evaluate(predictions);
-System.out.println("Silhouette with squared euclidean distance = " + 
silhouette);
+// Evaluate clustering.
+double cost = model.computeCost(dataset);
+System.out.println("Within Set Sum of Squared Errors = " + cost);
 
 // Shows the result.
 System.out.println("Cluster Centers: ");

http://git-wip-us.apache.org/repos/asf/spark/blob/e3a60b0d/examples/src/main/python/ml/bisecting_k_means_example.py
--
diff --git a/examples/src/main/python/ml/bisecting_k_means_example.py 
b/examples/src/main/python/ml/bisecting_k_means_example.py
index 82adb33..7842d20 100644
--- a/examples/src/main/python/ml/bisecting_k_means_example.py
+++ b/examples/src/main/python/ml/bisecting_k_means_example.py
@@ -24,7 +24,6 @@ from __future__ import print_function
 
 # $example on$
 from pyspark.ml.clustering import BisectingKMeans
-from pyspark.ml.evaluation import ClusteringEvaluator
 # $example off$
 from pyspark.sql import SparkSession
 
@@ -42,14 +41,9 @@ if __name__ == "__main__":
 bkm = BisectingKMeans().setK(2).setSeed(1)
 model = bkm.fit(dataset)
 
-# Make predictions
-predictions = model.transform(dataset)
-
-# Evaluate clustering by computing Silhouette score
-evaluator = ClusteringEvaluator()
-
-silhouette = evaluator.evaluate(predictions)
-print("Silhouette with squared euclidean distance = " + str(silhouette))
+# Evaluate clustering.
+cost = model.computeCost(dataset)
+print("Within Set Sum of Squared Errors = " + str(cost))
 
 # Shows the result.
 print("Cluster Centers: ")

http://git-wip-us.apache.org/repos/asf/spark/blob/e3a60b0d/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
index 14e13df..5f8f2c9 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
@@ -21,7 +21,6 @@ package org.apache.spark.examples.ml
 
 // $example on$
 import org.apache.spark.ml.clustering.BisectingKMeans
-import org.apache.spark.ml.evaluation.ClusteringEvaluator
 // $

spark git commit: Revert "[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to use ClusteringEvaluator"

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master ec96d34e7 -> 4acbda4a9


Revert "[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to use 
ClusteringEvaluator"

This reverts commit d0ecff28545ac81f5ba7ac06957ced65b6e3ebcd.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4acbda4a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4acbda4a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4acbda4a

Branch: refs/heads/master
Commit: 4acbda4a96a5d6ef9065544631a3457e8d7b1748
Parents: ec96d34
Author: Wenchen Fan 
Authored: Sat Oct 20 09:28:53 2018 +0800
Committer: Wenchen Fan 
Committed: Sat Oct 20 09:28:53 2018 +0800

--
 .../spark/examples/ml/JavaBisectingKMeansExample.java   | 12 +++-
 .../src/main/python/ml/bisecting_k_means_example.py | 12 +++-
 .../spark/examples/ml/BisectingKMeansExample.scala  | 12 +++-
 3 files changed, 9 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4acbda4a/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
index f517dc3..8c82aaa 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java
@@ -20,7 +20,6 @@ package org.apache.spark.examples.ml;
 // $example on$
 import org.apache.spark.ml.clustering.BisectingKMeans;
 import org.apache.spark.ml.clustering.BisectingKMeansModel;
-import org.apache.spark.ml.evaluation.ClusteringEvaluator;
 import org.apache.spark.ml.linalg.Vector;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
@@ -51,14 +50,9 @@ public class JavaBisectingKMeansExample {
 BisectingKMeans bkm = new BisectingKMeans().setK(2).setSeed(1);
 BisectingKMeansModel model = bkm.fit(dataset);
 
-// Make predictions
-Dataset predictions = model.transform(dataset);
-
-// Evaluate clustering by computing Silhouette score
-ClusteringEvaluator evaluator = new ClusteringEvaluator();
-
-double silhouette = evaluator.evaluate(predictions);
-System.out.println("Silhouette with squared euclidean distance = " + 
silhouette);
+// Evaluate clustering.
+double cost = model.computeCost(dataset);
+System.out.println("Within Set Sum of Squared Errors = " + cost);
 
 // Shows the result.
 System.out.println("Cluster Centers: ");

http://git-wip-us.apache.org/repos/asf/spark/blob/4acbda4a/examples/src/main/python/ml/bisecting_k_means_example.py
--
diff --git a/examples/src/main/python/ml/bisecting_k_means_example.py 
b/examples/src/main/python/ml/bisecting_k_means_example.py
index 82adb33..7842d20 100644
--- a/examples/src/main/python/ml/bisecting_k_means_example.py
+++ b/examples/src/main/python/ml/bisecting_k_means_example.py
@@ -24,7 +24,6 @@ from __future__ import print_function
 
 # $example on$
 from pyspark.ml.clustering import BisectingKMeans
-from pyspark.ml.evaluation import ClusteringEvaluator
 # $example off$
 from pyspark.sql import SparkSession
 
@@ -42,14 +41,9 @@ if __name__ == "__main__":
 bkm = BisectingKMeans().setK(2).setSeed(1)
 model = bkm.fit(dataset)
 
-# Make predictions
-predictions = model.transform(dataset)
-
-# Evaluate clustering by computing Silhouette score
-evaluator = ClusteringEvaluator()
-
-silhouette = evaluator.evaluate(predictions)
-print("Silhouette with squared euclidean distance = " + str(silhouette))
+# Evaluate clustering.
+cost = model.computeCost(dataset)
+print("Within Set Sum of Squared Errors = " + str(cost))
 
 # Shows the result.
 print("Cluster Centers: ")

http://git-wip-us.apache.org/repos/asf/spark/blob/4acbda4a/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
index 14e13df..5f8f2c9 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala
@@ -21,7 +21,6 @@ package org.apache.spark.examples.ml
 
 // $example on$
 import org.apache.spark.ml.clustering.BisectingKMeans
-import org.apache.spark.ml.evaluation.ClusteringEvaluator
 // $example

svn commit: r30172 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_16_03-ec96d34-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 23:17:26 2018
New Revision: 30172

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_16_03-ec96d34 docs


[This commit notification would consist of 1483 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25745][K8S] Improve docker-image-tool.sh script

2018-10-19 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master 43717dee5 -> ec96d34e7


[SPARK-25745][K8S] Improve docker-image-tool.sh script

## What changes were proposed in this pull request?

Adds error checking and handling to `docker` invocations ensuring the script 
terminates early in the event of any errors.  This avoids subtle errors that 
can occur e.g. if the base image fails to build the Python/R images can end up 
being built from outdated base images and makes it more explicit to the user 
that something went wrong.

Additionally the provided `Dockerfiles` assume that Spark was first built 
locally or is a runnable distribution however it didn't previously enforce 
this.  The script will now check the JARs folder to ensure that Spark JARs 
actually exist and if not aborts early reminding the user they need to build 
locally first.

## How was this patch tested?

- Tested with a `mvn clean` working copy and verified that the script now 
terminates early
- Tested with bad `Dockerfiles` that fail to build to see that early 
termination occurred

Closes #22748 from rvesse/SPARK-25745.

Authored-by: Rob Vesse 
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec96d34e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec96d34e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec96d34e

Branch: refs/heads/master
Commit: ec96d34e74148803190db8dcf9fda527eeea9255
Parents: 43717de
Author: Rob Vesse 
Authored: Fri Oct 19 15:03:53 2018 -0700
Committer: Marcelo Vanzin 
Committed: Fri Oct 19 15:03:53 2018 -0700

--
 bin/docker-image-tool.sh | 41 +++--
 1 file changed, 31 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ec96d34e/bin/docker-image-tool.sh
--
diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index f17791a..001590a 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -44,6 +44,7 @@ function image_ref {
 function build {
   local BUILD_ARGS
   local IMG_PATH
+  local JARS
 
   if [ ! -f "$SPARK_HOME/RELEASE" ]; then
 # Set image build arguments accordingly if this is a source repo and not a 
distribution archive.
@@ -53,26 +54,38 @@ function build {
 # the examples directory is cleaned up before generating the distribution 
tarball, so this
 # issue does not occur.
 IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
+JARS=assembly/target/scala-$SPARK_SCALA_VERSION/jars
 BUILD_ARGS=(
   ${BUILD_PARAMS}
   --build-arg
   img_path=$IMG_PATH
   --build-arg
-  spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
+  spark_jars=$JARS
   --build-arg
   example_jars=examples/target/scala-$SPARK_SCALA_VERSION/jars
   --build-arg
   k8s_tests=resource-managers/kubernetes/integration-tests/tests
 )
   else
-# Not passed as an argument to docker, but used to validate the Spark 
directory.
+# Not passed as arguments to docker, but used to validate the Spark 
directory.
 IMG_PATH="kubernetes/dockerfiles"
+JARS=jars
 BUILD_ARGS=(${BUILD_PARAMS})
   fi
 
+  # Verify that the Docker image content directory is present
   if [ ! -d "$IMG_PATH" ]; then
 error "Cannot find docker image. This script must be run from a runnable 
distribution of Apache Spark."
   fi
+
+  # Verify that Spark has actually been built/is a runnable distribution
+  #Â i.e. the Spark JARs that the Docker files will place into the image are 
present
+  local TOTAL_JARS=$(ls $JARS/spark-* | wc -l)
+  TOTAL_JARS=$(( $TOTAL_JARS ))
+  if [ "${TOTAL_JARS}" -eq 0 ]; then
+error "Cannot find Spark JARs. This script assumes that Apache Spark has 
first been built locally or this is a runnable distribution."
+  fi
+
   local BINDING_BUILD_ARGS=(
 ${BUILD_PARAMS}
 --build-arg
@@ -85,29 +98,37 @@ function build {
   docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$BASEDOCKERFILE" .
-  if [[ $? != 0 ]]; then
-error "Failed to build Spark docker image."
+  if [ $? -ne 0 ]; then
+error "Failed to build Spark JVM Docker image, please refer to Docker 
build output for details."
   fi
 
   docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
 -t $(image_ref spark-py) \
 -f "$PYDOCKERFILE" .
-  if [[ $? != 0 ]]; then
-error "Failed to build PySpark docker image."
-  fi
-
+if [ $? -ne 0 ]; then
+  error "Failed to build PySpark Docker image, please refer to Docker 
build output for details."
+fi
   docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
 -t $(image_ref spark-r) \
 -f "$RDOCKERFILE" .
-  if [[ $? != 0 ]]; then
-error "Failed to build

spark git commit: Revert "[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans"

2018-10-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 1001d2314 -> 432697c7b


Revert "[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans"

This reverts commit c2962546d9a5900a5628a31b83d2c4b22c3a7936.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/432697c7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/432697c7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/432697c7

Branch: refs/heads/branch-2.4
Commit: 432697c7b58785ca8439717fa748a72224cf0859
Parents: 1001d23
Author: gatorsmile 
Authored: Fri Oct 19 14:57:52 2018 -0700
Committer: gatorsmile 
Committed: Fri Oct 19 14:57:52 2018 -0700

--
 .../scala/org/apache/spark/ml/clustering/BisectingKMeans.scala | 5 -
 python/pyspark/ml/clustering.py| 6 --
 2 files changed, 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/432697c7/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index 2243d99..5cb16cc 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -125,13 +125,8 @@ class BisectingKMeansModel private[ml] (
   /**
* Computes the sum of squared distances between the input points and their 
corresponding cluster
* centers.
-   *
-   * @deprecated This method is deprecated and will be removed in 3.0.0. Use 
ClusteringEvaluator
-   * instead. You can also get the cost on the training dataset in 
the summary.
*/
   @Since("2.0.0")
-  @deprecated("This method is deprecated and will be removed in 3.0.0. Use 
ClusteringEvaluator " +
-"instead. You can also get the cost on the training dataset in the 
summary.", "2.4.0")
   def computeCost(dataset: Dataset[_]): Double = {
 SchemaUtils.validateVectorCompatibleColumn(dataset.schema, getFeaturesCol)
 val data = DatasetUtils.columnToOldVector(dataset, getFeaturesCol)

http://git-wip-us.apache.org/repos/asf/spark/blob/432697c7/python/pyspark/ml/clustering.py
--
diff --git a/python/pyspark/ml/clustering.py b/python/pyspark/ml/clustering.py
index 11eb124..5ef4e76 100644
--- a/python/pyspark/ml/clustering.py
+++ b/python/pyspark/ml/clustering.py
@@ -540,13 +540,7 @@ class BisectingKMeansModel(JavaModel, JavaMLWritable, 
JavaMLReadable):
 """
 Computes the sum of squared distances between the input points
 and their corresponding cluster centers.
-
-..note:: Deprecated in 2.4.0. It will be removed in 3.0.0. Use 
ClusteringEvaluator instead.
-   You can also get the cost on the training dataset in the summary.
 """
-warnings.warn("Deprecated in 2.4.0. It will be removed in 3.0.0. Use 
ClusteringEvaluator "
-  "instead. You can also get the cost on the training 
dataset in the summary.",
-  DeprecationWarning)
 return self._call_java("computeCost", dataset)
 
 @property


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30171 - in /dev/spark/2.4.1-SNAPSHOT-2018_10_19_14_03-1001d23-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 21:17:15 2018
New Revision: 30171

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_10_19_14_03-1001d23 docs


[This commit notification would consist of 1477 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30168 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_12_02-43717de-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 19:16:51 2018
New Revision: 30168

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_12_02-43717de docs


[This commit notification would consist of 1483 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25704][CORE] Allocate a bit less than Int.MaxValue

2018-10-19 Thread irashid

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 9c0c6d4d5 -> 1001d2314


[SPARK-25704][CORE] Allocate a bit less than Int.MaxValue

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave
a little extra room.  This is necessary when reading blocks >2GB off
the network (for remote reads or for cache replication).

Unit tests via jenkins, ran a test with blocks over 2gb on a cluster

Closes #22705 from squito/SPARK-25704.

Authored-by: Imran Rashid 
Signed-off-by: Imran Rashid 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1001d231
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1001d231
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1001d231

Branch: refs/heads/branch-2.4
Commit: 1001d2314275c902da519725da266a23b537e33a
Parents: 9c0c6d4
Author: Imran Rashid 
Authored: Fri Oct 19 12:52:41 2018 -0500
Committer: Imran Rashid 
Committed: Fri Oct 19 12:54:08 2018 -0500

--
 .../org/apache/spark/storage/BlockManager.scala |  6 ++
 .../apache/spark/util/io/ChunkedByteBuffer.scala| 16 +---
 2 files changed, 11 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1001d231/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 0fe82ac..c01a453 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -133,8 +133,6 @@ private[spark] class BlockManager(
 
   private[spark] val externalShuffleServiceEnabled =
 conf.get(config.SHUFFLE_SERVICE_ENABLED)
-  private val chunkSize =
-conf.getSizeAsBytes("spark.storage.memoryMapLimitForTests", 
Int.MaxValue.toString).toInt
   private val remoteReadNioBufferConversion =
 conf.getBoolean("spark.network.remoteReadNioBufferConversion", false)
 
@@ -451,7 +449,7 @@ private[spark] class BlockManager(
 new EncryptedBlockData(tmpFile, blockSize, conf, 
key).toChunkedByteBuffer(allocator)
 
   case None =>
-ChunkedByteBuffer.fromFile(tmpFile, 
conf.get(config.MEMORY_MAP_LIMIT_FOR_TESTS).toInt)
+ChunkedByteBuffer.fromFile(tmpFile)
 }
 putBytes(blockId, buffer, level)(classTag)
 tmpFile.delete()
@@ -797,7 +795,7 @@ private[spark] class BlockManager(
 if (remoteReadNioBufferConversion) {
   return Some(new ChunkedByteBuffer(data.nioByteBuffer()))
 } else {
-  return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize))
+  return Some(ChunkedByteBuffer.fromManagedBuffer(data))
 }
   }
   logDebug(s"The value of block $blockId is null")

http://git-wip-us.apache.org/repos/asf/spark/blob/1001d231/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala 
b/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
index 4aa8d45..9547cb4 100644
--- a/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
+++ b/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
@@ -30,6 +30,7 @@ import org.apache.spark.internal.config
 import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer}
 import org.apache.spark.network.util.{ByteArrayWritableChannel, 
LimitedInputStream}
 import org.apache.spark.storage.StorageUtils
+import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.util.Utils
 
 /**
@@ -169,24 +170,25 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
 
 }
 
-object ChunkedByteBuffer {
+private[spark] object ChunkedByteBuffer {
+
+
   // TODO eliminate this method if we switch BlockManager to getting 
InputStreams
-  def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): 
ChunkedByteBuffer = {
+  def fromManagedBuffer(data: ManagedBuffer): ChunkedByteBuffer = {
 data match {
   case f: FileSegmentManagedBuffer =>
-fromFile(f.getFile, maxChunkSize, f.getOffset, f.getLength)
+fromFile(f.getFile, f.getOffset, f.getLength)
   case other =>
 new ChunkedByteBuffer(other.nioByteBuffer())
 }
   }
 
-  def fromFile(file: File, maxChunkSize: Int): ChunkedByteBuffer = {
-fromFile(file, maxChunkSize, 0, file.length())
+  def fromFile(file: File): ChunkedByteBuffer = {
+fromFile(file, 0, file.length())
   }
 
   private def fromFile(
   file: File,
-  maxChunkSize: Int,
   offset: Long,
   length: Long):

spark git commit: [SPARK-25704][CORE] Allocate a bit less than Int.MaxValue

2018-10-19 Thread irashid

Repository: spark
Updated Branches:
  refs/heads/master 130121711 -> 43717dee5


[SPARK-25704][CORE] Allocate a bit less than Int.MaxValue

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave
a little extra room.  This is necessary when reading blocks >2GB off
the network (for remote reads or for cache replication).

Unit tests via jenkins, ran a test with blocks over 2gb on a cluster

Closes #22705 from squito/SPARK-25704.

Authored-by: Imran Rashid 
Signed-off-by: Imran Rashid 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43717dee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43717dee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43717dee

Branch: refs/heads/master
Commit: 43717dee570dc41d71f0b27b8939f6297a029a02
Parents: 1301217
Author: Imran Rashid 
Authored: Fri Oct 19 12:52:41 2018 -0500
Committer: Imran Rashid 
Committed: Fri Oct 19 12:52:41 2018 -0500

--
 .../org/apache/spark/storage/BlockManager.scala |  6 ++
 .../apache/spark/util/io/ChunkedByteBuffer.scala| 16 +---
 2 files changed, 11 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43717dee/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 0fe82ac..c01a453 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -133,8 +133,6 @@ private[spark] class BlockManager(
 
   private[spark] val externalShuffleServiceEnabled =
 conf.get(config.SHUFFLE_SERVICE_ENABLED)
-  private val chunkSize =
-conf.getSizeAsBytes("spark.storage.memoryMapLimitForTests", 
Int.MaxValue.toString).toInt
   private val remoteReadNioBufferConversion =
 conf.getBoolean("spark.network.remoteReadNioBufferConversion", false)
 
@@ -451,7 +449,7 @@ private[spark] class BlockManager(
 new EncryptedBlockData(tmpFile, blockSize, conf, 
key).toChunkedByteBuffer(allocator)
 
   case None =>
-ChunkedByteBuffer.fromFile(tmpFile, 
conf.get(config.MEMORY_MAP_LIMIT_FOR_TESTS).toInt)
+ChunkedByteBuffer.fromFile(tmpFile)
 }
 putBytes(blockId, buffer, level)(classTag)
 tmpFile.delete()
@@ -797,7 +795,7 @@ private[spark] class BlockManager(
 if (remoteReadNioBufferConversion) {
   return Some(new ChunkedByteBuffer(data.nioByteBuffer()))
 } else {
-  return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize))
+  return Some(ChunkedByteBuffer.fromManagedBuffer(data))
 }
   }
   logDebug(s"The value of block $blockId is null")

http://git-wip-us.apache.org/repos/asf/spark/blob/43717dee/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala 
b/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
index 4aa8d45..9547cb4 100644
--- a/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
+++ b/core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala
@@ -30,6 +30,7 @@ import org.apache.spark.internal.config
 import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer}
 import org.apache.spark.network.util.{ByteArrayWritableChannel, 
LimitedInputStream}
 import org.apache.spark.storage.StorageUtils
+import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.util.Utils
 
 /**
@@ -169,24 +170,25 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
 
 }
 
-object ChunkedByteBuffer {
+private[spark] object ChunkedByteBuffer {
+
+
   // TODO eliminate this method if we switch BlockManager to getting 
InputStreams
-  def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): 
ChunkedByteBuffer = {
+  def fromManagedBuffer(data: ManagedBuffer): ChunkedByteBuffer = {
 data match {
   case f: FileSegmentManagedBuffer =>
-fromFile(f.getFile, maxChunkSize, f.getOffset, f.getLength)
+fromFile(f.getFile, f.getOffset, f.getLength)
   case other =>
 new ChunkedByteBuffer(other.nioByteBuffer())
 }
   }
 
-  def fromFile(file: File, maxChunkSize: Int): ChunkedByteBuffer = {
-fromFile(file, maxChunkSize, 0, file.length())
+  def fromFile(file: File): ChunkedByteBuffer = {
+fromFile(file, 0, file.length())
   }
 
   private def fromFile(
   file: File,
-  maxChunkSize: Int,
   offset: Long,
   length: Long): ChunkedB

svn commit: r30165 - in /dev/spark/2.3.3-SNAPSHOT-2018_10_19_10_03-5cef11a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 17:19:05 2018
New Revision: 30165

Log:
Apache Spark 2.3.3-SNAPSHOT-2018_10_19_10_03-5cef11a docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30164 - in /dev/spark/2.4.1-SNAPSHOT-2018_10_19_10_03-9c0c6d4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 17:18:18 2018
New Revision: 30164

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_10_19_10_03-9c0c6d4 docs


[This commit notification would consist of 1478 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: fix security issue of zinc(update run-tests.py)

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 9ad0f6ea8 -> 130121711


fix security issue of zinc(update run-tests.py)


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13012171
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13012171
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13012171

Branch: refs/heads/master
Commit: 130121711c3258cc5cb6123379f2b4b419851c6e
Parents: 9ad0f6e
Author: Wenchen Fan 
Authored: Sat Oct 20 00:21:22 2018 +0800
Committer: Wenchen Fan 
Committed: Sat Oct 20 00:23:16 2018 +0800

--
 dev/run-tests.py | 10 --
 1 file changed, 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/13012171/dev/run-tests.py
--
diff --git a/dev/run-tests.py b/dev/run-tests.py
index 26045ee..7ec7334 100755
--- a/dev/run-tests.py
+++ b/dev/run-tests.py
@@ -249,15 +249,6 @@ def get_zinc_port():
 return random.randrange(3030, 4030)
 
 
-def kill_zinc_on_port(zinc_port):
-"""
-Kill the Zinc process running on the given port, if one exists.
-"""
-cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
-lsof_exe = which("lsof")
-subprocess.check_call(cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", 
zinc_port), shell=True)
-
-
 def exec_maven(mvn_args=()):
 """Will call Maven in the current directory with the list of mvn_args 
passed
 in and returns the subprocess for any further processing"""
@@ -267,7 +258,6 @@ def exec_maven(mvn_args=()):
 zinc_flag = "-DzincPort=%s" % zinc_port
 flags = [os.path.join(SPARK_HOME, "build", "mvn"), "--force", zinc_flag]
 run_cmd(flags + mvn_args)
-kill_zinc_on_port(zinc_port)
 
 
 def exec_sbt(sbt_args=()):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25269][SQL] SQL interface support specify StorageLevel when cache table

2018-10-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/master ac586bbb0 -> 9ad0f6ea8


[SPARK-25269][SQL] SQL interface support specify StorageLevel when cache table

## What changes were proposed in this pull request?

SQL interface support specify `StorageLevel` when cache table. The semantic is:
```sql
CACHE TABLE tableName OPTIONS('storageLevel' 'DISK_ONLY');
```
All supported `StorageLevel` are:
https://github.com/apache/spark/blob/eefdf9f9dd8afde49ad7d4e230e2735eb817ab0a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L172-L183

## How was this patch tested?

unit tests and manual tests.

manual tests configuration:
```
--executor-memory 15G --executor-cores 5 --num-executors 50
```
Data:
Input Size / Records: 1037.7 GB / 11732805788

Result:
![image](https://user-images.githubusercontent.com/5399861/47213362-56a1c980-d3cd-11e8-82e7-28d7abc5923e.png)

Closes #22263 from wangyum/SPARK-25269.

Authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ad0f6ea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ad0f6ea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ad0f6ea

Branch: refs/heads/master
Commit: 9ad0f6ea89435391ec16e436bc4c4d5bf6b68493
Parents: ac586bb
Author: Yuming Wang 
Authored: Fri Oct 19 09:15:55 2018 -0700
Committer: Dongjoon Hyun 
Committed: Fri Oct 19 09:15:55 2018 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 |  3 +-
 .../spark/sql/execution/SparkSqlParser.scala|  3 +-
 .../spark/sql/execution/command/cache.scala | 23 +++-
 .../org/apache/spark/sql/CachedTableSuite.scala | 60 
 .../apache/spark/sql/hive/test/TestHive.scala   |  2 +-
 5 files changed, 86 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9ad0f6ea/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 0569986..e2d34d1 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -162,7 +162,8 @@ statement
 tableIdentifier partitionSpec? describeColName?
#describeTable
 | REFRESH TABLE tableIdentifier
#refreshTable
 | REFRESH (STRING | .*?)   
#refreshResource
-| CACHE LAZY? TABLE tableIdentifier (AS? query)?   
#cacheTable
+| CACHE LAZY? TABLE tableIdentifier
+(OPTIONS options=tablePropertyList)? (AS? query)?  
#cacheTable
 | UNCACHE TABLE (IF EXISTS)? tableIdentifier   
#uncacheTable
 | CLEAR CACHE  
#clearCache
 | LOAD DATA LOCAL? INPATH path=STRING OVERWRITE? INTO TABLE

http://git-wip-us.apache.org/repos/asf/spark/blob/9ad0f6ea/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
index 4ed14d3..364efea 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
   throw new ParseException(s"It is not allowed to add database prefix 
`$database` to " +
 s"the table name in CACHE TABLE AS SELECT", ctx)
 }
-CacheTableCommand(tableIdent, query, ctx.LAZY != null)
+val options = 
Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty)
+CacheTableCommand(tableIdent, query, ctx.LAZY != null, options)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/9ad0f6ea/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala
index 6b00426..728604a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala
@@ -17,16 +17,21 @@
 
 package org.apache.spark.sql.execution.command
 
+import java.util.Locale
+

spark git commit: fix security issue of zinc(simplier version)

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master ec1fafe3e -> ac586bbb0


fix security issue of zinc(simplier version)


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac586bbb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac586bbb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac586bbb

Branch: refs/heads/master
Commit: ac586bbb016d70c60b1bd2ea5320fd56c3a8eead
Parents: ec1fafe
Author: Wenchen Fan 
Authored: Fri Oct 19 23:54:15 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 23:54:15 2018 +0800

--
 build/mvn   | 33 ++--
 dev/create-release/release-build.sh |  6 --
 2 files changed, 10 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ac586bbb/build/mvn
--
diff --git a/build/mvn b/build/mvn
index 0289ef3..b60ea64 100755
--- a/build/mvn
+++ b/build/mvn
@@ -139,17 +139,8 @@ if [ "$1" == "--force" ]; then
   shift
 fi
 
-if [ "$1" == "--zinc" ]; then
-  echo "Using zinc for incremental compilation. Be sure you are aware of the 
implications of "
-  echo "running this server process on your machine"
-  USE_ZINC=1
-  shift
-fi
-
 # Install the proper version of Scala, Zinc and Maven for the build
-if [ -n "${USE_ZINC}" ]; then
-  install_zinc
-fi
+install_zinc
 install_scala
 install_mvn
 
@@ -158,15 +149,13 @@ cd "${_CALLING_DIR}"
 
 # Now that zinc is ensured to be installed, check its status and, if its
 # not running or just installed, start it
-if [ -n "${USE_ZINC}" ]; then
-  if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
-export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
-"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-"${ZINC_BIN}" -start -port ${ZINC_PORT} -server 127.0.0.1 \
-  -idle-timeout 30m \
-  -scala-compiler "${SCALA_COMPILER}" \
-  -scala-library "${SCALA_LIBRARY}" &>/dev/null
-  fi
+if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
+  export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
+  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+  "${ZINC_BIN}" -start -port ${ZINC_PORT} \
+-server 127.0.0.1 -idle-timeout 30m \
+-scala-compiler "${SCALA_COMPILER}" \
+-scala-library "${SCALA_LIBRARY}" &>/dev/null
 fi
 
 # Set any `mvn` options if not already present
@@ -177,7 +166,5 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 # Last, call the `mvn` command as usual
 "${MVN_BIN}" -DzincPort=${ZINC_PORT} "$@"
 
-if [ -n "${USE_ZINC}" ]; then
-  # Try to shut down zinc explicitly
-  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-fi
+# Try to shut down zinc explicitly
+"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}

http://git-wip-us.apache.org/repos/asf/spark/blob/ac586bbb/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index a741a3b..26e0868 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -389,9 +389,6 @@ if [[ "$1" == "publish-snapshot" ]]; then
   #$MVN -DzincPort=$ZINC_PORT --settings $tmp_settings \
   #  -DskipTests $SCALA_2_12_PROFILES $PUBLISH_PROFILES clean deploy
 
-  # Clean-up Zinc nailgun process
-  $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
-
   rm $tmp_settings
   cd ..
   exit 0
@@ -436,9 +433,6 @@ if [[ "$1" == "publish-release" ]]; then
   -DskipTests $PUBLISH_PROFILES $SCALA_2_12_PROFILES clean install
   fi
 
-  # Clean-up Zinc nailgun process
-  $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
-
   ./dev/change-scala-version.sh 2.11
 
   pushd $tmp_repo/org/apache/spark


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30163 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_08_03-ec1fafe-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 15:17:35 2018
New Revision: 30163

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_08_03-ec1fafe docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2018-10-19 Thread wenchen

Repository: spark
Updated Tags:  refs/tags/v2.4.0-rc4 [deleted] 1ff8dd424

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: Preparing Spark release v2.4.0-rc4

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 8926c4a62 -> 9c0c6d4d5


Preparing Spark release v2.4.0-rc4


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1ff8dd42
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1ff8dd42
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1ff8dd42

Branch: refs/heads/branch-2.4
Commit: 1ff8dd424041f8dd525d1ec97828d9663f1f10ea
Parents: 8926c4a
Author: Wenchen Fan 
Authored: Fri Oct 19 14:22:00 2018 +
Committer: Wenchen Fan 
Committed: Fri Oct 19 14:22:00 2018 +

--
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1ff8dd42/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 714b6f1..f52d785 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.1
+Version: 2.4.0
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/1ff8dd42/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index ee0de73..63ab510 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1-SNAPSHOT
+2.4.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1ff8dd42/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index b89e0fe..b10e118 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1-SNAPSHOT
+2.4.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1ff8dd42/common/network-common/pom.xml
---

[2/2] spark git commit: Preparing development version 2.4.1-SNAPSHOT

2018-10-19 Thread wenchen

Preparing development version 2.4.1-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c0c6d4d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c0c6d4d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c0c6d4d

Branch: refs/heads/branch-2.4
Commit: 9c0c6d4d5039267be35e564d8d6712318d557317
Parents: 1ff8dd4
Author: Wenchen Fan 
Authored: Fri Oct 19 14:22:04 2018 +
Committer: Wenchen Fan 
Committed: Fri Oct 19 14:22:04 2018 +

--
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9c0c6d4d/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index f52d785..714b6f1 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.0
+Version: 2.4.1
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/9c0c6d4d/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 63ab510..ee0de73 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.0
+2.4.1-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9c0c6d4d/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index b10e118..b89e0fe 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.0
+2.4.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9c0c6d4d/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network

[spark] Git Push Summary

2018-10-19 Thread wenchen

Repository: spark
Updated Tags:  refs/tags/v2.4.0-rc4 [created] 1ff8dd424

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: fix security issue of zinc

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 2e3b923e0 -> d6542fa3f


fix security issue of zinc


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6542fa3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6542fa3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6542fa3

Branch: refs/heads/branch-2.2
Commit: d6542fa3f02587712d26e4e191353362a4031794
Parents: 2e3b923
Author: Wenchen Fan 
Authored: Fri Oct 19 21:39:58 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:40:40 2018 +0800

--
 build/mvn | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d6542fa3/build/mvn
--
diff --git a/build/mvn b/build/mvn
index 1e393c3..926027e 100755
--- a/build/mvn
+++ b/build/mvn
@@ -130,8 +130,17 @@ if [ "$1" == "--force" ]; then
   shift
 fi
 
+if [ "$1" == "--zinc" ]; then
+  echo "Using zinc for incremental compilation. Be sure you are aware of the 
implications of "
+  echo "running this server process on your machine"
+  USE_ZINC=1
+  shift
+fi
+
 # Install the proper version of Scala, Zinc and Maven for the build
-install_zinc
+if [ -n "${USE_ZINC}" ]; then
+  install_zinc
+fi
 install_scala
 install_mvn
 
@@ -140,12 +149,15 @@ cd "${_CALLING_DIR}"
 
 # Now that zinc is ensured to be installed, check its status and, if its
 # not running or just installed, start it
-if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
-  export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
-  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-  "${ZINC_BIN}" -start -port ${ZINC_PORT} \
--scala-compiler "${SCALA_COMPILER}" \
--scala-library "${SCALA_LIBRARY}" &>/dev/null
+if [ -n "${USE_ZINC}" ]; then
+  if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
+export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
+"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+"${ZINC_BIN}" -start -port ${ZINC_PORT} -server 127.0.0.1 \
+  -idle-timeout 30m \
+  -scala-compiler "${SCALA_COMPILER}" \
+  -scala-library "${SCALA_LIBRARY}" &>/dev/null
+  fi
 fi
 
 # Set any `mvn` options if not already present
@@ -155,3 +167,8 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 ${MVN_BIN} -DzincPort=${ZINC_PORT} "$@"
+
+if [ -n "${USE_ZINC}" ]; then
+  # Try to shut down zinc explicitly
+  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: fix security issue of zinc

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 353d32804 -> 5cef11acc


fix security issue of zinc


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5cef11ac
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5cef11ac
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5cef11ac

Branch: refs/heads/branch-2.3
Commit: 5cef11acc0770ca49a0487d6543eb81022b7415d
Parents: 353d328
Author: Wenchen Fan 
Authored: Fri Oct 19 21:39:58 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:39:58 2018 +0800

--
 build/mvn | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5cef11ac/build/mvn
--
diff --git a/build/mvn b/build/mvn
index efa4f93..9c0d1a7 100755
--- a/build/mvn
+++ b/build/mvn
@@ -130,8 +130,17 @@ if [ "$1" == "--force" ]; then
   shift
 fi
 
+if [ "$1" == "--zinc" ]; then
+  echo "Using zinc for incremental compilation. Be sure you are aware of the 
implications of "
+  echo "running this server process on your machine"
+  USE_ZINC=1
+  shift
+fi
+
 # Install the proper version of Scala, Zinc and Maven for the build
-install_zinc
+if [ -n "${USE_ZINC}" ]; then
+  install_zinc
+fi
 install_scala
 install_mvn
 
@@ -140,12 +149,15 @@ cd "${_CALLING_DIR}"
 
 # Now that zinc is ensured to be installed, check its status and, if its
 # not running or just installed, start it
-if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
-  export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
-  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-  "${ZINC_BIN}" -start -port ${ZINC_PORT} \
--scala-compiler "${SCALA_COMPILER}" \
--scala-library "${SCALA_LIBRARY}" &>/dev/null
+if [ -n "${USE_ZINC}" ]; then
+  if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
+export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
+"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+"${ZINC_BIN}" -start -port ${ZINC_PORT} -server 127.0.0.1 \
+  -idle-timeout 30m \
+  -scala-compiler "${SCALA_COMPILER}" \
+  -scala-library "${SCALA_LIBRARY}" &>/dev/null
+  fi
 fi
 
 # Set any `mvn` options if not already present
@@ -155,3 +167,8 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 ${MVN_BIN} -DzincPort=${ZINC_PORT} "$@"
+
+if [ -n "${USE_ZINC}" ]; then
+  # Try to shut down zinc explicitly
+  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: fix security issue of zinc

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 6a06b8cce -> 8926c4a62


fix security issue of zinc


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8926c4a6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8926c4a6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8926c4a6

Branch: refs/heads/branch-2.4
Commit: 8926c4a6237ab059875aa7502d6417317d58381a
Parents: 6a06b8c
Author: Wenchen Fan 
Authored: Fri Oct 19 21:34:35 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:34:35 2018 +0800

--
 build/mvn | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8926c4a6/build/mvn
--
diff --git a/build/mvn b/build/mvn
index 2487b81..0289ef3 100755
--- a/build/mvn
+++ b/build/mvn
@@ -139,8 +139,17 @@ if [ "$1" == "--force" ]; then
   shift
 fi
 
+if [ "$1" == "--zinc" ]; then
+  echo "Using zinc for incremental compilation. Be sure you are aware of the 
implications of "
+  echo "running this server process on your machine"
+  USE_ZINC=1
+  shift
+fi
+
 # Install the proper version of Scala, Zinc and Maven for the build
-install_zinc
+if [ -n "${USE_ZINC}" ]; then
+  install_zinc
+fi
 install_scala
 install_mvn
 
@@ -149,12 +158,15 @@ cd "${_CALLING_DIR}"
 
 # Now that zinc is ensured to be installed, check its status and, if its
 # not running or just installed, start it
-if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
-  export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
-  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-  "${ZINC_BIN}" -start -port ${ZINC_PORT} \
--scala-compiler "${SCALA_COMPILER}" \
--scala-library "${SCALA_LIBRARY}" &>/dev/null
+if [ -n "${USE_ZINC}" ]; then
+  if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
+export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
+"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+"${ZINC_BIN}" -start -port ${ZINC_PORT} -server 127.0.0.1 \
+  -idle-timeout 30m \
+  -scala-compiler "${SCALA_COMPILER}" \
+  -scala-library "${SCALA_LIBRARY}" &>/dev/null
+  fi
 fi
 
 # Set any `mvn` options if not already present
@@ -164,3 +176,8 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 "${MVN_BIN}" -DzincPort=${ZINC_PORT} "$@"
+
+if [ -n "${USE_ZINC}" ]; then
+  # Try to shut down zinc explicitly
+  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: fix security issue of zinc

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master f38594fc5 -> ec1fafe3e


fix security issue of zinc


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec1fafe3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec1fafe3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec1fafe3

Branch: refs/heads/master
Commit: ec1fafe3e78a40372975dfacb2516fe24bdfa2d2
Parents: f38594f
Author: Wenchen Fan 
Authored: Fri Oct 19 21:33:11 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:33:11 2018 +0800

--
 build/mvn | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ec1fafe3/build/mvn
--
diff --git a/build/mvn b/build/mvn
index 2487b81..0289ef3 100755
--- a/build/mvn
+++ b/build/mvn
@@ -139,8 +139,17 @@ if [ "$1" == "--force" ]; then
   shift
 fi
 
+if [ "$1" == "--zinc" ]; then
+  echo "Using zinc for incremental compilation. Be sure you are aware of the 
implications of "
+  echo "running this server process on your machine"
+  USE_ZINC=1
+  shift
+fi
+
 # Install the proper version of Scala, Zinc and Maven for the build
-install_zinc
+if [ -n "${USE_ZINC}" ]; then
+  install_zinc
+fi
 install_scala
 install_mvn
 
@@ -149,12 +158,15 @@ cd "${_CALLING_DIR}"
 
 # Now that zinc is ensured to be installed, check its status and, if its
 # not running or just installed, start it
-if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
-  export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
-  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
-  "${ZINC_BIN}" -start -port ${ZINC_PORT} \
--scala-compiler "${SCALA_COMPILER}" \
--scala-library "${SCALA_LIBRARY}" &>/dev/null
+if [ -n "${USE_ZINC}" ]; then
+  if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port 
${ZINC_PORT}`" ]; then
+export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
+"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+"${ZINC_BIN}" -start -port ${ZINC_PORT} -server 127.0.0.1 \
+  -idle-timeout 30m \
+  -scala-compiler "${SCALA_COMPILER}" \
+  -scala-library "${SCALA_LIBRARY}" &>/dev/null
+  fi
 fi
 
 # Set any `mvn` options if not already present
@@ -164,3 +176,8 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 "${MVN_BIN}" -DzincPort=${ZINC_PORT} "$@"
+
+if [ -n "${USE_ZINC}" ]; then
+  # Try to shut down zinc explicitly
+  "${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 61b301cc7 -> 353d32804


[SPARK-25768][SQL] fix constant argument expecting UDAFs

## What changes were proposed in this pull request?

Without this PR some UDAFs like `GenericUDAFPercentileApprox` can throw an 
exception because expecting a constant parameter (object inspector) as a 
particular argument.

The exception is thrown because `toPrettySQL` call in `ResolveAliases` analyzer 
rule transforms a `Literal` parameter to a `PrettyAttribute` which is then 
transformed to an `ObjectInspector` instead of a `ConstantObjectInspector`.
The exception comes from `getEvaluator` method of `GenericUDAFPercentileApprox` 
that actually shouldn't be called during `toPrettySQL` transformation. The 
reason why it is called are the non lazy fields in `HiveUDAFFunction`.

This PR makes all fields of `HiveUDAFFunction` lazy.

## How was this patch tested?

added new UT

Closes #22766 from peter-toth/SPARK-25768.

Authored-by: Peter Toth 
Signed-off-by: Wenchen Fan 
(cherry picked from commit f38594fc561208e17af80d17acf8da362b91fca4)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/353d3280
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/353d3280
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/353d3280

Branch: refs/heads/branch-2.3
Commit: 353d328041397762e12acf915967cafab5dcdade
Parents: 61b301c
Author: Peter Toth 
Authored: Fri Oct 19 21:17:14 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:18:36 2018 +0800

--
 .../org/apache/spark/sql/hive/hiveUDFs.scala| 53 +++-
 .../spark/sql/hive/execution/HiveUDFSuite.scala | 14 ++
 2 files changed, 42 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/353d3280/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
index 68af99e..4a84509 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
@@ -340,39 +340,40 @@ private[hive] case class HiveUDAFFunction(
 resolver.getEvaluator(parameterInfo)
   }
 
-  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
-  @transient
-  private lazy val partial1ModeEvaluator = newEvaluator()
+  private case class HiveEvaluator(
+  evaluator: GenericUDAFEvaluator,
+  objectInspector: ObjectInspector)
 
+  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
   // Hive `ObjectInspector` used to inspect partial aggregation results.
   @transient
-  private val partialResultInspector = partial1ModeEvaluator.init(
-GenericUDAFEvaluator.Mode.PARTIAL1,
-inputInspectors
-  )
+  private lazy val partial1HiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(evaluator, 
evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL1, inputInspectors))
+  }
 
   // The UDAF evaluator used to merge partial aggregation results.
   @transient
   private lazy val partial2ModeEvaluator = {
 val evaluator = newEvaluator()
-evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partialResultInspector))
+evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partial1HiveEvaluator.objectInspector))
 evaluator
   }
 
   // Spark SQL data type of partial aggregation results
   @transient
-  private lazy val partialResultDataType = 
inspectorToDataType(partialResultInspector)
+  private lazy val partialResultDataType =
+inspectorToDataType(partial1HiveEvaluator.objectInspector)
 
   // The UDAF evaluator used to compute the final result from a partial 
aggregation result objects.
-  @transient
-  private lazy val finalModeEvaluator = newEvaluator()
-
   // Hive `ObjectInspector` used to inspect the final aggregation result 
object.
   @transient
-  private val returnInspector = finalModeEvaluator.init(
-GenericUDAFEvaluator.Mode.FINAL,
-Array(partialResultInspector)
-  )
+  private lazy val finalHiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(
+  evaluator,
+  evaluator.init(GenericUDAFEvaluator.Mode.FINAL, 
Array(partial1HiveEvaluator.objectInspector)))
+  }
 
   // Wrapper functions used to wrap Spark SQL input arguments into Hive 
specific format.
   @transient
@@ -381,7 +382,7 @@ private[hive] case class HiveUDAFFunction(
   // Unwrapper function used to unwrap final aggregation result objects 
returned by Hive UDAFs into
   // Spark SQL specific format.
   @transient
-  private lazy val resultUnwrapp

spark git commit: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 df60d9f34 -> 6a06b8cce


[SPARK-25768][SQL] fix constant argument expecting UDAFs

## What changes were proposed in this pull request?

Without this PR some UDAFs like `GenericUDAFPercentileApprox` can throw an 
exception because expecting a constant parameter (object inspector) as a 
particular argument.

The exception is thrown because `toPrettySQL` call in `ResolveAliases` analyzer 
rule transforms a `Literal` parameter to a `PrettyAttribute` which is then 
transformed to an `ObjectInspector` instead of a `ConstantObjectInspector`.
The exception comes from `getEvaluator` method of `GenericUDAFPercentileApprox` 
that actually shouldn't be called during `toPrettySQL` transformation. The 
reason why it is called are the non lazy fields in `HiveUDAFFunction`.

This PR makes all fields of `HiveUDAFFunction` lazy.

## How was this patch tested?

added new UT

Closes #22766 from peter-toth/SPARK-25768.

Authored-by: Peter Toth 
Signed-off-by: Wenchen Fan 
(cherry picked from commit f38594fc561208e17af80d17acf8da362b91fca4)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a06b8cc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a06b8cc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a06b8cc

Branch: refs/heads/branch-2.4
Commit: 6a06b8ccef57017172666cd004d5eb6be994d19e
Parents: df60d9f
Author: Peter Toth 
Authored: Fri Oct 19 21:17:14 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:17:49 2018 +0800

--
 .../org/apache/spark/sql/hive/hiveUDFs.scala| 53 +++-
 .../spark/sql/hive/execution/HiveUDFSuite.scala | 14 ++
 2 files changed, 42 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a06b8cc/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
index 68af99e..4a84509 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
@@ -340,39 +340,40 @@ private[hive] case class HiveUDAFFunction(
 resolver.getEvaluator(parameterInfo)
   }
 
-  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
-  @transient
-  private lazy val partial1ModeEvaluator = newEvaluator()
+  private case class HiveEvaluator(
+  evaluator: GenericUDAFEvaluator,
+  objectInspector: ObjectInspector)
 
+  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
   // Hive `ObjectInspector` used to inspect partial aggregation results.
   @transient
-  private val partialResultInspector = partial1ModeEvaluator.init(
-GenericUDAFEvaluator.Mode.PARTIAL1,
-inputInspectors
-  )
+  private lazy val partial1HiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(evaluator, 
evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL1, inputInspectors))
+  }
 
   // The UDAF evaluator used to merge partial aggregation results.
   @transient
   private lazy val partial2ModeEvaluator = {
 val evaluator = newEvaluator()
-evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partialResultInspector))
+evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partial1HiveEvaluator.objectInspector))
 evaluator
   }
 
   // Spark SQL data type of partial aggregation results
   @transient
-  private lazy val partialResultDataType = 
inspectorToDataType(partialResultInspector)
+  private lazy val partialResultDataType =
+inspectorToDataType(partial1HiveEvaluator.objectInspector)
 
   // The UDAF evaluator used to compute the final result from a partial 
aggregation result objects.
-  @transient
-  private lazy val finalModeEvaluator = newEvaluator()
-
   // Hive `ObjectInspector` used to inspect the final aggregation result 
object.
   @transient
-  private val returnInspector = finalModeEvaluator.init(
-GenericUDAFEvaluator.Mode.FINAL,
-Array(partialResultInspector)
-  )
+  private lazy val finalHiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(
+  evaluator,
+  evaluator.init(GenericUDAFEvaluator.Mode.FINAL, 
Array(partial1HiveEvaluator.objectInspector)))
+  }
 
   // Wrapper functions used to wrap Spark SQL input arguments into Hive 
specific format.
   @transient
@@ -381,7 +382,7 @@ private[hive] case class HiveUDAFFunction(
   // Unwrapper function used to unwrap final aggregation result objects 
returned by Hive UDAFs into
   // Spark SQL specific format.
   @transient
-  private lazy val resultUnwrapp

spark git commit: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master e8167768c -> f38594fc5


[SPARK-25768][SQL] fix constant argument expecting UDAFs

## What changes were proposed in this pull request?

Without this PR some UDAFs like `GenericUDAFPercentileApprox` can throw an 
exception because expecting a constant parameter (object inspector) as a 
particular argument.

The exception is thrown because `toPrettySQL` call in `ResolveAliases` analyzer 
rule transforms a `Literal` parameter to a `PrettyAttribute` which is then 
transformed to an `ObjectInspector` instead of a `ConstantObjectInspector`.
The exception comes from `getEvaluator` method of `GenericUDAFPercentileApprox` 
that actually shouldn't be called during `toPrettySQL` transformation. The 
reason why it is called are the non lazy fields in `HiveUDAFFunction`.

This PR makes all fields of `HiveUDAFFunction` lazy.

## How was this patch tested?

added new UT

Closes #22766 from peter-toth/SPARK-25768.

Authored-by: Peter Toth 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f38594fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f38594fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f38594fc

Branch: refs/heads/master
Commit: f38594fc561208e17af80d17acf8da362b91fca4
Parents: e816776
Author: Peter Toth 
Authored: Fri Oct 19 21:17:14 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:17:14 2018 +0800

--
 .../org/apache/spark/sql/hive/hiveUDFs.scala| 53 +++-
 .../spark/sql/hive/execution/HiveUDFSuite.scala | 14 ++
 2 files changed, 42 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f38594fc/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
index 68af99e..4a84509 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
@@ -340,39 +340,40 @@ private[hive] case class HiveUDAFFunction(
 resolver.getEvaluator(parameterInfo)
   }
 
-  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
-  @transient
-  private lazy val partial1ModeEvaluator = newEvaluator()
+  private case class HiveEvaluator(
+  evaluator: GenericUDAFEvaluator,
+  objectInspector: ObjectInspector)
 
+  // The UDAF evaluator used to consume raw input rows and produce partial 
aggregation results.
   // Hive `ObjectInspector` used to inspect partial aggregation results.
   @transient
-  private val partialResultInspector = partial1ModeEvaluator.init(
-GenericUDAFEvaluator.Mode.PARTIAL1,
-inputInspectors
-  )
+  private lazy val partial1HiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(evaluator, 
evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL1, inputInspectors))
+  }
 
   // The UDAF evaluator used to merge partial aggregation results.
   @transient
   private lazy val partial2ModeEvaluator = {
 val evaluator = newEvaluator()
-evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partialResultInspector))
+evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL2, 
Array(partial1HiveEvaluator.objectInspector))
 evaluator
   }
 
   // Spark SQL data type of partial aggregation results
   @transient
-  private lazy val partialResultDataType = 
inspectorToDataType(partialResultInspector)
+  private lazy val partialResultDataType =
+inspectorToDataType(partial1HiveEvaluator.objectInspector)
 
   // The UDAF evaluator used to compute the final result from a partial 
aggregation result objects.
-  @transient
-  private lazy val finalModeEvaluator = newEvaluator()
-
   // Hive `ObjectInspector` used to inspect the final aggregation result 
object.
   @transient
-  private val returnInspector = finalModeEvaluator.init(
-GenericUDAFEvaluator.Mode.FINAL,
-Array(partialResultInspector)
-  )
+  private lazy val finalHiveEvaluator = {
+val evaluator = newEvaluator()
+HiveEvaluator(
+  evaluator,
+  evaluator.init(GenericUDAFEvaluator.Mode.FINAL, 
Array(partial1HiveEvaluator.objectInspector)))
+  }
 
   // Wrapper functions used to wrap Spark SQL input arguments into Hive 
specific format.
   @transient
@@ -381,7 +382,7 @@ private[hive] case class HiveUDAFFunction(
   // Unwrapper function used to unwrap final aggregation result objects 
returned by Hive UDAFs into
   // Spark SQL specific format.
   @transient
-  private lazy val resultUnwrapper = unwrapperFor(returnInspector)
+  private lazy val resultUnwrapper = 
unwrapperFor(finalHiveEvaluator

[1/2] spark git commit: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 9ed2e4204 -> df60d9f34


http://git-wip-us.apache.org/repos/asf/spark/blob/df60d9f3/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index 697757f..eb956c4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -73,19 +73,24 @@ case class UserDefinedFunction protected[sql] (
*/
   @scala.annotation.varargs
   def apply(exprs: Column*): Column = {
-if (inputTypes.isDefined && nullableTypes.isDefined) {
-  require(inputTypes.get.length == nullableTypes.get.length)
+// TODO: make sure this class is only instantiated through 
`SparkUserDefinedFunction.create()`
+// and `nullableTypes` is always set.
+if (nullableTypes.isEmpty) {
+  nullableTypes = Some(ScalaReflection.getParameterTypeNullability(f))
+}
+if (inputTypes.isDefined) {
+  assert(inputTypes.get.length == nullableTypes.get.length)
 }
 
 Column(ScalaUDF(
   f,
   dataType,
   exprs.map(_.expr),
+  nullableTypes.get,
   inputTypes.getOrElse(Nil),
   udfName = _nameOption,
   nullable = _nullable,
-  udfDeterministic = _deterministic,
-  nullableTypes = nullableTypes.getOrElse(Nil)))
+  udfDeterministic = _deterministic))
   }
 
   private def copyAll(): UserDefinedFunction = {
@@ -146,9 +151,14 @@ private[sql] object SparkUserDefinedFunction {
   def create(
   f: AnyRef,
   dataType: DataType,
-  inputSchemas: Option[Seq[ScalaReflection.Schema]]): UserDefinedFunction 
= {
-val udf = new UserDefinedFunction(f, dataType, 
inputSchemas.map(_.map(_.dataType)))
-udf.nullableTypes = inputSchemas.map(_.map(_.nullable))
+  inputSchemas: Seq[Option[ScalaReflection.Schema]]): UserDefinedFunction 
= {
+val inputTypes = if (inputSchemas.contains(None)) {
+  None
+} else {
+  Some(inputSchemas.map(_.get.dataType))
+}
+val udf = new UserDefinedFunction(f, dataType, inputTypes)
+udf.nullableTypes = 
Some(inputSchemas.map(_.map(_.nullable).getOrElse(true)))
 udf
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/df60d9f3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 10b67d7..6a43ce1 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3819,7 +3819,7 @@ object functions {
   (0 to 10).foreach { x =>
 val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
 val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: 
TypeTag")(_ + ", " + _)
-val inputSchemas = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor(typeTag[A$i]) :: $s"})
+val inputSchemas = (1 to x).foldRight("Nil")((i, s) => 
{s"Try(ScalaReflection.schemaFor(typeTag[A$i])).toOption :: $s"})
 println(s"""
   |/**
   | * Defines a Scala closure of $x arguments as user-defined function 
(UDF).
@@ -3832,7 +3832,7 @@ object functions {
   | */
   |def udf[$typeTags](f: Function$x[$types]): UserDefinedFunction = {
   |  val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[RT]
-  |  val inputSchemas = Try($inputTypes).toOption
+  |  val inputSchemas = $inputSchemas
   |  val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas)
   |  if (nullable) udf else udf.asNonNullable()
   |}""".stripMargin)
@@ -3856,7 +3856,7 @@ object functions {
   | */
   |def udf(f: UDF$i[$extTypeArgs], returnType: DataType): 
UserDefinedFunction = {
   |  val func = f$anyCast.call($anyParams)
-  |  SparkUserDefinedFunction.create($funcCall, returnType, inputSchemas = 
None)
+  |  SparkUserDefinedFunction.create($funcCall, returnType, inputSchemas = 
Seq.fill($i)(None))
   |}""".stripMargin)
   }
 
@@ -3877,7 +3877,7 @@ object functions {
*/
   def udf[RT: TypeTag](f: Function0[RT]): UserDefinedFunction = {
 val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[RT]
-val inputSchemas = Try(Nil).toOption
+val inputSchemas = Nil
 val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas)
 if (nullable) udf else udf.asNonNullable()
   }
@@ -3893,7 +3893,7 @@ object functions {
*/
   def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]): UserDefinedFunction 
= {
 val ScalaReflection.Schema(dataTyp

[2/2] spark git commit: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

2018-10-19 Thread wenchen

[SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

## What changes were proposed in this pull request?

This is a follow-up PR for #22259. The extra field added in `ScalaUDF` with the 
original PR was declared optional, but should be indeed required, otherwise 
callers of `ScalaUDF`'s constructor could ignore this new field and cause the 
result to be incorrect. This PR makes the new field required and changes its 
name to `handleNullForInputs`.

#22259 breaks the previous behavior for null-handling of primitive-type input 
parameters. For example, for `val f = udf({(x: Int, y: Any) => x})`, `f(null, 
"str")` should return `null` but would return `0` after #22259. In this PR, all 
UDF methods except `def udf(f: AnyRef, dataType: DataType): 
UserDefinedFunction` have been restored with the original behavior. The only 
exception is documented in the Spark SQL migration guide.

In addition, now that we have this extra field indicating if a null-test should 
be applied on the corresponding input value, we can also make use of this flag 
to avoid the rule `HandleNullInputsForUDF` being applied infinitely.

## How was this patch tested?
Added UT in UDFSuite

Passed affected existing UTs:
AnalysisSuite
UDFSuite

Closes #22732 from maryannxue/spark-25044-followup.

Lead-authored-by: maryannxue 
Co-authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit e8167768cfebfdb11acd8e0a06fe34ca43c14648)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df60d9f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df60d9f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df60d9f3

Branch: refs/heads/branch-2.4
Commit: df60d9f3469022866de2f41939a38e7e5d02dc1b
Parents: 9ed2e42
Author: maryannxue 
Authored: Fri Oct 19 21:03:59 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:04:33 2018 +0800

--
 .../spark/sql/catalyst/ScalaReflection.scala|  22 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  51 ++---
 .../sql/catalyst/expressions/ScalaUDF.scala |  14 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala   |  18 +-
 .../catalyst/expressions/ScalaUDFSuite.scala|   9 +-
 .../sql/catalyst/trees/TreeNodeSuite.scala  |   2 +-
 .../org/apache/spark/sql/UDFRegistration.scala  | 218 ++-
 .../datasources/FileFormatDataWriter.scala  |   3 +-
 .../sql/expressions/UserDefinedFunction.scala   |  24 +-
 .../scala/org/apache/spark/sql/functions.scala  |  54 ++---
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  24 ++
 11 files changed, 257 insertions(+), 182 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df60d9f3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 0238d57..c27180e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -19,8 +19,11 @@ package org.apache.spark.sql.catalyst
 
 import java.lang.reflect.Constructor
 
+import scala.util.Properties
+
 import org.apache.commons.lang3.reflect.ConstructorUtils
 
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, 
UnresolvedAttribute, UnresolvedExtractValue}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.objects._
@@ -879,7 +882,7 @@ object ScalaReflection extends ScalaReflection {
  * Support for generating catalyst schemas for scala objects.  Note that 
unlike its companion
  * object, this trait able to work in both the runtime and the compile time 
(macro) universe.
  */
-trait ScalaReflection {
+trait ScalaReflection extends Logging {
   /** The universe we work in (runtime or macro) */
   val universe: scala.reflect.api.Universe
 
@@ -933,6 +936,23 @@ trait ScalaReflection {
   }
 
   /**
+   * Returns the nullability of the input parameter types of the scala 
function object.
+   *
+   * Note that this only works with Scala 2.11, and the information returned 
may be inaccurate if
+   * used with a different Scala version.
+   */
+  def getParameterTypeNullability(func: AnyRef): Seq[Boolean] = {
+if (!Properties.versionString.contains("2.11")) {
+  logWarning(s"Scala ${Properties.versionString} cannot get type 
nullability correctly via " +
+"reflection, thus Spark cannot add proper input null check for UDF. To 
avoid this " +
+"problem, use the typed UDF interfaces instead.")
+}
+va

[1/2] spark git commit: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

2018-10-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 6e0fc8b0f -> e8167768c


http://git-wip-us.apache.org/repos/asf/spark/blob/e8167768/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index 697757f..eb956c4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -73,19 +73,24 @@ case class UserDefinedFunction protected[sql] (
*/
   @scala.annotation.varargs
   def apply(exprs: Column*): Column = {
-if (inputTypes.isDefined && nullableTypes.isDefined) {
-  require(inputTypes.get.length == nullableTypes.get.length)
+// TODO: make sure this class is only instantiated through 
`SparkUserDefinedFunction.create()`
+// and `nullableTypes` is always set.
+if (nullableTypes.isEmpty) {
+  nullableTypes = Some(ScalaReflection.getParameterTypeNullability(f))
+}
+if (inputTypes.isDefined) {
+  assert(inputTypes.get.length == nullableTypes.get.length)
 }
 
 Column(ScalaUDF(
   f,
   dataType,
   exprs.map(_.expr),
+  nullableTypes.get,
   inputTypes.getOrElse(Nil),
   udfName = _nameOption,
   nullable = _nullable,
-  udfDeterministic = _deterministic,
-  nullableTypes = nullableTypes.getOrElse(Nil)))
+  udfDeterministic = _deterministic))
   }
 
   private def copyAll(): UserDefinedFunction = {
@@ -146,9 +151,14 @@ private[sql] object SparkUserDefinedFunction {
   def create(
   f: AnyRef,
   dataType: DataType,
-  inputSchemas: Option[Seq[ScalaReflection.Schema]]): UserDefinedFunction 
= {
-val udf = new UserDefinedFunction(f, dataType, 
inputSchemas.map(_.map(_.dataType)))
-udf.nullableTypes = inputSchemas.map(_.map(_.nullable))
+  inputSchemas: Seq[Option[ScalaReflection.Schema]]): UserDefinedFunction 
= {
+val inputTypes = if (inputSchemas.contains(None)) {
+  None
+} else {
+  Some(inputSchemas.map(_.get.dataType))
+}
+val udf = new UserDefinedFunction(f, dataType, inputTypes)
+udf.nullableTypes = 
Some(inputSchemas.map(_.map(_.nullable).getOrElse(true)))
 udf
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/e8167768/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 8def996..dbf1f23 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3894,7 +3894,7 @@ object functions {
   (0 to 10).foreach { x =>
 val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
 val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: 
TypeTag")(_ + ", " + _)
-val inputSchemas = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor(typeTag[A$i]) :: $s"})
+val inputSchemas = (1 to x).foldRight("Nil")((i, s) => 
{s"Try(ScalaReflection.schemaFor(typeTag[A$i])).toOption :: $s"})
 println(s"""
   |/**
   | * Defines a Scala closure of $x arguments as user-defined function 
(UDF).
@@ -3907,7 +3907,7 @@ object functions {
   | */
   |def udf[$typeTags](f: Function$x[$types]): UserDefinedFunction = {
   |  val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[RT]
-  |  val inputSchemas = Try($inputTypes).toOption
+  |  val inputSchemas = $inputSchemas
   |  val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas)
   |  if (nullable) udf else udf.asNonNullable()
   |}""".stripMargin)
@@ -3931,7 +3931,7 @@ object functions {
   | */
   |def udf(f: UDF$i[$extTypeArgs], returnType: DataType): 
UserDefinedFunction = {
   |  val func = f$anyCast.call($anyParams)
-  |  SparkUserDefinedFunction.create($funcCall, returnType, inputSchemas = 
None)
+  |  SparkUserDefinedFunction.create($funcCall, returnType, inputSchemas = 
Seq.fill($i)(None))
   |}""".stripMargin)
   }
 
@@ -3952,7 +3952,7 @@ object functions {
*/
   def udf[RT: TypeTag](f: Function0[RT]): UserDefinedFunction = {
 val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[RT]
-val inputSchemas = Try(Nil).toOption
+val inputSchemas = Nil
 val udf = SparkUserDefinedFunction.create(f, dataType, inputSchemas)
 if (nullable) udf else udf.asNonNullable()
   }
@@ -3968,7 +3968,7 @@ object functions {
*/
   def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]): UserDefinedFunction 
= {
 val ScalaReflection.Schema(dataType, n

[2/2] spark git commit: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

2018-10-19 Thread wenchen

[SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature

## What changes were proposed in this pull request?

This is a follow-up PR for #22259. The extra field added in `ScalaUDF` with the 
original PR was declared optional, but should be indeed required, otherwise 
callers of `ScalaUDF`'s constructor could ignore this new field and cause the 
result to be incorrect. This PR makes the new field required and changes its 
name to `handleNullForInputs`.

#22259 breaks the previous behavior for null-handling of primitive-type input 
parameters. For example, for `val f = udf({(x: Int, y: Any) => x})`, `f(null, 
"str")` should return `null` but would return `0` after #22259. In this PR, all 
UDF methods except `def udf(f: AnyRef, dataType: DataType): 
UserDefinedFunction` have been restored with the original behavior. The only 
exception is documented in the Spark SQL migration guide.

In addition, now that we have this extra field indicating if a null-test should 
be applied on the corresponding input value, we can also make use of this flag 
to avoid the rule `HandleNullInputsForUDF` being applied infinitely.

## How was this patch tested?
Added UT in UDFSuite

Passed affected existing UTs:
AnalysisSuite
UDFSuite

Closes #22732 from maryannxue/spark-25044-followup.

Lead-authored-by: maryannxue 
Co-authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8167768
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8167768
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8167768

Branch: refs/heads/master
Commit: e8167768cfebfdb11acd8e0a06fe34ca43c14648
Parents: 6e0fc8b
Author: maryannxue 
Authored: Fri Oct 19 21:03:59 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Oct 19 21:03:59 2018 +0800

--
 .../spark/sql/catalyst/ScalaReflection.scala|  22 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  51 ++---
 .../sql/catalyst/expressions/ScalaUDF.scala |  14 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala   |  18 +-
 .../catalyst/expressions/ScalaUDFSuite.scala|   9 +-
 .../sql/catalyst/trees/TreeNodeSuite.scala  |   2 +-
 .../org/apache/spark/sql/UDFRegistration.scala  | 218 ++-
 .../datasources/FileFormatDataWriter.scala  |   3 +-
 .../sql/expressions/UserDefinedFunction.scala   |  24 +-
 .../scala/org/apache/spark/sql/functions.scala  |  54 ++---
 .../scala/org/apache/spark/sql/UDFSuite.scala   |  24 ++
 11 files changed, 257 insertions(+), 182 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e8167768/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 0238d57..c27180e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -19,8 +19,11 @@ package org.apache.spark.sql.catalyst
 
 import java.lang.reflect.Constructor
 
+import scala.util.Properties
+
 import org.apache.commons.lang3.reflect.ConstructorUtils
 
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, 
UnresolvedAttribute, UnresolvedExtractValue}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.objects._
@@ -879,7 +882,7 @@ object ScalaReflection extends ScalaReflection {
  * Support for generating catalyst schemas for scala objects.  Note that 
unlike its companion
  * object, this trait able to work in both the runtime and the compile time 
(macro) universe.
  */
-trait ScalaReflection {
+trait ScalaReflection extends Logging {
   /** The universe we work in (runtime or macro) */
   val universe: scala.reflect.api.Universe
 
@@ -933,6 +936,23 @@ trait ScalaReflection {
   }
 
   /**
+   * Returns the nullability of the input parameter types of the scala 
function object.
+   *
+   * Note that this only works with Scala 2.11, and the information returned 
may be inaccurate if
+   * used with a different Scala version.
+   */
+  def getParameterTypeNullability(func: AnyRef): Seq[Boolean] = {
+if (!Properties.versionString.contains("2.11")) {
+  logWarning(s"Scala ${Properties.versionString} cannot get type 
nullability correctly via " +
+"reflection, thus Spark cannot add proper input null check for UDF. To 
avoid this " +
+"problem, use the typed UDF interfaces instead.")
+}
+val methods = func.getClass.getMethods.filter(m => m.getName == "apply" && 
!m.isBridge)
+assert(me

svn commit: r30157 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_04_02-6e0fc8b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 11:17:07 2018
New Revision: 30157

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_04_02-6e0fc8b docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r30156 - in /dev/spark/2.4.1-SNAPSHOT-2018_10_19_02_02-9ed2e42-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 09:17:01 2018
New Revision: 30156

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_10_19_02_02-9ed2e42 docs


[This commit notification would consist of 1478 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25560][SQL] Allow FunctionInjection in SparkExtensions

2018-10-19 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/master c8f7691c6 -> 6e0fc8b0f


[SPARK-25560][SQL] Allow FunctionInjection in SparkExtensions

This allows an implementer of Spark Session Extensions to utilize a
method "injectFunction" which will add a new function to the default
Spark Session Catalogue.

## What changes were proposed in this pull request?

Adds a new function to SparkSessionExtensions

def injectFunction(functionDescription: FunctionDescription)

Where function description is a new type

  type FunctionDescription = (FunctionIdentifier, FunctionBuilder)

The functions are loaded in BaseSessionBuilder when the function registry does 
not have a parent
function registry to get loaded from.

## How was this patch tested?

New unit tests are added for the extension in SparkSessionExtensionSuite

Closes #22576 from RussellSpitzer/SPARK-25560.

Authored-by: Russell Spitzer 
Signed-off-by: Herman van Hovell 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6e0fc8b0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6e0fc8b0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6e0fc8b0

Branch: refs/heads/master
Commit: 6e0fc8b0fc2798b6372d1101f7996f57bae8fea4
Parents: c8f7691
Author: Russell Spitzer 
Authored: Fri Oct 19 10:40:56 2018 +0200
Committer: Herman van Hovell 
Committed: Fri Oct 19 10:40:56 2018 +0200

--
 .../spark/sql/SparkSessionExtensions.scala  | 22 ++
 .../sql/internal/BaseSessionStateBuilder.scala  |  3 ++-
 .../spark/sql/SparkSessionExtensionSuite.scala  | 24 ++--
 3 files changed, 46 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6e0fc8b0/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
index 6b02ac2..a486434 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
@@ -20,6 +20,10 @@ package org.apache.spark.sql
 import scala.collection.mutable
 
 import org.apache.spark.annotation.{DeveloperApi, Experimental, 
InterfaceStability}
+import org.apache.spark.sql.catalyst.FunctionIdentifier
+import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.FunctionBuilder
+import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
 import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
@@ -68,6 +72,7 @@ class SparkSessionExtensions {
   type CheckRuleBuilder = SparkSession => LogicalPlan => Unit
   type StrategyBuilder = SparkSession => Strategy
   type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
+  type FunctionDescription = (FunctionIdentifier, ExpressionInfo, 
FunctionBuilder)
 
   private[this] val resolutionRuleBuilders = mutable.Buffer.empty[RuleBuilder]
 
@@ -171,4 +176,21 @@ class SparkSessionExtensions {
   def injectParser(builder: ParserBuilder): Unit = {
 parserBuilders += builder
   }
+
+  private[this] val injectedFunctions = 
mutable.Buffer.empty[FunctionDescription]
+
+  private[sql] def registerFunctions(functionRegistry: FunctionRegistry) = {
+for ((name, expressionInfo, function) <- injectedFunctions) {
+  functionRegistry.registerFunction(name, expressionInfo, function)
+}
+functionRegistry
+  }
+
+  /**
+  * Injects a custom function into the 
[[org.apache.spark.sql.catalyst.analysis.FunctionRegistry]]
+  * at runtime for all sessions.
+  */
+  def injectFunction(functionDescription: FunctionDescription): Unit = {
+injectedFunctions += functionDescription
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/6e0fc8b0/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
index 60bba5e..f67cc32 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
@@ -95,7 +95,8 @@ abstract class BaseSessionStateBuilder(
* This either gets cloned from a pre-existing version or cloned from the 
built-in registry.
*/
   protected lazy val functionRegistry: FunctionRegistry =

svn commit: r30154 - in /dev/spark/3.0.0-SNAPSHOT-2018_10_19_00_02-c8f7691-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-10-19 Thread pwendell

Author: pwendell
Date: Fri Oct 19 07:17:24 2018
New Revision: 30154

Log:
Apache Spark 3.0.0-SNAPSHOT-2018_10_19_00_02-c8f7691 docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

41 matches

Mail list logo