from:"rxin"

spark git commit: [SPARK-17651][SPARKR] Set R package version number along with mvn

2016-09-23 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 452e468f2 -> b111a81f2


[SPARK-17651][SPARKR] Set R package version number along with mvn

This PR sets the R package version while tagging releases. Note that since R 
doesn't accept `-SNAPSHOT` in version number field, we remove that while 
setting the next version

Tested manually by running locally

Author: Shivaram Venkataraman 

Closes #15223 from shivaram/sparkr-version-change.

(cherry picked from commit 7c382524a959a2bc9b3d2fca44f6f0b41aba4e3c)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b111a81f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b111a81f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b111a81f

Branch: refs/heads/branch-2.0
Commit: b111a81f2a5547e2357d66db4ba2f05ce69a52a6
Parents: 452e468
Author: Shivaram Venkataraman 
Authored: Fri Sep 23 14:35:18 2016 -0700
Committer: Reynold Xin 
Committed: Fri Sep 23 14:36:01 2016 -0700

--
 dev/create-release/release-tag.sh | 15 +++
 1 file changed, 15 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b111a81f/dev/create-release/release-tag.sh
--
diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index d404939..b7e5100 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -60,12 +60,27 @@ git config user.email $GIT_EMAIL
 
 # Create release version
 $MVN versions:set -DnewVersion=$RELEASE_VERSION | grep -v "no value" # silence 
logs
+# Set the release version in R/pkg/DESCRIPTION
+sed -i".tmp1" 's/Version.*$/Version: '"$RELEASE_VERSION"'/g' R/pkg/DESCRIPTION
+# Set the release version in docs
+sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
+sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
+
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
 git tag $RELEASE_TAG
 
 # Create next version
 $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence 
logs
+# Remove -SNAPSHOT before setting the R version as R expects version strings 
to only have numbers
+R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
+sed -i".tmp2" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
+
+# Update docs with next version
+sed -i".tmp3" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
+# Use R version for short version
+sed -i".tmp4" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+
 git commit -a -m "Preparing development version $NEXT_VERSION"
 
 # Push changes


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17651][SPARKR] Set R package version number along with mvn

2016-09-23 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 90a30f463 -> 7c382524a


[SPARK-17651][SPARKR] Set R package version number along with mvn

## What changes were proposed in this pull request?

This PR sets the R package version while tagging releases. Note that since R 
doesn't accept `-SNAPSHOT` in version number field, we remove that while 
setting the next version

## How was this patch tested?

Tested manually by running locally

Author: Shivaram Venkataraman 

Closes #15223 from shivaram/sparkr-version-change.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c382524
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c382524
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c382524

Branch: refs/heads/master
Commit: 7c382524a959a2bc9b3d2fca44f6f0b41aba4e3c
Parents: 90a30f4
Author: Shivaram Venkataraman 
Authored: Fri Sep 23 14:35:18 2016 -0700
Committer: Reynold Xin 
Committed: Fri Sep 23 14:35:18 2016 -0700

--
 dev/create-release/release-tag.sh | 15 +++
 1 file changed, 15 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7c382524/dev/create-release/release-tag.sh
--
diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index d404939..b7e5100 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -60,12 +60,27 @@ git config user.email $GIT_EMAIL
 
 # Create release version
 $MVN versions:set -DnewVersion=$RELEASE_VERSION | grep -v "no value" # silence 
logs
+# Set the release version in R/pkg/DESCRIPTION
+sed -i".tmp1" 's/Version.*$/Version: '"$RELEASE_VERSION"'/g' R/pkg/DESCRIPTION
+# Set the release version in docs
+sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
+sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
+
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
 git tag $RELEASE_TAG
 
 # Create next version
 $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence 
logs
+# Remove -SNAPSHOT before setting the R version as R expects version strings 
to only have numbers
+R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
+sed -i".tmp2" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
+
+# Update docs with next version
+sed -i".tmp3" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
+# Use R version for short version
+sed -i".tmp4" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+
 git commit -a -m "Preparing development version $NEXT_VERSION"
 
 # Push changes


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec

2016-09-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a16619683 -> 79159a1e8


[SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec

## What changes were proposed in this pull request?

"agg_plan" are hardcoded in HashAggregateExec, which have potential issue, so 
removing them.

## How was this patch tested?

existing tests.

Author: Yucai Yu 

Closes #15199 from yucai/agg_plan.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79159a1e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79159a1e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79159a1e

Branch: refs/heads/master
Commit: 79159a1e87f19fb08a36857fc30b600ee7fdc52b
Parents: a166196
Author: Yucai Yu 
Authored: Thu Sep 22 17:22:56 2016 -0700
Committer: Reynold Xin 
Committed: Thu Sep 22 17:22:56 2016 -0700

--
 .../apache/spark/sql/execution/aggregate/HashAggregateExec.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/79159a1e/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 59e132d..06199ef 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -552,7 +552,7 @@ case class HashAggregateExec(
   } else {
 ctx.addMutableState(fastHashMapClassName, fastHashMapTerm,
   s"$fastHashMapTerm = new $fastHashMapClassName(" +
-s"agg_plan.getTaskMemoryManager(), 
agg_plan.getEmptyAggregationBuffer());")
+s"$thisPlan.getTaskMemoryManager(), 
$thisPlan.getEmptyAggregationBuffer());")
 ctx.addMutableState(
   "org.apache.spark.unsafe.KVIterator",
   iterTermForFastHashMap, "")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Skip building R vignettes if Spark is not built

2016-09-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 b25a8e6e1 -> f14f47f07


Skip building R vignettes if Spark is not built

## What changes were proposed in this pull request?

When we build the docs separately we don't have the JAR files from the Spark 
build in
the same tree. As the SparkR vignettes need to launch a SparkContext to be 
built, we skip building them if JAR files don't exist

## How was this patch tested?

To test this we can run the following:
```
build/mvn -DskipTests -Psparkr clean
./R/create-docs.sh
```
You should see a line `Skipping R vignettes as Spark JARs not found` at the end

Author: Shivaram Venkataraman 

Closes #15200 from shivaram/sparkr-vignette-skip.

(cherry picked from commit 9f24a17c59b1130d97efa7d313c06577f7344338)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f14f47f0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f14f47f0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f14f47f0

Branch: refs/heads/branch-2.0
Commit: f14f47f072a392df0ebe908f1c57b6eb858105b7
Parents: b25a8e6
Author: Shivaram Venkataraman 
Authored: Thu Sep 22 11:52:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Sep 22 11:54:51 2016 -0700

--
 R/create-docs.sh | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f14f47f0/R/create-docs.sh
--
diff --git a/R/create-docs.sh b/R/create-docs.sh
index 0dfba22..69ffc5f 100755
--- a/R/create-docs.sh
+++ b/R/create-docs.sh
@@ -30,6 +30,13 @@ set -e
 
 # Figure out where the script is
 export FWDIR="$(cd "`dirname "$0"`"; pwd)"
+export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
+
+# Required for setting SPARK_SCALA_VERSION
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
+echo "Using Scala $SPARK_SCALA_VERSION"
+
 pushd $FWDIR
 
 # Install the package (this will also generate the Rd files)
@@ -45,9 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, 
lib.loc=libDir); library(knit
 
 popd
 
-# render creates SparkR vignettes
-Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", 
paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); 
render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
+# Find Spark jars.
+if [ -f "${SPARK_HOME}/RELEASE" ]; then
+  SPARK_JARS_DIR="${SPARK_HOME}/jars"
+else
+  
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
+fi
+
+# Only create vignettes if Spark JARs exist
+if [ -d "$SPARK_JARS_DIR" ]; then
+  # render creates SparkR vignettes
+  Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", 
paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); 
render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
 
-find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not 
-name '*.pdf' -not -name '*.html' -delete
+  find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' 
-not -name '*.pdf' -not -name '*.html' -delete
+else
+  echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
+fi
 
 popd


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Skip building R vignettes if Spark is not built

2016-09-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 17b72d31e -> 9f24a17c5


Skip building R vignettes if Spark is not built

## What changes were proposed in this pull request?

When we build the docs separately we don't have the JAR files from the Spark 
build in
the same tree. As the SparkR vignettes need to launch a SparkContext to be 
built, we skip building them if JAR files don't exist

## How was this patch tested?

To test this we can run the following:
```
build/mvn -DskipTests -Psparkr clean
./R/create-docs.sh
```
You should see a line `Skipping R vignettes as Spark JARs not found` at the end

Author: Shivaram Venkataraman 

Closes #15200 from shivaram/sparkr-vignette-skip.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9f24a17c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9f24a17c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9f24a17c

Branch: refs/heads/master
Commit: 9f24a17c59b1130d97efa7d313c06577f7344338
Parents: 17b72d3
Author: Shivaram Venkataraman 
Authored: Thu Sep 22 11:52:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Sep 22 11:52:42 2016 -0700

--
 R/create-docs.sh | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9f24a17c/R/create-docs.sh
--
diff --git a/R/create-docs.sh b/R/create-docs.sh
index 0dfba22..69ffc5f 100755
--- a/R/create-docs.sh
+++ b/R/create-docs.sh
@@ -30,6 +30,13 @@ set -e
 
 # Figure out where the script is
 export FWDIR="$(cd "`dirname "$0"`"; pwd)"
+export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
+
+# Required for setting SPARK_SCALA_VERSION
+. "${SPARK_HOME}"/bin/load-spark-env.sh
+
+echo "Using Scala $SPARK_SCALA_VERSION"
+
 pushd $FWDIR
 
 # Install the package (this will also generate the Rd files)
@@ -45,9 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, 
lib.loc=libDir); library(knit
 
 popd
 
-# render creates SparkR vignettes
-Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", 
paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); 
render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
+# Find Spark jars.
+if [ -f "${SPARK_HOME}/RELEASE" ]; then
+  SPARK_JARS_DIR="${SPARK_HOME}/jars"
+else
+  
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
+fi
+
+# Only create vignettes if Spark JARs exist
+if [ -d "$SPARK_JARS_DIR" ]; then
+  # render creates SparkR vignettes
+  Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", 
paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); 
render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
 
-find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not 
-name '*.pdf' -not -name '*.html' -delete
+  find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' 
-not -name '*.pdf' -not -name '*.html' -delete
+else
+  echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
+fi
 
 popd


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Bump doc version for release 2.0.1.

2016-09-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ec377e773 -> 053b20a79


Bump doc version for release 2.0.1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/053b20a7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/053b20a7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/053b20a7

Branch: refs/heads/branch-2.0
Commit: 053b20a79c1824917c17405f30a7b91472311abe
Parents: ec377e7
Author: Reynold Xin 
Authored: Wed Sep 21 21:06:47 2016 -0700
Committer: Reynold Xin 
Committed: Wed Sep 21 21:06:47 2016 -0700

--
 docs/_config.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/053b20a7/docs/_config.yml
--
diff --git a/docs/_config.yml b/docs/_config.yml
index 3951cad..75c89bd 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -14,8 +14,8 @@ include:
 
 # These allow the documentation to be updated with newer releases
 # of Spark, Scala, and Mesos.
-SPARK_VERSION: 2.0.0
-SPARK_VERSION_SHORT: 2.0.0
+SPARK_VERSION: 2.0.1
+SPARK_VERSION_SHORT: 2.0.1
 SCALA_BINARY_VERSION: "2.11"
 SCALA_VERSION: "2.11.7"
 MESOS_VERSION: 0.21.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode

2016-09-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 3497ebe51 -> 8bde03bf9


[SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding 
mode

## What changes were proposed in this pull request?

Floor()/Ceil() of decimal is implemented using changePrecision() by passing a 
rounding mode, but the rounding mode is not respected when the decimal is in 
compact mode (could fit within a Long).

This Update the changePrecision() to respect rounding mode, which could be 
ROUND_FLOOR, ROUND_CEIL, ROUND_HALF_UP, ROUND_HALF_EVEN.

## How was this patch tested?

Added regression tests.

Author: Davies Liu 

Closes #15154 from davies/decimal_round.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8bde03bf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8bde03bf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8bde03bf

Branch: refs/heads/master
Commit: 8bde03bf9a0896ea59ceaa699df7700351a130fb
Parents: 3497ebe
Author: Davies Liu 
Authored: Wed Sep 21 21:02:30 2016 -0700
Committer: Reynold Xin 
Committed: Wed Sep 21 21:02:30 2016 -0700

--
 .../org/apache/spark/sql/types/Decimal.scala| 28 +---
 .../apache/spark/sql/types/DecimalSuite.scala   | 15 +++
 2 files changed, 39 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8bde03bf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index cc8175c..7085905 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -242,10 +242,30 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   if (scale < _scale) {
 // Easier case: we just need to divide our scale down
 val diff = _scale - scale
-val droppedDigits = longVal % POW_10(diff)
-longVal /= POW_10(diff)
-if (math.abs(droppedDigits) * 2 >= POW_10(diff)) {
-  longVal += (if (longVal < 0) -1L else 1L)
+val pow10diff = POW_10(diff)
+// % and / always round to 0
+val droppedDigits = longVal % pow10diff
+longVal /= pow10diff
+roundMode match {
+  case ROUND_FLOOR =>
+if (droppedDigits < 0) {
+  longVal += -1L
+}
+  case ROUND_CEILING =>
+if (droppedDigits > 0) {
+  longVal += 1L
+}
+  case ROUND_HALF_UP =>
+if (math.abs(droppedDigits) * 2 >= pow10diff) {
+  longVal += (if (droppedDigits < 0) -1L else 1L)
+}
+  case ROUND_HALF_EVEN =>
+val doubled = math.abs(droppedDigits) * 2
+if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 
0) {
+  longVal += (if (droppedDigits < 0) -1L else 1L)
+}
+  case _ =>
+sys.error(s"Not supported rounding mode: $roundMode")
 }
   } else if (scale > _scale) {
 // We might be able to multiply longVal by a power of 10 and not 
overflow, but if not,

http://git-wip-us.apache.org/repos/asf/spark/blob/8bde03bf/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
index a10c0e3..52d0692 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.types
 import org.scalatest.PrivateMethodTester
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types.Decimal._
 
 class DecimalSuite extends SparkFunSuite with PrivateMethodTester {
   /** Check that a Decimal has the given string representation, precision and 
scale */
@@ -191,4 +192,18 @@ class DecimalSuite extends SparkFunSuite with 
PrivateMethodTester {
 assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L)
 assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue)
   }
+
+  test("changePrecision() on compact decimal should respect rounding mode") {
+Seq(ROUND_FLOOR, ROUND_CEILING, ROUND_HALF_UP, ROUND_HALF_EVEN).foreach { 
mode =>
+  Seq("0.4", "0.5", "0.6", "1.0", "1.1", "1.6", "2.5", "5.5").foreach { n 
=>
+Seq("", "-").foreach {

spark git commit: [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode

2016-09-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 966abd6af -> ec377e773


[SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding 
mode

## What changes were proposed in this pull request?

Floor()/Ceil() of decimal is implemented using changePrecision() by passing a 
rounding mode, but the rounding mode is not respected when the decimal is in 
compact mode (could fit within a Long).

This Update the changePrecision() to respect rounding mode, which could be 
ROUND_FLOOR, ROUND_CEIL, ROUND_HALF_UP, ROUND_HALF_EVEN.

## How was this patch tested?

Added regression tests.

Author: Davies Liu 

Closes #15154 from davies/decimal_round.

(cherry picked from commit 8bde03bf9a0896ea59ceaa699df7700351a130fb)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec377e77
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec377e77
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec377e77

Branch: refs/heads/branch-2.0
Commit: ec377e77307b477d20a642edcd5ad5e26b989de6
Parents: 966abd6
Author: Davies Liu 
Authored: Wed Sep 21 21:02:30 2016 -0700
Committer: Reynold Xin 
Committed: Wed Sep 21 21:02:42 2016 -0700

--
 .../org/apache/spark/sql/types/Decimal.scala| 28 +---
 .../apache/spark/sql/types/DecimalSuite.scala   | 15 +++
 2 files changed, 39 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ec377e77/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index cc8175c..7085905 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -242,10 +242,30 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   if (scale < _scale) {
 // Easier case: we just need to divide our scale down
 val diff = _scale - scale
-val droppedDigits = longVal % POW_10(diff)
-longVal /= POW_10(diff)
-if (math.abs(droppedDigits) * 2 >= POW_10(diff)) {
-  longVal += (if (longVal < 0) -1L else 1L)
+val pow10diff = POW_10(diff)
+// % and / always round to 0
+val droppedDigits = longVal % pow10diff
+longVal /= pow10diff
+roundMode match {
+  case ROUND_FLOOR =>
+if (droppedDigits < 0) {
+  longVal += -1L
+}
+  case ROUND_CEILING =>
+if (droppedDigits > 0) {
+  longVal += 1L
+}
+  case ROUND_HALF_UP =>
+if (math.abs(droppedDigits) * 2 >= pow10diff) {
+  longVal += (if (droppedDigits < 0) -1L else 1L)
+}
+  case ROUND_HALF_EVEN =>
+val doubled = math.abs(droppedDigits) * 2
+if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 
0) {
+  longVal += (if (droppedDigits < 0) -1L else 1L)
+}
+  case _ =>
+sys.error(s"Not supported rounding mode: $roundMode")
 }
   } else if (scale > _scale) {
 // We might be able to multiply longVal by a power of 10 and not 
overflow, but if not,

http://git-wip-us.apache.org/repos/asf/spark/blob/ec377e77/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
index e1675c9..4cf329d 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala
@@ -22,6 +22,7 @@ import scala.language.postfixOps
 import org.scalatest.PrivateMethodTester
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types.Decimal._
 
 class DecimalSuite extends SparkFunSuite with PrivateMethodTester {
   /** Check that a Decimal has the given string representation, precision and 
scale */
@@ -193,4 +194,18 @@ class DecimalSuite extends SparkFunSuite with 
PrivateMethodTester {
 assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L)
 assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue)
   }
+
+  test("changePrecision() on compact decimal should respect rounding mode") {
+Seq(ROUND_FLOOR, ROUND_CEILING, ROUND_HALF_UP, ROUND_HALF_EVEN).foreach {

spark git commit: [SPARK-17627] Mark Streaming Providers Experimental

2016-09-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 59e6ab11a -> 966abd6af


[SPARK-17627] Mark Streaming Providers Experimental

All of structured streaming is experimental in its first release.  We missed 
the annotation on two of the APIs.

Author: Michael Armbrust 

Closes #15188 from marmbrus/experimentalApi.

(cherry picked from commit 3497ebe511fee67e66387e9e737c843a2939ce45)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/966abd6a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/966abd6a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/966abd6a

Branch: refs/heads/branch-2.0
Commit: 966abd6af04b8e7b5f6446cba96f1825ca2bfcfa
Parents: 59e6ab1
Author: Michael Armbrust 
Authored: Wed Sep 21 20:59:46 2016 -0700
Committer: Reynold Xin 
Committed: Wed Sep 21 20:59:52 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/sources/interfaces.scala | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/966abd6a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
index d2077a0..b84953d 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
@@ -112,8 +112,10 @@ trait SchemaRelationProvider {
 }
 
 /**
+ * ::Experimental::
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
+@Experimental
 trait StreamSourceProvider {
 
   /** Returns the name and schema of the source that can be used to 
continually read data. */
@@ -132,8 +134,10 @@ trait StreamSourceProvider {
 }
 
 /**
+ * ::Experimental::
  * Implemented by objects that can produce a streaming [[Sink]] for a specific 
format or system.
  */
+@Experimental
 trait StreamSinkProvider {
   def createSink(
   sqlContext: SQLContext,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17627] Mark Streaming Providers Experimental

2016-09-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 6902edab7 -> 3497ebe51


[SPARK-17627] Mark Streaming Providers Experimental

All of structured streaming is experimental in its first release.  We missed 
the annotation on two of the APIs.

Author: Michael Armbrust 

Closes #15188 from marmbrus/experimentalApi.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3497ebe5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3497ebe5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3497ebe5

Branch: refs/heads/master
Commit: 3497ebe511fee67e66387e9e737c843a2939ce45
Parents: 6902eda
Author: Michael Armbrust 
Authored: Wed Sep 21 20:59:46 2016 -0700
Committer: Reynold Xin 
Committed: Wed Sep 21 20:59:46 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/sources/interfaces.scala | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3497ebe5/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
index a16d7ed..6484c78 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
@@ -112,8 +112,10 @@ trait SchemaRelationProvider {
 }
 
 /**
+ * ::Experimental::
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
+@Experimental
 trait StreamSourceProvider {
 
   /** Returns the name and schema of the source that can be used to 
continually read data. */
@@ -132,8 +134,10 @@ trait StreamSourceProvider {
 }
 
 /**
+ * ::Experimental::
  * Implemented by objects that can produce a streaming [[Sink]] for a specific 
format or system.
  */
+@Experimental
 trait StreamSinkProvider {
   def createSink(
   sqlContext: SQLContext,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Fix CheckStyle Error

2016-09-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 976f3b122 -> 1ea49916a


[MINOR][BUILD] Fix CheckStyle Error

## What changes were proposed in this pull request?
This PR is to fix the code style errors before 2.0.1 release.

## How was this patch tested?
Manual.

Before:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[153] 
(sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[196] 
(sizes) LineLength: Line is longer than 100 characters (found 108).
[ERROR] 
src/main/java/org/apache/spark/network/client/TransportClient.java:[239] 
(sizes) LineLength: Line is longer than 100 characters (found 115).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119]
 (sizes) LineLength: Line is longer than 100 characters (found 107).
[ERROR] 
src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129]
 (sizes) LineLength: Line is longer than 100 characters (found 104).
[ERROR] 
src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] 
(modifier) ModifierOrder: 'static' modifier out of order with the JLS 
suggestions.
[ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[26] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[33]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[38]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[43]
 (sizes) LineLength: Line is longer than 100 characters (found 106).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[48]
 (sizes) LineLength: Line is longer than 100 characters (found 110).
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java:[0]
 (misc) NewlineAtEndOfFile: File does not end with a newline.
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[67]
 (sizes) LineLength: Line is longer than 100 characters (found 106).
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[200] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[309] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[332] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
[ERROR] 
src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[348] 
(regexp) RegexpSingleline: No trailing whitespace allowed.
 ```
After:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```

Author: Weiqing Yang 

Closes #15170 from Sherry302/fixjavastyle.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1ea49916
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1ea49916
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1ea49916

Branch: refs/heads/master
Commit: 1ea49916acc46b0a74e5c85eef907920c5e31142
Parents: 976f3b1
Author: Weiqing Yang 
Authored: Tue Sep 20 21:48:25 2016 -0700
Committer: Reynold Xin 
Committed: Tue Sep 20 21:48:25 2016 -0700

--
 .../apache/spark/network/client/TransportClient.java| 11 ++-
 .../spark/network/server/TransportRequestHandler.java   |  7 ---
 .../org/apache/spark/network/util/LevelDBProvider.java  |  2 +-
 .../org/apache/spark/network/util/TransportConf.java|  2 +-
 .../util/collection/unsafe/sort/PrefixComparators.java  | 12 
 .../collection/unsafe/sort/UnsafeInMemorySorter.java|  2 +-
 .../collection/unsafe/sort/UnsafeSorterSpillReader.java |  4 ++--
 7 files changed, 23 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1ea49916/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
--
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
 
b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
index 600b80e..7e7d78d 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
+++

spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

2016-09-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 7e418e99c -> 976f3b122


[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches 
that have already been fully processed. I used the purge method that was added 
as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in #15067, 
but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the 
changes in this pull request.

Author: petermaxlee 

Closes #15166 from petermaxlee/SPARK-17513-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/976f3b12
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/976f3b12
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/976f3b12

Branch: refs/heads/master
Commit: 976f3b1227c1a9e0b878e010531285fdba57b6a7
Parents: 7e418e9
Author: petermaxlee 
Authored: Tue Sep 20 19:08:07 2016 -0700
Committer: Reynold Xin 
Committed: Tue Sep 20 19:08:07 2016 -0700

--
 .../sql/execution/streaming/MetadataLog.scala   |  1 +
 .../execution/streaming/StreamExecution.scala   |  7 ++
 .../sql/streaming/StreamingQuerySuite.scala | 24 
 3 files changed, 32 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index 78d6be1..9e2604c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming
  *  - Allow the user to query the latest batch id.
  *  - Allow the user to query the metadata object of a specified batch id.
  *  - Allow the user to query metadata objects in a range of batch ids.
+ *  - Allow the user to remove obsolete metadata
  */
 trait MetadataLog[T] {
 

http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index a1aae61..220f77d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -290,6 +290,13 @@ class StreamExecution(
   assert(offsetLog.add(currentBatchId, 
availableOffsets.toCompositeOffset(sources)),
 s"Concurrent update to the log. Multiple streaming jobs detected for 
$currentBatchId")
   logInfo(s"Committed offsets for batch $currentBatchId.")
+
+  // Now that we have logged the new batch, no further processing will 
happen for
+  // the previous batch, and it is safe to discard the old metadata.
+  // Note that purge is exclusive, i.e. it purges everything before 
currentBatchId.
+  // NOTE: If StreamExecution implements pipeline parallelism (multiple 
batches in
+  // flight at the same time), this cleanup logic will need to change.
+  offsetLog.purge(currentBatchId)
 } else {
   awaitBatchLock.lock()
   try {

http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 9d58315..88f1f18 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
@@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter {
 )
   }
 
+  testQuietly("StreamExecution metadata garbage collection") {
+val inputData = MemoryStream[Int]
+val mapped = inputData.toDS().map(6 / _)
+
+// Run 3 batches, and then assert that only 1 metadata file is left at the 
end
+// since the first 2

spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

2016-09-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 8d8e2332c -> 726f05716


[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches 
that have already been fully processed. I used the purge method that was added 
as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in #15067, 
but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the 
changes in this pull request.

Author: petermaxlee 

Closes #15166 from petermaxlee/SPARK-17513-2.

(cherry picked from commit 976f3b1227c1a9e0b878e010531285fdba57b6a7)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/726f0571
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/726f0571
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/726f0571

Branch: refs/heads/branch-2.0
Commit: 726f05716b6c1c5021460483eedb0c8ca55d9276
Parents: 8d8e233
Author: petermaxlee 
Authored: Tue Sep 20 19:08:07 2016 -0700
Committer: Reynold Xin 
Committed: Tue Sep 20 19:08:15 2016 -0700

--
 .../sql/execution/streaming/MetadataLog.scala   |  1 +
 .../execution/streaming/StreamExecution.scala   |  7 ++
 .../sql/streaming/StreamingQuerySuite.scala | 24 
 3 files changed, 32 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index 78d6be1..9e2604c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming
  *  - Allow the user to query the latest batch id.
  *  - Allow the user to query the metadata object of a specified batch id.
  *  - Allow the user to query metadata objects in a range of batch ids.
+ *  - Allow the user to remove obsolete metadata
  */
 trait MetadataLog[T] {
 

http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index 5e1e5ee..b7587f2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -290,6 +290,13 @@ class StreamExecution(
   assert(offsetLog.add(currentBatchId, 
availableOffsets.toCompositeOffset(sources)),
 s"Concurrent update to the log. Multiple streaming jobs detected for 
$currentBatchId")
   logInfo(s"Committed offsets for batch $currentBatchId.")
+
+  // Now that we have logged the new batch, no further processing will 
happen for
+  // the previous batch, and it is safe to discard the old metadata.
+  // Note that purge is exclusive, i.e. it purges everything before 
currentBatchId.
+  // NOTE: If StreamExecution implements pipeline parallelism (multiple 
batches in
+  // flight at the same time), this cleanup logic will need to change.
+  offsetLog.purge(currentBatchId)
 } else {
   awaitBatchLock.lock()
   try {

http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 9d58315..88f1f18 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
@@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter {
 )
   }
 
+  testQuietly("StreamExecution metadata garbage collection") {
+val inputData = MemoryStream[Int]
+val mapped =

spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

2016-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7026eb87e -> 5456a1b4f


[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches 
that have already been fully processed. I used the purge method that was added 
as part of SPARK-17235.

This is based on work by frreiss in #15067, but fixed the test case along with 
some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the 
changes in this pull request.

Author: petermaxlee 
Author: frreiss 

Closes #15126 from petermaxlee/SPARK-17513.

(cherry picked from commit be9d57fc9d8b10e4234c01c06ed43fd7dd12c07b)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5456a1b4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5456a1b4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5456a1b4

Branch: refs/heads/branch-2.0
Commit: 5456a1b4fcd85d0d7f2f1cc64e44967def0950bf
Parents: 7026eb8
Author: petermaxlee 
Authored: Mon Sep 19 22:19:51 2016 -0700
Committer: Reynold Xin 
Committed: Mon Sep 19 22:19:58 2016 -0700

--
 .../sql/execution/streaming/MetadataLog.scala   |  1 +
 .../execution/streaming/StreamExecution.scala   |  7 ++
 .../sql/streaming/StreamingQuerySuite.scala | 24 
 3 files changed, 32 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index 78d6be1..9e2604c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming
  *  - Allow the user to query the latest batch id.
  *  - Allow the user to query the metadata object of a specified batch id.
  *  - Allow the user to query metadata objects in a range of batch ids.
+ *  - Allow the user to remove obsolete metadata
  */
 trait MetadataLog[T] {
 

http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index 5e1e5ee..b7587f2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -290,6 +290,13 @@ class StreamExecution(
   assert(offsetLog.add(currentBatchId, 
availableOffsets.toCompositeOffset(sources)),
 s"Concurrent update to the log. Multiple streaming jobs detected for 
$currentBatchId")
   logInfo(s"Committed offsets for batch $currentBatchId.")
+
+  // Now that we have logged the new batch, no further processing will 
happen for
+  // the previous batch, and it is safe to discard the old metadata.
+  // Note that purge is exclusive, i.e. it purges everything before 
currentBatchId.
+  // NOTE: If StreamExecution implements pipeline parallelism (multiple 
batches in
+  // flight at the same time), this cleanup logic will need to change.
+  offsetLog.purge(currentBatchId)
 } else {
   awaitBatchLock.lock()
   try {

http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 9d58315..d3e2cab 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
@@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter {
 )
   }
 
+  testQuietly("StreamExecution metadata garbage collection") {
+val inputData = MemoryStream[Int]
+val mapped =

spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

2016-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 26145a5af -> be9d57fc9


[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata

## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches 
that have already been fully processed. I used the purge method that was added 
as part of SPARK-17235.

This is based on work by frreiss in #15067, but fixed the test case along with 
some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the 
changes in this pull request.

Author: petermaxlee 
Author: frreiss 

Closes #15126 from petermaxlee/SPARK-17513.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be9d57fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be9d57fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/be9d57fc

Branch: refs/heads/master
Commit: be9d57fc9d8b10e4234c01c06ed43fd7dd12c07b
Parents: 26145a5
Author: petermaxlee 
Authored: Mon Sep 19 22:19:51 2016 -0700
Committer: Reynold Xin 
Committed: Mon Sep 19 22:19:51 2016 -0700

--
 .../sql/execution/streaming/MetadataLog.scala   |  1 +
 .../execution/streaming/StreamExecution.scala   |  7 ++
 .../sql/streaming/StreamingQuerySuite.scala | 24 
 3 files changed, 32 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index 78d6be1..9e2604c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming
  *  - Allow the user to query the latest batch id.
  *  - Allow the user to query the metadata object of a specified batch id.
  *  - Allow the user to query metadata objects in a range of batch ids.
+ *  - Allow the user to remove obsolete metadata
  */
 trait MetadataLog[T] {
 

http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index a1aae61..220f77d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -290,6 +290,13 @@ class StreamExecution(
   assert(offsetLog.add(currentBatchId, 
availableOffsets.toCompositeOffset(sources)),
 s"Concurrent update to the log. Multiple streaming jobs detected for 
$currentBatchId")
   logInfo(s"Committed offsets for batch $currentBatchId.")
+
+  // Now that we have logged the new batch, no further processing will 
happen for
+  // the previous batch, and it is safe to discard the old metadata.
+  // Note that purge is exclusive, i.e. it purges everything before 
currentBatchId.
+  // NOTE: If StreamExecution implements pipeline parallelism (multiple 
batches in
+  // flight at the same time), this cleanup logic will need to change.
+  offsetLog.purge(currentBatchId)
 } else {
   awaitBatchLock.lock()
   try {

http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 9d58315..d3e2cab 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
@@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter {
 )
   }
 
+  testQuietly("StreamExecution metadata garbage collection") {
+val inputData = MemoryStream[Int]
+val mapped = inputData.toDS().map(6 / _)
+
+// Run 3 batches, and then assert that only 1 metadata file is left at the 
end
+// since the first 2

spark git commit: [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value

2016-09-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 151f808a1 -> 27ce39cf2


[SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value

## What changes were proposed in this pull request?
AssertOnQuery has two apply constructor: one that accepts a closure that 
returns boolean, and another that accepts a closure that returns Unit. This is 
actually very confusing because developers could mistakenly think that 
AssertOnQuery always require a boolean return type and verifies the return 
result, when indeed the value of the last statement is ignored in one of the 
constructors.

This pull request makes the two constructor consistent and always require 
boolean value. It will overall make the test suites more robust against 
developer errors.

As an evidence for the confusing behavior, this change also identified a bug 
with an existing test case due to file system time granularity. This pull 
request fixes that test case as well.

## How was this patch tested?
This is a test only change.

Author: petermaxlee 

Closes #15127 from petermaxlee/SPARK-17571.

(cherry picked from commit 8f0c35a4d0dd458719627be5f524792bf244d70a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27ce39cf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27ce39cf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27ce39cf

Branch: refs/heads/branch-2.0
Commit: 27ce39cf207eba46502ed11fcbfd51bed3e68f2b
Parents: 151f808
Author: petermaxlee 
Authored: Sun Sep 18 15:22:01 2016 -0700
Committer: Reynold Xin 
Committed: Sun Sep 18 15:22:08 2016 -0700

--
 .../apache/spark/sql/streaming/FileStreamSourceSuite.scala| 7 +--
 .../scala/org/apache/spark/sql/streaming/StreamTest.scala | 4 ++--
 .../spark/sql/streaming/StreamingQueryListenerSuite.scala | 3 +++
 3 files changed, 10 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
index 886f7be..a02a36c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
@@ -354,7 +354,9 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
 CheckAnswer("a", "b"),
 
 // SLeeps longer than 5ms (maxFileAge)
-AssertOnQuery { _ => Thread.sleep(10); true },
+// Unfortunately since a lot of file system does not have modification 
time granularity
+// finer grained than 1 sec, we need to use 1 sec here.
+AssertOnQuery { _ => Thread.sleep(1000); true },
 
 AddTextFileData("c\nd", src, tmp),
 CheckAnswer("a", "b", "c", "d"),
@@ -363,7 +365,8 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
   val source = streamExecution.logicalPlan.collect { case e: 
StreamingExecutionRelation =>
 e.source.asInstanceOf[FileStreamSource]
   }.head
-  source.seenFiles.size == 1
+  assert(source.seenFiles.size == 1)
+  true
 }
   )
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
index af2b581..6c5b170 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
@@ -188,8 +188,8 @@ trait StreamTest extends QueryTest with SharedSQLContext 
with Timeouts {
   new AssertOnQuery(condition, message)
 }
 
-def apply(message: String)(condition: StreamExecution => Unit): 
AssertOnQuery = {
-  new AssertOnQuery(s => { condition(s); true }, message)
+def apply(message: String)(condition: StreamExecution => Boolean): 
AssertOnQuery = {
+  new AssertOnQuery(condition, message)
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
--
diff --git

spark git commit: [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value

2016-09-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 1dbb725db -> 8f0c35a4d


[SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value

## What changes were proposed in this pull request?
AssertOnQuery has two apply constructor: one that accepts a closure that 
returns boolean, and another that accepts a closure that returns Unit. This is 
actually very confusing because developers could mistakenly think that 
AssertOnQuery always require a boolean return type and verifies the return 
result, when indeed the value of the last statement is ignored in one of the 
constructors.

This pull request makes the two constructor consistent and always require 
boolean value. It will overall make the test suites more robust against 
developer errors.

As an evidence for the confusing behavior, this change also identified a bug 
with an existing test case due to file system time granularity. This pull 
request fixes that test case as well.

## How was this patch tested?
This is a test only change.

Author: petermaxlee 

Closes #15127 from petermaxlee/SPARK-17571.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8f0c35a4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8f0c35a4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8f0c35a4

Branch: refs/heads/master
Commit: 8f0c35a4d0dd458719627be5f524792bf244d70a
Parents: 1dbb725
Author: petermaxlee 
Authored: Sun Sep 18 15:22:01 2016 -0700
Committer: Reynold Xin 
Committed: Sun Sep 18 15:22:01 2016 -0700

--
 .../apache/spark/sql/streaming/FileStreamSourceSuite.scala| 7 +--
 .../scala/org/apache/spark/sql/streaming/StreamTest.scala | 4 ++--
 .../spark/sql/streaming/StreamingQueryListenerSuite.scala | 3 +++
 3 files changed, 10 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
index 886f7be..a02a36c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
@@ -354,7 +354,9 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
 CheckAnswer("a", "b"),
 
 // SLeeps longer than 5ms (maxFileAge)
-AssertOnQuery { _ => Thread.sleep(10); true },
+// Unfortunately since a lot of file system does not have modification 
time granularity
+// finer grained than 1 sec, we need to use 1 sec here.
+AssertOnQuery { _ => Thread.sleep(1000); true },
 
 AddTextFileData("c\nd", src, tmp),
 CheckAnswer("a", "b", "c", "d"),
@@ -363,7 +365,8 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
   val source = streamExecution.logicalPlan.collect { case e: 
StreamingExecutionRelation =>
 e.source.asInstanceOf[FileStreamSource]
   }.head
-  source.seenFiles.size == 1
+  assert(source.seenFiles.size == 1)
+  true
 }
   )
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
index af2b581..6c5b170 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
@@ -188,8 +188,8 @@ trait StreamTest extends QueryTest with SharedSQLContext 
with Timeouts {
   new AssertOnQuery(condition, message)
 }
 
-def apply(message: String)(condition: StreamExecution => Unit): 
AssertOnQuery = {
-  new AssertOnQuery(s => { condition(s); true }, message)
+def apply(message: String)(condition: StreamExecution => Boolean): 
AssertOnQuery = {
+  new AssertOnQuery(condition, message)
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
index

spark git commit: [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems

2016-09-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master dca771bec -> b9323fc93


[SPARK-17561][DOCS] DataFrameWriter documentation formatting problems

## What changes were proposed in this pull request?

Fix ` / ` problems in SQL scaladoc.

## How was this patch tested?

Scaladoc build and manual verification of generated HTML.

Author: Sean Owen 

Closes #15117 from srowen/SPARK-17561.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9323fc9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9323fc9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9323fc9

Branch: refs/heads/master
Commit: b9323fc9381a09af510f542fd5c86473e029caf6
Parents: dca771b
Author: Sean Owen 
Authored: Fri Sep 16 13:43:05 2016 -0700
Committer: Reynold Xin 
Committed: Fri Sep 16 13:43:05 2016 -0700

--
 .../org/apache/spark/sql/DataFrameReader.scala  | 32 +
 .../org/apache/spark/sql/DataFrameWriter.scala  | 12 +++
 .../spark/sql/streaming/DataStreamReader.scala  | 38 
 3 files changed, 53 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b9323fc9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index 93bf74d..d29d90c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -269,14 +269,15 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `allowBackslashEscapingAnyCharacter` (default `false`): allows 
accepting quoting of all
* character using backslash quoting mechanism
* `mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt 
records
-   * during parsing.
-   * 
-   *   - `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record, and puts
-   *  the malformed string into a new field configured by 
`columnNameOfCorruptRecord`. When
-   *  a schema is set by user, it sets `null` for extra fields.
-   *   - `DROPMALFORMED` : ignores the whole corrupted records.
-   *   - `FAILFAST` : throws an exception when it meets corrupted 
records.
-   * 
+   * during parsing.
+   *   
+   * `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record, and puts
+   * the malformed string into a new field configured by 
`columnNameOfCorruptRecord`. When
+   * a schema is set by user, it sets `null` for extra fields.
+   * `DROPMALFORMED` : ignores the whole corrupted records.
+   * `FAILFAST` : throws an exception when it meets corrupted 
records.
+   *   
+   * 
* `columnNameOfCorruptRecord` (default is the value specified in
* `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field 
having malformed string
* created by `PERMISSIVE` mode. This overrides 
`spark.sql.columnNameOfCorruptRecord`.
@@ -395,13 +396,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `maxMalformedLogPerPartition` (default `10`): sets the maximum number 
of malformed rows
* Spark will log for each partition. Malformed records beyond this number 
will be ignored.
* `mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt 
records
-   *during parsing.
-   * 
-   *- `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record. When
-   * a schema is set by user, it sets `null` for extra fields.
-   *- `DROPMALFORMED` : ignores the whole corrupted records.
-   *- `FAILFAST` : throws an exception when it meets corrupted 
records.
-   * 
+   *during parsing.
+   *   
+   * `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record. When
+   *   a schema is set by user, it sets `null` for extra fields.
+   * `DROPMALFORMED` : ignores the whole corrupted records.
+   * `FAILFAST` : throws an exception when it meets corrupted 
records.
+   *   
+   * 
* 
* @since 2.0.0
*/

http://git-wip-us.apache.org/repos/asf/spark/blob/b9323fc9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index c05c7a6..e137f07 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -397,7 +397,9 @@

spark git commit: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3

2016-09-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 9c23f4408 -> 5ad4395e1


[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3

## What changes were proposed in this pull request?
This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, 
which was recently released and contained a number of bug fixes.

## How was this patch tested?
The change should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #15115 from rxin/SPARK-17558.

(cherry picked from commit dca771bec6edb1cd8fc75861d364e0ba9dccf7c3)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ad4395e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ad4395e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ad4395e

Branch: refs/heads/branch-2.0
Commit: 5ad4395e1b41d5ec74785c0aef5c2f656f9db9da
Parents: 9c23f44
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Sep 16 11:24:26 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Sep 16 11:24:40 2016 -0700

--
 dev/deps/spark-deps-hadoop-2.7 | 30 +++---
 pom.xml|  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5ad4395e/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 3da0860..a61f31e 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -58,21 +58,21 @@ gson-2.2.4.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.7.2.jar
-hadoop-auth-2.7.2.jar
-hadoop-client-2.7.2.jar
-hadoop-common-2.7.2.jar
-hadoop-hdfs-2.7.2.jar
-hadoop-mapreduce-client-app-2.7.2.jar
-hadoop-mapreduce-client-common-2.7.2.jar
-hadoop-mapreduce-client-core-2.7.2.jar
-hadoop-mapreduce-client-jobclient-2.7.2.jar
-hadoop-mapreduce-client-shuffle-2.7.2.jar
-hadoop-yarn-api-2.7.2.jar
-hadoop-yarn-client-2.7.2.jar
-hadoop-yarn-common-2.7.2.jar
-hadoop-yarn-server-common-2.7.2.jar
-hadoop-yarn-server-web-proxy-2.7.2.jar
+hadoop-annotations-2.7.3.jar
+hadoop-auth-2.7.3.jar
+hadoop-client-2.7.3.jar
+hadoop-common-2.7.3.jar
+hadoop-hdfs-2.7.3.jar
+hadoop-mapreduce-client-app-2.7.3.jar
+hadoop-mapreduce-client-common-2.7.3.jar
+hadoop-mapreduce-client-core-2.7.3.jar
+hadoop-mapreduce-client-jobclient-2.7.3.jar
+hadoop-mapreduce-client-shuffle-2.7.3.jar
+hadoop-yarn-api-2.7.3.jar
+hadoop-yarn-client-2.7.3.jar
+hadoop-yarn-common-2.7.3.jar
+hadoop-yarn-server-common-2.7.3.jar
+hadoop-yarn-server-web-proxy-2.7.3.jar
 hk2-api-2.4.0-b34.jar
 hk2-locator-2.4.0-b34.jar
 hk2-utils-2.4.0-b34.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/5ad4395e/pom.xml
--
diff --git a/pom.xml b/pom.xml
index ee0032a..a723283 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2511,7 +2511,7 @@
 
   hadoop-2.7
   
-2.7.2
+2.7.3
 0.9.3
 3.4.6
 2.6.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3

2016-09-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a425a37a5 -> dca771bec


[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3

## What changes were proposed in this pull request?
This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, 
which was recently released and contained a number of bug fixes.

## How was this patch tested?
The change should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #15115 from rxin/SPARK-17558.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dca771be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dca771be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dca771be

Branch: refs/heads/master
Commit: dca771bec6edb1cd8fc75861d364e0ba9dccf7c3
Parents: a425a37
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Sep 16 11:24:26 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Sep 16 11:24:26 2016 -0700

--
 dev/deps/spark-deps-hadoop-2.7 | 30 +++---
 pom.xml|  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dca771be/dev/deps/spark-deps-hadoop-2.7
--
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index d464c97..6356612 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -59,21 +59,21 @@ gson-2.2.4.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.7.2.jar
-hadoop-auth-2.7.2.jar
-hadoop-client-2.7.2.jar
-hadoop-common-2.7.2.jar
-hadoop-hdfs-2.7.2.jar
-hadoop-mapreduce-client-app-2.7.2.jar
-hadoop-mapreduce-client-common-2.7.2.jar
-hadoop-mapreduce-client-core-2.7.2.jar
-hadoop-mapreduce-client-jobclient-2.7.2.jar
-hadoop-mapreduce-client-shuffle-2.7.2.jar
-hadoop-yarn-api-2.7.2.jar
-hadoop-yarn-client-2.7.2.jar
-hadoop-yarn-common-2.7.2.jar
-hadoop-yarn-server-common-2.7.2.jar
-hadoop-yarn-server-web-proxy-2.7.2.jar
+hadoop-annotations-2.7.3.jar
+hadoop-auth-2.7.3.jar
+hadoop-client-2.7.3.jar
+hadoop-common-2.7.3.jar
+hadoop-hdfs-2.7.3.jar
+hadoop-mapreduce-client-app-2.7.3.jar
+hadoop-mapreduce-client-common-2.7.3.jar
+hadoop-mapreduce-client-core-2.7.3.jar
+hadoop-mapreduce-client-jobclient-2.7.3.jar
+hadoop-mapreduce-client-shuffle-2.7.3.jar
+hadoop-yarn-api-2.7.3.jar
+hadoop-yarn-client-2.7.3.jar
+hadoop-yarn-common-2.7.3.jar
+hadoop-yarn-server-common-2.7.3.jar
+hadoop-yarn-server-web-proxy-2.7.3.jar
 hk2-api-2.4.0-b34.jar
 hk2-locator-2.4.0-b34.jar
 hk2-utils-2.4.0-b34.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/dca771be/pom.xml
--
diff --git a/pom.xml b/pom.xml
index ef83c18..b514173 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2524,7 +2524,7 @@
 
   hadoop-2.7
   
-2.7.2
+2.7.3
 0.9.3
 3.4.6
 2.6.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

2016-08-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 736a7911c -> 48b459ddd


[SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

There's an unused `classTag` val in the AtomicType base class which is causing 
unnecessary slowness in deserialization because it needs to grab 
ScalaReflectionLock and create a new runtime reflection mirror. Removing this 
unused code gives a small but measurable performance boost in SQL task 
deserialization.

Author: Josh Rosen 

Closes #14869 from JoshRosen/remove-unused-classtag.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48b459dd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48b459dd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48b459dd

Branch: refs/heads/master
Commit: 48b459ddd58affd5519856cb6e204398b7739a2a
Parents: 736a791
Author: Josh Rosen 
Authored: Tue Aug 30 09:58:00 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 30 09:58:00 2016 +0800

--
 .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/48b459dd/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
index 65eae86..1981fd8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.types
 
-import scala.reflect.ClassTag
-import scala.reflect.runtime.universe.{runtimeMirror, TypeTag}
+import scala.reflect.runtime.universe.TypeTag
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.ScalaReflectionLock
 import org.apache.spark.sql.catalyst.expressions.Expression
-import org.apache.spark.util.Utils
 
 /**
  * A non-concrete data type, reserved for internal uses.
@@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType {
   private[sql] type InternalType
   private[sql] val tag: TypeTag[InternalType]
   private[sql] val ordering: Ordering[InternalType]
-
-  @transient private[sql] val classTag = ScalaReflectionLock.synchronized {
-val mirror = runtimeMirror(Utils.getSparkClassLoader)
-ClassTag[InternalType](mirror.runtimeClass(tag.tpe))
-  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

2016-08-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 976a43dbf -> 59032570f


[SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

There's an unused `classTag` val in the AtomicType base class which is causing 
unnecessary slowness in deserialization because it needs to grab 
ScalaReflectionLock and create a new runtime reflection mirror. Removing this 
unused code gives a small but measurable performance boost in SQL task 
deserialization.

Author: Josh Rosen 

Closes #14869 from JoshRosen/remove-unused-classtag.

(cherry picked from commit 48b459ddd58affd5519856cb6e204398b7739a2a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59032570
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59032570
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59032570

Branch: refs/heads/branch-2.0
Commit: 59032570fbd0985f758c27bdec5482221cc64af9
Parents: 976a43d
Author: Josh Rosen 
Authored: Tue Aug 30 09:58:00 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 30 09:58:11 2016 +0800

--
 .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/59032570/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
index 65eae86..1981fd8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.types
 
-import scala.reflect.ClassTag
-import scala.reflect.runtime.universe.{runtimeMirror, TypeTag}
+import scala.reflect.runtime.universe.TypeTag
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.ScalaReflectionLock
 import org.apache.spark.sql.catalyst.expressions.Expression
-import org.apache.spark.util.Utils
 
 /**
  * A non-concrete data type, reserved for internal uses.
@@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType {
   private[sql] type InternalType
   private[sql] val tag: TypeTag[InternalType]
   private[sql] val ordering: Ordering[InternalType]
-
-  @transient private[sql] val classTag = ScalaReflectionLock.synchronized {
-val mirror = runtimeMirror(Utils.getSparkClassLoader)
-ClassTag[InternalType](mirror.runtimeClass(tag.tpe))
-  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17274][SQL] Move join optimizer rules into a separate file

2016-08-27 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f91614f36 -> 901ab0694


[SPARK-17274][SQL] Move join optimizer rules into a separate file

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various join rules 
into a single file.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14846 from rxin/SPARK-17274.

(cherry picked from commit 718b6bad2d698b76be6906d51da13626e9f3890e)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/901ab069
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/901ab069
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/901ab069

Branch: refs/heads/branch-2.0
Commit: 901ab06949addd05be6cb85df4eb6bd2104777e8
Parents: f91614f
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Aug 27 00:36:18 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Aug 27 00:36:36 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 106 ---
 .../spark/sql/catalyst/optimizer/joins.scala| 134 +++
 2 files changed, 134 insertions(+), 106 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/901ab069/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 15d33c1..e743898 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -1150,112 +1150,6 @@ object PushDownPredicate extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
- * Reorder the joins and push all the conditions into join, so that the bottom 
ones have at least
- * one condition.
- *
- * The order of joins will not be changed if all of them already have at least 
one condition.
- */
-object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
-
-  /**
-   * Join a list of plans together and push down the conditions into them.
-   *
-   * The joined plan are picked from left to right, prefer those has at least 
one join condition.
-   *
-   * @param input a list of LogicalPlans to join.
-   * @param conditions a list of condition for join.
-   */
-  @tailrec
-  def createOrderedJoin(input: Seq[LogicalPlan], conditions: Seq[Expression]): 
LogicalPlan = {
-assert(input.size >= 2)
-if (input.size == 2) {
-  val (joinConditions, others) = conditions.partition(
-e => !SubqueryExpression.hasCorrelatedSubquery(e))
-  val join = Join(input(0), input(1), Inner, 
joinConditions.reduceLeftOption(And))
-  if (others.nonEmpty) {
-Filter(others.reduceLeft(And), join)
-  } else {
-join
-  }
-} else {
-  val left :: rest = input.toList
-  // find out the first join that have at least one join condition
-  val conditionalJoin = rest.find { plan =>
-val refs = left.outputSet ++ plan.outputSet
-conditions.filterNot(canEvaluate(_, left)).filterNot(canEvaluate(_, 
plan))
-  .exists(_.references.subsetOf(refs))
-  }
-  // pick the next one if no condition left
-  val right = conditionalJoin.getOrElse(rest.head)
-
-  val joinedRefs = left.outputSet ++ right.outputSet
-  val (joinConditions, others) = conditions.partition(
-e => e.references.subsetOf(joinedRefs) && 
!SubqueryExpression.hasCorrelatedSubquery(e))
-  val joined = Join(left, right, Inner, 
joinConditions.reduceLeftOption(And))
-
-  // should not have reference to same logical plan
-  createOrderedJoin(Seq(joined) ++ rest.filterNot(_ eq right), others)
-}
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case j @ ExtractFiltersAndInnerJoins(input, conditions)
-if input.size > 2 && conditions.nonEmpty =>
-  createOrderedJoin(input, conditions)
-  }
-}
-
-/**
- * Elimination of outer joins, if the predicates can restrict the result sets 
so that
- * all null-supplying rows are eliminated
- *
- * - full outer -> inner if both sides have such predicates
- * - left outer -> inner if the right side has such predicates
- * - right outer -> inner if the left side has such predicates
- * - full outer -> left outer if only the left side has such predicates
- * - full outer -> right outer if only the right side has such predicates
- *
- * This rule shoul

spark git commit: [SPARK-17274][SQL] Move join optimizer rules into a separate file

2016-08-27 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 5aad4509c -> 718b6bad2


[SPARK-17274][SQL] Move join optimizer rules into a separate file

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various join rules 
into a single file.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14846 from rxin/SPARK-17274.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/718b6bad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/718b6bad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/718b6bad

Branch: refs/heads/master
Commit: 718b6bad2d698b76be6906d51da13626e9f3890e
Parents: 5aad450
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Aug 27 00:36:18 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Aug 27 00:36:18 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 106 ---
 .../spark/sql/catalyst/optimizer/joins.scala| 134 +++
 2 files changed, 134 insertions(+), 106 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/718b6bad/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 17cab18..7617d34 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -800,112 +800,6 @@ object PushDownPredicate extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
- * Reorder the joins and push all the conditions into join, so that the bottom 
ones have at least
- * one condition.
- *
- * The order of joins will not be changed if all of them already have at least 
one condition.
- */
-object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper {
-
-  /**
-   * Join a list of plans together and push down the conditions into them.
-   *
-   * The joined plan are picked from left to right, prefer those has at least 
one join condition.
-   *
-   * @param input a list of LogicalPlans to join.
-   * @param conditions a list of condition for join.
-   */
-  @tailrec
-  def createOrderedJoin(input: Seq[LogicalPlan], conditions: Seq[Expression]): 
LogicalPlan = {
-assert(input.size >= 2)
-if (input.size == 2) {
-  val (joinConditions, others) = conditions.partition(
-e => !SubqueryExpression.hasCorrelatedSubquery(e))
-  val join = Join(input(0), input(1), Inner, 
joinConditions.reduceLeftOption(And))
-  if (others.nonEmpty) {
-Filter(others.reduceLeft(And), join)
-  } else {
-join
-  }
-} else {
-  val left :: rest = input.toList
-  // find out the first join that have at least one join condition
-  val conditionalJoin = rest.find { plan =>
-val refs = left.outputSet ++ plan.outputSet
-conditions.filterNot(canEvaluate(_, left)).filterNot(canEvaluate(_, 
plan))
-  .exists(_.references.subsetOf(refs))
-  }
-  // pick the next one if no condition left
-  val right = conditionalJoin.getOrElse(rest.head)
-
-  val joinedRefs = left.outputSet ++ right.outputSet
-  val (joinConditions, others) = conditions.partition(
-e => e.references.subsetOf(joinedRefs) && 
!SubqueryExpression.hasCorrelatedSubquery(e))
-  val joined = Join(left, right, Inner, 
joinConditions.reduceLeftOption(And))
-
-  // should not have reference to same logical plan
-  createOrderedJoin(Seq(joined) ++ rest.filterNot(_ eq right), others)
-}
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case j @ ExtractFiltersAndInnerJoins(input, conditions)
-if input.size > 2 && conditions.nonEmpty =>
-  createOrderedJoin(input, conditions)
-  }
-}
-
-/**
- * Elimination of outer joins, if the predicates can restrict the result sets 
so that
- * all null-supplying rows are eliminated
- *
- * - full outer -> inner if both sides have such predicates
- * - left outer -> inner if the right side has such predicates
- * - right outer -> inner if the left side has such predicates
- * - full outer -> left outer if only the left side has such predicates
- * - full outer -> right outer if only the right side has such predicates
- *
- * This rule should be executed before pushing down the Filter
- */
-object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper {
-
-  /

spark git commit: [SPARK-17273][SQL] Move expression optimizer rules into a separate file

2016-08-27 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 0243b3287 -> 5aad4509c


[SPARK-17273][SQL] Move expression optimizer rules into a separate file

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various expression 
optimization rules into a single file.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14845 from rxin/SPARK-17273.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5aad4509
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5aad4509
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5aad4509

Branch: refs/heads/master
Commit: 5aad4509c15e131948d387157ecf56af1a705e19
Parents: 0243b32
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Aug 27 00:34:35 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Aug 27 00:34:35 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 461 +
 .../sql/catalyst/optimizer/expressions.scala| 506 +++
 2 files changed, 507 insertions(+), 460 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5aad4509/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 8a50368..17cab18 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -534,176 +534,6 @@ object CollapseRepartition extends Rule[LogicalPlan] {
 }
 
 /**
- * Simplifies LIKE expressions that do not need full regular expressions to 
evaluate the condition.
- * For example, when the expression is just checking to see if a string starts 
with a given
- * pattern.
- */
-object LikeSimplification extends Rule[LogicalPlan] {
-  // if guards below protect from escapes on trailing %.
-  // Cases like "something\%" are not optimized, but this does not affect 
correctness.
-  private val startsWith = "([^_%]+)%".r
-  private val endsWith = "%([^_%]+)".r
-  private val startsAndEndsWith = "([^_%]+)%([^_%]+)".r
-  private val contains = "%([^_%]+)%".r
-  private val equalTo = "([^_%]*)".r
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
-case Like(input, Literal(pattern, StringType)) =>
-  pattern.toString match {
-case startsWith(prefix) if !prefix.endsWith("\\") =>
-  StartsWith(input, Literal(prefix))
-case endsWith(postfix) =>
-  EndsWith(input, Literal(postfix))
-// 'a%a' pattern is basically same with 'a%' && '%a'.
-// However, the additional `Length` condition is required to prevent 
'a' match 'a%a'.
-case startsAndEndsWith(prefix, postfix) if !prefix.endsWith("\\") =>
-  And(GreaterThanOrEqual(Length(input), Literal(prefix.size + 
postfix.size)),
-And(StartsWith(input, Literal(prefix)), EndsWith(input, 
Literal(postfix
-case contains(infix) if !infix.endsWith("\\") =>
-  Contains(input, Literal(infix))
-case equalTo(str) =>
-  EqualTo(input, Literal(str))
-case _ =>
-  Like(input, Literal.create(pattern, StringType))
-  }
-  }
-}
-
-/**
- * Replaces [[Expression Expressions]] that can be statically evaluated with
- * equivalent [[Literal]] values. This rule is more specific with
- * Null value propagation from bottom to top of the expression tree.
- */
-object NullPropagation extends Rule[LogicalPlan] {
-  private def nonNullLiteral(e: Expression): Boolean = e match {
-case Literal(null, _) => false
-case _ => true
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case q: LogicalPlan => q transformExpressionsUp {
-  case e @ WindowExpression(Cast(Literal(0L, _), _), _) =>
-Cast(Literal(0L), e.dataType)
-  case e @ AggregateExpression(Count(exprs), _, _, _) if 
!exprs.exists(nonNullLiteral) =>
-Cast(Literal(0L), e.dataType)
-  case e @ IsNull(c) if !c.nullable => Literal.create(false, BooleanType)
-  case e @ IsNotNull(c) if !c.nullable => Literal.create(true, BooleanType)
-  case e @ GetArrayItem(Literal(null, _), _) => Literal.create(null, 
e.dataType)
-  case e @ GetArrayItem(_, Literal(null, _)) => Literal.create(null, 
e.dataType)
-

spark git commit: [SPARK-17272][SQL] Move subquery optimizer rules into its own file

2016-08-27 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master dcefac438 -> 0243b3287


[SPARK-17272][SQL] Move subquery optimizer rules into its own file

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various subquery 
rules into a single file.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14844 from rxin/SPARK-17272.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0243b328
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0243b328
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0243b328

Branch: refs/heads/master
Commit: 0243b328736f83faea5f83d18c4d331890ed8e81
Parents: dcefac4
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Aug 27 00:32:57 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Aug 27 00:32:57 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 323 -
 .../spark/sql/catalyst/optimizer/subquery.scala | 356 +++
 2 files changed, 356 insertions(+), 323 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0243b328/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index d055bc3..8a50368 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -1637,326 +1637,3 @@ object RemoveRepetitionFromGroupExpressions extends 
Rule[LogicalPlan] {
   a.copy(groupingExpressions = newGrouping)
   }
 }
-
-/**
- * This rule rewrites predicate sub-queries into left semi/anti joins. The 
following predicates
- * are supported:
- * a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved 
conditions in Filter
- *will be pulled out as the join conditions.
- * b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in 
the Filter will
- *be pulled out as join conditions, value = selected column will also be 
used as join
- *condition.
- */
-object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper 
{
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case Filter(condition, child) =>
-  val (withSubquery, withoutSubquery) =
-
splitConjunctivePredicates(condition).partition(PredicateSubquery.hasPredicateSubquery)
-
-  // Construct the pruned filter condition.
-  val newFilter: LogicalPlan = withoutSubquery match {
-case Nil => child
-case conditions => Filter(conditions.reduce(And), child)
-  }
-
-  // Filter the plan by applying left semi and left anti joins.
-  withSubquery.foldLeft(newFilter) {
-case (p, PredicateSubquery(sub, conditions, _, _)) =>
-  val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p)
-  Join(outerPlan, sub, LeftSemi, joinCond)
-case (p, Not(PredicateSubquery(sub, conditions, false, _))) =>
-  val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p)
-  Join(outerPlan, sub, LeftAnti, joinCond)
-case (p, Not(PredicateSubquery(sub, conditions, true, _))) =>
-  // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN expr
-  // Construct the condition. A NULL in one of the conditions is 
regarded as a positive
-  // result; such a row will be filtered out by the Anti-Join operator.
-
-  // Note that will almost certainly be planned as a Broadcast Nested 
Loop join.
-  // Use EXISTS if performance matters to you.
-  val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p)
-  val anyNull = 
splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or)
-  Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get)))
-case (p, predicate) =>
-  val (newCond, inputPlan) = rewriteExistentialExpr(Seq(predicate), p)
-  Project(p.output, Filter(newCond.get, inputPlan))
-  }
-  }
-
-  /**
-   * Given a predicate expression and an input plan, it rewrites
-   * any embedded existential sub-query into an existential join.
-   * It returns the rewritten expression together with the updated plan.
-   * Currently, it does not support null-aware joins. Embedded NOT IN 
predicates
-   * are blocked in the Analyzer.
-   */
-  private def rewriteExistentialExpr(
-  exprs: Seq[Expression],
-

spark git commit: [SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0)

2016-08-27 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 94d52d765 -> f91614f36


[SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0)

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various Dataset 
object optimization rules into a single file. I'm submitting separate pull 
requests so we can more easily merge this in branch-2.0 to simplify optimizer 
backports.

This is https://github.com/apache/spark/pull/14839 but for branch-2.0.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14843 from rxin/SPARK-17270-branch-2.0.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f91614f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f91614f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f91614f3

Branch: refs/heads/branch-2.0
Commit: f91614f36472957355fad7d69d66327807fe80c8
Parents: 94d52d7
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Aug 27 00:31:49 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Aug 27 00:31:49 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  |  72 -
 .../spark/sql/catalyst/optimizer/objects.scala  | 101 +++
 2 files changed, 101 insertions(+), 72 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f91614f3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index f3f1d21..15d33c1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -187,25 +187,6 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
 }
 
 /**
- * Removes cases where we are unnecessarily going between the object and 
serialized (InternalRow)
- * representation of data item.  For example back to back map operations.
- */
-object EliminateSerialization extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case d @ DeserializeToObject(_, _, s: SerializeFromObject)
-if d.outputObjectType == s.inputObjectType =>
-  // Adds an extra Project here, to preserve the output expr id of 
`DeserializeToObject`.
-  // We will remove it later in RemoveAliasOnlyProject rule.
-  val objAttr =
-Alias(s.child.output.head, s.child.output.head.name)(exprId = 
d.output.head.exprId)
-  Project(objAttr :: Nil, s.child)
-case a @ AppendColumns(_, _, _, s: SerializeFromObject)
-if a.deserializer.dataType == s.inputObjectType =>
-  AppendColumnsWithObject(a.func, s.serializer, a.serializer, s.child)
-  }
-}
-
-/**
  * Pushes down [[LocalLimit]] beneath UNION ALL and beneath the streamed 
inputs of outer joins.
  */
 object LimitPushDown extends Rule[LogicalPlan] {
@@ -1583,59 +1564,6 @@ object RemoveRepetitionFromGroupExpressions extends 
Rule[LogicalPlan] {
 }
 
 /**
- * Typed [[Filter]] is by default surrounded by a [[DeserializeToObject]] 
beneath it and a
- * [[SerializeFromObject]] above it.  If these serializations can't be 
eliminated, we should embed
- * the deserializer in filter condition to save the extra serialization at 
last.
- */
-object EmbedSerializerInFilter extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case s @ SerializeFromObject(_, Filter(condition, d: DeserializeToObject))
-  // SPARK-15632: Conceptually, filter operator should never introduce 
schema change. This
-  // optimization rule also relies on this assumption. However, Dataset 
typed filter operator
-  // does introduce schema changes in some cases. Thus, we only enable 
this optimization when
-  //
-  //  1. either input and output schemata are exactly the same, or
-  //  2. both input and output schemata are single-field schema and share 
the same type.
-  //
-  // The 2nd case is included because encoders for primitive types always 
have only a single
-  // field with hard-coded field name "value".
-  // TODO Cleans this up after fixing SPARK-15632.
-  if s.schema == d.child.schema || samePrimitiveType(s.schema, 
d.child.schema) =>
-
-  val numObjects = condition.collect {
-case a: Attribute if a == d.output.head => a
-  }.length
-
-  if (numObjects > 1) {
-// If the filter condition references the ob

spark git commit: [SPARK-17269][SQL] Move finish analysis optimization stage into its own file

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 9c0ac6b53 -> 94d52d765


[SPARK-17269][SQL] Move finish analysis optimization stage into its own file

As part of breaking Optimizer.scala apart, this patch moves various finish 
analysis optimization stage rules into a single file. I'm submitting separate 
pull requests so we can more easily merge this in branch-2.0 to simplify 
optimizer backports.

This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14838 from rxin/SPARK-17269.

(cherry picked from commit dcefac438788c51d84641bfbc505efe095731a39)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94d52d76
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94d52d76
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94d52d76

Branch: refs/heads/branch-2.0
Commit: 94d52d76569f8b0782f424cfac959a4bb75c54c0
Parents: 9c0ac6b
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Aug 26 22:10:28 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Aug 26 22:12:11 2016 -0700

--
 .../analysis/RewriteDistinctAggregates.scala| 269 ---
 .../sql/catalyst/optimizer/Optimizer.scala  |  38 ---
 .../optimizer/RewriteDistinctAggregates.scala   | 269 +++
 .../sql/catalyst/optimizer/finishAnalysis.scala |  65 +
 4 files changed, 334 insertions(+), 307 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/94d52d76/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
deleted file mode 100644
index 8afd28d..000
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
+++ /dev/null
@@ -1,269 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions._
-import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
AggregateFunction, Complete}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, 
LogicalPlan}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * This rule rewrites an aggregate query with distinct aggregations into an 
expanded double
- * aggregation in which the regular aggregation expressions and every distinct 
clause is aggregated
- * in a separate group. The results are then combined in a second aggregate.
- *
- * For example (in scala):
- * {{{
- *   val data = Seq(
- * ("a", "ca1", "cb1", 10),
- * ("a", "ca1", "cb2", 5),
- * ("b", "ca1", "cb1", 13))
- * .toDF("key", "cat1", "cat2", "value")
- *   data.createOrReplaceTempView("data")
- *
- *   val agg = data.groupBy($"key")
- * .agg(
- *   countDistinct($"cat1").as("cat1_cnt"),
- *   countDistinct($"cat2").as("cat2_cnt"),
- *   sum($"value").as("total"))
- * }}}
- *
- * This translates to the following (pseudo) logical plan:
- * {{{
- * Aggregate(
- *key = ['key]
- *functions = [COUNT(DISTINCT 'cat1),
- * COUNT(DISTINCT 'cat2),
- * sum('value)]
- *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
- *   LocalTableScan [...]
- * }}}
- *
- * This rule rewrites this logical plan to the following (pseudo) logical plan:
- * {{{
- * Aggregate(
- *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- *

spark git commit: [SPARK-17269][SQL] Move finish analysis optimization stage into its own file

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master cc0caa690 -> dcefac438


[SPARK-17269][SQL] Move finish analysis optimization stage into its own file

## What changes were proposed in this pull request?
As part of breaking Optimizer.scala apart, this patch moves various finish 
analysis optimization stage rules into a single file. I'm submitting separate 
pull requests so we can more easily merge this in branch-2.0 to simplify 
optimizer backports.

## How was this patch tested?
This should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #14838 from rxin/SPARK-17269.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcefac43
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcefac43
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcefac43

Branch: refs/heads/master
Commit: dcefac438788c51d84641bfbc505efe095731a39
Parents: cc0caa6
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Aug 26 22:10:28 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Aug 26 22:10:28 2016 -0700

--
 .../analysis/RewriteDistinctAggregates.scala| 269 ---
 .../sql/catalyst/optimizer/Optimizer.scala  |  38 ---
 .../optimizer/RewriteDistinctAggregates.scala   | 269 +++
 .../sql/catalyst/optimizer/finishAnalysis.scala |  65 +
 4 files changed, 334 insertions(+), 307 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dcefac43/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
deleted file mode 100644
index 8afd28d..000
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala
+++ /dev/null
@@ -1,269 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions._
-import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
AggregateFunction, Complete}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, 
LogicalPlan}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * This rule rewrites an aggregate query with distinct aggregations into an 
expanded double
- * aggregation in which the regular aggregation expressions and every distinct 
clause is aggregated
- * in a separate group. The results are then combined in a second aggregate.
- *
- * For example (in scala):
- * {{{
- *   val data = Seq(
- * ("a", "ca1", "cb1", 10),
- * ("a", "ca1", "cb2", 5),
- * ("b", "ca1", "cb1", 13))
- * .toDF("key", "cat1", "cat2", "value")
- *   data.createOrReplaceTempView("data")
- *
- *   val agg = data.groupBy($"key")
- * .agg(
- *   countDistinct($"cat1").as("cat1_cnt"),
- *   countDistinct($"cat2").as("cat2_cnt"),
- *   sum($"value").as("total"))
- * }}}
- *
- * This translates to the following (pseudo) logical plan:
- * {{{
- * Aggregate(
- *key = ['key]
- *functions = [COUNT(DISTINCT 'cat1),
- * COUNT(DISTINCT 'cat2),
- * sum('value)]
- *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
- *   LocalTableScan [...]
- * }}}
- *
- * This rule rewrites this logical plan to the following (pseudo) logical plan:
- * {{{
- * Aggregate(
- *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
-

spark git commit: [SPARK-17235][SQL] Support purging of old logs in MetadataLog

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 52feb3fbf -> dfdfc3092


[SPARK-17235][SQL] Support purging of old logs in MetadataLog

## What changes were proposed in this pull request?
This patch adds a purge interface to MetadataLog, and an implementation in 
HDFSMetadataLog. The purge function is currently unused, but I will use it to 
purge old execution and file source logs in follow-up patches. These changes 
are required in a production structured streaming job that runs for a long 
period of time.

## How was this patch tested?
Added a unit test case in HDFSMetadataLogSuite.

Author: petermaxlee 

Closes #14802 from petermaxlee/SPARK-17235.

(cherry picked from commit f64a1ddd09a34d5d867ccbaba46204d75fad038d)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dfdfc309
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dfdfc309
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dfdfc309

Branch: refs/heads/branch-2.0
Commit: dfdfc3092d1b6942eb9092e28e15fa4efb6ac084
Parents: 52feb3f
Author: petermaxlee 
Authored: Fri Aug 26 16:05:34 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 26 16:05:40 2016 -0700

--
 .../execution/streaming/HDFSMetadataLog.scala   | 14 ++
 .../sql/execution/streaming/MetadataLog.scala   |  6 +
 .../streaming/HDFSMetadataLogSuite.scala| 27 +---
 3 files changed, 43 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
index 2b6f76c..127ece9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
@@ -227,6 +227,20 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
 None
   }
 
+  /**
+   * Removes all the log entry earlier than thresholdBatchId (exclusive).
+   */
+  override def purge(thresholdBatchId: Long): Unit = {
+val batchIds = fileManager.list(metadataPath, batchFilesFilter)
+  .map(f => pathToBatchId(f.getPath))
+
+for (batchId <- batchIds if batchId < thresholdBatchId) {
+  val path = batchIdToPath(batchId)
+  fileManager.delete(path)
+  logTrace(s"Removed metadata log file: $path")
+}
+  }
+
   private def createFileManager(): FileManager = {
 val hadoopConf = sparkSession.sessionState.newHadoopConf()
 try {

http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index cc70e1d..78d6be1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -48,4 +48,10 @@ trait MetadataLog[T] {
* Return the latest batch Id and its metadata if exist.
*/
   def getLatest(): Option[(Long, T)]
+
+  /**
+   * Removes all the log entry earlier than thresholdBatchId (exclusive).
+   * This operation should be idempotent.
+   */
+  def purge(thresholdBatchId: Long): Unit
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
index ab5a2d2..4259384 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
@@ -46,14 +46,14 @@ class HDFSMetadataLogSuite extends SparkFunSuite with 
SharedSQLContext {
   test("FileManager: FileContextManager") {
 withTempDir { temp =>
   val path = new Path(temp.getAbsolutePath)
-  testManager(path, new FileContextManager(path, new Configuration))
+  testFileManager(path, new

spark git commit: [SPARK-17235][SQL] Support purging of old logs in MetadataLog

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a11d10f18 -> f64a1ddd0


[SPARK-17235][SQL] Support purging of old logs in MetadataLog

## What changes were proposed in this pull request?
This patch adds a purge interface to MetadataLog, and an implementation in 
HDFSMetadataLog. The purge function is currently unused, but I will use it to 
purge old execution and file source logs in follow-up patches. These changes 
are required in a production structured streaming job that runs for a long 
period of time.

## How was this patch tested?
Added a unit test case in HDFSMetadataLogSuite.

Author: petermaxlee 

Closes #14802 from petermaxlee/SPARK-17235.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f64a1ddd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f64a1ddd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f64a1ddd

Branch: refs/heads/master
Commit: f64a1ddd09a34d5d867ccbaba46204d75fad038d
Parents: a11d10f
Author: petermaxlee 
Authored: Fri Aug 26 16:05:34 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 26 16:05:34 2016 -0700

--
 .../execution/streaming/HDFSMetadataLog.scala   | 14 ++
 .../sql/execution/streaming/MetadataLog.scala   |  6 +
 .../streaming/HDFSMetadataLogSuite.scala| 27 +---
 3 files changed, 43 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
index 2b6f76c..127ece9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
@@ -227,6 +227,20 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: 
SparkSession, path: String)
 None
   }
 
+  /**
+   * Removes all the log entry earlier than thresholdBatchId (exclusive).
+   */
+  override def purge(thresholdBatchId: Long): Unit = {
+val batchIds = fileManager.list(metadataPath, batchFilesFilter)
+  .map(f => pathToBatchId(f.getPath))
+
+for (batchId <- batchIds if batchId < thresholdBatchId) {
+  val path = batchIdToPath(batchId)
+  fileManager.delete(path)
+  logTrace(s"Removed metadata log file: $path")
+}
+  }
+
   private def createFileManager(): FileManager = {
 val hadoopConf = sparkSession.sessionState.newHadoopConf()
 try {

http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
index cc70e1d..78d6be1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala
@@ -48,4 +48,10 @@ trait MetadataLog[T] {
* Return the latest batch Id and its metadata if exist.
*/
   def getLatest(): Option[(Long, T)]
+
+  /**
+   * Removes all the log entry earlier than thresholdBatchId (exclusive).
+   * This operation should be idempotent.
+   */
+  def purge(thresholdBatchId: Long): Unit
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
index ab5a2d2..4259384 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
@@ -46,14 +46,14 @@ class HDFSMetadataLogSuite extends SparkFunSuite with 
SharedSQLContext {
   test("FileManager: FileContextManager") {
 withTempDir { temp =>
   val path = new Path(temp.getAbsolutePath)
-  testManager(path, new FileContextManager(path, new Configuration))
+  testFileManager(path, new FileContextManager(path, new Configuration))
 }
   }
 
   test("FileManager: FileSystemManager") {
 withTempDir { temp =>
   val path

spark git commit: [SPARK-17246][SQL] Add BigDecimal literal

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 deb6a54cf -> 52feb3fbf


[SPARK-17246][SQL] Add BigDecimal literal

## What changes were proposed in this pull request?
This PR adds parser support for `BigDecimal` literals. If you append the suffix 
`BD` to a valid number then this will be interpreted as a `BigDecimal`, for 
example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and 
precision 3. This is useful in situations where you need exact values.

## How was this patch tested?
Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and 
`SQLQueryTestSuite`.

Author: Herman van Hovell 

Closes #14819 from hvanhovell/SPARK-17246.

(cherry picked from commit a11d10f1826b578ff721c4738224eef2b3c3b9f3)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52feb3fb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52feb3fb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52feb3fb

Branch: refs/heads/branch-2.0
Commit: 52feb3fbf75a234d041703e3ac41884294ab0b64
Parents: deb6a54
Author: Herman van Hovell 
Authored: Fri Aug 26 13:29:22 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 26 13:29:30 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 |  6 +
 .../sql/catalyst/expressions/literals.scala |  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 16 -
 .../catalyst/parser/ExpressionParserSuite.scala |  7 ++
 .../resources/sql-tests/inputs/literals.sql |  6 +
 .../sql-tests/results/literals.sql.out  | 24 +++-
 .../catalyst/ExpressionSQLBuilderSuite.scala|  1 +
 7 files changed, 59 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 51f3804..ecb7c8a 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -626,6 +626,7 @@ number
 | MINUS? SMALLINT_LITERAL #smallIntLiteral
 | MINUS? TINYINT_LITERAL  #tinyIntLiteral
 | MINUS? DOUBLE_LITERAL   #doubleLiteral
+| MINUS? BIGDECIMAL_LITERAL   #bigDecimalLiteral
 ;
 
 nonReserved
@@ -920,6 +921,11 @@ DOUBLE_LITERAL
 (INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'D'
 ;
 
+BIGDECIMAL_LITERAL
+:
+(INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'BD'
+;
+
 IDENTIFIER
 : (LETTER | DIGIT | '_')+
 ;

http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 730a7f6..41e3952 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -266,7 +266,7 @@ case class Literal (value: Any, dataType: DataType) extends 
LeafExpression with
 case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
 case _ => v + "D"
   }
-case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
+case (v: Decimal, t: DecimalType) => v + "BD"
 case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'"
 case (v: Long, TimestampType) => 
s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')"
 case _ => value.toString

http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index aec3126..0451abe 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -26,7 +26,8 @@ import org.antlr.v4.runtime.{ParserRuleContext, Token}
 import org.antlr.v4.runtime.tree.{ParseTree, RuleNode,

spark git commit: [SPARK-17246][SQL] Add BigDecimal literal

2016-08-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 8e5475be3 -> a11d10f18


[SPARK-17246][SQL] Add BigDecimal literal

## What changes were proposed in this pull request?
This PR adds parser support for `BigDecimal` literals. If you append the suffix 
`BD` to a valid number then this will be interpreted as a `BigDecimal`, for 
example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and 
precision 3. This is useful in situations where you need exact values.

## How was this patch tested?
Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and 
`SQLQueryTestSuite`.

Author: Herman van Hovell 

Closes #14819 from hvanhovell/SPARK-17246.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a11d10f1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a11d10f1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a11d10f1

Branch: refs/heads/master
Commit: a11d10f1826b578ff721c4738224eef2b3c3b9f3
Parents: 8e5475b
Author: Herman van Hovell 
Authored: Fri Aug 26 13:29:22 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 26 13:29:22 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 |  6 +
 .../sql/catalyst/expressions/literals.scala |  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 16 -
 .../catalyst/parser/ExpressionParserSuite.scala |  7 ++
 .../resources/sql-tests/inputs/literals.sql |  6 +
 .../sql-tests/results/literals.sql.out  | 24 +++-
 .../catalyst/ExpressionSQLBuilderSuite.scala|  1 +
 7 files changed, 59 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index cab7c3f..a8af840 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -633,6 +633,7 @@ number
 | MINUS? SMALLINT_LITERAL #smallIntLiteral
 | MINUS? TINYINT_LITERAL  #tinyIntLiteral
 | MINUS? DOUBLE_LITERAL   #doubleLiteral
+| MINUS? BIGDECIMAL_LITERAL   #bigDecimalLiteral
 ;
 
 nonReserved
@@ -928,6 +929,11 @@ DOUBLE_LITERAL
 (INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'D'
 ;
 
+BIGDECIMAL_LITERAL
+:
+(INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'BD'
+;
+
 IDENTIFIER
 : (LETTER | DIGIT | '_')+
 ;

http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 730a7f6..41e3952 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -266,7 +266,7 @@ case class Literal (value: Any, dataType: DataType) extends 
LeafExpression with
 case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
 case _ => v + "D"
   }
-case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
+case (v: Decimal, t: DecimalType) => v + "BD"
 case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'"
 case (v: Long, TimestampType) => 
s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')"
 case _ => value.toString

http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 8b98efc..893db93 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -26,7 +26,8 @@ import org.antlr.v4.runtime.{ParserRuleContext, Token}
 import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode}
 
 import org.apache.spark.internal.Logging
-import org.apache.spark.sql.catalyst.{FunctionIdentifier, InternalRow,

spark git commit: [SPARK-17242][DOCUMENT] Update links of external dstream projects

2016-08-25 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 73014a2aa -> 27ed6d5dc


[SPARK-17242][DOCUMENT] Update links of external dstream projects

## What changes were proposed in this pull request?

Updated links of external dstream projects.

## How was this patch tested?

Just document changes.

Author: Shixiong Zhu 

Closes #14814 from zsxwing/dstream-link.

(cherry picked from commit 341e0e778dff8c404b47d34ee7661b658bb91880)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27ed6d5d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27ed6d5d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27ed6d5d

Branch: refs/heads/branch-2.0
Commit: 27ed6d5dcd521b4ff1ebe777b03a03ba103d6e76
Parents: 73014a2
Author: Shixiong Zhu 
Authored: Thu Aug 25 21:08:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 25 21:08:48 2016 -0700

--
 docs/streaming-programming-guide.md | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/27ed6d5d/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 14e1744..b92ca92 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -656,7 +656,7 @@ methods for creating DStreams from files as input sources.
Python API 
`fileStream` is not available in the Python API, only  `textFileStream` is  
   available.
 
 - **Streams based on Custom Receivers:** DStreams can be created with data 
streams received through custom receivers. See the [Custom Receiver
-  Guide](streaming-custom-receivers.html) and [DStream 
Akka](https://github.com/spark-packages/dstream-akka) for more details.
+  Guide](streaming-custom-receivers.html) for more details.
 
 - **Queue of RDDs as a Stream:** For testing a Spark Streaming application 
with test data, one can also create a DStream based on a queue of RDDs, using 
`streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue 
will be treated as a batch of data in the DStream, and processed like a stream.
 
@@ -2383,11 +2383,7 @@ additional effort may be necessary to achieve 
exactly-once semantics. There are
 - [Kafka Integration Guide](streaming-kafka-integration.html)
 - [Kinesis Integration Guide](streaming-kinesis-integration.html)
 - [Custom Receiver Guide](streaming-custom-receivers.html)
-* External DStream data sources:
-- [DStream MQTT](https://github.com/spark-packages/dstream-mqtt)
-- [DStream Twitter](https://github.com/spark-packages/dstream-twitter)
-- [DStream Akka](https://github.com/spark-packages/dstream-akka)
-- [DStream ZeroMQ](https://github.com/spark-packages/dstream-zeromq)
+* Third-party DStream data sources can be found in [Spark 
Packages](https://spark-packages.org/)
 * API documentation
   - Scala docs
 * 
[StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext)
 and


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17242][DOCUMENT] Update links of external dstream projects

2016-08-25 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b964a172a -> 341e0e778


[SPARK-17242][DOCUMENT] Update links of external dstream projects

## What changes were proposed in this pull request?

Updated links of external dstream projects.

## How was this patch tested?

Just document changes.

Author: Shixiong Zhu 

Closes #14814 from zsxwing/dstream-link.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/341e0e77
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/341e0e77
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/341e0e77

Branch: refs/heads/master
Commit: 341e0e778dff8c404b47d34ee7661b658bb91880
Parents: b964a17
Author: Shixiong Zhu 
Authored: Thu Aug 25 21:08:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 25 21:08:42 2016 -0700

--
 docs/streaming-programming-guide.md | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/341e0e77/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index df94e95..82d3647 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -656,7 +656,7 @@ methods for creating DStreams from files as input sources.
Python API 
`fileStream` is not available in the Python API, only  `textFileStream` is  
   available.
 
 - **Streams based on Custom Receivers:** DStreams can be created with data 
streams received through custom receivers. See the [Custom Receiver
-  Guide](streaming-custom-receivers.html) and [DStream 
Akka](https://github.com/spark-packages/dstream-akka) for more details.
+  Guide](streaming-custom-receivers.html) for more details.
 
 - **Queue of RDDs as a Stream:** For testing a Spark Streaming application 
with test data, one can also create a DStream based on a queue of RDDs, using 
`streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue 
will be treated as a batch of data in the DStream, and processed like a stream.
 
@@ -2383,11 +2383,7 @@ additional effort may be necessary to achieve 
exactly-once semantics. There are
 - [Kafka Integration Guide](streaming-kafka-integration.html)
 - [Kinesis Integration Guide](streaming-kinesis-integration.html)
 - [Custom Receiver Guide](streaming-custom-receivers.html)
-* External DStream data sources:
-- [DStream MQTT](https://github.com/spark-packages/dstream-mqtt)
-- [DStream Twitter](https://github.com/spark-packages/dstream-twitter)
-- [DStream Akka](https://github.com/spark-packages/dstream-akka)
-- [DStream ZeroMQ](https://github.com/spark-packages/dstream-zeromq)
+* Third-party DStream data sources can be found in [Spark 
Packages](https://spark-packages.org/)
 * API documentation
   - Scala docs
 * 
[StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext)
 and


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.

2016-08-25 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 4d0706d61 -> 5f02d2e5b


[SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` 
could be removed.

## What changes were proposed in this pull request?

Method `SQLContext.parseDataType(dataTypeString: String)` could be removed, we 
should use `SparkSession.parseDataType(dataTypeString: String)` instead.
This require updating PySpark.

## How was this patch tested?

Existing test cases.

Author: jiangxingbo 

Closes #14790 from jiangxb1987/parseDataType.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5f02d2e5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5f02d2e5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5f02d2e5

Branch: refs/heads/master
Commit: 5f02d2e5b4d37f554629cbd0e488e856fffd7b6b
Parents: 4d0706d
Author: jiangxingbo 
Authored: Wed Aug 24 23:36:04 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 24 23:36:04 2016 -0700

--
 python/pyspark/sql/column.py  |  7 +++
 python/pyspark/sql/functions.py   |  6 +++---
 python/pyspark/sql/readwriter.py  |  4 +++-
 python/pyspark/sql/streaming.py   |  4 +++-
 python/pyspark/sql/tests.py   |  2 +-
 python/pyspark/sql/types.py   |  6 +++---
 .../src/main/scala/org/apache/spark/sql/SQLContext.scala  | 10 --
 7 files changed, 16 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/column.py
--
diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index 4b99f30..8d5adc8 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -328,10 +328,9 @@ class Column(object):
 if isinstance(dataType, basestring):
 jc = self._jc.cast(dataType)
 elif isinstance(dataType, DataType):
-from pyspark.sql import SQLContext
-sc = SparkContext.getOrCreate()
-ctx = SQLContext.getOrCreate(sc)
-jdt = ctx._ssql_ctx.parseDataType(dataType.json())
+from pyspark.sql import SparkSession
+spark = SparkSession.builder.getOrCreate()
+jdt = spark._jsparkSession.parseDataType(dataType.json())
 jc = self._jc.cast(jdt)
 else:
 raise TypeError("unexpected type: %s" % type(dataType))

http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 4ea83e2..89b3c07 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1760,11 +1760,11 @@ class UserDefinedFunction(object):
 self._judf = self._create_judf(name)
 
 def _create_judf(self, name):
-from pyspark.sql import SQLContext
+from pyspark.sql import SparkSession
 sc = SparkContext.getOrCreate()
 wrapped_func = _wrap_function(sc, self.func, self.returnType)
-ctx = SQLContext.getOrCreate(sc)
-jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
+spark = SparkSession.builder.getOrCreate()
+jdt = spark._jsparkSession.parseDataType(self.returnType.json())
 if name is None:
 f = self.func
 name = f.__name__ if hasattr(f, '__name__') else 
f.__class__.__name__

http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 3da6f49..3d79e0c 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -98,9 +98,11 @@ class DataFrameReader(OptionUtils):
 
 :param schema: a :class:`pyspark.sql.types.StructType` object
 """
+from pyspark.sql import SparkSession
 if not isinstance(schema, StructType):
 raise TypeError("schema should be StructType")
-jschema = self._spark._ssql_ctx.parseDataType(schema.json())
+spark = SparkSession.builder.getOrCreate()
+jschema = spark._jsparkSession.parseDataType(schema.json())
 self._jreader = self._jreader.schema(jschema)
 return self
 

http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/streaming.py
--
diff --git a/python/pyspark/sql/streaming.py

spark git commit: [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3258f27a8 -> aa57083af


[SPARK-17228][SQL] Not infer/propagate non-deterministic constraints

## What changes were proposed in this pull request?

Given that filters based on non-deterministic constraints shouldn't be pushed 
down in the query plan, unnecessarily inferring them is confusing and a source 
of potential bugs. This patch simplifies the inferring logic by simply ignoring 
them.

## How was this patch tested?

Added a new test in `ConstraintPropagationSuite`.

Author: Sameer Agarwal 

Closes #14795 from sameeragarwal/deterministic-constraints.

(cherry picked from commit ac27557eb622a257abeb3e8551f06ebc72f87133)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aa57083a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aa57083a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aa57083a

Branch: refs/heads/branch-2.0
Commit: aa57083af4cecb595bac09e437607d7142b54913
Parents: 3258f27
Author: Sameer Agarwal 
Authored: Wed Aug 24 21:24:24 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 24 21:24:31 2016 -0700

--
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  3 ++-
 .../plans/ConstraintPropagationSuite.scala | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/aa57083a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index cf34f4b..9c60590 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -35,7 +35,8 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
   .union(inferAdditionalConstraints(constraints))
   .union(constructIsNotNullConstraints(constraints))
   .filter(constraint =>
-constraint.references.nonEmpty && 
constraint.references.subsetOf(outputSet))
+constraint.references.nonEmpty && 
constraint.references.subsetOf(outputSet) &&
+  constraint.deterministic)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/aa57083a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
index 5a76969..8d6a49a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
@@ -352,4 +352,21 @@ class ConstraintPropagationSuite extends SparkFunSuite {
 verifyConstraints(tr.analyze.constraints,
   ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "b")), 
IsNotNull(resolveColumn(tr, "c")
   }
+
+  test("not infer non-deterministic constraints") {
+val tr = LocalRelation('a.int, 'b.string, 'c.int)
+
+verifyConstraints(tr
+  .where('a.attr === Rand(0))
+  .analyze.constraints,
+  ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "a")
+
+verifyConstraints(tr
+  .where('a.attr === InputFileName())
+  .where('a.attr =!= 'c.attr)
+  .analyze.constraints,
+  ExpressionSet(Seq(resolveColumn(tr, "a") =!= resolveColumn(tr, "c"),
+IsNotNull(resolveColumn(tr, "a")),
+IsNotNull(resolveColumn(tr, "c")
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 3a60be4b1 -> ac27557eb


[SPARK-17228][SQL] Not infer/propagate non-deterministic constraints

## What changes were proposed in this pull request?

Given that filters based on non-deterministic constraints shouldn't be pushed 
down in the query plan, unnecessarily inferring them is confusing and a source 
of potential bugs. This patch simplifies the inferring logic by simply ignoring 
them.

## How was this patch tested?

Added a new test in `ConstraintPropagationSuite`.

Author: Sameer Agarwal 

Closes #14795 from sameeragarwal/deterministic-constraints.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac27557e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac27557e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac27557e

Branch: refs/heads/master
Commit: ac27557eb622a257abeb3e8551f06ebc72f87133
Parents: 3a60be4
Author: Sameer Agarwal 
Authored: Wed Aug 24 21:24:24 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 24 21:24:24 2016 -0700

--
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  3 ++-
 .../plans/ConstraintPropagationSuite.scala | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 8ee31f4..0fb6e7d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -35,7 +35,8 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
   .union(inferAdditionalConstraints(constraints))
   .union(constructIsNotNullConstraints(constraints))
   .filter(constraint =>
-constraint.references.nonEmpty && 
constraint.references.subsetOf(outputSet))
+constraint.references.nonEmpty && 
constraint.references.subsetOf(outputSet) &&
+  constraint.deterministic)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
index 5a76969..8d6a49a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
@@ -352,4 +352,21 @@ class ConstraintPropagationSuite extends SparkFunSuite {
 verifyConstraints(tr.analyze.constraints,
   ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "b")), 
IsNotNull(resolveColumn(tr, "c")
   }
+
+  test("not infer non-deterministic constraints") {
+val tr = LocalRelation('a.int, 'b.string, 'c.int)
+
+verifyConstraints(tr
+  .where('a.attr === Rand(0))
+  .analyze.constraints,
+  ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "a")
+
+verifyConstraints(tr
+  .where('a.attr === InputFileName())
+  .where('a.attr =!= 'c.attr)
+  .analyze.constraints,
+  ExpressionSet(Seq(resolveColumn(tr, "a") =!= resolveColumn(tr, "c"),
+IsNotNull(resolveColumn(tr, "a")),
+IsNotNull(resolveColumn(tr, "c")
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat options for CSV and JSON

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 9f363a690 -> 3258f27a8


[SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat 
options for CSV and JSON

## What changes were proposed in this pull request?

This PR backports https://github.com/apache/spark/pull/14279 to 2.0.

## How was this patch tested?

Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing tests 
cover the default cases.

Author: hyukjinkwon 

Closes #14799 from HyukjinKwon/SPARK-16216-json-csv-backport.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3258f27a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3258f27a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3258f27a

Branch: refs/heads/branch-2.0
Commit: 3258f27a881dfeb5ab8bae90c338603fa4b6f9d8
Parents: 9f363a6
Author: hyukjinkwon 
Authored: Wed Aug 24 21:19:35 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 24 21:19:35 2016 -0700

--
 python/pyspark/sql/readwriter.py|  56 +--
 python/pyspark/sql/streaming.py |  30 +++-
 .../org/apache/spark/sql/DataFrameReader.scala  |  17 +-
 .../org/apache/spark/sql/DataFrameWriter.scala  |  12 ++
 .../datasources/csv/CSVInferSchema.scala|  42 ++---
 .../execution/datasources/csv/CSVOptions.scala  |  15 +-
 .../execution/datasources/csv/CSVRelation.scala |  43 -
 .../datasources/json/JSONOptions.scala  |   9 ++
 .../datasources/json/JacksonGenerator.scala |  14 +-
 .../datasources/json/JacksonParser.scala|  68 
 .../datasources/json/JsonFileFormat.scala   |   5 +-
 .../spark/sql/streaming/DataStreamReader.scala  |  16 +-
 .../datasources/csv/CSVInferSchemaSuite.scala   |   4 +-
 .../execution/datasources/csv/CSVSuite.scala| 156 ++-
 .../datasources/csv/CSVTypeCastSuite.scala  |  17 +-
 .../execution/datasources/json/JsonSuite.scala  |  74 -
 .../datasources/json/TestJsonData.scala |   6 +
 .../sql/sources/JsonHadoopFsRelationSuite.scala |   4 +
 18 files changed, 478 insertions(+), 110 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3258f27a/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 64de33e..3da6f49 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -156,7 +156,7 @@ class DataFrameReader(OptionUtils):
 def json(self, path, schema=None, primitivesAsString=None, 
prefersDecimal=None,
  allowComments=None, allowUnquotedFieldNames=None, 
allowSingleQuotes=None,
  allowNumericLeadingZero=None, 
allowBackslashEscapingAnyCharacter=None,
- mode=None, columnNameOfCorruptRecord=None):
+ mode=None, columnNameOfCorruptRecord=None, dateFormat=None, 
timestampFormat=None):
 """
 Loads a JSON file (one object per line) or an RDD of Strings storing 
JSON objects
 (one object per record) and returns the result as a :class`DataFrame`.
@@ -198,6 +198,14 @@ class DataFrameReader(OptionUtils):
   
``spark.sql.columnNameOfCorruptRecord``. If None is set,
   it uses the value specified in
   
``spark.sql.columnNameOfCorruptRecord``.
+:param dateFormat: sets the string that indicates a date format. 
Custom date formats
+   follow the formats at 
``java.text.SimpleDateFormat``. This
+   applies to date type. If None is set, it uses the
+   default value value, ``-MM-dd``.
+:param timestampFormat: sets the string that indicates a timestamp 
format. Custom date
+formats follow the formats at 
``java.text.SimpleDateFormat``.
+This applies to timestamp type. If None is 
set, it uses the
+default value value, 
``-MM-dd'T'HH:mm:ss.SSSZZ``.
 
 >>> df1 = spark.read.json('python/test_support/sql/people.json')
 >>> df1.dtypes
@@ -213,7 +221,8 @@ class DataFrameReader(OptionUtils):
 allowComments=allowComments, 
allowUnquotedFieldNames=allowUnquotedFieldNames,
 allowSingleQuotes=allowSingleQuotes, 
allowNumericLeadingZero=allowNumericLeadingZero,
 
allowBackslashEscapingAnyCharacter=allowBackslashEscapingAnyCharacter,
-mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord)
+mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord,

spark git commit: [SPARK-17186][SQL] remove catalog table type INDEX

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a6e6a047b -> df87f161c


[SPARK-17186][SQL] remove catalog table type INDEX

## What changes were proposed in this pull request?

Actually Spark SQL doesn't support index, the catalog table type `INDEX` is 
from Hive. However, most operations in Spark SQL can't handle index table, e.g. 
create table, alter table, etc.

Logically index table should be invisible to end users, and Hive also generates 
special table name for index table to avoid users accessing it directly. Hive 
has special SQL syntax to create/show/drop index tables.

At Spark SQL side, although we can describe index table directly, but the 
result is unreadable, we should use the dedicated SQL syntax to do it(e.g. 
`SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the 
result is always empty.(Can hive read index table directly?)

This PR remove the table type `INDEX`, to make it clear that Spark SQL doesn't 
support index currently.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #14752 from cloud-fan/minor2.

(cherry picked from commit 52fa45d62a5a0bc832442f38f9e634c5d8e29e08)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df87f161
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df87f161
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df87f161

Branch: refs/heads/branch-2.0
Commit: df87f161c9e40a49235ea722f6a662a488b41c4c
Parents: a6e6a04
Author: Wenchen Fan 
Authored: Tue Aug 23 23:46:09 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 23 23:46:17 2016 -0700

--
 .../org/apache/spark/sql/catalyst/catalog/interface.scala| 1 -
 .../org/apache/spark/sql/execution/command/tables.scala  | 8 +++-
 .../scala/org/apache/spark/sql/hive/MetastoreRelation.scala  | 1 -
 .../org/apache/spark/sql/hive/client/HiveClientImpl.scala| 4 ++--
 .../apache/spark/sql/hive/execution/HiveCommandSuite.scala   | 2 +-
 5 files changed, 6 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index 6197aca..c083cf6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -203,7 +203,6 @@ case class CatalogTableType private(name: String)
 object CatalogTableType {
   val EXTERNAL = new CatalogTableType("EXTERNAL")
   val MANAGED = new CatalogTableType("MANAGED")
-  val INDEX = new CatalogTableType("INDEX")
   val VIEW = new CatalogTableType("VIEW")
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index b2300b4..a5ccbcf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -678,12 +678,11 @@ case class ShowPartitionsCommand(
  * Validate and throws an [[AnalysisException]] exception under the 
following conditions:
  * 1. If the table is not partitioned.
  * 2. If it is a datasource table.
- * 3. If it is a view or index table.
+ * 3. If it is a view.
  */
-if (tab.tableType == VIEW ||
-  tab.tableType == INDEX) {
+if (tab.tableType == VIEW) {
   throw new AnalysisException(
-s"SHOW PARTITIONS is not allowed on a view or index table: 
${tab.qualifiedName}")
+s"SHOW PARTITIONS is not allowed on a view: ${tab.qualifiedName}")
 }
 
 if (!DDLUtils.isTablePartitioned(tab)) {
@@ -765,7 +764,6 @@ case class ShowCreateTableCommand(table: TableIdentifier) 
extends RunnableComman
   case EXTERNAL => " EXTERNAL TABLE"
   case VIEW => " VIEW"
   case MANAGED => " TABLE"
-  case INDEX => reportUnsupportedError(Seq("index table"))
 }
 
 builder ++= s"CREATE$tableTypeString ${table.quotedString}"

http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala

spark git commit: [SPARK-17186][SQL] remove catalog table type INDEX

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b9994ad05 -> 52fa45d62


[SPARK-17186][SQL] remove catalog table type INDEX

## What changes were proposed in this pull request?

Actually Spark SQL doesn't support index, the catalog table type `INDEX` is 
from Hive. However, most operations in Spark SQL can't handle index table, e.g. 
create table, alter table, etc.

Logically index table should be invisible to end users, and Hive also generates 
special table name for index table to avoid users accessing it directly. Hive 
has special SQL syntax to create/show/drop index tables.

At Spark SQL side, although we can describe index table directly, but the 
result is unreadable, we should use the dedicated SQL syntax to do it(e.g. 
`SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the 
result is always empty.(Can hive read index table directly?)

This PR remove the table type `INDEX`, to make it clear that Spark SQL doesn't 
support index currently.

## How was this patch tested?

existing tests.

Author: Wenchen Fan 

Closes #14752 from cloud-fan/minor2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52fa45d6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52fa45d6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52fa45d6

Branch: refs/heads/master
Commit: 52fa45d62a5a0bc832442f38f9e634c5d8e29e08
Parents: b9994ad
Author: Wenchen Fan 
Authored: Tue Aug 23 23:46:09 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 23 23:46:09 2016 -0700

--
 .../org/apache/spark/sql/catalyst/catalog/interface.scala| 1 -
 .../org/apache/spark/sql/execution/command/tables.scala  | 8 +++-
 .../scala/org/apache/spark/sql/hive/MetastoreRelation.scala  | 1 -
 .../org/apache/spark/sql/hive/client/HiveClientImpl.scala| 4 ++--
 .../apache/spark/sql/hive/execution/HiveCommandSuite.scala   | 2 +-
 5 files changed, 6 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index f7762e0..83e01f9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -200,7 +200,6 @@ case class CatalogTableType private(name: String)
 object CatalogTableType {
   val EXTERNAL = new CatalogTableType("EXTERNAL")
   val MANAGED = new CatalogTableType("MANAGED")
-  val INDEX = new CatalogTableType("INDEX")
   val VIEW = new CatalogTableType("VIEW")
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 21544a3..b4a15b8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -620,12 +620,11 @@ case class ShowPartitionsCommand(
  * Validate and throws an [[AnalysisException]] exception under the 
following conditions:
  * 1. If the table is not partitioned.
  * 2. If it is a datasource table.
- * 3. If it is a view or index table.
+ * 3. If it is a view.
  */
-if (tab.tableType == VIEW ||
-  tab.tableType == INDEX) {
+if (tab.tableType == VIEW) {
   throw new AnalysisException(
-s"SHOW PARTITIONS is not allowed on a view or index table: 
${tab.qualifiedName}")
+s"SHOW PARTITIONS is not allowed on a view: ${tab.qualifiedName}")
 }
 
 if (tab.partitionColumnNames.isEmpty) {
@@ -708,7 +707,6 @@ case class ShowCreateTableCommand(table: TableIdentifier) 
extends RunnableComman
   case EXTERNAL => " EXTERNAL TABLE"
   case VIEW => " VIEW"
   case MANAGED => " TABLE"
-  case INDEX => reportUnsupportedError(Seq("index table"))
 }
 
 builder ++= s"CREATE$tableTypeString ${table.quotedString}"

http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala

spark git commit: [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala'

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a772b4b5d -> a6e6a047b


[MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'

## What changes were proposed in this pull request?
This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.

## How was this patch tested?
Manual.

Author: Weiqing Yang 

Closes #14769 from Sherry302/cleanComment.

(cherry picked from commit b9994ad05628077016331e6b411fbc09017b1e63)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6e6a047
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6e6a047
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6e6a047

Branch: refs/heads/branch-2.0
Commit: a6e6a047bb9215df55b009957d4c560624d886fc
Parents: a772b4b
Author: Weiqing Yang 
Authored: Tue Aug 23 23:44:45 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 23 23:45:00 2016 -0700

--
 .../scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala   | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a6e6a047/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
index c59ac3d..1684e8d 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
@@ -230,10 +230,8 @@ private[sql] class HiveSessionCatalog(
   // List of functions we are explicitly not supporting are:
   // compute_stats, context_ngrams, create_union,
   // current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, 
ewah_bitmap_or, field,
-  // in_file, index, java_method,
-  // matchpath, ngrams, noop, noopstreaming, noopwithmap, noopwithmapstreaming,
-  // parse_url_tuple, posexplode, reflect2,
-  // str_to_map, windowingtablefunction.
+  // in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap,
+  // noopwithmapstreaming, parse_url_tuple, reflect2, windowingtablefunction.
   private val hiveFunctions = Seq(
 "hash",
 "histogram_numeric",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala'

2016-08-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c1937dd19 -> b9994ad05


[MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'

## What changes were proposed in this pull request?
This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.

## How was this patch tested?
Manual.

Author: Weiqing Yang 

Closes #14769 from Sherry302/cleanComment.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9994ad0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9994ad0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9994ad0

Branch: refs/heads/master
Commit: b9994ad05628077016331e6b411fbc09017b1e63
Parents: c1937dd
Author: Weiqing Yang 
Authored: Tue Aug 23 23:44:45 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 23 23:44:45 2016 -0700

--
 .../scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala   | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b9994ad0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
index ebed9eb..ca8c734 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
@@ -230,10 +230,8 @@ private[sql] class HiveSessionCatalog(
   // List of functions we are explicitly not supporting are:
   // compute_stats, context_ngrams, create_union,
   // current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, 
ewah_bitmap_or, field,
-  // in_file, index, java_method,
-  // matchpath, ngrams, noop, noopstreaming, noopwithmap, noopwithmapstreaming,
-  // parse_url_tuple, posexplode, reflect2,
-  // str_to_map, windowingtablefunction.
+  // in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap,
+  // noopwithmapstreaming, parse_url_tuple, reflect2, windowingtablefunction.
   private val hiveFunctions = Seq(
 "hash",
 "histogram_numeric",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`

2016-08-23 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master bf8ff833e -> c1937dd19


[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`

## What changes were proposed in this pull request?

Jira: https://issues.apache.org/jira/browse/SPARK-16862

`BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k 
buffer to read data off disk. This PR makes it configurable to improve on disk 
reads. I have made the default value to be 1 MB as with that value I observed 
improved performance.

## How was this patch tested?

I am relying on the existing unit tests.

## Performance

After deploying this change to prod and setting the config to 1 mb, there was a 
12% reduction in the CPU time and 19.5% reduction in CPU reservation time.

Author: Tejas Patil 

Closes #14726 from tejasapatil/spill_buffer_2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1937dd1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1937dd1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1937dd1

Branch: refs/heads/master
Commit: c1937dd19a23bd096a4707656c7ba19fb5c16966
Parents: bf8ff83
Author: Tejas Patil 
Authored: Tue Aug 23 18:48:08 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 23 18:48:08 2016 -0700

--
 .../unsafe/sort/UnsafeSorterSpillReader.java| 22 +++-
 1 file changed, 21 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c1937dd1/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
index 1d588c3..d048cf7 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
@@ -22,15 +22,21 @@ import java.io.*;
 import com.google.common.io.ByteStreams;
 import com.google.common.io.Closeables;
 
+import org.apache.spark.SparkEnv;
 import org.apache.spark.serializer.SerializerManager;
 import org.apache.spark.storage.BlockId;
 import org.apache.spark.unsafe.Platform;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Reads spill files written by {@link UnsafeSorterSpillWriter} (see that 
class for a description
  * of the file format).
  */
 public final class UnsafeSorterSpillReader extends UnsafeSorterIterator 
implements Closeable {
+  private static final Logger logger = 
LoggerFactory.getLogger(UnsafeSorterSpillReader.class);
+  private static final int DEFAULT_BUFFER_SIZE_BYTES = 1024 * 1024; // 1 MB
+  private static final int MAX_BUFFER_SIZE_BYTES = 16777216; // 16 mb
 
   private InputStream in;
   private DataInputStream din;
@@ -50,7 +56,21 @@ public final class UnsafeSorterSpillReader extends 
UnsafeSorterIterator implemen
   File file,
   BlockId blockId) throws IOException {
 assert (file.length() > 0);
-final BufferedInputStream bs = new BufferedInputStream(new 
FileInputStream(file));
+long bufferSizeBytes =
+SparkEnv.get() == null ?
+DEFAULT_BUFFER_SIZE_BYTES:
+
SparkEnv.get().conf().getSizeAsBytes("spark.unsafe.sorter.spill.reader.buffer.size",
+ DEFAULT_BUFFER_SIZE_BYTES);
+if (bufferSizeBytes > MAX_BUFFER_SIZE_BYTES || bufferSizeBytes < 
DEFAULT_BUFFER_SIZE_BYTES) {
+  // fall back to a sane default value
+  logger.warn("Value of config 
\"spark.unsafe.sorter.spill.reader.buffer.size\" = {} not in " +
+  "allowed range [{}, {}). Falling back to default value : 
{} bytes", bufferSizeBytes,
+  DEFAULT_BUFFER_SIZE_BYTES, MAX_BUFFER_SIZE_BYTES, 
DEFAULT_BUFFER_SIZE_BYTES);
+  bufferSizeBytes = DEFAULT_BUFFER_SIZE_BYTES;
+}
+
+final BufferedInputStream bs =
+new BufferedInputStream(new FileInputStream(file), (int) 
bufferSizeBytes);
 try {
   this.in = serializerManager.wrapForCompression(blockId, bs);
   this.din = new DataInputStream(this.in);


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 71afeeea4 -> 8e223ea67


[SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block 
manager replication

## What changes were proposed in this pull request?

This is a straightforward clone of JoshRosen 's original patch. I have 
follow-up changes to fix block replication for repl-defined classes as well, 
but those appear to be flaking tests so I'm going to leave that for SPARK-17042

## How was this patch tested?

End-to-end test in ReplSuite (also more tests in DistributedSuite from the 
original patch).

Author: Eric Liang 

Closes #14311 from ericl/spark-16550.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e223ea6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e223ea6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e223ea6

Branch: refs/heads/master
Commit: 8e223ea67acf5aa730ccf688802f17f6fc10907c
Parents: 71afeee
Author: Eric Liang 
Authored: Mon Aug 22 16:32:14 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 16:32:14 2016 -0700

--
 .../spark/serializer/SerializerManager.scala| 14 +++-
 .../org/apache/spark/storage/BlockManager.scala | 13 +++-
 .../org/apache/spark/DistributedSuite.scala | 77 ++--
 .../scala/org/apache/spark/repl/ReplSuite.scala | 14 
 4 files changed, 60 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala 
b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
index 9dc274c..07caadb 100644
--- a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
@@ -68,7 +68,7 @@ private[spark] class SerializerManager(defaultSerializer: 
Serializer, conf: Spar
* loaded yet. */
   private lazy val compressionCodec: CompressionCodec = 
CompressionCodec.createCodec(conf)
 
-  private def canUseKryo(ct: ClassTag[_]): Boolean = {
+  def canUseKryo(ct: ClassTag[_]): Boolean = {
 primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag
   }
 
@@ -128,8 +128,18 @@ private[spark] class SerializerManager(defaultSerializer: 
Serializer, conf: Spar
 
   /** Serializes into a chunked byte buffer. */
   def dataSerialize[T: ClassTag](blockId: BlockId, values: Iterator[T]): 
ChunkedByteBuffer = {
+dataSerializeWithExplicitClassTag(blockId, values, implicitly[ClassTag[T]])
+  }
+
+  /** Serializes into a chunked byte buffer. */
+  def dataSerializeWithExplicitClassTag(
+  blockId: BlockId,
+  values: Iterator[_],
+  classTag: ClassTag[_]): ChunkedByteBuffer = {
 val bbos = new ChunkedByteBufferOutputStream(1024 * 1024 * 4, 
ByteBuffer.allocate)
-dataSerializeStream(blockId, bbos, values)
+val byteStream = new BufferedOutputStream(bbos)
+val ser = getSerializer(classTag).newInstance()
+ser.serializeStream(wrapForCompression(blockId, 
byteStream)).writeAll(values).close()
 bbos.toChunkedByteBuffer
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 015e71d..fe84652 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -498,7 +498,8 @@ private[spark] class BlockManager(
 diskStore.getBytes(blockId)
   } else if (level.useMemory && memoryStore.contains(blockId)) {
 // The block was not found on disk, so serialize an in-memory copy:
-serializerManager.dataSerialize(blockId, 
memoryStore.getValues(blockId).get)
+serializerManager.dataSerializeWithExplicitClassTag(
+  blockId, memoryStore.getValues(blockId).get, info.classTag)
   } else {
 handleLocalReadFailure(blockId)
   }
@@ -973,8 +974,16 @@ private[spark] class BlockManager(
 if (level.replication > 1) {
   val remoteStartTime = System.currentTimeMillis
   val bytesToReplicate = doGetLocalBytes(blockId, info)
+  // [SPARK-16550] Erase the typed classTag when using default 
serialization, since
+  // NettyBlockRpcServer crashes when deserializing repl-defined 
classes.
+  // TODO(ekl) remove this once the classloader issue on

spark git commit: [SPARK-17162] Range does not support SQL generation

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6dcc1a3f0 -> 01a4d69f3


[SPARK-17162] Range does not support SQL generation

## What changes were proposed in this pull request?

The range operator previously didn't support SQL generation, which made it not 
possible to use in views.

## How was this patch tested?

Unit tests.

cc hvanhovell

Author: Eric Liang 

Closes #14724 from ericl/spark-17162.

(cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/01a4d69f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/01a4d69f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/01a4d69f

Branch: refs/heads/branch-2.0
Commit: 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8
Parents: 6dcc1a3
Author: Eric Liang 
Authored: Mon Aug 22 15:48:35 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 15:48:43 2016 -0700

--
 .../analysis/ResolveTableValuedFunctions.scala  | 11 --
 .../plans/logical/basicLogicalOperators.scala   | 21 +---
 .../apache/spark/sql/catalyst/SQLBuilder.scala  |  3 +++
 .../sql/execution/basicPhysicalOperators.scala  |  2 +-
 .../spark/sql/execution/command/views.scala |  3 +--
 sql/hive/src/test/resources/sqlgen/range.sql|  4 
 .../test/resources/sqlgen/range_with_splits.sql |  4 
 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 -
 8 files changed, 44 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index 7fdf7fa..6b3bb68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, 
LongType}
  * Rule that resolves table-valued function references.
  */
 object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
-  private lazy val defaultParallelism =
-SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism
-
   /**
* List of argument names and their types, used to declare a function.
*/
@@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 "range" -> Map(
   /* range(end) */
   tvf("end" -> LongType) { case Seq(end: Long) =>
-Range(0, end, 1, defaultParallelism)
+Range(0, end, 1, None)
   },
 
   /* range(start, end) */
   tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: 
Long) =>
-Range(start, end, 1, defaultParallelism)
+Range(start, end, 1, None)
   },
 
   /* range(start, end, step) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) {
 case Seq(start: Long, end: Long, step: Long) =>
-  Range(start, end, step, defaultParallelism)
+  Range(start, end, step, None)
   },
 
   /* range(start, end, step, numPartitions) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType,
   "numPartitions" -> IntegerType) {
 case Seq(start: Long, end: Long, step: Long, numPartitions: Int) =>
-  Range(start, end, step, numPartitions)
+  Range(start, end, step, Some(numPartitions))
   })
   )
 

http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index eb612c4..07e39b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -422,17 +422,20 @@ case class Sort(
 
 /** Factory for constructing new `Range` nodes. */
 object Range {
-  def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
+  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range 
= {
 val

spark git commit: [SPARK-17162] Range does not support SQL generation

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 929cb8bee -> 84770b59f


[SPARK-17162] Range does not support SQL generation

## What changes were proposed in this pull request?

The range operator previously didn't support SQL generation, which made it not 
possible to use in views.

## How was this patch tested?

Unit tests.

cc hvanhovell

Author: Eric Liang 

Closes #14724 from ericl/spark-17162.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/84770b59
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/84770b59
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/84770b59

Branch: refs/heads/master
Commit: 84770b59f773f132073cd2af4204957fc2d7bf35
Parents: 929cb8b
Author: Eric Liang 
Authored: Mon Aug 22 15:48:35 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 15:48:35 2016 -0700

--
 .../analysis/ResolveTableValuedFunctions.scala  | 11 --
 .../plans/logical/basicLogicalOperators.scala   | 21 +---
 .../apache/spark/sql/catalyst/SQLBuilder.scala  |  3 +++
 .../sql/execution/basicPhysicalOperators.scala  |  2 +-
 .../spark/sql/execution/command/views.scala |  3 +--
 sql/hive/src/test/resources/sqlgen/range.sql|  4 
 .../test/resources/sqlgen/range_with_splits.sql |  4 
 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 -
 8 files changed, 44 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index 7fdf7fa..6b3bb68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, 
LongType}
  * Rule that resolves table-valued function references.
  */
 object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
-  private lazy val defaultParallelism =
-SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism
-
   /**
* List of argument names and their types, used to declare a function.
*/
@@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 "range" -> Map(
   /* range(end) */
   tvf("end" -> LongType) { case Seq(end: Long) =>
-Range(0, end, 1, defaultParallelism)
+Range(0, end, 1, None)
   },
 
   /* range(start, end) */
   tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: 
Long) =>
-Range(start, end, 1, defaultParallelism)
+Range(start, end, 1, None)
   },
 
   /* range(start, end, step) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) {
 case Seq(start: Long, end: Long, step: Long) =>
-  Range(start, end, step, defaultParallelism)
+  Range(start, end, step, None)
   },
 
   /* range(start, end, step, numPartitions) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType,
   "numPartitions" -> IntegerType) {
 case Seq(start: Long, end: Long, step: Long, numPartitions: Int) =>
-  Range(start, end, step, numPartitions)
+  Range(start, end, step, Some(numPartitions))
   })
   )
 

http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index af1736e..010aec7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -422,17 +422,20 @@ case class Sort(
 
 /** Factory for constructing new `Range` nodes. */
 object Range {
-  def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
+  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range 
= {
 val output = StructType(StructField("id", LongType, nullable = false) :: 
Nil).toAttributes
 new Range(start, end, step,

spark git commit: [SPARK-17158][SQL] Change error message for out of range numeric literals

2016-08-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 efe832200 -> 379b12729


[SPARK-17158][SQL] Change error message for out of range numeric literals

## What changes were proposed in this pull request?

Modifies error message for numeric literals to
Numeric literal  does not fit in range [min, max] for type 

## How was this patch tested?

Fixed up the error messages for literals.sql in  SqlQueryTestSuite and re-ran 
via sbt. Also fixed up error messages in ExpressionParserSuite

Author: Srinath Shankar 

Closes #14721 from srinathshankar/sc4296.

(cherry picked from commit ba1737c21aab91ff3f1a1737aa2d6b07575e36a3)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/379b1272
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/379b1272
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/379b1272

Branch: refs/heads/branch-2.0
Commit: 379b1272925e534d99ddf4e4add054284900d200
Parents: efe8322
Author: Srinath Shankar 
Authored: Fri Aug 19 19:54:26 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 19 19:54:47 2016 -0700

--
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 29 
 .../catalyst/parser/ExpressionParserSuite.scala |  9 --
 .../sql-tests/results/literals.sql.out  |  6 ++--
 3 files changed, 27 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/379b1272/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 0230294..aec3126 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1273,10 +1273,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   }
 
   /** Create a numeric literal expression. */
-  private def numericLiteral(ctx: NumberContext)(f: String => Any): Literal = 
withOrigin(ctx) {
-val raw = ctx.getText
+  private def numericLiteral
+  (ctx: NumberContext, minValue: BigDecimal, maxValue: BigDecimal, 
typeName: String)
+  (converter: String => Any): Literal = withOrigin(ctx) {
+val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
 try {
-  Literal(f(raw.substring(0, raw.length - 1)))
+  val rawBigDecimal = BigDecimal(rawStrippedQualifier)
+  if (rawBigDecimal < minValue || rawBigDecimal > maxValue) {
+throw new ParseException(s"Numeric literal ${rawStrippedQualifier} 
does not " +
+  s"fit in range [${minValue}, ${maxValue}] for type ${typeName}", ctx)
+  }
+  Literal(converter(rawStrippedQualifier))
 } catch {
   case e: NumberFormatException =>
 throw new ParseException(e.getMessage, ctx)
@@ -1286,29 +1293,29 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   /**
* Create a Byte Literal expression.
*/
-  override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toByte
+  override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = {
+numericLiteral(ctx, Byte.MinValue, Byte.MaxValue, 
ByteType.simpleString)(_.toByte)
   }
 
   /**
* Create a Short Literal expression.
*/
-  override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toShort
+  override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = {
+numericLiteral(ctx, Short.MinValue, Short.MaxValue, 
ShortType.simpleString)(_.toShort)
   }
 
   /**
* Create a Long Literal expression.
*/
-  override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toLong
+  override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = {
+numericLiteral(ctx, Long.MinValue, Long.MaxValue, 
LongType.simpleString)(_.toLong)
   }
 
   /**
* Create a Double Literal expression.
*/
-  override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toDouble
+  override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = {
+numericLiteral(ctx, Double.MinValue, Double.MaxValue, 
DoubleType.simpleString)(_.toDouble)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/379b1272/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
--
diff

spark git commit: [SPARK-17158][SQL] Change error message for out of range numeric literals

2016-08-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a117afa7c -> ba1737c21


[SPARK-17158][SQL] Change error message for out of range numeric literals

## What changes were proposed in this pull request?

Modifies error message for numeric literals to
Numeric literal  does not fit in range [min, max] for type 

## How was this patch tested?

Fixed up the error messages for literals.sql in  SqlQueryTestSuite and re-ran 
via sbt. Also fixed up error messages in ExpressionParserSuite

Author: Srinath Shankar 

Closes #14721 from srinathshankar/sc4296.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba1737c2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba1737c2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba1737c2

Branch: refs/heads/master
Commit: ba1737c21aab91ff3f1a1737aa2d6b07575e36a3
Parents: a117afa
Author: Srinath Shankar 
Authored: Fri Aug 19 19:54:26 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 19 19:54:26 2016 -0700

--
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 29 
 .../catalyst/parser/ExpressionParserSuite.scala |  9 --
 .../sql-tests/results/literals.sql.out  |  6 ++--
 3 files changed, 27 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ba1737c2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 283e4d4..8b98efc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1278,10 +1278,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   }
 
   /** Create a numeric literal expression. */
-  private def numericLiteral(ctx: NumberContext)(f: String => Any): Literal = 
withOrigin(ctx) {
-val raw = ctx.getText
+  private def numericLiteral
+  (ctx: NumberContext, minValue: BigDecimal, maxValue: BigDecimal, 
typeName: String)
+  (converter: String => Any): Literal = withOrigin(ctx) {
+val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
 try {
-  Literal(f(raw.substring(0, raw.length - 1)))
+  val rawBigDecimal = BigDecimal(rawStrippedQualifier)
+  if (rawBigDecimal < minValue || rawBigDecimal > maxValue) {
+throw new ParseException(s"Numeric literal ${rawStrippedQualifier} 
does not " +
+  s"fit in range [${minValue}, ${maxValue}] for type ${typeName}", ctx)
+  }
+  Literal(converter(rawStrippedQualifier))
 } catch {
   case e: NumberFormatException =>
 throw new ParseException(e.getMessage, ctx)
@@ -1291,29 +1298,29 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   /**
* Create a Byte Literal expression.
*/
-  override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toByte
+  override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = {
+numericLiteral(ctx, Byte.MinValue, Byte.MaxValue, 
ByteType.simpleString)(_.toByte)
   }
 
   /**
* Create a Short Literal expression.
*/
-  override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toShort
+  override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = {
+numericLiteral(ctx, Short.MinValue, Short.MaxValue, 
ShortType.simpleString)(_.toShort)
   }
 
   /**
* Create a Long Literal expression.
*/
-  override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toLong
+  override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = {
+numericLiteral(ctx, Long.MinValue, Long.MaxValue, 
LongType.simpleString)(_.toLong)
   }
 
   /**
* Create a Double Literal expression.
*/
-  override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = 
numericLiteral(ctx) {
-_.toDouble
+  override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = {
+numericLiteral(ctx, Double.MinValue, Double.MaxValue, 
DoubleType.simpleString)(_.toDouble)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/ba1737c2/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala

spark git commit: [SPARK-17149][SQL] array.sql for testing array related functions

2016-08-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master acac7a508 -> a117afa7c


[SPARK-17149][SQL] array.sql for testing array related functions

## What changes were proposed in this pull request?
This patch creates array.sql in SQLQueryTestSuite for testing array related 
functions, including:

- indexing
- array creation
- size
- array_contains
- sort_array

## How was this patch tested?
The patch itself is about adding tests.

Author: petermaxlee 

Closes #14708 from petermaxlee/SPARK-17149.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a117afa7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a117afa7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a117afa7

Branch: refs/heads/master
Commit: a117afa7c2d94f943106542ec53d74ba2b5f1058
Parents: acac7a5
Author: petermaxlee 
Authored: Fri Aug 19 18:14:45 2016 -0700
Committer: Reynold Xin 
Committed: Fri Aug 19 18:14:45 2016 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|  12 +-
 .../test/resources/sql-tests/inputs/array.sql   |  86 +++
 .../resources/sql-tests/results/array.sql.out   | 144 +++
 .../org/apache/spark/sql/SQLQuerySuite.scala|  16 ---
 .../apache/spark/sql/SQLQueryTestSuite.scala|  10 ++
 .../hive/execution/HiveCompatibilitySuite.scala |   4 +-
 .../sql/hive/execution/HiveQuerySuite.scala |   9 --
 7 files changed, 248 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a117afa7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index c5f91c1..35fd800 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -161,7 +161,6 @@ object FunctionRegistry {
   val expressions: Map[String, (ExpressionInfo, FunctionBuilder)] = Map(
 // misc non-aggregate functions
 expression[Abs]("abs"),
-expression[CreateArray]("array"),
 expression[Coalesce]("coalesce"),
 expression[Explode]("explode"),
 expression[Greatest]("greatest"),
@@ -172,10 +171,6 @@ object FunctionRegistry {
 expression[IsNull]("isnull"),
 expression[IsNotNull]("isnotnull"),
 expression[Least]("least"),
-expression[CreateMap]("map"),
-expression[MapKeys]("map_keys"),
-expression[MapValues]("map_values"),
-expression[CreateNamedStruct]("named_struct"),
 expression[NaNvl]("nanvl"),
 expression[NullIf]("nullif"),
 expression[Nvl]("nvl"),
@@ -184,7 +179,6 @@ object FunctionRegistry {
 expression[Rand]("rand"),
 expression[Randn]("randn"),
 expression[Stack]("stack"),
-expression[CreateStruct]("struct"),
 expression[CaseWhen]("when"),
 
 // math functions
@@ -354,9 +348,15 @@ object FunctionRegistry {
 expression[TimeWindow]("window"),
 
 // collection functions
+expression[CreateArray]("array"),
 expression[ArrayContains]("array_contains"),
+expression[CreateMap]("map"),
+expression[CreateNamedStruct]("named_struct"),
+expression[MapKeys]("map_keys"),
+expression[MapValues]("map_values"),
 expression[Size]("size"),
 expression[SortArray]("sort_array"),
+expression[CreateStruct]("struct"),
 
 // misc functions
 expression[AssertTrue]("assert_true"),

http://git-wip-us.apache.org/repos/asf/spark/blob/a117afa7/sql/core/src/test/resources/sql-tests/inputs/array.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql 
b/sql/core/src/test/resources/sql-tests/inputs/array.sql
new file mode 100644
index 000..4038a0d
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql
@@ -0,0 +1,86 @@
+-- test cases for array functions
+
+create temporary view data as select * from values
+  ("one", array(11, 12, 13), array(array(111, 112, 113), array(121, 122, 
123))),
+  ("two", array(21, 22, 23), array(array(211, 212, 213), array(221, 222, 223)))
+  as data(a, b, c);
+
+select * from data;
+
+-- index into array
+select a, b[0], b[0] + b[1] from data;
+
+-- index into array of arrays
+select a, c[0][0] + c[0][0 + 1] from data;
+
+
+create temporary view primitive_arrays as select * from values (
+  array(true),
+  array(2Y, 1Y),
+  array(2S, 1S),
+  array(2, 1),
+  array(2L, 1L),
+  array(9223372036854775809, 9223372036854775808),
+  array(2.0D, 1.0D),

spark git commit: [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning

2016-08-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d0707c6ba -> 3276ccfac


[SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by 
ColumnPruning

We push down `Project` through `Sample` in `Optimizer` by the rule 
`PushProjectThroughSample`. However, if the projected columns produce new 
output, they will encounter whole data instead of sampled data. It will bring 
some inconsistency between original plan (Sample then Project) and optimized 
plan (Project then Sample). In the extreme case such as attached in the JIRA, 
if the projected column is an UDF which is supposed to not see the sampled out 
data, the result of UDF will be incorrect.

Since the rule `ColumnPruning` already handles general `Project` pushdown. We 
don't need  `PushProjectThroughSample` anymore. The rule `ColumnPruning` also 
avoids the described issue.

Jenkins tests.

Author: Liang-Chi Hsieh 

Closes #14327 from viirya/fix-sample-pushdown.

(cherry picked from commit 7b06a8948fc16d3c14e240fdd632b79ce1651008)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3276ccfa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3276ccfa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3276ccfa

Branch: refs/heads/branch-2.0
Commit: 3276ccfac807514d5a959415bcf58d2aa6ed8fbc
Parents: d0707c6
Author: Liang-Chi Hsieh 
Authored: Tue Jul 26 12:00:01 2016 +0800
Committer: Reynold Xin 
Committed: Fri Aug 19 11:18:55 2016 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 12 --
 .../catalyst/optimizer/ColumnPruningSuite.scala | 15 
 .../optimizer/FilterPushdownSuite.scala | 17 -
 .../org/apache/spark/sql/DatasetSuite.scala | 25 
 4 files changed, 40 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3276ccfa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 19d3c39..88cc0e4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -75,7 +75,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, 
conf: CatalystConf)
 Batch("Operator Optimizations", fixedPoint,
   // Operator push down
   PushThroughSetOperations,
-  PushProjectThroughSample,
   ReorderJoin,
   EliminateOuterJoin,
   PushPredicateThroughJoin,
@@ -147,17 +146,6 @@ class SimpleTestOptimizer extends Optimizer(
   new SimpleCatalystConf(caseSensitiveAnalysis = true))
 
 /**
- * Pushes projects down beneath Sample to enable column pruning with sampling.
- */
-object PushProjectThroughSample extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-// Push down projection into sample
-case Project(projectList, Sample(lb, up, replace, seed, child)) =>
-  Sample(lb, up, replace, seed, Project(projectList, child))()
-  }
-}
-
-/**
  * Removes the Project only conducting Alias of its child node.
  * It is created mainly for removing extra Project added in 
EliminateSerialization rule,
  * but can also benefit other operators.

http://git-wip-us.apache.org/repos/asf/spark/blob/3276ccfa/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
index b5664a5..589607e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
@@ -346,5 +346,20 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(Optimize.execute(plan1.analyze), correctAnswer1)
   }
 
+  test("push project down into sample") {
+val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
+val x = testRelation.subquery('x)
+
+val query1 = Sample(0.0, 0.6, false, 11L, x)().select('a)
+val optimized1 = Optimize.execute(query1.analyze)
+val expected1 = Sample(0.0, 0.6, false, 11L, x.select('a))()
+comparePlans(optimized1, expected1.analyze)
+
+val query2 = Sample(0.0, 0.6, false, 11L, x)().select('a

spark git commit: HOTFIX: compilation broken due to protected ctor.

2016-08-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 c180d637a -> 05b180faa


HOTFIX: compilation broken due to protected ctor.

(cherry picked from commit b482c09fa22c5762a355f95820e4ba3e2517fb77)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/05b180fa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/05b180fa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/05b180fa

Branch: refs/heads/branch-2.0
Commit: 05b180faa4bd87498516c05d4769cc2f51d56aae
Parents: c180d63
Author: Reynold Xin 
Authored: Thu Aug 18 19:02:32 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 18 19:03:00 2016 -0700

--
 .../org/apache/spark/sql/catalyst/expressions/literals.scala  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/05b180fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 95ed68f..7040008 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -163,8 +163,7 @@ object DecimalLiteral {
 /**
  * In order to do type checking, use Literal.create() instead of constructor
  */
-case class Literal protected (value: Any, dataType: DataType)
-  extends LeafExpression with CodegenFallback {
+case class Literal (value: Any, dataType: DataType) extends LeafExpression 
with CodegenFallback {
 
   override def foldable: Boolean = true
   override def nullable: Boolean = value == null


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: HOTFIX: compilation broken due to protected ctor.

2016-08-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master f5472dda5 -> b482c09fa


HOTFIX: compilation broken due to protected ctor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b482c09f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b482c09f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b482c09f

Branch: refs/heads/master
Commit: b482c09fa22c5762a355f95820e4ba3e2517fb77
Parents: f5472dd
Author: Reynold Xin 
Authored: Thu Aug 18 19:02:32 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 18 19:02:32 2016 -0700

--
 .../org/apache/spark/sql/catalyst/expressions/literals.scala  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b482c09f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 95ed68f..7040008 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -163,8 +163,7 @@ object DecimalLiteral {
 /**
  * In order to do type checking, use Literal.create() instead of constructor
  */
-case class Literal protected (value: Any, dataType: DataType)
-  extends LeafExpression with CodegenFallback {
+case class Literal (value: Any, dataType: DataType) extends LeafExpression 
with CodegenFallback {
 
   override def foldable: Boolean = true
   override def nullable: Boolean = value == null


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables

2016-08-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ea684b69c -> c180d637a


[SPARK-16947][SQL] Support type coercion and foldable expression for inline 
tables

This patch improves inline table support with the following:

1. Support type coercion.
2. Support using foldable expressions. Previously only literals were supported.
3. Improve error message handling.
4. Improve test coverage.

Added a new unit test suite ResolveInlineTablesSuite and a new file-based 
end-to-end test inline-table.sql.

Author: petermaxlee 

Closes #14676 from petermaxlee/SPARK-16947.

(cherry picked from commit f5472dda51b980a726346587257c22873ff708e3)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c180d637
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c180d637
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c180d637

Branch: refs/heads/branch-2.0
Commit: c180d637a3caca0d4e46f4980c10d1005eb453bc
Parents: ea684b6
Author: petermaxlee 
Authored: Fri Aug 19 09:19:47 2016 +0800
Committer: Reynold Xin 
Committed: Thu Aug 18 18:37:40 2016 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |   1 +
 .../catalyst/analysis/ResolveInlineTables.scala | 112 ++
 .../sql/catalyst/analysis/TypeCoercion.scala|   2 +-
 .../sql/catalyst/analysis/unresolved.scala  |  26 +++-
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  41 ++
 .../analysis/ResolveInlineTablesSuite.scala | 101 +
 .../sql/catalyst/parser/PlanParserSuite.scala   |  22 +--
 .../resources/sql-tests/inputs/inline-table.sql |  48 ++
 .../sql-tests/results/inline-table.sql.out  | 145 +++
 9 files changed, 452 insertions(+), 46 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c180d637/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index e0b8166..14e995e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -108,6 +108,7 @@ class Analyzer(
   GlobalAggregates ::
   ResolveAggregateFunctions ::
   TimeWindowing ::
+  ResolveInlineTables ::
   TypeCoercion.typeCoercionRules ++
   extendedResolutionRules : _*),
 Batch("Nondeterministic", Once,

http://git-wip-us.apache.org/repos/asf/spark/blob/c180d637/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
new file mode 100644
index 000..7323197
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.util.control.NonFatal
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Cast
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * An analyzer rule that replaces [[UnresolvedInlineTable]] with 
[[LocalRelation]].
+ */
+object ResolveInlineTables extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case table: UnresolvedInlineTable if table.expressionsResolved =>
+

spark git commit: [SPARK-17069] Expose spark.range() as table-valued function in SQL

2016-08-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 176af17a7 -> ea684b69c


[SPARK-17069] Expose spark.range() as table-valued function in SQL

This adds analyzer rules for resolving table-valued functions, and adds one 
builtin implementation for range(). The arguments for range() are the same as 
those of `spark.range()`.

Unit tests.

cc hvanhovell

Author: Eric Liang 

Closes #14656 from ericl/sc-4309.

(cherry picked from commit 412dba63b511474a6db3c43c8618d803e604bc6b)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea684b69
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea684b69
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea684b69

Branch: refs/heads/branch-2.0
Commit: ea684b69cd6934bc093f4a5a8b0d8470e92157cd
Parents: 176af17
Author: Eric Liang 
Authored: Thu Aug 18 13:33:55 2016 +0200
Committer: Reynold Xin 
Committed: Thu Aug 18 18:36:50 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 |   1 +
 .../spark/sql/catalyst/analysis/Analyzer.scala  |   1 +
 .../analysis/ResolveTableValuedFunctions.scala  | 132 +++
 .../sql/catalyst/analysis/unresolved.scala  |  11 ++
 .../spark/sql/catalyst/parser/AstBuilder.scala  |   8 ++
 .../sql/catalyst/parser/PlanParserSuite.scala   |   8 +-
 .../sql-tests/inputs/table-valued-functions.sql |  20 +++
 .../results/table-valued-functions.sql.out  |  87 
 8 files changed, 267 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index aca7282..51f3804 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -426,6 +426,7 @@ relationPrimary
 | '(' queryNoWith ')' sample? (AS? strictIdentifier)?   
#aliasedQuery
 | '(' relation ')' sample? (AS? strictIdentifier)?  
#aliasedRelation
 | inlineTable   
#inlineTableDefault2
+| identifier '(' (expression (',' expression)*)? ')'
#tableValuedFunction
 ;
 
 inlineTable

http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 57c3d9a..e0b8166 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -86,6 +86,7 @@ class Analyzer(
   WindowsSubstitution,
   EliminateUnions),
 Batch("Resolution", fixedPoint,
+  ResolveTableValuedFunctions ::
   ResolveRelations ::
   ResolveReferences ::
   ResolveDeserializer ::

http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
new file mode 100644
index 000..7fdf7fa
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language

spark git commit: [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs

2016-08-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3e0163bee -> 68a24d3e7


[MINOR][DOC] Fix the descriptions for `properties` argument in the documenation 
for jdbc APIs

## What changes were proposed in this pull request?

This should be credited to mvervuurt. The main purpose of this PR is
 - simply to include the change for the same instance in `DataFrameReader` just 
to match up.
 - just avoid duplicately verifying the PR (as I already did).

The documentation for both should be the same because both assume the 
`properties` should be  the same `dict` for the same option.

## How was this patch tested?

Manually building Python documentation.

This will produce the output as below:

- `DataFrameReader`

![2016-08-17 11 12 
00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png)

- `DataFrameWriter`

![2016-08-17 11 12 
10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png)

Closes #14624

Author: hyukjinkwon 
Author: mvervuurt 

Closes #14677 from HyukjinKwon/typo-python.

(cherry picked from commit 0f6aa8afaacdf0ceca9c2c1650ca26a5c167ae69)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68a24d3e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68a24d3e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68a24d3e

Branch: refs/heads/branch-2.0
Commit: 68a24d3e7aa9b40d4557652d3179b0ccb0f8624e
Parents: 3e0163b
Author: mvervuurt 
Authored: Tue Aug 16 23:12:59 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 23:13:06 2016 -0700

--
 python/pyspark/sql/readwriter.py | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/68a24d3e/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 4020bb3..64de33e 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -401,8 +401,9 @@ class DataFrameReader(OptionUtils):
 :param numPartitions: the number of partitions
 :param predicates: a list of expressions suitable for inclusion in 
WHERE clauses;
each one defines one partition of the 
:class:`DataFrame`
-:param properties: a dictionary of JDBC database connection arguments; 
normally,
-   at least a "user" and "password" property should be 
included
+:param properties: a dictionary of JDBC database connection arguments. 
Normally at
+   least properties "user" and "password" with their 
corresponding values.
+   For example { 'user' : 'SYSTEM', 'password' : 
'mypassword' }
 :return: a DataFrame
 """
 if properties is None:
@@ -716,9 +717,9 @@ class DataFrameWriter(OptionUtils):
 * ``overwrite``: Overwrite existing data.
 * ``ignore``: Silently ignore this operation if data already 
exists.
 * ``error`` (default case): Throw an exception if data already 
exists.
-:param properties: JDBC database connection arguments, a list of
-   arbitrary string tag/value. Normally at least a
-   "user" and "password" property should be included.
+:param properties: a dictionary of JDBC database connection arguments. 
Normally at
+   least properties "user" and "password" with their 
corresponding values.
+   For example { 'user' : 'SYSTEM', 'password' : 
'mypassword' }
 """
 if properties is None:
 properties = dict()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs

2016-08-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master f7c9ff57c -> 0f6aa8afa


[MINOR][DOC] Fix the descriptions for `properties` argument in the documenation 
for jdbc APIs

## What changes were proposed in this pull request?

This should be credited to mvervuurt. The main purpose of this PR is
 - simply to include the change for the same instance in `DataFrameReader` just 
to match up.
 - just avoid duplicately verifying the PR (as I already did).

The documentation for both should be the same because both assume the 
`properties` should be  the same `dict` for the same option.

## How was this patch tested?

Manually building Python documentation.

This will produce the output as below:

- `DataFrameReader`

![2016-08-17 11 12 
00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png)

- `DataFrameWriter`

![2016-08-17 11 12 
10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png)

Closes #14624

Author: hyukjinkwon 
Author: mvervuurt 

Closes #14677 from HyukjinKwon/typo-python.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f6aa8af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f6aa8af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f6aa8af

Branch: refs/heads/master
Commit: 0f6aa8afaacdf0ceca9c2c1650ca26a5c167ae69
Parents: f7c9ff5
Author: mvervuurt 
Authored: Tue Aug 16 23:12:59 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 23:12:59 2016 -0700

--
 python/pyspark/sql/readwriter.py | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0f6aa8af/python/pyspark/sql/readwriter.py
--
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 4020bb3..64de33e 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -401,8 +401,9 @@ class DataFrameReader(OptionUtils):
 :param numPartitions: the number of partitions
 :param predicates: a list of expressions suitable for inclusion in 
WHERE clauses;
each one defines one partition of the 
:class:`DataFrame`
-:param properties: a dictionary of JDBC database connection arguments; 
normally,
-   at least a "user" and "password" property should be 
included
+:param properties: a dictionary of JDBC database connection arguments. 
Normally at
+   least properties "user" and "password" with their 
corresponding values.
+   For example { 'user' : 'SYSTEM', 'password' : 
'mypassword' }
 :return: a DataFrame
 """
 if properties is None:
@@ -716,9 +717,9 @@ class DataFrameWriter(OptionUtils):
 * ``overwrite``: Overwrite existing data.
 * ``ignore``: Silently ignore this operation if data already 
exists.
 * ``error`` (default case): Throw an exception if data already 
exists.
-:param properties: JDBC database connection arguments, a list of
-   arbitrary string tag/value. Normally at least a
-   "user" and "password" property should be included.
+:param properties: a dictionary of JDBC database connection arguments. 
Normally at
+   least properties "user" and "password" with their 
corresponding values.
+   For example { 'user' : 'SYSTEM', 'password' : 
'mypassword' }
 """
 if properties is None:
 properties = dict()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17068][SQL] Make view-usage visible during analysis

2016-08-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 4a2c375be -> f7c9ff57c


[SPARK-17068][SQL] Make view-usage visible during analysis

## What changes were proposed in this pull request?
This PR adds a field to subquery alias in order to make the usage of views in a 
resolved `LogicalPlan` more visible (and more understandable).

For example, the following view and query:
```sql
create view constants as select 1 as id union all select 1 union all select 42
select * from constants;
```
...now yields the following analyzed plan:
```
Project [id#39]
+- SubqueryAlias c, `default`.`constants`
   +- Project [gen_attr_0#36 AS id#39]
  +- SubqueryAlias gen_subquery_0
 +- Union
:- Union
:  :- Project [1 AS gen_attr_0#36]
:  :  +- OneRowRelation$
:  +- Project [1 AS gen_attr_1#37]
: +- OneRowRelation$
+- Project [42 AS gen_attr_2#38]
   +- OneRowRelation$
```
## How was this patch tested?
Added tests for the two code paths in `SessionCatalogSuite` (sql/core) and 
`HiveMetastoreCatalogSuite` (sql/hive)

Author: Herman van Hovell 

Closes #14657 from hvanhovell/SPARK-17068.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f7c9ff57
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f7c9ff57
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f7c9ff57

Branch: refs/heads/master
Commit: f7c9ff57c17a950cccdc26aadf8768c899a4d572
Parents: 4a2c375
Author: Herman van Hovell 
Authored: Tue Aug 16 23:09:53 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 23:09:53 2016 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  4 +--
 .../sql/catalyst/analysis/CheckAnalysis.scala   |  4 +--
 .../sql/catalyst/catalog/SessionCatalog.scala   | 30 +++-
 .../apache/spark/sql/catalyst/dsl/package.scala |  4 +--
 .../sql/catalyst/expressions/subquery.scala |  8 +++---
 .../sql/catalyst/optimizer/Optimizer.scala  |  8 +++---
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  4 +--
 .../plans/logical/basicLogicalOperators.scala   |  7 -
 .../sql/catalyst/analysis/AnalysisSuite.scala   |  4 +--
 .../catalyst/catalog/SessionCatalogSuite.scala  | 19 +
 .../catalyst/optimizer/ColumnPruningSuite.scala |  8 +++---
 .../EliminateSubqueryAliasesSuite.scala |  6 ++--
 .../optimizer/JoinOptimizationSuite.scala   |  8 +++---
 .../sql/catalyst/parser/PlanParserSuite.scala   |  2 +-
 .../scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../apache/spark/sql/catalyst/SQLBuilder.scala  |  6 ++--
 .../spark/sql/execution/datasources/rules.scala |  2 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   | 21 ++
 .../spark/sql/hive/HiveSessionCatalog.scala |  4 +--
 .../sql/hive/HiveMetastoreCatalogSuite.scala| 14 -
 20 files changed, 94 insertions(+), 71 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f7c9ff57/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index a2a022c..bd4c191 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -138,7 +138,7 @@ class Analyzer(
 case u : UnresolvedRelation =>
   val substituted = cteRelations.find(x => resolver(x._1, 
u.tableIdentifier.table))
 .map(_._2).map { relation =>
-  val withAlias = u.alias.map(SubqueryAlias(_, relation))
+  val withAlias = u.alias.map(SubqueryAlias(_, relation, None))
   withAlias.getOrElse(relation)
 }
   substituted.getOrElse(u)
@@ -2057,7 +2057,7 @@ class Analyzer(
  */
 object EliminateSubqueryAliases extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case SubqueryAlias(_, child) => child
+case SubqueryAlias(_, child, _) => child
   }
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f7c9ff57/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 41b7e62..e07e919 100644
---

spark git commit: [SPARK-17084][SQL] Rename ParserUtils.assert to validate

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6cb3eab7c -> 3e0163bee


[SPARK-17084][SQL] Rename ParserUtils.assert to validate

## What changes were proposed in this pull request?
This PR renames `ParserUtils.assert` to `ParserUtils.validate`. This is done 
because this method is used to check requirements, and not to check if the 
program is in an invalid state.

## How was this patch tested?
Simple rename. Compilation should do.

Author: Herman van Hovell 

Closes #14665 from hvanhovell/SPARK-17084.

(cherry picked from commit 4a2c375be2bcd98cc7e00bea920fd6a0f68a4e14)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3e0163be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3e0163be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3e0163be

Branch: refs/heads/branch-2.0
Commit: 3e0163bee2354258899c82ce4cc4aacafd2a802d
Parents: 6cb3eab
Author: Herman van Hovell 
Authored: Tue Aug 16 21:35:39 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 21:35:46 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 14 +++---
 .../spark/sql/catalyst/parser/ParserUtils.scala   |  4 ++--
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  5 ++---
 3 files changed, 11 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3e0163be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 1a0e7ab..aee8eb1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -132,7 +132,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // Build the insert clauses.
 val inserts = ctx.multiInsertQueryBody.asScala.map {
   body =>
-assert(body.querySpecification.fromClause == null,
+validate(body.querySpecification.fromClause == null,
   "Multi-Insert queries cannot have a FROM clause in their individual 
SELECT statements",
   body)
 
@@ -591,7 +591,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   // function takes X PERCENT as the input and the range of X is [0, 100], 
we need to
   // adjust the fraction.
   val eps = RandomSampler.roundingEpsilon
-  assert(fraction >= 0.0 - eps && fraction <= 1.0 + eps,
+  validate(fraction >= 0.0 - eps && fraction <= 1.0 + eps,
 s"Sampling fraction ($fraction) must be on interval [0, 1]",
 ctx)
   Sample(0.0, fraction, withReplacement = false, (math.random * 
1000).toInt, query)(true)
@@ -659,7 +659,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // Get the backing expressions.
 val expressions = ctx.expression.asScala.map { eCtx =>
   val e = expression(eCtx)
-  assert(e.foldable, "All expressions in an inline table must be 
constants.", eCtx)
+  validate(e.foldable, "All expressions in an inline table must be 
constants.", eCtx)
   e
 }
 
@@ -681,7 +681,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 val baseAttributes = structType.toAttributes.map(_.withNullability(true))
 val attributes = if (ctx.identifierList != null) {
   val aliases = visitIdentifierList(ctx.identifierList)
-  assert(aliases.size == baseAttributes.size,
+  validate(aliases.size == baseAttributes.size,
 "Number of aliases must match the number of fields in an inline 
table.", ctx)
   baseAttributes.zip(aliases).map(p => p._1.withName(p._2))
 } else {
@@ -1089,7 +1089,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // We currently only allow foldable integers.
 def value: Int = {
   val e = expression(ctx.expression)
-  assert(e.resolved && e.foldable && e.dataType == IntegerType,
+  validate(e.resolved && e.foldable && e.dataType == IntegerType,
 "Frame bound value must be a constant integer.",
 ctx)
   e.eval().asInstanceOf[Int]
@@ -1342,7 +1342,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
*/
   override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
 val intervals = ctx.intervalField.asScala.map(visitIntervalField)
-assert(intervals.nonEmpty, "at least one time unit should be given for 
interval literal", ctx)
+

spark git commit: [SPARK-17084][SQL] Rename ParserUtils.assert to validate

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e28a8c589 -> 4a2c375be


[SPARK-17084][SQL] Rename ParserUtils.assert to validate

## What changes were proposed in this pull request?
This PR renames `ParserUtils.assert` to `ParserUtils.validate`. This is done 
because this method is used to check requirements, and not to check if the 
program is in an invalid state.

## How was this patch tested?
Simple rename. Compilation should do.

Author: Herman van Hovell 

Closes #14665 from hvanhovell/SPARK-17084.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4a2c375b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4a2c375b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4a2c375b

Branch: refs/heads/master
Commit: 4a2c375be2bcd98cc7e00bea920fd6a0f68a4e14
Parents: e28a8c5
Author: Herman van Hovell 
Authored: Tue Aug 16 21:35:39 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 21:35:39 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 14 +++---
 .../spark/sql/catalyst/parser/ParserUtils.scala   |  4 ++--
 .../apache/spark/sql/execution/SparkSqlParser.scala   |  5 ++---
 3 files changed, 11 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4a2c375b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 25c8445..09b650c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -132,7 +132,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // Build the insert clauses.
 val inserts = ctx.multiInsertQueryBody.asScala.map {
   body =>
-assert(body.querySpecification.fromClause == null,
+validate(body.querySpecification.fromClause == null,
   "Multi-Insert queries cannot have a FROM clause in their individual 
SELECT statements",
   body)
 
@@ -596,7 +596,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   // function takes X PERCENT as the input and the range of X is [0, 100], 
we need to
   // adjust the fraction.
   val eps = RandomSampler.roundingEpsilon
-  assert(fraction >= 0.0 - eps && fraction <= 1.0 + eps,
+  validate(fraction >= 0.0 - eps && fraction <= 1.0 + eps,
 s"Sampling fraction ($fraction) must be on interval [0, 1]",
 ctx)
   Sample(0.0, fraction, withReplacement = false, (math.random * 
1000).toInt, query)(true)
@@ -664,7 +664,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // Get the backing expressions.
 val expressions = ctx.expression.asScala.map { eCtx =>
   val e = expression(eCtx)
-  assert(e.foldable, "All expressions in an inline table must be 
constants.", eCtx)
+  validate(e.foldable, "All expressions in an inline table must be 
constants.", eCtx)
   e
 }
 
@@ -686,7 +686,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 val baseAttributes = structType.toAttributes.map(_.withNullability(true))
 val attributes = if (ctx.identifierList != null) {
   val aliases = visitIdentifierList(ctx.identifierList)
-  assert(aliases.size == baseAttributes.size,
+  validate(aliases.size == baseAttributes.size,
 "Number of aliases must match the number of fields in an inline 
table.", ctx)
   baseAttributes.zip(aliases).map(p => p._1.withName(p._2))
 } else {
@@ -1094,7 +1094,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
 // We currently only allow foldable integers.
 def value: Int = {
   val e = expression(ctx.expression)
-  assert(e.resolved && e.foldable && e.dataType == IntegerType,
+  validate(e.resolved && e.foldable && e.dataType == IntegerType,
 "Frame bound value must be a constant integer.",
 ctx)
   e.eval().asInstanceOf[Int]
@@ -1347,7 +1347,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
*/
   override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
 val intervals = ctx.intervalField.asScala.map(visitIntervalField)
-assert(intervals.nonEmpty, "at least one time unit should be given for 
interval literal", ctx)
+validate(intervals.nonEmpty, "at least one time unit should be given for 
interval literal", ctx)

spark git commit: [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 022230c20 -> 6cb3eab7c


[SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator

## What changes were proposed in this pull request?

Remove the api doc link for mapReduceTriplets operator because in latest api 
they are remove so when user link to that api they will not get 
mapReduceTriplets there so its more good to remove than confuse the user.

## How was this patch tested?
Run all the test cases

![screenshot from 2016-08-16 
23-08-25](https://cloud.githubusercontent.com/assets/8075390/17709393/8cfbf75a-6406-11e6-98e6-38f7b319d833.png)

Author: sandy 

Closes #14669 from phalodi/SPARK-17089.

(cherry picked from commit e28a8c5899c48ff065e2fd3bb6b10c82b4d39c2c)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6cb3eab7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6cb3eab7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6cb3eab7

Branch: refs/heads/branch-2.0
Commit: 6cb3eab7cc49ad8b8459ddc479a900de9dea1bcf
Parents: 022230c
Author: sandy 
Authored: Tue Aug 16 12:50:55 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 12:51:02 2016 -0700

--
 docs/graphx-programming-guide.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6cb3eab7/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index bf4b968..07b38d9 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -24,7 +24,6 @@ description: GraphX graph processing library guide for Spark 
SPARK_VERSION_SHORT
 [Graph.outerJoinVertices]: 
api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])âVD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED]
 [Graph.aggregateMessages]: 
api/scala/index.html#org.apache.spark.graphx.Graph@aggregateMessages[A]((EdgeContext[VD,ED,A])âUnit,(A,A)âA,TripletFields)(ClassTag[A]):VertexRDD[A]
 [EdgeContext]: api/scala/index.html#org.apache.spark.graphx.EdgeContext
-[Graph.mapReduceTriplets]: 
api/scala/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A]
 [GraphOps.collectNeighborIds]: 
api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]]
 [GraphOps.collectNeighbors]: 
api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]
 [RDD Persistence]: programming-guide.html#rdd-persistence
@@ -596,7 +595,7 @@ compute the average age of the more senior followers of 
each user.
 ### Map Reduce Triplets Transition Guide (Legacy)
 
 In earlier versions of GraphX neighborhood aggregation was accomplished using 
the
-[`mapReduceTriplets`][Graph.mapReduceTriplets] operator:
+`mapReduceTriplets` operator:
 
 {% highlight scala %}
 class Graph[VD, ED] {
@@ -607,7 +606,7 @@ class Graph[VD, ED] {
 }
 {% endhighlight %}
 
-The [`mapReduceTriplets`][Graph.mapReduceTriplets] operator takes a user 
defined map function which
+The `mapReduceTriplets` operator takes a user defined map function which
 is applied to each triplet and can yield *messages* which are aggregated using 
the user defined
 `reduce` function.
 However, we found the user of the returned iterator to be expensive and it 
inhibited our ability to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c34b546d6 -> e28a8c589


[SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator

## What changes were proposed in this pull request?

Remove the api doc link for mapReduceTriplets operator because in latest api 
they are remove so when user link to that api they will not get 
mapReduceTriplets there so its more good to remove than confuse the user.

## How was this patch tested?
Run all the test cases

![screenshot from 2016-08-16 
23-08-25](https://cloud.githubusercontent.com/assets/8075390/17709393/8cfbf75a-6406-11e6-98e6-38f7b319d833.png)

Author: sandy 

Closes #14669 from phalodi/SPARK-17089.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e28a8c58
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e28a8c58
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e28a8c58

Branch: refs/heads/master
Commit: e28a8c5899c48ff065e2fd3bb6b10c82b4d39c2c
Parents: c34b546
Author: sandy 
Authored: Tue Aug 16 12:50:55 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 12:50:55 2016 -0700

--
 docs/graphx-programming-guide.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e28a8c58/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 6f738f0..58671e6 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -24,7 +24,6 @@ description: GraphX graph processing library guide for Spark 
SPARK_VERSION_SHORT
 [Graph.outerJoinVertices]: 
api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])âVD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED]
 [Graph.aggregateMessages]: 
api/scala/index.html#org.apache.spark.graphx.Graph@aggregateMessages[A]((EdgeContext[VD,ED,A])âUnit,(A,A)âA,TripletFields)(ClassTag[A]):VertexRDD[A]
 [EdgeContext]: api/scala/index.html#org.apache.spark.graphx.EdgeContext
-[Graph.mapReduceTriplets]: 
api/scala/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A]
 [GraphOps.collectNeighborIds]: 
api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]]
 [GraphOps.collectNeighbors]: 
api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]
 [RDD Persistence]: programming-guide.html#rdd-persistence
@@ -596,7 +595,7 @@ compute the average age of the more senior followers of 
each user.
 ### Map Reduce Triplets Transition Guide (Legacy)
 
 In earlier versions of GraphX neighborhood aggregation was accomplished using 
the
-[`mapReduceTriplets`][Graph.mapReduceTriplets] operator:
+`mapReduceTriplets` operator:
 
 {% highlight scala %}
 class Graph[VD, ED] {
@@ -607,7 +606,7 @@ class Graph[VD, ED] {
 }
 {% endhighlight %}
 
-The [`mapReduceTriplets`][Graph.mapReduceTriplets] operator takes a user 
defined map function which
+The `mapReduceTriplets` operator takes a user defined map function which
 is applied to each triplet and can yield *messages* which are aggregated using 
the user defined
 `reduce` function.
 However, we found the user of the returned iterator to be expensive and it 
inhibited our ability to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]

2016-08-16 Thread rxin

[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution 
package [Backport]

## What changes were proposed in this pull request?
This PR backports https://github.com/apache/spark/pull/14554 to branch-2.0.

I have also changed the visibility of a few similar Hive classes.

## How was this patch tested?
(Only a package visibility change)

Author: Herman van Hovell 
Author: Reynold Xin 

Closes #14652 from hvanhovell/SPARK-16964.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c569711
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c569711
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c569711

Branch: refs/heads/branch-2.0
Commit: 1c56971167a0ebb3c422ccc7cc3d6904015fe2ec
Parents: 237ae54
Author: Herman van Hovell 
Authored: Tue Aug 16 01:15:31 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 01:15:31 2016 -0700

--
 .../spark/sql/execution/CacheManager.scala  | 22 +-
 .../spark/sql/execution/ExistingRDD.scala   | 18 +++
 .../apache/spark/sql/execution/ExpandExec.scala |  2 +-
 .../spark/sql/execution/FileRelation.scala  |  2 +-
 .../spark/sql/execution/GenerateExec.scala  |  2 +-
 .../sql/execution/LocalTableScanExec.scala  |  4 ++--
 .../spark/sql/execution/RowIterator.scala   |  2 +-
 .../spark/sql/execution/SQLExecution.scala  |  2 +-
 .../apache/spark/sql/execution/SortExec.scala   |  6 ++---
 .../apache/spark/sql/execution/SparkPlan.scala  | 14 ++--
 .../spark/sql/execution/SparkPlanInfo.scala |  2 +-
 .../spark/sql/execution/SparkStrategies.scala   |  6 ++---
 .../sql/execution/UnsafeRowSerializer.scala |  4 ++--
 .../sql/execution/WholeStageCodegenExec.scala   |  2 +-
 .../execution/aggregate/HashAggregateExec.scala |  2 +-
 .../execution/aggregate/SortAggregateExec.scala |  2 +-
 .../spark/sql/execution/aggregate/udaf.scala|  6 ++---
 .../sql/execution/basicPhysicalOperators.scala  |  6 ++---
 .../execution/columnar/InMemoryRelation.scala   |  8 +++
 .../columnar/InMemoryTableScanExec.scala|  4 ++--
 .../spark/sql/execution/command/commands.scala  |  4 ++--
 .../datasources/DataSourceStrategy.scala|  8 +++
 .../datasources/FileSourceStrategy.scala|  2 +-
 .../InsertIntoDataSourceCommand.scala   |  2 +-
 .../InsertIntoHadoopFsRelationCommand.scala |  2 +-
 .../datasources/PartitioningUtils.scala | 24 +++-
 .../execution/datasources/WriterContainer.scala |  8 +++
 .../sql/execution/datasources/bucket.scala  |  2 +-
 .../execution/datasources/csv/CSVOptions.scala  |  2 +-
 .../execution/datasources/csv/CSVParser.scala   |  4 ++--
 .../execution/datasources/csv/CSVRelation.scala |  4 ++--
 .../datasources/fileSourceInterfaces.scala  |  6 ++---
 .../execution/datasources/jdbc/JDBCRDD.scala|  8 +++
 .../datasources/parquet/ParquetFileFormat.scala | 17 +++---
 .../datasources/parquet/ParquetFilters.scala|  2 +-
 .../datasources/parquet/ParquetOptions.scala|  6 ++---
 .../spark/sql/execution/datasources/rules.scala |  6 ++---
 .../spark/sql/execution/debug/package.scala |  2 +-
 .../exchange/BroadcastExchangeExec.scala|  2 +-
 .../exchange/ExchangeCoordinator.scala  |  4 ++--
 .../execution/exchange/ShuffleExchange.scala|  9 
 .../execution/joins/BroadcastHashJoinExec.scala |  2 +-
 .../joins/BroadcastNestedLoopJoinExec.scala |  2 +-
 .../execution/joins/CartesianProductExec.scala  |  5 ++--
 .../execution/joins/ShuffledHashJoinExec.scala  |  2 +-
 .../sql/execution/joins/SortMergeJoinExec.scala |  2 +-
 .../spark/sql/execution/metric/SQLMetrics.scala | 10 
 .../execution/python/ExtractPythonUDFs.scala|  4 ++--
 .../sql/execution/r/MapPartitionsRWrapper.scala |  4 ++--
 .../sql/execution/stat/FrequentItems.scala  |  4 ++--
 .../sql/execution/stat/StatFunctions.scala  |  8 +++
 .../streaming/IncrementalExecution.scala|  2 +-
 .../execution/streaming/StreamExecution.scala   | 19 
 .../execution/streaming/StreamProgress.scala|  2 +-
 .../execution/streaming/state/StateStore.scala  |  2 +-
 .../streaming/state/StateStoreCoordinator.scala |  4 ++--
 .../spark/sql/execution/ui/ExecutionPage.scala  |  2 +-
 .../spark/sql/execution/ui/SQLListener.scala|  6 ++---
 .../apache/spark/sql/execution/ui/SQLTab.scala  |  4 ++--
 .../spark/sql/execution/ui/SparkPlanGraph.scala |  6 ++---
 .../apache/spark/sql/internal/SharedState.scala |  2 --
 .../CreateHiveTableAsSelectCommand.scala|  1 -
 .../sql/hive/execution/HiveTableScanExec.scala  |  2 +-
 .../hive/execution/ScriptTransformation.scala   |  3 ---
 .../spark/sql/hive/orc/OrcFileFormat.scala  |  2 +-
 65 files

[1/2] spark git commit: [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 237ae54c9 -> 1c5697116


http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
index af2229a..66fb5a4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
@@ -49,10 +49,10 @@ class StreamExecution(
 override val id: Long,
 override val name: String,
 checkpointRoot: String,
-private[sql] val logicalPlan: LogicalPlan,
+val logicalPlan: LogicalPlan,
 val sink: Sink,
 val trigger: Trigger,
-private[sql] val triggerClock: Clock,
+val triggerClock: Clock,
 val outputMode: OutputMode)
   extends StreamingQuery with Logging {
 
@@ -74,7 +74,7 @@ class StreamExecution(
* input source.
*/
   @volatile
-  private[sql] var committedOffsets = new StreamProgress
+  var committedOffsets = new StreamProgress
 
   /**
* Tracks the offsets that are available to be processed, but have not yet 
be committed to the
@@ -102,10 +102,10 @@ class StreamExecution(
   private var state: State = INITIALIZED
 
   @volatile
-  private[sql] var lastExecution: QueryExecution = null
+  var lastExecution: QueryExecution = null
 
   @volatile
-  private[sql] var streamDeathCause: StreamingQueryException = null
+  var streamDeathCause: StreamingQueryException = null
 
   /* Get the call site in the caller thread; will pass this into the micro 
batch thread */
   private val callSite = Utils.getCallSite()
@@ -115,7 +115,7 @@ class StreamExecution(
* [[org.apache.spark.util.UninterruptibleThread]] to avoid potential 
deadlocks in using
* [[HDFSMetadataLog]]. See SPARK-14131 for more details.
*/
-  private[sql] val microBatchThread =
+  val microBatchThread =
 new UninterruptibleThread(s"stream execution thread for $name") {
   override def run(): Unit = {
 // To fix call site like "run at :0", we bridge the call site 
from the caller
@@ -131,8 +131,7 @@ class StreamExecution(
* processing is done.  Thus, the Nth record in this log indicated data that 
is currently being
* processed and the N-1th entry indicates which offsets have been durably 
committed to the sink.
*/
-  private[sql] val offsetLog =
-new HDFSMetadataLog[CompositeOffset](sparkSession, 
checkpointFile("offsets"))
+  val offsetLog = new HDFSMetadataLog[CompositeOffset](sparkSession, 
checkpointFile("offsets"))
 
   /** Whether the query is currently active or not */
   override def isActive: Boolean = state == ACTIVE
@@ -159,7 +158,7 @@ class StreamExecution(
* Starts the execution. This returns only after the thread has started and 
[[QueryStarted]] event
* has been posted to all the listeners.
*/
-  private[sql] def start(): Unit = {
+  def start(): Unit = {
 microBatchThread.setDaemon(true)
 microBatchThread.start()
 startLatch.await()  // Wait until thread started and QueryStart event has 
been posted
@@ -518,7 +517,7 @@ class StreamExecution(
   case object TERMINATED extends State
 }
 
-private[sql] object StreamExecution {
+object StreamExecution {
   private val _nextId = new AtomicLong(0)
 
   def nextId: Long = _nextId.getAndIncrement()

http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
index 405a5f0..db0bd9e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
@@ -26,7 +26,7 @@ class StreamProgress(
 val baseMap: immutable.Map[Source, Offset] = new immutable.HashMap[Source, 
Offset])
   extends scala.collection.immutable.Map[Source, Offset] {
 
-  private[sql] def toCompositeOffset(source: Seq[Source]): CompositeOffset = {
+  def toCompositeOffset(source: Seq[Source]): CompositeOffset = {
 CompositeOffset(source.map(get))
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

spark git commit: Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package"

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2e2c787bf -> 237ae54c9


Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package"

This reverts commit 2e2c787bf588e129eaaadc792737fd9d2892939c.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/237ae54c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/237ae54c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/237ae54c

Branch: refs/heads/branch-2.0
Commit: 237ae54c960d52b35b4bc673609aed9998c2bd45
Parents: 2e2c787
Author: Reynold Xin 
Authored: Tue Aug 16 01:14:53 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 01:14:53 2016 -0700

--
 .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 +
 .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 +++
 .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala  | 3 ++-
 3 files changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
index 3a8b0f1..15a5d79 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
@@ -34,6 +34,7 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * @param ignoreIfExists allow continue working if it's already exists, 
otherwise
  *  raise exception
  */
+private[hive]
 case class CreateHiveTableAsSelectCommand(
 tableDesc: CatalogTable,
 query: LogicalPlan,

http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
index 9747abb..dfb1251 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
@@ -51,6 +51,7 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfig
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
  */
+private[hive]
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
@@ -335,6 +336,7 @@ private class ScriptTransformationWriterThread(
   }
 }
 
+private[hive]
 object HiveScriptIOSchema {
   def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = {
 HiveScriptIOSchema(
@@ -353,6 +355,7 @@ object HiveScriptIOSchema {
 /**
  * The wrapper class of Hive input and output schema properties
  */
+private[hive]
 case class HiveScriptIOSchema (
 inputRowFormat: Seq[(String, String)],
 outputRowFormat: Seq[(String, String)],

http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
index 894c71c..a2c8092 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
@@ -47,7 +47,8 @@ import org.apache.spark.util.SerializableConfiguration
  * [[FileFormat]] for reading ORC files. If this is moved or renamed, please 
update
  * [[DataSource]]'s backwardCompatibilityMap.
  */
-class OrcFileFormat extends FileFormat with DataSourceRegister with 
Serializable {
+private[sql] class OrcFileFormat
+  extends FileFormat with DataSourceRegister with Serializable {
 
   override def shortName(): String = "orc"
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 45036327f -> 2e2c787bf


[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package

## What changes were proposed in this pull request?
This PR is a small follow-up to https://github.com/apache/spark/pull/14554. 
This also widens the visibility of a few (similar) Hive classes.

## How was this patch tested?
No test. Only a visibility change.

Author: Herman van Hovell 

Closes #14654 from hvanhovell/SPARK-16964-hive.

(cherry picked from commit 8fdc6ce400f9130399fbdd004df48b3ba95bcd6a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e2c787b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e2c787b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e2c787b

Branch: refs/heads/branch-2.0
Commit: 2e2c787bf588e129eaaadc792737fd9d2892939c
Parents: 4503632
Author: Herman van Hovell 
Authored: Tue Aug 16 01:12:27 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 01:12:33 2016 -0700

--
 .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 -
 .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 ---
 .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala  | 3 +--
 3 files changed, 1 insertion(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
index 15a5d79..3a8b0f1 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
@@ -34,7 +34,6 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * @param ignoreIfExists allow continue working if it's already exists, 
otherwise
  *  raise exception
  */
-private[hive]
 case class CreateHiveTableAsSelectCommand(
 tableDesc: CatalogTable,
 query: LogicalPlan,

http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
index dfb1251..9747abb 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
@@ -51,7 +51,6 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfig
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
  */
-private[hive]
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
@@ -336,7 +335,6 @@ private class ScriptTransformationWriterThread(
   }
 }
 
-private[hive]
 object HiveScriptIOSchema {
   def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = {
 HiveScriptIOSchema(
@@ -355,7 +353,6 @@ object HiveScriptIOSchema {
 /**
  * The wrapper class of Hive input and output schema properties
  */
-private[hive]
 case class HiveScriptIOSchema (
 inputRowFormat: Seq[(String, String)],
 outputRowFormat: Seq[(String, String)],

http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
index a2c8092..894c71c 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
@@ -47,8 +47,7 @@ import org.apache.spark.util.SerializableConfiguration
  * [[FileFormat]] for reading ORC files. If this is moved or renamed, please 
update
  * [[DataSource]]'s backwardCompatibilityMap.
  */
-private[sql] class OrcFileFormat
-  extends FileFormat with DataSourceRegister with Serializable {
+class OrcFileFormat extends FileFormat with DataSourceRegister with 
Serializable {
 
   override def

spark git commit: [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package

2016-08-16 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 7b65030e7 -> 8fdc6ce40


[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package

## What changes were proposed in this pull request?
This PR is a small follow-up to https://github.com/apache/spark/pull/14554. 
This also widens the visibility of a few (similar) Hive classes.

## How was this patch tested?
No test. Only a visibility change.

Author: Herman van Hovell 

Closes #14654 from hvanhovell/SPARK-16964-hive.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8fdc6ce4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8fdc6ce4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8fdc6ce4

Branch: refs/heads/master
Commit: 8fdc6ce400f9130399fbdd004df48b3ba95bcd6a
Parents: 7b65030
Author: Herman van Hovell 
Authored: Tue Aug 16 01:12:27 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 16 01:12:27 2016 -0700

--
 .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 -
 .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 ---
 .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala  | 3 +--
 3 files changed, 1 insertion(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
index 678bf8d..6e6b1c2 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
@@ -34,7 +34,6 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * @param ignoreIfExists allow continue working if it's already exists, 
otherwise
  *  raise exception
  */
-private[hive]
 case class CreateHiveTableAsSelectCommand(
 tableDesc: CatalogTable,
 query: LogicalPlan,

http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
index d063dd6..c553c03 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
@@ -51,7 +51,6 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfig
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
  */
-private[hive]
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
@@ -338,7 +337,6 @@ private class ScriptTransformationWriterThread(
   }
 }
 
-private[hive]
 object HiveScriptIOSchema {
   def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = {
 HiveScriptIOSchema(
@@ -357,7 +355,6 @@ object HiveScriptIOSchema {
 /**
  * The wrapper class of Hive input and output schema properties
  */
-private[hive]
 case class HiveScriptIOSchema (
 inputRowFormat: Seq[(String, String)],
 outputRowFormat: Seq[(String, String)],

http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
index 1d3c466..c74d948 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
@@ -45,8 +45,7 @@ import org.apache.spark.util.SerializableConfiguration
  * [[FileFormat]] for reading ORC files. If this is moved or renamed, please 
update
  * [[DataSource]]'s backwardCompatibilityMap.
  */
-private[sql] class OrcFileFormat
-  extends FileFormat with DataSourceRegister with Serializable {
+class OrcFileFormat extends FileFormat with DataSourceRegister with 
Serializable {
 
   override def shortName(): String = "orc"
 


-
To unsubscribe, e-mail:

spark git commit: [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists

2016-08-13 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a21ecc996 -> 750f88045


[SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" 
exists

## What changes were proposed in this pull request?

Don't override app name specified in `SparkConf` with a random app name. Only 
set it if the conf has no app name even after options have been applied.

See also https://github.com/apache/spark/pull/14602
This is similar to Sherry302 's original proposal in 
https://github.com/apache/spark/pull/14556

## How was this patch tested?

Jenkins test, with new case reproducing the bug

Author: Sean Owen 

Closes #14630 from srowen/SPARK-16966.2.

(cherry picked from commit cdaa562c9a09e2e83e6df4e84d911ce1428a7a7c)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/750f8804
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/750f8804
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/750f8804

Branch: refs/heads/branch-2.0
Commit: 750f8804540df5ad68a732f68598c4a2dbbc4761
Parents: a21ecc9
Author: Sean Owen 
Authored: Sat Aug 13 15:40:43 2016 -0700
Committer: Reynold Xin 
Committed: Sat Aug 13 15:40:59 2016 -0700

--
 .../main/scala/org/apache/spark/sql/SparkSession.scala   | 11 +++
 .../org/apache/spark/sql/SparkSessionBuilderSuite.scala  |  1 +
 2 files changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/750f8804/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 946d8cb..c88206c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -822,16 +822,19 @@ object SparkSession {
 // No active nor global default session. Create a new one.
 val sparkContext = userSuppliedContext.getOrElse {
   // set app name if not given
-  if (!options.contains("spark.app.name")) {
-options += "spark.app.name" -> java.util.UUID.randomUUID().toString
-  }
-
+  val randomAppName = java.util.UUID.randomUUID().toString
   val sparkConf = new SparkConf()
   options.foreach { case (k, v) => sparkConf.set(k, v) }
+  if (!sparkConf.contains("spark.app.name")) {
+sparkConf.setAppName(randomAppName)
+  }
   val sc = SparkContext.getOrCreate(sparkConf)
   // maybe this is an existing SparkContext, update its SparkConf 
which maybe used
   // by SparkSession
   options.foreach { case (k, v) => sc.conf.set(k, v) }
+  if (!sc.conf.contains("spark.app.name")) {
+sc.conf.setAppName(randomAppName)
+  }
   sc
 }
 session = new SparkSession(sparkContext)

http://git-wip-us.apache.org/repos/asf/spark/blob/750f8804/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
index 418345b..386d13d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
@@ -100,6 +100,7 @@ class SparkSessionBuilderSuite extends SparkFunSuite {
 assert(session.conf.get("key2") == "value2")
 assert(session.sparkContext.conf.get("key1") == "value1")
 assert(session.sparkContext.conf.get("key2") == "value2")
+assert(session.sparkContext.conf.get("spark.app.name") == "test")
 session.stop()
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists

2016-08-13 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 67f025d90 -> cdaa562c9


[SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" 
exists

## What changes were proposed in this pull request?

Don't override app name specified in `SparkConf` with a random app name. Only 
set it if the conf has no app name even after options have been applied.

See also https://github.com/apache/spark/pull/14602
This is similar to Sherry302 's original proposal in 
https://github.com/apache/spark/pull/14556

## How was this patch tested?

Jenkins test, with new case reproducing the bug

Author: Sean Owen 

Closes #14630 from srowen/SPARK-16966.2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdaa562c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdaa562c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdaa562c

Branch: refs/heads/master
Commit: cdaa562c9a09e2e83e6df4e84d911ce1428a7a7c
Parents: 67f025d
Author: Sean Owen 
Authored: Sat Aug 13 15:40:43 2016 -0700
Committer: Reynold Xin 
Committed: Sat Aug 13 15:40:43 2016 -0700

--
 .../main/scala/org/apache/spark/sql/SparkSession.scala   | 11 +++
 .../org/apache/spark/sql/SparkSessionBuilderSuite.scala  |  1 +
 2 files changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cdaa562c/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 2ade36d..362bf45 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -816,16 +816,19 @@ object SparkSession {
 // No active nor global default session. Create a new one.
 val sparkContext = userSuppliedContext.getOrElse {
   // set app name if not given
-  if (!options.contains("spark.app.name")) {
-options += "spark.app.name" -> java.util.UUID.randomUUID().toString
-  }
-
+  val randomAppName = java.util.UUID.randomUUID().toString
   val sparkConf = new SparkConf()
   options.foreach { case (k, v) => sparkConf.set(k, v) }
+  if (!sparkConf.contains("spark.app.name")) {
+sparkConf.setAppName(randomAppName)
+  }
   val sc = SparkContext.getOrCreate(sparkConf)
   // maybe this is an existing SparkContext, update its SparkConf 
which maybe used
   // by SparkSession
   options.foreach { case (k, v) => sc.conf.set(k, v) }
+  if (!sc.conf.contains("spark.app.name")) {
+sc.conf.setAppName(randomAppName)
+  }
   sc
 }
 session = new SparkSession(sparkContext)

http://git-wip-us.apache.org/repos/asf/spark/blob/cdaa562c/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
index 418345b..386d13d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala
@@ -100,6 +100,7 @@ class SparkSessionBuilderSuite extends SparkFunSuite {
 assert(session.conf.get("key2") == "value2")
 assert(session.sparkContext.conf.get("key1") == "value1")
 assert(session.sparkContext.conf.get("key2") == "value2")
+assert(session.sparkContext.conf.get("spark.app.name") == "test")
 session.stop()
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17013][SQL] Parse negative numeric literals

2016-08-12 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master abff92bfd -> 00e103a6e


[SPARK-17013][SQL] Parse negative numeric literals

## What changes were proposed in this pull request?
This patch updates the SQL parser to parse negative numeric literals as numeric 
literals, instead of unary minus of positive literals.

This allows the parser to parse the minimal value for each data type, e.g. 
"-32768S".

## How was this patch tested?
Updated test cases.

Author: petermaxlee 

Closes #14608 from petermaxlee/SPARK-17013.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00e103a6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00e103a6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00e103a6

Branch: refs/heads/master
Commit: 00e103a6edd1a1f001a94d41dd1f7acc40a1e30f
Parents: abff92b
Author: petermaxlee 
Authored: Thu Aug 11 23:56:55 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 23:56:55 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 14 +++
 .../sql/catalyst/expressions/arithmetic.scala   |  4 +-
 .../sql-tests/results/arithmetic.sql.out| 26 ++--
 .../sql-tests/results/literals.sql.out  | 44 ++--
 .../catalyst/ExpressionSQLBuilderSuite.scala|  4 +-
 5 files changed, 37 insertions(+), 55 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index ba65f2a..6122bcd 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -625,13 +625,13 @@ quotedIdentifier
 ;
 
 number
-: DECIMAL_VALUE#decimalLiteral
-| SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
-| INTEGER_VALUE#integerLiteral
-| BIGINT_LITERAL   #bigIntLiteral
-| SMALLINT_LITERAL #smallIntLiteral
-| TINYINT_LITERAL  #tinyIntLiteral
-| DOUBLE_LITERAL   #doubleLiteral
+: MINUS? DECIMAL_VALUE#decimalLiteral
+| MINUS? SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
+| MINUS? INTEGER_VALUE#integerLiteral
+| MINUS? BIGINT_LITERAL   #bigIntLiteral
+| MINUS? SMALLINT_LITERAL #smallIntLiteral
+| MINUS? TINYINT_LITERAL  #tinyIntLiteral
+| MINUS? DOUBLE_LITERAL   #doubleLiteral
 ;
 
 nonReserved

http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 4aebef9..13e539a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -58,7 +58,7 @@ case class UnaryMinus(child: Expression) extends 
UnaryExpression
 }
   }
 
-  override def sql: String = s"(-${child.sql})"
+  override def sql: String = s"(- ${child.sql})"
 }
 
 @ExpressionDescription(
@@ -76,7 +76,7 @@ case class UnaryPositive(child: Expression)
 
   protected override def nullSafeEval(input: Any): Any = input
 
-  override def sql: String = s"(+${child.sql})"
+  override def sql: String = s"(+ ${child.sql})"
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out 
b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
index 50ea254..f2b40a0 100644
--- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
@@ -5,7 +5,7 @@
 -- !query 0
 select -100
 -- !query 0 schema
-struct<(-100):int>
+struct<-100:int>
 -- !query 0 output
 -100
 
@@ -21,7 +21,7 @@ struct<230:int>
 -- !query 2
 select -5.2
 -- !query 2 schema
-struct<(-5.2):decimal(2,1)>
+struct<-5.2:decimal(2,1)>
 -- !query 2 output
 -5.2
 
@@ -37,7 +37,7 @@ struct<6.8:double>
 -- !query 4
 select -key, +key from testdata where key = 2
 -- !query 4

spark git commit: [SPARK-17013][SQL] Parse negative numeric literals

2016-08-12 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 b4047fc21 -> bde94cd71


[SPARK-17013][SQL] Parse negative numeric literals

## What changes were proposed in this pull request?
This patch updates the SQL parser to parse negative numeric literals as numeric 
literals, instead of unary minus of positive literals.

This allows the parser to parse the minimal value for each data type, e.g. 
"-32768S".

## How was this patch tested?
Updated test cases.

Author: petermaxlee 

Closes #14608 from petermaxlee/SPARK-17013.

(cherry picked from commit 00e103a6edd1a1f001a94d41dd1f7acc40a1e30f)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bde94cd7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bde94cd7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bde94cd7

Branch: refs/heads/branch-2.0
Commit: bde94cd71086fd348f3ba96de628d6df3f87dba5
Parents: b4047fc
Author: petermaxlee 
Authored: Thu Aug 11 23:56:55 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 23:57:01 2016 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 14 +++
 .../sql/catalyst/expressions/arithmetic.scala   |  4 +-
 .../sql-tests/results/arithmetic.sql.out| 26 ++--
 .../sql-tests/results/literals.sql.out  | 44 ++--
 .../catalyst/ExpressionSQLBuilderSuite.scala|  4 +-
 5 files changed, 37 insertions(+), 55 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 279a1ce..aca7282 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -618,13 +618,13 @@ quotedIdentifier
 ;
 
 number
-: DECIMAL_VALUE#decimalLiteral
-| SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
-| INTEGER_VALUE#integerLiteral
-| BIGINT_LITERAL   #bigIntLiteral
-| SMALLINT_LITERAL #smallIntLiteral
-| TINYINT_LITERAL  #tinyIntLiteral
-| DOUBLE_LITERAL   #doubleLiteral
+: MINUS? DECIMAL_VALUE#decimalLiteral
+| MINUS? SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
+| MINUS? INTEGER_VALUE#integerLiteral
+| MINUS? BIGINT_LITERAL   #bigIntLiteral
+| MINUS? SMALLINT_LITERAL #smallIntLiteral
+| MINUS? TINYINT_LITERAL  #tinyIntLiteral
+| MINUS? DOUBLE_LITERAL   #doubleLiteral
 ;
 
 nonReserved

http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 7ff8795..fa459aa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -57,7 +57,7 @@ case class UnaryMinus(child: Expression) extends 
UnaryExpression
 }
   }
 
-  override def sql: String = s"(-${child.sql})"
+  override def sql: String = s"(- ${child.sql})"
 }
 
 @ExpressionDescription(
@@ -75,7 +75,7 @@ case class UnaryPositive(child: Expression)
 
   protected override def nullSafeEval(input: Any): Any = input
 
-  override def sql: String = s"(+${child.sql})"
+  override def sql: String = s"(+ ${child.sql})"
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out 
b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
index 50ea254..f2b40a0 100644
--- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
@@ -5,7 +5,7 @@
 -- !query 0
 select -100
 -- !query 0 schema
-struct<(-100):int>
+struct<-100:int>
 -- !query 0 output
 -100
 
@@ -21,7 +21,7 @@ struct<230:int>
 -- !query 2
 select -5.2
 -- !query 2 schema
-struct<(-5.2):decimal(2,1)>
+struct<-5.2:decimal(2,1)>
 -- !query 2

spark git commit: [SPARK-17018][SQL] literals.sql for testing literal parsing

2016-08-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6bf20cd94 -> bc683f037


[SPARK-17018][SQL] literals.sql for testing literal parsing

## What changes were proposed in this pull request?
This patch adds literals.sql for testing literal parsing end-to-end in SQL.

## How was this patch tested?
The patch itself is only about adding test cases.

Author: petermaxlee 

Closes #14598 from petermaxlee/SPARK-17018-2.

(cherry picked from commit cf9367826c38e5f34ae69b409f5d09c55ed1d319)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc683f03
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc683f03
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc683f03

Branch: refs/heads/branch-2.0
Commit: bc683f037d4e84f2a42eb7b1aaa9e0e4fd5f833a
Parents: 6bf20cd
Author: petermaxlee 
Authored: Thu Aug 11 13:55:10 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 13:55:17 2016 -0700

--
 .../resources/sql-tests/inputs/literals.sql |  92 +
 .../sql-tests/inputs/number-format.sql  |  16 -
 .../sql-tests/results/literals.sql.out  | 374 +++
 .../sql-tests/results/number-format.sql.out |  42 ---
 .../apache/spark/sql/SQLQueryTestSuite.scala|  14 +-
 5 files changed, 476 insertions(+), 62 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bc683f03/sql/core/src/test/resources/sql-tests/inputs/literals.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/literals.sql 
b/sql/core/src/test/resources/sql-tests/inputs/literals.sql
new file mode 100644
index 000..62f0d3d
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/literals.sql
@@ -0,0 +1,92 @@
+-- Literal parsing
+
+-- null
+select null, Null, nUll;
+
+-- boolean
+select true, tRue, false, fALse;
+
+-- byte (tinyint)
+select 1Y;
+select 127Y, -128Y;
+
+-- out of range byte
+select 128Y;
+
+-- short (smallint)
+select 1S;
+select 32767S, -32768S;
+
+-- out of range short
+select 32768S;
+
+-- long (bigint)
+select 1L, 2147483648L;
+select 9223372036854775807L, -9223372036854775808L;
+
+-- out of range long
+select 9223372036854775808L;
+
+-- integral parsing
+
+-- parse int
+select 1, -1;
+
+-- parse int max and min value as int
+select 2147483647, -2147483648;
+
+-- parse long max and min value as long
+select 9223372036854775807, -9223372036854775808;
+
+-- parse as decimals (Long.MaxValue + 1, and Long.MinValue - 1)
+select 9223372036854775808, -9223372036854775809;
+
+-- out of range decimal numbers
+select 1234567890123456789012345678901234567890;
+select 1234567890123456789012345678901234567890.0;
+
+-- double
+select 1D, 1.2D, 1e10, 1.5e5, .10D, 0.10D, .1e5, .9e+2, 0.9e+2, 900e-1, 9.e+1;
+select -1D, -1.2D, -1e10, -1.5e5, -.10D, -0.10D, -.1e5;
+-- negative double
+select .e3;
+-- inf and -inf
+select 1E309, -1E309;
+
+-- decimal parsing
+select 0.3, -0.8, .5, -.18, 0., .;
+
+-- super large scientific notation numbers should still be valid doubles
+select 123456789012345678901234567890123456789e10, 
123456789012345678901234567890123456789.1e10;
+
+-- string
+select "Hello Peter!", 'hello lee!';
+-- multi string
+select 'hello' 'world', 'hello' " " 'lee';
+-- single quote within double quotes
+select "hello 'peter'";
+select 'pattern%', 'no-pattern\%', 'pattern\\%', 'pattern\\\%';
+select '\'', '"', '\n', '\r', '\t', 'Z';
+-- "Hello!" in octals
+select '\110\145\154\154\157\041';
+-- "World :)" in unicode
+select '\u0057\u006F\u0072\u006C\u0064\u0020\u003A\u0029';
+
+-- date
+select dAte '2016-03-12';
+-- invalid date
+select date 'mar 11 2016';
+
+-- timestamp
+select tImEstAmp '2016-03-11 20:54:00.000';
+-- invalid timestamp
+select timestamp '2016-33-11 20:54:00.000';
+
+-- interval
+select interval 13.123456789 seconds, interval -13.123456789 second;
+select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 
millisecond, 9 microsecond;
+-- ns is not supported
+select interval 10 nanoseconds;
+
+-- unsupported data type
+select GEO '(10,-6)';

http://git-wip-us.apache.org/repos/asf/spark/blob/bc683f03/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql 
b/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
deleted file mode 100644
index a32d068..000
--- a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
+++ /dev/null
@@ -1,16 +0,0 @@
--- Verifies how we parse numbers
-
--- parse as ints
-select 1, -1;
-
--- parse as longs (Int.MaxValue + 1, and Int.MinValue -

spark git commit: [SPARK-17018][SQL] literals.sql for testing literal parsing

2016-08-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master acaf2a81a -> cf9367826


[SPARK-17018][SQL] literals.sql for testing literal parsing

## What changes were proposed in this pull request?
This patch adds literals.sql for testing literal parsing end-to-end in SQL.

## How was this patch tested?
The patch itself is only about adding test cases.

Author: petermaxlee 

Closes #14598 from petermaxlee/SPARK-17018-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf936782
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf936782
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf936782

Branch: refs/heads/master
Commit: cf9367826c38e5f34ae69b409f5d09c55ed1d319
Parents: acaf2a8
Author: petermaxlee 
Authored: Thu Aug 11 13:55:10 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 13:55:10 2016 -0700

--
 .../resources/sql-tests/inputs/literals.sql |  92 +
 .../sql-tests/inputs/number-format.sql  |  16 -
 .../sql-tests/results/literals.sql.out  | 374 +++
 .../sql-tests/results/number-format.sql.out |  42 ---
 .../apache/spark/sql/SQLQueryTestSuite.scala|  14 +-
 5 files changed, 476 insertions(+), 62 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cf936782/sql/core/src/test/resources/sql-tests/inputs/literals.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/literals.sql 
b/sql/core/src/test/resources/sql-tests/inputs/literals.sql
new file mode 100644
index 000..62f0d3d
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/literals.sql
@@ -0,0 +1,92 @@
+-- Literal parsing
+
+-- null
+select null, Null, nUll;
+
+-- boolean
+select true, tRue, false, fALse;
+
+-- byte (tinyint)
+select 1Y;
+select 127Y, -128Y;
+
+-- out of range byte
+select 128Y;
+
+-- short (smallint)
+select 1S;
+select 32767S, -32768S;
+
+-- out of range short
+select 32768S;
+
+-- long (bigint)
+select 1L, 2147483648L;
+select 9223372036854775807L, -9223372036854775808L;
+
+-- out of range long
+select 9223372036854775808L;
+
+-- integral parsing
+
+-- parse int
+select 1, -1;
+
+-- parse int max and min value as int
+select 2147483647, -2147483648;
+
+-- parse long max and min value as long
+select 9223372036854775807, -9223372036854775808;
+
+-- parse as decimals (Long.MaxValue + 1, and Long.MinValue - 1)
+select 9223372036854775808, -9223372036854775809;
+
+-- out of range decimal numbers
+select 1234567890123456789012345678901234567890;
+select 1234567890123456789012345678901234567890.0;
+
+-- double
+select 1D, 1.2D, 1e10, 1.5e5, .10D, 0.10D, .1e5, .9e+2, 0.9e+2, 900e-1, 9.e+1;
+select -1D, -1.2D, -1e10, -1.5e5, -.10D, -0.10D, -.1e5;
+-- negative double
+select .e3;
+-- inf and -inf
+select 1E309, -1E309;
+
+-- decimal parsing
+select 0.3, -0.8, .5, -.18, 0., .;
+
+-- super large scientific notation numbers should still be valid doubles
+select 123456789012345678901234567890123456789e10, 
123456789012345678901234567890123456789.1e10;
+
+-- string
+select "Hello Peter!", 'hello lee!';
+-- multi string
+select 'hello' 'world', 'hello' " " 'lee';
+-- single quote within double quotes
+select "hello 'peter'";
+select 'pattern%', 'no-pattern\%', 'pattern\\%', 'pattern\\\%';
+select '\'', '"', '\n', '\r', '\t', 'Z';
+-- "Hello!" in octals
+select '\110\145\154\154\157\041';
+-- "World :)" in unicode
+select '\u0057\u006F\u0072\u006C\u0064\u0020\u003A\u0029';
+
+-- date
+select dAte '2016-03-12';
+-- invalid date
+select date 'mar 11 2016';
+
+-- timestamp
+select tImEstAmp '2016-03-11 20:54:00.000';
+-- invalid timestamp
+select timestamp '2016-33-11 20:54:00.000';
+
+-- interval
+select interval 13.123456789 seconds, interval -13.123456789 second;
+select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 
millisecond, 9 microsecond;
+-- ns is not supported
+select interval 10 nanoseconds;
+
+-- unsupported data type
+select GEO '(10,-6)';

http://git-wip-us.apache.org/repos/asf/spark/blob/cf936782/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql 
b/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
deleted file mode 100644
index a32d068..000
--- a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql
+++ /dev/null
@@ -1,16 +0,0 @@
--- Verifies how we parse numbers
-
--- parse as ints
-select 1, -1;
-
--- parse as longs (Int.MaxValue + 1, and Int.MinValue - 1)
-select 2147483648, -2147483649;
-
--- parse long min and max value
-select 9223372036854775807, -9223372036854775808;
-
---

spark git commit: [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests

2016-08-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 33a213f33 -> 6bf20cd94


[SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests

This patch adds three test files:
1. arithmetic.sql.out
2. order-by-ordinal.sql
3. group-by-ordinal.sql

This includes https://github.com/apache/spark/pull/14594.

This is a test case change.

Author: petermaxlee 

Closes #14595 from petermaxlee/SPARK-17015.

(cherry picked from commit a7b02db457d5fc663ce6a1ef01bf04689870e6b4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6bf20cd9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6bf20cd9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6bf20cd9

Branch: refs/heads/branch-2.0
Commit: 6bf20cd9460fd27c3e1e434b1cf31a3778ec3443
Parents: 33a213f
Author: petermaxlee 
Authored: Thu Aug 11 01:43:08 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 10:50:52 2016 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  24 +-
 .../resources/sql-tests/inputs/arithmetic.sql   |  26 +++
 .../sql-tests/inputs/group-by-ordinal.sql   |  50 +
 .../sql-tests/inputs/order-by-ordinal.sql   |  36 +++
 .../sql-tests/results/arithmetic.sql.out| 178 +++
 .../sql-tests/results/group-by-ordinal.sql.out  | 168 ++
 .../sql-tests/results/order-by-ordinal.sql.out  | 143 
 .../org/apache/spark/sql/SQLQuerySuite.scala| 220 ---
 8 files changed, 613 insertions(+), 232 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6bf20cd9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 660f523..57c3d9a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -547,8 +547,7 @@ class Analyzer(
   case a: Aggregate if containsStar(a.aggregateExpressions) =>
 if (conf.groupByOrdinal && 
a.groupingExpressions.exists(IntegerIndex.unapply(_).nonEmpty)) {
   failAnalysis(
-"Group by position: star is not allowed to use in the select list 
" +
-  "when using ordinals in group by")
+"Star (*) is not allowed in select list when GROUP BY ordinal 
position is used")
 } else {
   a.copy(aggregateExpressions = 
buildExpandedProjectList(a.aggregateExpressions, a.child))
 }
@@ -723,9 +722,9 @@ class Analyzer(
 if (index > 0 && index <= child.output.size) {
   SortOrder(child.output(index - 1), direction)
 } else {
-  throw new UnresolvedException(s,
-s"Order/sort By position: $index does not exist " +
-s"The Select List is indexed from 1 to ${child.output.size}")
+  s.failAnalysis(
+s"ORDER BY position $index is not in select list " +
+  s"(valid range is [1, ${child.output.size}])")
 }
   case o => o
 }
@@ -737,17 +736,18 @@ class Analyzer(
   if conf.groupByOrdinal && aggs.forall(_.resolved) &&
 groups.exists(IntegerIndex.unapply(_).nonEmpty) =>
 val newGroups = groups.map {
-  case IntegerIndex(index) if index > 0 && index <= aggs.size =>
+  case ordinal @ IntegerIndex(index) if index > 0 && index <= 
aggs.size =>
 aggs(index - 1) match {
   case e if ResolveAggregateFunctions.containsAggregate(e) =>
-throw new UnresolvedException(a,
-  s"Group by position: the '$index'th column in the select 
contains an " +
-  s"aggregate function: ${e.sql}. Aggregate functions are not 
allowed in GROUP BY")
+ordinal.failAnalysis(
+  s"GROUP BY position $index is an aggregate function, and " +
+"aggregate functions are not allowed in GROUP BY")
   case o => o
 }
-  case IntegerIndex(index) =>
-throw new UnresolvedException(a,
-  s"Group by position: '$index' exceeds the size of the select 
list '${aggs.size}'.")
+  case ordinal @ IntegerIndex(index) =>
+ordinal.failAnalysis(
+  s"GROUP BY position $index is not in select list " +
+s"(valid range is [1, ${aggs.size}])")
   case o => o
 }
 Aggregate(newGroups,

spark git commit: [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests

2016-08-11 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 0db373aaf -> a7b02db45


[SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests

## What changes were proposed in this pull request?
This patch adds three test files:
1. arithmetic.sql.out
2. order-by-ordinal.sql
3. group-by-ordinal.sql

This includes https://github.com/apache/spark/pull/14594.

## How was this patch tested?
This is a test case change.

Author: petermaxlee 

Closes #14595 from petermaxlee/SPARK-17015.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7b02db4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7b02db4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7b02db4

Branch: refs/heads/master
Commit: a7b02db457d5fc663ce6a1ef01bf04689870e6b4
Parents: 0db373a
Author: petermaxlee 
Authored: Thu Aug 11 01:43:08 2016 -0700
Committer: Reynold Xin 
Committed: Thu Aug 11 01:43:08 2016 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  24 +-
 .../resources/sql-tests/inputs/arithmetic.sql   |  26 +++
 .../sql-tests/inputs/group-by-ordinal.sql   |  50 +
 .../sql-tests/inputs/order-by-ordinal.sql   |  36 +++
 .../sql-tests/results/arithmetic.sql.out| 178 +++
 .../sql-tests/results/group-by-ordinal.sql.out  | 168 ++
 .../sql-tests/results/order-by-ordinal.sql.out  | 143 
 .../org/apache/spark/sql/SQLQuerySuite.scala| 220 ---
 8 files changed, 613 insertions(+), 232 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7b02db4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 25202b5..14a2a32 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -547,8 +547,7 @@ class Analyzer(
   case a: Aggregate if containsStar(a.aggregateExpressions) =>
 if (conf.groupByOrdinal && 
a.groupingExpressions.exists(IntegerIndex.unapply(_).nonEmpty)) {
   failAnalysis(
-"Group by position: star is not allowed to use in the select list 
" +
-  "when using ordinals in group by")
+"Star (*) is not allowed in select list when GROUP BY ordinal 
position is used")
 } else {
   a.copy(aggregateExpressions = 
buildExpandedProjectList(a.aggregateExpressions, a.child))
 }
@@ -723,9 +722,9 @@ class Analyzer(
 if (index > 0 && index <= child.output.size) {
   SortOrder(child.output(index - 1), direction)
 } else {
-  throw new UnresolvedException(s,
-s"Order/sort By position: $index does not exist " +
-s"The Select List is indexed from 1 to ${child.output.size}")
+  s.failAnalysis(
+s"ORDER BY position $index is not in select list " +
+  s"(valid range is [1, ${child.output.size}])")
 }
   case o => o
 }
@@ -737,17 +736,18 @@ class Analyzer(
   if conf.groupByOrdinal && aggs.forall(_.resolved) &&
 groups.exists(IntegerIndex.unapply(_).nonEmpty) =>
 val newGroups = groups.map {
-  case IntegerIndex(index) if index > 0 && index <= aggs.size =>
+  case ordinal @ IntegerIndex(index) if index > 0 && index <= 
aggs.size =>
 aggs(index - 1) match {
   case e if ResolveAggregateFunctions.containsAggregate(e) =>
-throw new UnresolvedException(a,
-  s"Group by position: the '$index'th column in the select 
contains an " +
-  s"aggregate function: ${e.sql}. Aggregate functions are not 
allowed in GROUP BY")
+ordinal.failAnalysis(
+  s"GROUP BY position $index is an aggregate function, and " +
+"aggregate functions are not allowed in GROUP BY")
   case o => o
 }
-  case IntegerIndex(index) =>
-throw new UnresolvedException(a,
-  s"Group by position: '$index' exceeds the size of the select 
list '${aggs.size}'.")
+  case ordinal @ IntegerIndex(index) =>
+ordinal.failAnalysis(
+  s"GROUP BY position $index is not in select list " +
+s"(valid range is [1, ${aggs.size}])")
   case o => o
 }
 Aggregate(newGroups, aggs, child)

spark git commit: [SPARK-17010][MINOR][DOC] Wrong description in memory management document

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 665e17532 -> 7a6a3c3fb


[SPARK-17010][MINOR][DOC] Wrong description in memory management document

## What changes were proposed in this pull request?

change the remain percent to right one.

## How was this patch tested?

Manual review

Author: Tao Wang 

Closes #14591 from WangTaoTheTonic/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a6a3c3f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7a6a3c3f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7a6a3c3f

Branch: refs/heads/master
Commit: 7a6a3c3fbcea889ca20beae9d4198df2fe53bd1b
Parents: 665e175
Author: Tao Wang 
Authored: Wed Aug 10 22:30:18 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 10 22:30:18 2016 -0700

--
 docs/tuning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7a6a3c3f/docs/tuning.md
--
diff --git a/docs/tuning.md b/docs/tuning.md
index 1ed1409..976f2eb 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -115,7 +115,7 @@ Although there are two relevant configurations, the typical 
user should not need
 as the default values are applicable to most workloads:
 
 * `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM 
heap space - 300MB)
-(default 0.6). The rest of the space (25%) is reserved for user data 
structures, internal
+(default 0.6). The rest of the space (40%) is reserved for user data 
structures, internal
 metadata in Spark, and safeguarding against OOM errors in the case of sparse 
and unusually
 large records.
 * `spark.memory.storageFraction` expresses the size of `R` as a fraction of 
`M` (default 0.5).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17010][MINOR][DOC] Wrong description in memory management document

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d3a30d2f0 -> 1e4013571


[SPARK-17010][MINOR][DOC] Wrong description in memory management document

## What changes were proposed in this pull request?

change the remain percent to right one.

## How was this patch tested?

Manual review

Author: Tao Wang 

Closes #14591 from WangTaoTheTonic/patch-1.

(cherry picked from commit 7a6a3c3fbcea889ca20beae9d4198df2fe53bd1b)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e401357
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e401357
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e401357

Branch: refs/heads/branch-2.0
Commit: 1e4013571b18ca337ea664838f7f8e781c8de7aa
Parents: d3a30d2
Author: Tao Wang 
Authored: Wed Aug 10 22:30:18 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 10 22:30:25 2016 -0700

--
 docs/tuning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1e401357/docs/tuning.md
--
diff --git a/docs/tuning.md b/docs/tuning.md
index 1ed1409..976f2eb 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -115,7 +115,7 @@ Although there are two relevant configurations, the typical 
user should not need
 as the default values are applicable to most workloads:
 
 * `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM 
heap space - 300MB)
-(default 0.6). The rest of the space (25%) is reserved for user data 
structures, internal
+(default 0.6). The rest of the space (40%) is reserved for user data 
structures, internal
 metadata in Spark, and safeguarding against OOM errors in the case of sparse 
and unusually
 large records.
 * `spark.memory.storageFraction` expresses the size of `R` as a fraction of 
`M` (default 0.5).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17007][SQL] Move test data files into a test-data folder

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 425c7c2db -> 665e17532


[SPARK-17007][SQL] Move test data files into a test-data folder

## What changes were proposed in this pull request?
This patch moves all the test data files in sql/core/src/test/resources to 
sql/core/src/test/resources/test-data, so we don't clutter the top level 
sql/core/src/test/resources. Also deleted 
sql/core/src/test/resources/old-repeated.parquet since it is no longer used.

The change will make it easier to spot sql-tests directory.

## How was this patch tested?
This is a test-only change.

Author: petermaxlee 

Closes #14589 from petermaxlee/SPARK-17007.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/665e1753
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/665e1753
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/665e1753

Branch: refs/heads/master
Commit: 665e175328130ab3eb0370cdd2a43ed5a7bed1d6
Parents: 425c7c2
Author: petermaxlee 
Authored: Wed Aug 10 21:26:46 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 10 21:26:46 2016 -0700

--
 .../apache/spark/sql/JavaDataFrameSuite.java|  12 +++
 sql/core/src/test/resources/bool.csv|   5 ---
 .../src/test/resources/cars-alternative.csv |   5 ---
 .../test/resources/cars-blank-column-name.csv   |   3 --
 sql/core/src/test/resources/cars-malformed.csv  |   6 
 sql/core/src/test/resources/cars-null.csv   |   6 
 .../test/resources/cars-unbalanced-quotes.csv   |   4 ---
 sql/core/src/test/resources/cars.csv|   7 
 sql/core/src/test/resources/cars.tsv|   4 ---
 sql/core/src/test/resources/cars_iso-8859-1.csv |   6 
 sql/core/src/test/resources/comments.csv|   6 
 sql/core/src/test/resources/dates.csv   |   4 ---
 .../src/test/resources/dec-in-fixed-len.parquet | Bin 460 -> 0 bytes
 sql/core/src/test/resources/dec-in-i32.parquet  | Bin 420 -> 0 bytes
 sql/core/src/test/resources/dec-in-i64.parquet  | Bin 437 -> 0 bytes
 sql/core/src/test/resources/decimal.csv |   7 
 .../src/test/resources/disable_comments.csv |   2 --
 sql/core/src/test/resources/empty.csv   |   0
 .../test/resources/nested-array-struct.parquet  | Bin 775 -> 0 bytes
 sql/core/src/test/resources/numbers.csv |   9 -
 .../src/test/resources/old-repeated-int.parquet | Bin 389 -> 0 bytes
 .../test/resources/old-repeated-message.parquet | Bin 600 -> 0 bytes
 .../src/test/resources/old-repeated.parquet | Bin 432 -> 0 bytes
 .../parquet-thrift-compat.snappy.parquet| Bin 10550 -> 0 bytes
 .../resources/proto-repeated-string.parquet | Bin 411 -> 0 bytes
 .../resources/proto-repeated-struct.parquet | Bin 608 -> 0 bytes
 .../proto-struct-with-array-many.parquet| Bin 802 -> 0 bytes
 .../resources/proto-struct-with-array.parquet   | Bin 1576 -> 0 bytes
 sql/core/src/test/resources/simple_sparse.csv   |   5 ---
 sql/core/src/test/resources/test-data/bool.csv  |   5 +++
 .../resources/test-data/cars-alternative.csv|   5 +++
 .../test-data/cars-blank-column-name.csv|   3 ++
 .../test/resources/test-data/cars-malformed.csv |   6 
 .../src/test/resources/test-data/cars-null.csv  |   6 
 .../test-data/cars-unbalanced-quotes.csv|   4 +++
 sql/core/src/test/resources/test-data/cars.csv  |   7 
 sql/core/src/test/resources/test-data/cars.tsv  |   4 +++
 .../resources/test-data/cars_iso-8859-1.csv |   6 
 .../src/test/resources/test-data/comments.csv   |   6 
 sql/core/src/test/resources/test-data/dates.csv |   4 +++
 .../test-data/dec-in-fixed-len.parquet  | Bin 0 -> 460 bytes
 .../test/resources/test-data/dec-in-i32.parquet | Bin 0 -> 420 bytes
 .../test/resources/test-data/dec-in-i64.parquet | Bin 0 -> 437 bytes
 .../src/test/resources/test-data/decimal.csv|   7 
 .../resources/test-data/disable_comments.csv|   2 ++
 sql/core/src/test/resources/test-data/empty.csv |   0
 .../test-data/nested-array-struct.parquet   | Bin 0 -> 775 bytes
 .../src/test/resources/test-data/numbers.csv|   9 +
 .../test-data/old-repeated-int.parquet  | Bin 0 -> 389 bytes
 .../test-data/old-repeated-message.parquet  | Bin 0 -> 600 bytes
 .../parquet-thrift-compat.snappy.parquet| Bin 0 -> 10550 bytes
 .../test-data/proto-repeated-string.parquet | Bin 0 -> 411 bytes
 .../test-data/proto-repeated-struct.parquet | Bin 0 -> 608 bytes
 .../proto-struct-with-array-many.parquet| Bin 0 -> 802 bytes
 .../test-data/proto-struct-with-array.parquet   | Bin 0 -> 1576 bytes
 .../test/resources/test-data/simple_sparse.csv  |   5 +++
 .../text-partitioned/year=2014/data.txt |   1 +
 .../text-partitioned/year=2015/data.txt |   1 +

spark git commit: [SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQueryTestSuite.

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ab648c000 -> 425c7c2db


[SPARK-17008][SPARK-17009][SQL] Normalization and isolation in 
SQLQueryTestSuite.

## What changes were proposed in this pull request?
This patch enhances SQLQueryTestSuite in two ways:

1. SPARK-17009: Use a new SparkSession for each test case to provide stronger 
isolation (e.g. config changes in one test case does not impact another). That 
said, we do not currently isolate catalog changes.
2. SPARK-17008: Normalize query output using sorting, inspired by 
HiveComparisonTest.

I also ported a few new test cases over from SQLQuerySuite.

## How was this patch tested?
This is a test harness update.

Author: petermaxlee 

Closes #14590 from petermaxlee/SPARK-17008.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/425c7c2d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/425c7c2d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/425c7c2d

Branch: refs/heads/master
Commit: 425c7c2dbd2923094712e1215dd29272fb09cd79
Parents: ab648c0
Author: petermaxlee 
Authored: Wed Aug 10 21:05:32 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 10 21:05:32 2016 -0700

--
 .../resources/sql-tests/inputs/datetime.sql |  4 ++
 .../test/resources/sql-tests/inputs/having.sql  | 15 +
 .../resources/sql-tests/inputs/natural-join.sql | 20 ++
 .../sql-tests/results/datetime.sql.out  | 10 +++
 .../resources/sql-tests/results/having.sql.out  | 40 
 .../sql-tests/results/natural-join.sql.out  | 64 
 .../org/apache/spark/sql/SQLQuerySuite.scala| 62 ---
 .../apache/spark/sql/SQLQueryTestSuite.scala| 30 -
 8 files changed, 180 insertions(+), 65 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql 
b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
new file mode 100644
index 000..3fd1c37
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
@@ -0,0 +1,4 @@
+-- date time functions
+
+-- [SPARK-16836] current_date and current_timestamp literals
+select current_date = current_date(), current_timestamp = current_timestamp();

http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/having.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/having.sql 
b/sql/core/src/test/resources/sql-tests/inputs/having.sql
new file mode 100644
index 000..364c022
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/having.sql
@@ -0,0 +1,15 @@
+create temporary view hav as select * from values
+  ("one", 1),
+  ("two", 2),
+  ("three", 3),
+  ("one", 5)
+  as hav(k, v);
+
+-- having clause
+SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) > 2;
+
+-- having condition contains grouping column
+SELECT count(k) FROM hav GROUP BY v + 1 HAVING v + 1 = 2;
+
+-- SPARK-11032: resolve having correctly
+SELECT MIN(t.v) FROM (SELECT * FROM hav WHERE v > 0) t HAVING(COUNT(1) > 0);

http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql
--
diff --git a/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql 
b/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql
new file mode 100644
index 000..71a5015
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql
@@ -0,0 +1,20 @@
+create temporary view nt1 as select * from values
+  ("one", 1),
+  ("two", 2),
+  ("three", 3)
+  as nt1(k, v1);
+
+create temporary view nt2 as select * from values
+  ("one", 1),
+  ("two", 22),
+  ("one", 5)
+  as nt2(k, v2);
+
+
+SELECT * FROM nt1 natural join nt2 where k = "one";
+
+SELECT * FROM nt1 natural left join nt2 order by v1, v2;
+
+SELECT * FROM nt1 natural right join nt2 order by v1, v2;
+
+SELECT count(*) FROM nt1 natural full outer join nt2;

http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out 
b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
new file mode 100644
index 000..5174657
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
@@ -0,0 +1,10 @@
+-- Automatically generated by

spark git commit: Fixed typo

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 121643bc7 -> 9dc3e602d


Fixed typo

## What changes were proposed in this pull request?

Fixed small typo - "value ... ~~in~~ is null"

## How was this patch tested?

Still compiles!

Author: MichaÅ KieÅbowicz 

Closes #14569 from jupblb/typo-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9dc3e602
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9dc3e602
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9dc3e602

Branch: refs/heads/master
Commit: 9dc3e602d77ccdf670f1b6648e5674066d189cc0
Parents: 121643b
Author: MichaÅ KieÅbowicz 
Authored: Tue Aug 9 23:01:50 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 9 23:01:50 2016 -0700

--
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9dc3e602/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index d83eef7..e16850e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -463,6 +463,6 @@ trait Row extends Serializable {
* @throws NullPointerException when value is null.
*/
   private def getAnyValAs[T <: AnyVal](i: Int): T =
-if (isNullAt(i)) throw new NullPointerException(s"Value at index $i in 
null")
+if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Fixed typo

2016-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2d136dba4 -> 475ee3815


Fixed typo

## What changes were proposed in this pull request?

Fixed small typo - "value ... ~~in~~ is null"

## How was this patch tested?

Still compiles!

Author: MichaÅ KieÅbowicz 

Closes #14569 from jupblb/typo-fix.

(cherry picked from commit 9dc3e602d77ccdf670f1b6648e5674066d189cc0)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/475ee381
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/475ee381
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/475ee381

Branch: refs/heads/branch-2.0
Commit: 475ee38150ee5a234156a903e4de227954b0063e
Parents: 2d136db
Author: MichaÅ KieÅbowicz 
Authored: Tue Aug 9 23:01:50 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 9 23:01:57 2016 -0700

--
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/475ee381/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index d83eef7..e16850e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -463,6 +463,6 @@ trait Row extends Serializable {
* @throws NullPointerException when value is null.
*/
   private def getAnyValAs[T <: AnyVal](i: Int): T =
-if (isNullAt(i)) throw new NullPointerException(s"Value at index $i in 
null")
+if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug

2016-08-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6fc54b776 -> 601c649d0


[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug

## What changes were proposed in this pull request?

Add a constant iterator which point to head of result. The header will be used 
to reset iterator when fetch result from first row repeatedly.
JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563

## How was this patch tested?

This bug was found when using Cloudera HUE connecting to spark sql thrift 
server, currently SQL statement result can be only fetched for once. The fix 
was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL 
results repeatedly through thrift server.

Author: Alice 
Author: Alice 

Closes #14218 from alicegugu/SparkSQLFetchResultsBug.

(cherry picked from commit e17a76efdb44837c38388a4d0e62436065cd4dc9)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/601c649d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/601c649d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/601c649d

Branch: refs/heads/branch-2.0
Commit: 601c649d0134e6791f1c0e0aaa25d6aad3c541d4
Parents: 6fc54b7
Author: Alice 
Authored: Mon Aug 8 18:00:04 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 8 18:00:58 2016 -0700

--
 .../SparkExecuteStatementOperation.scala| 12 +
 .../thriftserver/HiveThriftServer2Suites.scala  | 48 
 2 files changed, 60 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/601c649d/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index e8bcdd7..b2717ec 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -51,6 +51,7 @@ private[hive] class SparkExecuteStatementOperation(
 
   private var result: DataFrame = _
   private var iter: Iterator[SparkRow] = _
+  private var iterHeader: Iterator[SparkRow] = _
   private var dataTypes: Array[DataType] = _
   private var statementId: String = _
 
@@ -110,6 +111,14 @@ private[hive] class SparkExecuteStatementOperation(
 assertState(OperationState.FINISHED)
 setHasResultSet(true)
 val resultRowSet: RowSet = RowSetFactory.create(getResultSetSchema, 
getProtocolVersion)
+
+// Reset iter to header when fetching start from first row
+if (order.equals(FetchOrientation.FETCH_FIRST)) {
+  val (ita, itb) = iterHeader.duplicate
+  iter = ita
+  iterHeader = itb
+}
+
 if (!iter.hasNext) {
   resultRowSet
 } else {
@@ -228,6 +237,9 @@ private[hive] class SparkExecuteStatementOperation(
   result.collect().iterator
 }
   }
+  val (itra, itrb) = iter.duplicate
+  iterHeader = itra
+  iter = itrb
   dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray
 } catch {
   case e: HiveSQLException =>

http://git-wip-us.apache.org/repos/asf/spark/blob/601c649d/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
index e388c2a..8f2c4fa 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
@@ -36,6 +36,8 @@ import org.apache.hive.service.auth.PlainSaslHelper
 import org.apache.hive.service.cli.GetInfoType
 import org.apache.hive.service.cli.thrift.TCLIService.Client
 import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient
+import org.apache.hive.service.cli.FetchOrientation
+import org.apache.hive.service.cli.FetchType
 import org.apache.thrift.protocol.TBinaryProtocol
 import org.apache.thrift.transport.TSocket
 import org.scalatest.BeforeAndAfterAll
@@ -91,6 +93,52 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest 
{
 }
   }

spark git commit: [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug

2016-08-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master bca43cd63 -> e17a76efd


[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug

## What changes were proposed in this pull request?

Add a constant iterator which point to head of result. The header will be used 
to reset iterator when fetch result from first row repeatedly.
JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563

## How was this patch tested?

This bug was found when using Cloudera HUE connecting to spark sql thrift 
server, currently SQL statement result can be only fetched for once. The fix 
was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL 
results repeatedly through thrift server.

Author: Alice 
Author: Alice 

Closes #14218 from alicegugu/SparkSQLFetchResultsBug.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e17a76ef
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e17a76ef
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e17a76ef

Branch: refs/heads/master
Commit: e17a76efdb44837c38388a4d0e62436065cd4dc9
Parents: bca43cd
Author: Alice 
Authored: Mon Aug 8 18:00:04 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 8 18:00:04 2016 -0700

--
 .../SparkExecuteStatementOperation.scala| 12 +
 .../thriftserver/HiveThriftServer2Suites.scala  | 48 
 2 files changed, 60 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e17a76ef/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index e8bcdd7..b2717ec 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -51,6 +51,7 @@ private[hive] class SparkExecuteStatementOperation(
 
   private var result: DataFrame = _
   private var iter: Iterator[SparkRow] = _
+  private var iterHeader: Iterator[SparkRow] = _
   private var dataTypes: Array[DataType] = _
   private var statementId: String = _
 
@@ -110,6 +111,14 @@ private[hive] class SparkExecuteStatementOperation(
 assertState(OperationState.FINISHED)
 setHasResultSet(true)
 val resultRowSet: RowSet = RowSetFactory.create(getResultSetSchema, 
getProtocolVersion)
+
+// Reset iter to header when fetching start from first row
+if (order.equals(FetchOrientation.FETCH_FIRST)) {
+  val (ita, itb) = iterHeader.duplicate
+  iter = ita
+  iterHeader = itb
+}
+
 if (!iter.hasNext) {
   resultRowSet
 } else {
@@ -228,6 +237,9 @@ private[hive] class SparkExecuteStatementOperation(
   result.collect().iterator
 }
   }
+  val (itra, itrb) = iter.duplicate
+  iterHeader = itra
+  iter = itrb
   dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray
 } catch {
   case e: HiveSQLException =>

http://git-wip-us.apache.org/repos/asf/spark/blob/e17a76ef/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
index e388c2a..8f2c4fa 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
@@ -36,6 +36,8 @@ import org.apache.hive.service.auth.PlainSaslHelper
 import org.apache.hive.service.cli.GetInfoType
 import org.apache.hive.service.cli.thrift.TCLIService.Client
 import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient
+import org.apache.hive.service.cli.FetchOrientation
+import org.apache.hive.service.cli.FetchType
 import org.apache.thrift.protocol.TBinaryProtocol
 import org.apache.thrift.transport.TSocket
 import org.scalatest.BeforeAndAfterAll
@@ -91,6 +93,52 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest 
{
 }
   }
 
+  test("SPARK-16563 ThriftCLIService FetchResults repeat fetching result") {
+withCLIServiceClient { client =>
+  val

spark git commit: Update docs to include SASL support for RPC

2016-08-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 9748a2928 -> 6fc54b776


Update docs to include SASL support for RPC

## What changes were proposed in this pull request?

Update docs to include SASL support for RPC

Evidence: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63

## How was this patch tested?

Docs change only

Author: Michael Gummelt 

Closes #14549 from mgummelt/sasl.

(cherry picked from commit 53d1c7877967f03cc9c8c7e7394f380d1bbefc27)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fc54b77
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fc54b77
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fc54b77

Branch: refs/heads/branch-2.0
Commit: 6fc54b776419317dc55754a76b68a5ba7eecdcf3
Parents: 9748a29
Author: Michael Gummelt 
Authored: Mon Aug 8 16:07:51 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 8 16:08:09 2016 -0700

--
 docs/configuration.md | 7 ---
 docs/security.md  | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6fc54b77/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index bf10b24..8facd0e 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1204,7 +1204,7 @@ Apart from these, the following properties are also 
available, and may be useful
   false
   
 Whether to use dynamic resource allocation, which scales the number of 
executors registered
-with this application up and down based on the workload. 
+with this application up and down based on the workload.
 For more detail, see the description
 here.
 
@@ -1345,8 +1345,9 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.authenticate.enableSaslEncryption
   false
   
-Enable encrypted communication when authentication is enabled. This option 
is currently
-only supported by the block transfer service.
+Enable encrypted communication when authentication is
+enabled. This is supported by the block transfer service and the
+RPC endpoints.
   
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6fc54b77/docs/security.md
--
diff --git a/docs/security.md b/docs/security.md
index d2708a8..baadfef 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -27,7 +27,8 @@ If your applications are using event logging, the directory 
where the event logs
 
 ## Encryption
 
-Spark supports SSL for HTTP protocols. SASL encryption is supported for the 
block transfer service.
+Spark supports SSL for HTTP protocols. SASL encryption is supported for the 
block transfer service
+and the RPC endpoints.
 
 Encryption is not yet supported for data stored by Spark in temporary local 
storage, such as shuffle
 files, cached data, and other application files. If encrypting this data is 
desired, a workaround is


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Update docs to include SASL support for RPC

2016-08-08 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 9216901d5 -> 53d1c7877


Update docs to include SASL support for RPC

## What changes were proposed in this pull request?

Update docs to include SASL support for RPC

Evidence: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63

## How was this patch tested?

Docs change only

Author: Michael Gummelt 

Closes #14549 from mgummelt/sasl.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/53d1c787
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/53d1c787
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53d1c787

Branch: refs/heads/master
Commit: 53d1c7877967f03cc9c8c7e7394f380d1bbefc27
Parents: 9216901
Author: Michael Gummelt 
Authored: Mon Aug 8 16:07:51 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 8 16:07:51 2016 -0700

--
 docs/configuration.md | 7 ---
 docs/security.md  | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/53d1c787/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index cc6b2b6..4569bed 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1211,7 +1211,7 @@ Apart from these, the following properties are also 
available, and may be useful
   false
   
 Whether to use dynamic resource allocation, which scales the number of 
executors registered
-with this application up and down based on the workload. 
+with this application up and down based on the workload.
 For more detail, see the description
 here.
 
@@ -1352,8 +1352,9 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.authenticate.enableSaslEncryption
   false
   
-Enable encrypted communication when authentication is enabled. This option 
is currently
-only supported by the block transfer service.
+Enable encrypted communication when authentication is
+enabled. This is supported by the block transfer service and the
+RPC endpoints.
   
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/53d1c787/docs/security.md
--
diff --git a/docs/security.md b/docs/security.md
index d2708a8..baadfef 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -27,7 +27,8 @@ If your applications are using event logging, the directory 
where the event logs
 
 ## Encryption
 
-Spark supports SSL for HTTP protocols. SASL encryption is supported for the 
block transfer service.
+Spark supports SSL for HTTP protocols. SASL encryption is supported for the 
block transfer service
+and the RPC endpoints.
 
 Encryption is not yet supported for data stored by Spark in temporary local 
storage, such as shuffle
 files, cached data, and other application files. If encrypting this data is 
desired, a workaround is


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

2016-08-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 52d8837c6 -> d2518acc1


[SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

## What changes were proposed in this pull request?

SpillReader NPE when spillFile has no data. See follow logs:

16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to 
file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa,
 fileSize:0.0 B
16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from 
org.apache.spark.util.collection.ExternalSorter3db4b52d
16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 
190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: 
Exception in task 1013.0 in stage 18.0 (TID 23585)
java.lang.NullPointerException
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507)
at 
org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816)
at 
org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251)
at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 
1090.1 in stage 18.0 (TID 23793)
16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded 
a shutdown

## How was this patch tested?

Manual test.

Author: sharkd 
Author: sharkdtu 

Closes #14479 from sharkdtu/master.

(cherry picked from commit 583d91a1957f4258a64184cc6b9007588791d332)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d2518acc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d2518acc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d2518acc

Branch: refs/heads/branch-1.6
Commit: d2518acc1df44b1ecb8eed20404bcc1277f358a4
Parents: 52d8837
Author: sharkd 
Authored: Wed Aug 3 19:20:34 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 3 19:21:16 2016 -0700

--
 .../scala/org/apache/spark/util/collection/ExternalSorter.scala  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d2518acc/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 
b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
index 44b1d90..60ec1ca 100644
--- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
+++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
@@ -592,7 +592,9 @@ private[spark] class ExternalSorter[K, V, C](
   val ds = deserializeStream
   deserializeStream = null
   fileStream = null
-  ds.close()
+  if (ds != null) {
+ds.close()
+  }
   // NOTE: We don't do file.delete() here because that is done in 
ExternalSorter.stop().
   // This should also be fixed in

spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

2016-08-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 bb30a3d0f -> 11854e5a1


[SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

## What changes were proposed in this pull request?

SpillReader NPE when spillFile has no data. See follow logs:

16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to 
file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa,
 fileSize:0.0 B
16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from 
org.apache.spark.util.collection.ExternalSorter3db4b52d
16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 
190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: 
Exception in task 1013.0 in stage 18.0 (TID 23585)
java.lang.NullPointerException
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507)
at 
org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816)
at 
org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251)
at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 
1090.1 in stage 18.0 (TID 23793)
16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded 
a shutdown

## How was this patch tested?

Manual test.

Author: sharkd 
Author: sharkdtu 

Closes #14479 from sharkdtu/master.

(cherry picked from commit 583d91a1957f4258a64184cc6b9007588791d332)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11854e5a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11854e5a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11854e5a

Branch: refs/heads/branch-2.0
Commit: 11854e5a1baa7682d91bfce4e8bba57566f22b3a
Parents: bb30a3d
Author: sharkd 
Authored: Wed Aug 3 19:20:34 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 3 19:20:56 2016 -0700

--
 .../scala/org/apache/spark/util/collection/ExternalSorter.scala  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/11854e5a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 
b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
index 4067ace..6ea7307 100644
--- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
+++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
@@ -622,7 +622,9 @@ private[spark] class ExternalSorter[K, V, C](
   val ds = deserializeStream
   deserializeStream = null
   fileStream = null
-  ds.close()
+  if (ds != null) {
+ds.close()
+  }
   // NOTE: We don't do file.delete() here because that is done in 
ExternalSorter.stop().
   // This should also be fixed in

spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

2016-08-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c5eb1df72 -> 583d91a19


[SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data

## What changes were proposed in this pull request?

SpillReader NPE when spillFile has no data. See follow logs:

16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to 
file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa,
 fileSize:0.0 B
16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from 
org.apache.spark.util.collection.ExternalSorter3db4b52d
16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 
190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: 
Exception in task 1013.0 in stage 18.0 (TID 23585)
java.lang.NullPointerException
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539)
at 
org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507)
at 
org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816)
at 
org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251)
at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 
1090.1 in stage 18.0 (TID 23793)
16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded 
a shutdown

## How was this patch tested?

Manual test.

Author: sharkd 
Author: sharkdtu 

Closes #14479 from sharkdtu/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/583d91a1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/583d91a1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/583d91a1

Branch: refs/heads/master
Commit: 583d91a1957f4258a64184cc6b9007588791d332
Parents: c5eb1df
Author: sharkd 
Authored: Wed Aug 3 19:20:34 2016 -0700
Committer: Reynold Xin 
Committed: Wed Aug 3 19:20:34 2016 -0700

--
 .../scala/org/apache/spark/util/collection/ExternalSorter.scala  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/583d91a1/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 
b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
index 708a007..7c98e8c 100644
--- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
+++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
@@ -611,7 +611,9 @@ private[spark] class ExternalSorter[K, V, C](
   val ds = deserializeStream
   deserializeStream = null
   fileStream = null
-  ds.close()
+  if (ds != null) {
+ds.close()
+  }
   // NOTE: We don't do file.delete() here because that is done in 
ExternalSorter.stop().
   // This should also be fixed in ExternalAppendOnlyMap.
 }


-
To unsubscribe, e-mail:

spark git commit: [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e9fc0b6a8 -> b73a57060


[SPARK-16858][SQL][TEST] Removal of TestHiveSharedState

### What changes were proposed in this pull request?
This PR is to remove `TestHiveSharedState`.

Also, this is also associated with the Hive refractoring for removing 
`HiveSharedState`.

### How was this patch tested?
The existing test cases

Author: gatorsmile 

Closes #14463 from gatorsmile/removeTestHiveSharedState.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b73a5706
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b73a5706
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b73a5706

Branch: refs/heads/master
Commit: b73a5706032eae7c87f7f2f8b0a72e7ee6d2e7e5
Parents: e9fc0b6
Author: gatorsmile 
Authored: Tue Aug 2 14:17:45 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 14:17:45 2016 -0700

--
 .../apache/spark/sql/hive/test/TestHive.scala   | 78 +---
 .../spark/sql/hive/ShowCreateTableSuite.scala   |  2 +-
 2 files changed, 20 insertions(+), 60 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b73a5706/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
index fbacd59..cdc8d61 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -24,7 +24,6 @@ import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.language.implicitConversions
 
-import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.hive.conf.HiveConf.ConfVars
 import org.apache.hadoop.hive.ql.exec.FunctionRegistry
 import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
@@ -40,7 +39,6 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.execution.QueryExecution
 import org.apache.spark.sql.execution.command.CacheTableCommand
 import org.apache.spark.sql.hive._
-import org.apache.spark.sql.hive.client.HiveClient
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.util.{ShutdownHookManager, Utils}
 
@@ -86,8 +84,6 @@ class TestHiveContext(
 new TestHiveContext(sparkSession.newSession())
   }
 
-  override def sharedState: TestHiveSharedState = sparkSession.sharedState
-
   override def sessionState: TestHiveSessionState = sparkSession.sessionState
 
   def setCacheTables(c: Boolean): Unit = {
@@ -112,38 +108,43 @@ class TestHiveContext(
  * A [[SparkSession]] used in [[TestHiveContext]].
  *
  * @param sc SparkContext
- * @param scratchDirPath scratch directory used by Hive's metastore client
- * @param metastoreTemporaryConf configuration options for Hive's metastore
- * @param existingSharedState optional [[TestHiveSharedState]]
+ * @param existingSharedState optional [[HiveSharedState]]
  * @param loadTestTables if true, load the test tables. They can only be 
loaded when running
  *   in the JVM, i.e when calling from Python this flag 
has to be false.
  */
 private[hive] class TestHiveSparkSession(
 @transient private val sc: SparkContext,
-scratchDirPath: File,
-metastoreTemporaryConf: Map[String, String],
-@transient private val existingSharedState: Option[TestHiveSharedState],
+@transient private val existingSharedState: Option[HiveSharedState],
 private val loadTestTables: Boolean)
   extends SparkSession(sc) with Logging { self =>
 
   def this(sc: SparkContext, loadTestTables: Boolean) {
 this(
   sc,
-  TestHiveContext.makeScratchDir(),
-  HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false),
-  None,
+  existingSharedState = None,
   loadTestTables)
   }
 
+  { // set the metastore temporary configuration
+val metastoreTempConf = 
HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false) ++ Map(
+  ConfVars.METASTORE_INTEGER_JDO_PUSHDOWN.varname -> "true",
+  // scratch directory used by Hive's metastore client
+  ConfVars.SCRATCHDIR.varname -> 
TestHiveContext.makeScratchDir().toURI.toString,
+  ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname -> "1")
+
+metastoreTempConf.foreach { case (k, v) =>
+  sc.hadoopConfiguration.set(k, v)
+}
+  }
+
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove TestHiveSharedState and TestHiveSessionState. 
Otherwise,
+  // TODO: Let's remove HiveSharedState and TestHiveSessionState. Otherwise,
   // we are not really testing the

spark git commit: [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master cbdff4935 -> a9beeaaae


[SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to 
arithmetic.scala

## What changes were proposed in this pull request?

`Greatest` and `Least` are not conditional expressions, but arithmetic 
expressions.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #14460 from cloud-fan/move.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9beeaaa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9beeaaa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9beeaaa

Branch: refs/heads/master
Commit: a9beeaaaeb52e9c940fe86a3d70801655401623c
Parents: cbdff49
Author: Wenchen Fan 
Authored: Tue Aug 2 11:08:32 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 11:08:32 2016 -0700

--
 .../sql/catalyst/expressions/arithmetic.scala   | 121 ++
 .../expressions/conditionalExpressions.scala| 122 ---
 .../expressions/ArithmeticExpressionSuite.scala | 107 
 .../ConditionalExpressionSuite.scala| 107 
 4 files changed, 228 insertions(+), 229 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9beeaaa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 77d40a5..4aebef9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
@@ -460,3 +461,123 @@ case class Pmod(left: Expression, right: Expression) 
extends BinaryArithmetic wi
 
   override def sql: String = s"$prettyName(${left.sql}, ${right.sql})"
 }
+
+/**
+ * A function that returns the least value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n1, ...) - Returns the least value of all parameters, 
skipping null values.")
+case class Least(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = children.forall(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = TypeUtils.getInterpretedOrdering(dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"LEAST requires at least 2 arguments")
+} else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The expressions should all have the same type," +
+  s" got LEAST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
+} else {
+  TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
+}
+  }
+
+  override def dataType: DataType = children.head.dataType
+
+  override def eval(input: InternalRow): Any = {
+children.foldLeft[Any](null)((r, c) => {
+  val evalc = c.eval(input)
+  if (evalc != null) {
+if (r == null || ordering.lt(evalc, r)) evalc else r
+  } else {
+r
+  }
+})
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val first = evalChildren(0)
+val rest = evalChildren.drop(1)
+def updateEval(eval: ExprCode): String = {
+  s"""
+${eval.code}
+if (!${eval.isNull} && (${ev.isNull} ||
+  ${ctx.genGreater(dataType, ev.value, eval.value)})) {
+  ${ev.isNull} = false;
+  ${ev.value} = ${eval.value};
+}
+  """
+}
+ev.copy(code = s"""
+  ${first.code}
+  boolean ${ev.isNull} = ${first.isNull};
+  ${ctx.javaType(dataType)} ${ev.value} = ${first.value};
+  ${rest.map(updateEval).mkString("\n")}""")
+  }
+}
+
+/**
+ * A function that returns the greatest value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n1, ...) -

spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a937c9ee4 -> f190bb83b


[SPARK-16850][SQL] Improve type checking error message for greatest/least

Greatest/least function does not have the most friendly error message for data 
types. This patch improves the error message to not show the Seq type, and use 
more human readable data types.

Before:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; 
line 1 pos 7
```

After:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7
```

Manually verified the output and also added unit tests to 
ConditionalExpressionSuite.

Author: petermaxlee 

Closes #14453 from petermaxlee/SPARK-16850.

(cherry picked from commit a1ff72e1cce6f22249ccc4905e8cef30075beb2f)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f190bb83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f190bb83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f190bb83

Branch: refs/heads/branch-2.0
Commit: f190bb83beaafb65c8e6290e9ecaa61ac51e04bb
Parents: a937c9e
Author: petermaxlee 
Authored: Tue Aug 2 19:32:35 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 2 10:22:18 2016 -0700

--
 .../catalyst/expressions/conditionalExpressions.scala  |  4 ++--
 .../expressions/ConditionalExpressionSuite.scala   | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
index e97e089..5f2585f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
@@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got LEAST (${children.map(_.dataType)}).")
+  s" got LEAST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }
@@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got GREATEST (${children.map(_.dataType)}).")
+  s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
index 3c581ec..36185b8 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
 
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.types._
 
@@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 Literal(Timestamp.valueOf("2015-07-01 10:00:00",
   Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty)
 
+// Type

spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 146001a9f -> 2330f3ecb


[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

## What changes were proposed in this pull request?
In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and 
`CURRENT_TIMESTAMP` functions as literals (without adding braces), for example:
```SQL
select /* Spark 1.6: */ current_date, /* Spark 1.6  & Spark 2.0: */ 
current_date()
```
This was accidentally dropped in Spark 2.0. This PR reinstates this 
functionality.

## How was this patch tested?
Added a case to ExpressionParserSuite.

Author: Herman van Hovell 

Closes #14442 from hvanhovell/SPARK-16836.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2330f3ec
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2330f3ec
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2330f3ec

Branch: refs/heads/master
Commit: 2330f3ecbbd89c7eaab9cc0d06726aa743b16334
Parents: 146001a
Author: Herman van Hovell 
Authored: Tue Aug 2 10:09:47 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 10:09:47 2016 -0700

--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4|  5 -
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala  | 13 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala|  5 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++-
 4 files changed, 32 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 5e10462..c7d5086 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -500,6 +500,7 @@ valueExpression
 
 primaryExpression
 : constant 
#constantDefault
+| name=(CURRENT_DATE | CURRENT_TIMESTAMP)  
#timeFunctionCall
 | ASTERISK 
#star
 | qualifiedName '.' ASTERISK   
#star
 | '(' expression (',' expression)+ ')' 
#rowConstructor
@@ -660,7 +661,7 @@ nonReserved
 | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE
 | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | 
MACRO | OR | STRATIFY | THEN
 | UNBOUNDED | WHEN
-| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT
+| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | 
CURRENT_DATE | CURRENT_TIMESTAMP
 ;
 
 SELECT: 'SELECT';
@@ -880,6 +881,8 @@ OPTION: 'OPTION';
 ANTI: 'ANTI';
 LOCAL: 'LOCAL';
 INPATH: 'INPATH';
+CURRENT_DATE: 'CURRENT_DATE';
+CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
 
 STRING
 : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''

http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index f2cc8d3..679adf2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   }
 
   /**
+   * Create a current timestamp/date expression. These are different from 
regular function because
+   * they do not require the user to specify braces when calling them.
+   */
+  override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression 
= withOrigin(ctx) {
+ctx.name.getType match {
+  case SqlBaseParser.CURRENT_DATE =>
+CurrentDate()
+  case SqlBaseParser.CURRENT_TIMESTAMP =>
+CurrentTimestamp()
+}
+  }
+
+  /**
* Create a function database (optional) and name pair.
*/
   protected def visitFunctionName(ctx: QualifiedNameContext): 
FunctionIdentifier = {

spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ef7927e8e -> a937c9ee4


[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

## What changes were proposed in this pull request?
In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and 
`CURRENT_TIMESTAMP` functions as literals (without adding braces), for example:
```SQL
select /* Spark 1.6: */ current_date, /* Spark 1.6  & Spark 2.0: */ 
current_date()
```
This was accidentally dropped in Spark 2.0. This PR reinstates this 
functionality.

## How was this patch tested?
Added a case to ExpressionParserSuite.

Author: Herman van Hovell 

Closes #14442 from hvanhovell/SPARK-16836.

(cherry picked from commit 2330f3ecbbd89c7eaab9cc0d06726aa743b16334)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a937c9ee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a937c9ee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a937c9ee

Branch: refs/heads/branch-2.0
Commit: a937c9ee44e0766194fc8ca4bce2338453112a53
Parents: ef7927e
Author: Herman van Hovell 
Authored: Tue Aug 2 10:09:47 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 10:09:53 2016 -0700

--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4|  5 -
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala  | 13 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala|  5 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++-
 4 files changed, 32 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 4c15f9c..de98a87 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -493,6 +493,7 @@ valueExpression
 
 primaryExpression
 : constant 
#constantDefault
+| name=(CURRENT_DATE | CURRENT_TIMESTAMP)  
#timeFunctionCall
 | ASTERISK 
#star
 | qualifiedName '.' ASTERISK   
#star
 | '(' expression (',' expression)+ ')' 
#rowConstructor
@@ -653,7 +654,7 @@ nonReserved
 | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE
 | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | 
MACRO | OR | STRATIFY | THEN
 | UNBOUNDED | WHEN
-| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT
+| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | 
CURRENT_DATE | CURRENT_TIMESTAMP
 ;
 
 SELECT: 'SELECT';
@@ -873,6 +874,8 @@ OPTION: 'OPTION';
 ANTI: 'ANTI';
 LOCAL: 'LOCAL';
 INPATH: 'INPATH';
+CURRENT_DATE: 'CURRENT_DATE';
+CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
 
 STRING
 : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''

http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index c7420a1..1a0e7ab 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   }
 
   /**
+   * Create a current timestamp/date expression. These are different from 
regular function because
+   * they do not require the user to specify braces when calling them.
+   */
+  override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression 
= withOrigin(ctx) {
+ctx.name.getType match {
+  case SqlBaseParser.CURRENT_DATE =>
+CurrentDate()
+  case SqlBaseParser.CURRENT_TIMESTAMP =>
+CurrentTimestamp()
+}
+  }
+
+  /**
* Create a function database (optional) and name pair.
*/
   protected def visitFunctionName(ctx: QualifiedNameContext):

spark git commit: [SPARK-16793][SQL] Set the temporary warehouse path to sc'conf in TestHive.

2016-08-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 2eedc00b0 -> 5184df06b


[SPARK-16793][SQL] Set the temporary warehouse path to sc'conf in TestHive.

## What changes were proposed in this pull request?

With SPARK-15034, we could use the value of spark.sql.warehouse.dir to set the 
warehouse location. In TestHive, we can now simply set the temporary warehouse 
path in sc's conf, and thus, param "warehousePath" could be removed.

## How was this patch tested?

exsiting testsuites.

Author: jiangxingbo 

Closes #14401 from jiangxb1987/warehousePath.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5184df06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5184df06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5184df06

Branch: refs/heads/master
Commit: 5184df06b347f86776c8ac87415b8002a5942a35
Parents: 2eedc00
Author: jiangxingbo 
Authored: Mon Aug 1 23:08:06 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 1 23:08:06 2016 -0700

--
 .../apache/spark/sql/hive/test/TestHive.scala   | 42 +---
 .../sql/hive/execution/HiveQuerySuite.scala |  2 +-
 .../spark/sql/sources/BucketedReadSuite.scala   |  2 +-
 3 files changed, 21 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5184df06/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
index 7f89204..fbacd59 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -54,6 +54,7 @@ object TestHive
 .set("spark.sql.test", "")
 .set("spark.sql.hive.metastore.barrierPrefixes",
   "org.apache.spark.sql.hive.execution.PairSerDe")
+.set("spark.sql.warehouse.dir", 
TestHiveContext.makeWarehouseDir().toURI.getPath)
 // SPARK-8910
 .set("spark.ui.enabled", "false")))
 
@@ -111,7 +112,6 @@ class TestHiveContext(
  * A [[SparkSession]] used in [[TestHiveContext]].
  *
  * @param sc SparkContext
- * @param warehousePath path to the Hive warehouse directory
  * @param scratchDirPath scratch directory used by Hive's metastore client
  * @param metastoreTemporaryConf configuration options for Hive's metastore
  * @param existingSharedState optional [[TestHiveSharedState]]
@@ -120,23 +120,15 @@ class TestHiveContext(
  */
 private[hive] class TestHiveSparkSession(
 @transient private val sc: SparkContext,
-val warehousePath: File,
 scratchDirPath: File,
 metastoreTemporaryConf: Map[String, String],
 @transient private val existingSharedState: Option[TestHiveSharedState],
 private val loadTestTables: Boolean)
   extends SparkSession(sc) with Logging { self =>
 
-  // TODO: We need to set the temp warehouse path to sc's conf.
-  // Right now, In SparkSession, we will set the warehouse path to the default 
one
-  // instead of the temp one. Then, we override the setting in 
TestHiveSharedState
-  // when we creating metadataHive. This flow is not easy to follow and can 
introduce
-  // confusion when a developer is debugging an issue. We need to refactor 
this part
-  // to just set the temp warehouse path in sc's conf.
   def this(sc: SparkContext, loadTestTables: Boolean) {
 this(
   sc,
-  Utils.createTempDir(namePrefix = "warehouse"),
   TestHiveContext.makeScratchDir(),
   HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false),
   None,
@@ -151,16 +143,16 @@ private[hive] class TestHiveSparkSession(
   @transient
   override lazy val sharedState: TestHiveSharedState = {
 existingSharedState.getOrElse(
-  new TestHiveSharedState(sc, warehousePath, scratchDirPath, 
metastoreTemporaryConf))
+  new TestHiveSharedState(sc, scratchDirPath, metastoreTemporaryConf))
   }
 
   @transient
   override lazy val sessionState: TestHiveSessionState =
-new TestHiveSessionState(self, warehousePath)
+new TestHiveSessionState(self)
 
   override def newSession(): TestHiveSparkSession = {
 new TestHiveSparkSession(
-  sc, warehousePath, scratchDirPath, metastoreTemporaryConf, 
Some(sharedState), loadTestTables)
+  sc, scratchDirPath, metastoreTemporaryConf, Some(sharedState), 
loadTestTables)
   }
 
   private var cacheTables: Boolean = false
@@ -199,6 +191,12 @@ private[hive] class TestHiveSparkSession(
 new 
File(Thread.currentThread().getContextClassLoader.getResource(path).getFile)
   }
 
+  def getWarehousePath(): String = {
+val tempConf = new SQLConf
+

spark git commit: [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions

2016-08-01 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 1813bbd9b -> 5fbf5f93e


[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of 
partitions

https://github.com/apache/spark/pull/14425 rebased for branch-2.0

Author: Eric Liang 

Closes #14427 from ericl/spark-16818-br-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5fbf5f93
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5fbf5f93
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5fbf5f93

Branch: refs/heads/branch-2.0
Commit: 5fbf5f93ee5aa4d1aca0fa0c8fb769a085dd7b93
Parents: 1813bbd
Author: Eric Liang 
Authored: Mon Aug 1 19:46:20 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 1 19:46:20 2016 -0700

--
 .../datasources/FileSourceStrategy.scala|  2 ++
 .../datasources/FileSourceStrategySuite.scala   | 35 +++-
 2 files changed, 36 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5fbf5f93/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index 13a86bf..8af9562 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -202,7 +202,9 @@ private[sql] object FileSourceStrategy extends Strategy 
with Logging {
   partitions
   }
 
+  // These metadata values make scan plans uniquely identifiable for 
equality checking.
   val meta = Map(
+"PartitionFilters" -> partitionKeyFilters.mkString("[", ", ", "]"),
 "Format" -> files.fileFormat.toString,
 "ReadSchema" -> prunedDataSchema.simpleString,
 PUSHED_FILTERS -> pushedDownFilters.mkString("[", ", ", "]"),

http://git-wip-us.apache.org/repos/asf/spark/blob/5fbf5f93/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
index 8d8a18f..7a24f21 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
@@ -29,7 +29,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionSet, 
PredicateHelper}
 import org.apache.spark.sql.catalyst.util
-import org.apache.spark.sql.execution.DataSourceScanExec
+import org.apache.spark.sql.execution.{DataSourceScanExec, SparkPlan}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources._
@@ -407,6 +407,39 @@ class FileSourceStrategySuite extends QueryTest with 
SharedSQLContext with Predi
 }
   }
 
+  test("[SPARK-16818] partition pruned file scans implement sameResult 
correctly") {
+withTempPath { path =>
+  val tempDir = path.getCanonicalPath
+  spark.range(100)
+.selectExpr("id", "id as b")
+.write
+.partitionBy("id")
+.parquet(tempDir)
+  val df = spark.read.parquet(tempDir)
+  def getPlan(df: DataFrame): SparkPlan = {
+df.queryExecution.executedPlan
+  }
+  assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 
2"
+  assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 
3"
+}
+  }
+
+  test("[SPARK-16818] exchange reuse respects differences in partition 
pruning") {
+spark.conf.set("spark.sql.exchange.reuse", true)
+withTempPath { path =>
+  val tempDir = path.getCanonicalPath
+  spark.range(10)
+.selectExpr("id % 2 as a", "id % 3 as b", "id as c")
+.write
+.partitionBy("a")
+.parquet(tempDir)
+  val df = spark.read.parquet(tempDir)
+  val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
+  val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
+  checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
10, 5) :: Nil)
+}
+  }
+
   // Helpers for checking the arguments passed to the FileFormat.
 
   protected val checkPartitionSchema =

spark git commit: [SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package

2016-07-31 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 75dd78130 -> d357ca302


[SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package

The catalyst package is meant to be internal, and as a result it does not make 
sense to mark things as private[sql] or private[spark]. It simply makes 
debugging harder when Spark developers need to inspect the plans at runtime.

This patch removes all private[sql] and private[spark] visibility modifiers in 
org.apache.spark.sql.catalyst.

N/A - just visibility changes.

Author: Reynold Xin <r...@databricks.com>

Closes #14418 from rxin/SPARK-16813.

(cherry picked from commit 064d91ff7342002414d3274694a8e2e37f154986)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d357ca30
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d357ca30
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d357ca30

Branch: refs/heads/branch-2.0
Commit: d357ca3023c84e472927380bed65b1cee33c4e03
Parents: 75dd781
Author: Reynold Xin <r...@databricks.com>
Authored: Sun Jul 31 16:31:06 2016 +0800
Committer: Reynold Xin <r...@databricks.com>
Committed: Sun Jul 31 11:10:07 2016 -0700

--
 .../spark/sql/catalyst/CatalystTypeConverters.scala   |  4 ++--
 .../apache/spark/sql/catalyst/ScalaReflection.scala   |  2 +-
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala |  4 ++--
 .../spark/sql/catalyst/analysis/TypeCoercion.scala|  2 +-
 .../spark/sql/catalyst/catalog/SessionCatalog.scala   |  6 +++---
 .../apache/spark/sql/catalyst/encoders/package.scala  |  2 +-
 .../spark/sql/catalyst/expressions/Expression.scala   |  2 +-
 .../expressions/MonotonicallyIncreasingID.scala   |  2 +-
 .../sql/catalyst/expressions/SparkPartitionID.scala   |  2 +-
 .../catalyst/expressions/aggregate/interfaces.scala   | 14 +++---
 .../spark/sql/catalyst/expressions/arithmetic.scala   |  2 +-
 .../sql/catalyst/expressions/complexTypeCreator.scala |  4 ++--
 .../catalyst/expressions/complexTypeExtractors.scala  |  2 +-
 .../apache/spark/sql/catalyst/expressions/misc.scala  |  2 +-
 .../spark/sql/catalyst/expressions/predicates.scala   |  4 ++--
 .../apache/spark/sql/catalyst/expressions/rows.scala  |  2 +-
 .../plans/logical/basicLogicalOperators.scala |  6 +++---
 .../sql/catalyst/util/AbstractScalaRowIterator.scala  |  2 +-
 18 files changed, 32 insertions(+), 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d357ca30/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index 9cc7b2a..f542f5c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -382,7 +382,7 @@ object CatalystTypeConverters {
* Typical use case would be converting a collection of rows that have the 
same schema. You will
* call this function once to get a converter, and apply it to every row.
*/
-  private[sql] def createToCatalystConverter(dataType: DataType): Any => Any = 
{
+  def createToCatalystConverter(dataType: DataType): Any => Any = {
 if (isPrimitive(dataType)) {
   // Although the `else` branch here is capable of handling inbound 
conversion of primitives,
   // we add some special-case handling for those types here. The 
motivation for this relates to
@@ -409,7 +409,7 @@ object CatalystTypeConverters {
* Typical use case would be converting a collection of rows that have the 
same schema. You will
* call this function once to get a converter, and apply it to every row.
*/
-  private[sql] def createToScalaConverter(dataType: DataType): Any => Any = {
+  def createToScalaConverter(dataType: DataType): Any => Any = {
 if (isPrimitive(dataType)) {
   identity
 } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/d357ca30/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 8affb03..dd36468 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -720,7 +720,7 @@ object

spark git commit: [SPARK-16812] Open up SparkILoop.getAddedJars

2016-07-31 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 26da5a7fc -> 75dd78130


[SPARK-16812] Open up SparkILoop.getAddedJars

## What changes were proposed in this pull request?
This patch makes SparkILoop.getAddedJars a public developer API. It is a useful 
function to get the list of jars added.

## How was this patch tested?
N/A - this is a simple visibility change.

Author: Reynold Xin <r...@databricks.com>

Closes #14417 from rxin/SPARK-16812.

(cherry picked from commit 7c27d075c39ebaf3e762284e2536fe7be0e3da87)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75dd7813
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75dd7813
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75dd7813

Branch: refs/heads/branch-2.0
Commit: 75dd78130d29154a3147490c57bce6883c992469
Parents: 26da5a7
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Jul 30 23:05:03 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Jul 30 23:05:12 2016 -0700

--
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/75dd7813/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index 16f330a..e017aa4 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -1059,7 +1059,8 @@ class SparkILoop(
   @deprecated("Use `process` instead", "2.9.0")
   private def main(settings: Settings): Unit = process(settings)
 
-  private[repl] def getAddedJars(): Array[String] = {
+  @DeveloperApi
+  def getAddedJars(): Array[String] = {
 val conf = new SparkConf().setMaster(getMaster())
 val envJars = sys.env.get("ADD_JARS")
 if (envJars.isDefined) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16812] Open up SparkILoop.getAddedJars

2016-07-31 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 957a8ab37 -> 7c27d075c


[SPARK-16812] Open up SparkILoop.getAddedJars

## What changes were proposed in this pull request?
This patch makes SparkILoop.getAddedJars a public developer API. It is a useful 
function to get the list of jars added.

## How was this patch tested?
N/A - this is a simple visibility change.

Author: Reynold Xin <r...@databricks.com>

Closes #14417 from rxin/SPARK-16812.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c27d075
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c27d075
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c27d075

Branch: refs/heads/master
Commit: 7c27d075c39ebaf3e762284e2536fe7be0e3da87
Parents: 957a8ab
Author: Reynold Xin <r...@databricks.com>
Authored: Sat Jul 30 23:05:03 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Jul 30 23:05:03 2016 -0700

--
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7c27d075/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index 16f330a..e017aa4 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -1059,7 +1059,8 @@ class SparkILoop(
   @deprecated("Use `process` instead", "2.9.0")
   private def main(settings: Settings): Unit = process(settings)
 
-  private[repl] def getAddedJars(): Array[String] = {
+  @DeveloperApi
+  def getAddedJars(): Array[String] = {
 val conf = new SparkConf().setMaster(getMaster())
 val envJars = sys.env.get("ADD_JARS")
 if (envJars.isDefined) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a6290e51e -> 957a8ab37


[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of 
partitions

## What changes were proposed in this pull request?

This fixes a bug wherethe file scan operator does not take into account 
partition pruning in its implementation of `sameResult()`. As a result, 
executions may be incorrect on self-joins over the same base file relation.

The patch here is minimal, but we should reconsider relying on `metadata` for 
implementing sameResult() in the future, as string representations may not be 
uniquely identifying.

cc rxin

## How was this patch tested?

Unit tests.

Author: Eric Liang <e...@databricks.com>

Closes #14425 from ericl/spark-16818.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/957a8ab3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/957a8ab3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/957a8ab3

Branch: refs/heads/master
Commit: 957a8ab3743521850fb1c0106c37c5d3997b9e56
Parents: a6290e5
Author: Eric Liang <e...@databricks.com>
Authored: Sat Jul 30 22:48:09 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat Jul 30 22:48:09 2016 -0700

--
 .../datasources/FileSourceStrategy.scala|  2 ++
 .../datasources/FileSourceStrategySuite.scala   | 35 +++-
 2 files changed, 36 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/957a8ab3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index 32aa471..6749130 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -130,7 +130,9 @@ private[sql] object FileSourceStrategy extends Strategy 
with Logging {
   createNonBucketedReadRDD(readFile, selectedPartitions, fsRelation)
   }
 
+  // These metadata values make scan plans uniquely identifiable for 
equality checking.
   val meta = Map(
+"PartitionFilters" -> partitionKeyFilters.mkString("[", ", ", "]"),
 "Format" -> fsRelation.fileFormat.toString,
 "ReadSchema" -> prunedDataSchema.simpleString,
 PUSHED_FILTERS -> pushedDownFilters.mkString("[", ", ", "]"),

http://git-wip-us.apache.org/repos/asf/spark/blob/957a8ab3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
index 2f551b1..1824650 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
@@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.catalog.BucketSpec
 import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionSet, 
PredicateHelper}
 import org.apache.spark.sql.catalyst.util
-import org.apache.spark.sql.execution.DataSourceScanExec
+import org.apache.spark.sql.execution.{DataSourceScanExec, SparkPlan}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources._
@@ -408,6 +408,39 @@ class FileSourceStrategySuite extends QueryTest with 
SharedSQLContext with Predi
 }
   }
 
+  test("[SPARK-16818] partition pruned file scans implement sameResult 
correctly") {
+withTempPath { path =>
+  val tempDir = path.getCanonicalPath
+  spark.range(100)
+.selectExpr("id", "id as b")
+.write
+.partitionBy("id")
+.parquet(tempDir)
+  val df = spark.read.parquet(tempDir)
+  def getPlan(df: DataFrame): SparkPlan = {
+df.queryExecution.executedPlan
+  }
+  assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 
2"
+  assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 
3"
+}
+  }
+
+  test("[SPARK-1681

spark git commit: [SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings

2016-07-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 2c15323ad -> 2182e4322


[SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings

## What changes were proposed in this pull request?

This PR corrects [an error made in an earlier 
PR](https://github.com/apache/spark/pull/14393/files#r72843069).

## How was this patch tested?

```sh
$ ./dev/lint-python
PEP8 checks passed.
rm -rf _build/*
pydoc checks passed.
```

I also built the docs and confirmed that they looked good in my browser.

Author: Nicholas Chammas 

Closes #14408 from nchammas/SPARK-16772.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2182e432
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2182e432
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2182e432

Branch: refs/heads/master
Commit: 2182e4322da6ba732f99ae75dce00f76f1cdc4d9
Parents: 2c15323
Author: Nicholas Chammas 
Authored: Fri Jul 29 14:07:03 2016 -0700
Committer: Reynold Xin 
Committed: Fri Jul 29 14:07:03 2016 -0700

--
 python/pyspark/sql/context.py | 10 --
 python/pyspark/sql/session.py | 10 --
 2 files changed, 8 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2182e432/python/pyspark/sql/context.py
--
diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py
index f7009fe..4085f16 100644
--- a/python/pyspark/sql/context.py
+++ b/python/pyspark/sql/context.py
@@ -226,9 +226,8 @@ class SQLContext(object):
 from ``data``, which should be an RDD of :class:`Row`,
 or :class:`namedtuple`, or :class:`dict`.
 
-When ``schema`` is :class:`pyspark.sql.types.DataType` or
-:class:`pyspark.sql.types.StringType`, it must match the
-real data, or an exception will be thrown at runtime. If the given 
schema is not
+When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype 
string it must match
+the real data, or an exception will be thrown at runtime. If the given 
schema is not
 :class:`pyspark.sql.types.StructType`, it will be wrapped into a
 :class:`pyspark.sql.types.StructType` as its only field, and the field 
name will be "value",
 each record will also be wrapped into a tuple, which can be converted 
to row later.
@@ -239,8 +238,7 @@ class SQLContext(object):
 :param data: an RDD of any kind of SQL data representation(e.g. 
:class:`Row`,
 :class:`tuple`, ``int``, ``boolean``, etc.), or :class:`list`, or
 :class:`pandas.DataFrame`.
-:param schema: a :class:`pyspark.sql.types.DataType` or a
-:class:`pyspark.sql.types.StringType` or a list of
+:param schema: a :class:`pyspark.sql.types.DataType` or a datatype 
string or a list of
 column names, default is None.  The data type string format equals 
to
 :class:`pyspark.sql.types.DataType.simpleString`, except that top 
level struct type can
 omit the ``struct<>`` and atomic types use ``typeName()`` as their 
format, e.g. use
@@ -251,7 +249,7 @@ class SQLContext(object):
 
 .. versionchanged:: 2.0
The ``schema`` parameter can be a 
:class:`pyspark.sql.types.DataType` or a
-   :class:`pyspark.sql.types.StringType` after 2.0.
+   datatype string after 2.0.
If it's not a :class:`pyspark.sql.types.StructType`, it will be 
wrapped into a
:class:`pyspark.sql.types.StructType` and each record will also be 
wrapped into a tuple.
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2182e432/python/pyspark/sql/session.py
--
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 10bd89b..2dacf48 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -414,9 +414,8 @@ class SparkSession(object):
 from ``data``, which should be an RDD of :class:`Row`,
 or :class:`namedtuple`, or :class:`dict`.
 
-When ``schema`` is :class:`pyspark.sql.types.DataType` or
-:class:`pyspark.sql.types.StringType`, it must match the
-real data, or an exception will be thrown at runtime. If the given 
schema is not
+When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype 
string, it must match
+the real data, or an exception will be thrown at runtime. If the given 
schema is not
 :class:`pyspark.sql.types.StructType`, it will be wrapped into a
 :class:`pyspark.sql.types.StructType` as its only field, and the field 
name will be "value",
 each record will also be

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 3893 matches

Mail list logo