spark git commit: [SPARK-17651][SPARKR] Set R package version number along with mvn
Repository: spark Updated Branches: refs/heads/branch-2.0 452e468f2 -> b111a81f2 [SPARK-17651][SPARKR] Set R package version number along with mvn This PR sets the R package version while tagging releases. Note that since R doesn't accept `-SNAPSHOT` in version number field, we remove that while setting the next version Tested manually by running locally Author: Shivaram VenkataramanCloses #15223 from shivaram/sparkr-version-change. (cherry picked from commit 7c382524a959a2bc9b3d2fca44f6f0b41aba4e3c) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b111a81f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b111a81f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b111a81f Branch: refs/heads/branch-2.0 Commit: b111a81f2a5547e2357d66db4ba2f05ce69a52a6 Parents: 452e468 Author: Shivaram Venkataraman Authored: Fri Sep 23 14:35:18 2016 -0700 Committer: Reynold Xin Committed: Fri Sep 23 14:36:01 2016 -0700 -- dev/create-release/release-tag.sh | 15 +++ 1 file changed, 15 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b111a81f/dev/create-release/release-tag.sh -- diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index d404939..b7e5100 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -60,12 +60,27 @@ git config user.email $GIT_EMAIL # Create release version $MVN versions:set -DnewVersion=$RELEASE_VERSION | grep -v "no value" # silence logs +# Set the release version in R/pkg/DESCRIPTION +sed -i".tmp1" 's/Version.*$/Version: '"$RELEASE_VERSION"'/g' R/pkg/DESCRIPTION +# Set the release version in docs +sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' docs/_config.yml +sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$RELEASE_VERSION"'/g' docs/_config.yml + git commit -a -m "Preparing Spark release $RELEASE_TAG" echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH" git tag $RELEASE_TAG # Create next version $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence logs +# Remove -SNAPSHOT before setting the R version as R expects version strings to only have numbers +R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'` +sed -i".tmp2" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION + +# Update docs with next version +sed -i".tmp3" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml +# Use R version for short version +sed -i".tmp4" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$R_NEXT_VERSION"'/g' docs/_config.yml + git commit -a -m "Preparing development version $NEXT_VERSION" # Push changes - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17651][SPARKR] Set R package version number along with mvn
Repository: spark Updated Branches: refs/heads/master 90a30f463 -> 7c382524a [SPARK-17651][SPARKR] Set R package version number along with mvn ## What changes were proposed in this pull request? This PR sets the R package version while tagging releases. Note that since R doesn't accept `-SNAPSHOT` in version number field, we remove that while setting the next version ## How was this patch tested? Tested manually by running locally Author: Shivaram VenkataramanCloses #15223 from shivaram/sparkr-version-change. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c382524 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c382524 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c382524 Branch: refs/heads/master Commit: 7c382524a959a2bc9b3d2fca44f6f0b41aba4e3c Parents: 90a30f4 Author: Shivaram Venkataraman Authored: Fri Sep 23 14:35:18 2016 -0700 Committer: Reynold Xin Committed: Fri Sep 23 14:35:18 2016 -0700 -- dev/create-release/release-tag.sh | 15 +++ 1 file changed, 15 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7c382524/dev/create-release/release-tag.sh -- diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index d404939..b7e5100 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -60,12 +60,27 @@ git config user.email $GIT_EMAIL # Create release version $MVN versions:set -DnewVersion=$RELEASE_VERSION | grep -v "no value" # silence logs +# Set the release version in R/pkg/DESCRIPTION +sed -i".tmp1" 's/Version.*$/Version: '"$RELEASE_VERSION"'/g' R/pkg/DESCRIPTION +# Set the release version in docs +sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' docs/_config.yml +sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$RELEASE_VERSION"'/g' docs/_config.yml + git commit -a -m "Preparing Spark release $RELEASE_TAG" echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH" git tag $RELEASE_TAG # Create next version $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence logs +# Remove -SNAPSHOT before setting the R version as R expects version strings to only have numbers +R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'` +sed -i".tmp2" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION + +# Update docs with next version +sed -i".tmp3" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml +# Use R version for short version +sed -i".tmp4" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$R_NEXT_VERSION"'/g' docs/_config.yml + git commit -a -m "Preparing development version $NEXT_VERSION" # Push changes - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec
Repository: spark Updated Branches: refs/heads/master a16619683 -> 79159a1e8 [SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec ## What changes were proposed in this pull request? "agg_plan" are hardcoded in HashAggregateExec, which have potential issue, so removing them. ## How was this patch tested? existing tests. Author: Yucai YuCloses #15199 from yucai/agg_plan. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79159a1e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79159a1e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79159a1e Branch: refs/heads/master Commit: 79159a1e87f19fb08a36857fc30b600ee7fdc52b Parents: a166196 Author: Yucai Yu Authored: Thu Sep 22 17:22:56 2016 -0700 Committer: Reynold Xin Committed: Thu Sep 22 17:22:56 2016 -0700 -- .../apache/spark/sql/execution/aggregate/HashAggregateExec.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/79159a1e/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala index 59e132d..06199ef 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala @@ -552,7 +552,7 @@ case class HashAggregateExec( } else { ctx.addMutableState(fastHashMapClassName, fastHashMapTerm, s"$fastHashMapTerm = new $fastHashMapClassName(" + -s"agg_plan.getTaskMemoryManager(), agg_plan.getEmptyAggregationBuffer());") +s"$thisPlan.getTaskMemoryManager(), $thisPlan.getEmptyAggregationBuffer());") ctx.addMutableState( "org.apache.spark.unsafe.KVIterator", iterTermForFastHashMap, "") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Skip building R vignettes if Spark is not built
Repository: spark Updated Branches: refs/heads/branch-2.0 b25a8e6e1 -> f14f47f07 Skip building R vignettes if Spark is not built ## What changes were proposed in this pull request? When we build the docs separately we don't have the JAR files from the Spark build in the same tree. As the SparkR vignettes need to launch a SparkContext to be built, we skip building them if JAR files don't exist ## How was this patch tested? To test this we can run the following: ``` build/mvn -DskipTests -Psparkr clean ./R/create-docs.sh ``` You should see a line `Skipping R vignettes as Spark JARs not found` at the end Author: Shivaram VenkataramanCloses #15200 from shivaram/sparkr-vignette-skip. (cherry picked from commit 9f24a17c59b1130d97efa7d313c06577f7344338) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f14f47f0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f14f47f0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f14f47f0 Branch: refs/heads/branch-2.0 Commit: f14f47f072a392df0ebe908f1c57b6eb858105b7 Parents: b25a8e6 Author: Shivaram Venkataraman Authored: Thu Sep 22 11:52:42 2016 -0700 Committer: Reynold Xin Committed: Thu Sep 22 11:54:51 2016 -0700 -- R/create-docs.sh | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f14f47f0/R/create-docs.sh -- diff --git a/R/create-docs.sh b/R/create-docs.sh index 0dfba22..69ffc5f 100755 --- a/R/create-docs.sh +++ b/R/create-docs.sh @@ -30,6 +30,13 @@ set -e # Figure out where the script is export FWDIR="$(cd "`dirname "$0"`"; pwd)" +export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" + +# Required for setting SPARK_SCALA_VERSION +. "${SPARK_HOME}"/bin/load-spark-env.sh + +echo "Using Scala $SPARK_SCALA_VERSION" + pushd $FWDIR # Install the package (this will also generate the Rd files) @@ -45,9 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit popd -# render creates SparkR vignettes -Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)' +# Find Spark jars. +if [ -f "${SPARK_HOME}/RELEASE" ]; then + SPARK_JARS_DIR="${SPARK_HOME}/jars" +else + SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" +fi + +# Only create vignettes if Spark JARs exist +if [ -d "$SPARK_JARS_DIR" ]; then + # render creates SparkR vignettes + Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)' -find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete + find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete +else + echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME" +fi popd - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Skip building R vignettes if Spark is not built
Repository: spark Updated Branches: refs/heads/master 17b72d31e -> 9f24a17c5 Skip building R vignettes if Spark is not built ## What changes were proposed in this pull request? When we build the docs separately we don't have the JAR files from the Spark build in the same tree. As the SparkR vignettes need to launch a SparkContext to be built, we skip building them if JAR files don't exist ## How was this patch tested? To test this we can run the following: ``` build/mvn -DskipTests -Psparkr clean ./R/create-docs.sh ``` You should see a line `Skipping R vignettes as Spark JARs not found` at the end Author: Shivaram VenkataramanCloses #15200 from shivaram/sparkr-vignette-skip. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9f24a17c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9f24a17c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9f24a17c Branch: refs/heads/master Commit: 9f24a17c59b1130d97efa7d313c06577f7344338 Parents: 17b72d3 Author: Shivaram Venkataraman Authored: Thu Sep 22 11:52:42 2016 -0700 Committer: Reynold Xin Committed: Thu Sep 22 11:52:42 2016 -0700 -- R/create-docs.sh | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9f24a17c/R/create-docs.sh -- diff --git a/R/create-docs.sh b/R/create-docs.sh index 0dfba22..69ffc5f 100755 --- a/R/create-docs.sh +++ b/R/create-docs.sh @@ -30,6 +30,13 @@ set -e # Figure out where the script is export FWDIR="$(cd "`dirname "$0"`"; pwd)" +export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" + +# Required for setting SPARK_SCALA_VERSION +. "${SPARK_HOME}"/bin/load-spark-env.sh + +echo "Using Scala $SPARK_SCALA_VERSION" + pushd $FWDIR # Install the package (this will also generate the Rd files) @@ -45,9 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit popd -# render creates SparkR vignettes -Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)' +# Find Spark jars. +if [ -f "${SPARK_HOME}/RELEASE" ]; then + SPARK_JARS_DIR="${SPARK_HOME}/jars" +else + SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" +fi + +# Only create vignettes if Spark JARs exist +if [ -d "$SPARK_JARS_DIR" ]; then + # render creates SparkR vignettes + Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)' -find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete + find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete +else + echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME" +fi popd - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Bump doc version for release 2.0.1.
Repository: spark Updated Branches: refs/heads/branch-2.0 ec377e773 -> 053b20a79 Bump doc version for release 2.0.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/053b20a7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/053b20a7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/053b20a7 Branch: refs/heads/branch-2.0 Commit: 053b20a79c1824917c17405f30a7b91472311abe Parents: ec377e7 Author: Reynold XinAuthored: Wed Sep 21 21:06:47 2016 -0700 Committer: Reynold Xin Committed: Wed Sep 21 21:06:47 2016 -0700 -- docs/_config.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/053b20a7/docs/_config.yml -- diff --git a/docs/_config.yml b/docs/_config.yml index 3951cad..75c89bd 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -14,8 +14,8 @@ include: # These allow the documentation to be updated with newer releases # of Spark, Scala, and Mesos. -SPARK_VERSION: 2.0.0 -SPARK_VERSION_SHORT: 2.0.0 +SPARK_VERSION: 2.0.1 +SPARK_VERSION_SHORT: 2.0.1 SCALA_BINARY_VERSION: "2.11" SCALA_VERSION: "2.11.7" MESOS_VERSION: 0.21.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode
Repository: spark Updated Branches: refs/heads/master 3497ebe51 -> 8bde03bf9 [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode ## What changes were proposed in this pull request? Floor()/Ceil() of decimal is implemented using changePrecision() by passing a rounding mode, but the rounding mode is not respected when the decimal is in compact mode (could fit within a Long). This Update the changePrecision() to respect rounding mode, which could be ROUND_FLOOR, ROUND_CEIL, ROUND_HALF_UP, ROUND_HALF_EVEN. ## How was this patch tested? Added regression tests. Author: Davies LiuCloses #15154 from davies/decimal_round. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8bde03bf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8bde03bf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8bde03bf Branch: refs/heads/master Commit: 8bde03bf9a0896ea59ceaa699df7700351a130fb Parents: 3497ebe Author: Davies Liu Authored: Wed Sep 21 21:02:30 2016 -0700 Committer: Reynold Xin Committed: Wed Sep 21 21:02:30 2016 -0700 -- .../org/apache/spark/sql/types/Decimal.scala| 28 +--- .../apache/spark/sql/types/DecimalSuite.scala | 15 +++ 2 files changed, 39 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8bde03bf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala index cc8175c..7085905 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala @@ -242,10 +242,30 @@ final class Decimal extends Ordered[Decimal] with Serializable { if (scale < _scale) { // Easier case: we just need to divide our scale down val diff = _scale - scale -val droppedDigits = longVal % POW_10(diff) -longVal /= POW_10(diff) -if (math.abs(droppedDigits) * 2 >= POW_10(diff)) { - longVal += (if (longVal < 0) -1L else 1L) +val pow10diff = POW_10(diff) +// % and / always round to 0 +val droppedDigits = longVal % pow10diff +longVal /= pow10diff +roundMode match { + case ROUND_FLOOR => +if (droppedDigits < 0) { + longVal += -1L +} + case ROUND_CEILING => +if (droppedDigits > 0) { + longVal += 1L +} + case ROUND_HALF_UP => +if (math.abs(droppedDigits) * 2 >= pow10diff) { + longVal += (if (droppedDigits < 0) -1L else 1L) +} + case ROUND_HALF_EVEN => +val doubled = math.abs(droppedDigits) * 2 +if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 0) { + longVal += (if (droppedDigits < 0) -1L else 1L) +} + case _ => +sys.error(s"Not supported rounding mode: $roundMode") } } else if (scale > _scale) { // We might be able to multiply longVal by a power of 10 and not overflow, but if not, http://git-wip-us.apache.org/repos/asf/spark/blob/8bde03bf/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala index a10c0e3..52d0692 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.types import org.scalatest.PrivateMethodTester import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.types.Decimal._ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { /** Check that a Decimal has the given string representation, precision and scale */ @@ -191,4 +192,18 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L) assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue) } + + test("changePrecision() on compact decimal should respect rounding mode") { +Seq(ROUND_FLOOR, ROUND_CEILING, ROUND_HALF_UP, ROUND_HALF_EVEN).foreach { mode => + Seq("0.4", "0.5", "0.6", "1.0", "1.1", "1.6", "2.5", "5.5").foreach { n => +Seq("", "-").foreach {
spark git commit: [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode
Repository: spark Updated Branches: refs/heads/branch-2.0 966abd6af -> ec377e773 [SPARK-17494][SQL] changePrecision() on compact decimal should respect rounding mode ## What changes were proposed in this pull request? Floor()/Ceil() of decimal is implemented using changePrecision() by passing a rounding mode, but the rounding mode is not respected when the decimal is in compact mode (could fit within a Long). This Update the changePrecision() to respect rounding mode, which could be ROUND_FLOOR, ROUND_CEIL, ROUND_HALF_UP, ROUND_HALF_EVEN. ## How was this patch tested? Added regression tests. Author: Davies LiuCloses #15154 from davies/decimal_round. (cherry picked from commit 8bde03bf9a0896ea59ceaa699df7700351a130fb) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec377e77 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec377e77 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec377e77 Branch: refs/heads/branch-2.0 Commit: ec377e77307b477d20a642edcd5ad5e26b989de6 Parents: 966abd6 Author: Davies Liu Authored: Wed Sep 21 21:02:30 2016 -0700 Committer: Reynold Xin Committed: Wed Sep 21 21:02:42 2016 -0700 -- .../org/apache/spark/sql/types/Decimal.scala| 28 +--- .../apache/spark/sql/types/DecimalSuite.scala | 15 +++ 2 files changed, 39 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ec377e77/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala index cc8175c..7085905 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala @@ -242,10 +242,30 @@ final class Decimal extends Ordered[Decimal] with Serializable { if (scale < _scale) { // Easier case: we just need to divide our scale down val diff = _scale - scale -val droppedDigits = longVal % POW_10(diff) -longVal /= POW_10(diff) -if (math.abs(droppedDigits) * 2 >= POW_10(diff)) { - longVal += (if (longVal < 0) -1L else 1L) +val pow10diff = POW_10(diff) +// % and / always round to 0 +val droppedDigits = longVal % pow10diff +longVal /= pow10diff +roundMode match { + case ROUND_FLOOR => +if (droppedDigits < 0) { + longVal += -1L +} + case ROUND_CEILING => +if (droppedDigits > 0) { + longVal += 1L +} + case ROUND_HALF_UP => +if (math.abs(droppedDigits) * 2 >= pow10diff) { + longVal += (if (droppedDigits < 0) -1L else 1L) +} + case ROUND_HALF_EVEN => +val doubled = math.abs(droppedDigits) * 2 +if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 0) { + longVal += (if (droppedDigits < 0) -1L else 1L) +} + case _ => +sys.error(s"Not supported rounding mode: $roundMode") } } else if (scale > _scale) { // We might be able to multiply longVal by a power of 10 and not overflow, but if not, http://git-wip-us.apache.org/repos/asf/spark/blob/ec377e77/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala index e1675c9..4cf329d 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala @@ -22,6 +22,7 @@ import scala.language.postfixOps import org.scalatest.PrivateMethodTester import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.types.Decimal._ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { /** Check that a Decimal has the given string representation, precision and scale */ @@ -193,4 +194,18 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L) assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue) } + + test("changePrecision() on compact decimal should respect rounding mode") { +Seq(ROUND_FLOOR, ROUND_CEILING, ROUND_HALF_UP, ROUND_HALF_EVEN).foreach {
spark git commit: [SPARK-17627] Mark Streaming Providers Experimental
Repository: spark Updated Branches: refs/heads/branch-2.0 59e6ab11a -> 966abd6af [SPARK-17627] Mark Streaming Providers Experimental All of structured streaming is experimental in its first release. We missed the annotation on two of the APIs. Author: Michael ArmbrustCloses #15188 from marmbrus/experimentalApi. (cherry picked from commit 3497ebe511fee67e66387e9e737c843a2939ce45) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/966abd6a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/966abd6a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/966abd6a Branch: refs/heads/branch-2.0 Commit: 966abd6af04b8e7b5f6446cba96f1825ca2bfcfa Parents: 59e6ab1 Author: Michael Armbrust Authored: Wed Sep 21 20:59:46 2016 -0700 Committer: Reynold Xin Committed: Wed Sep 21 20:59:52 2016 -0700 -- .../src/main/scala/org/apache/spark/sql/sources/interfaces.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/966abd6a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala index d2077a0..b84953d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala @@ -112,8 +112,10 @@ trait SchemaRelationProvider { } /** + * ::Experimental:: * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ +@Experimental trait StreamSourceProvider { /** Returns the name and schema of the source that can be used to continually read data. */ @@ -132,8 +134,10 @@ trait StreamSourceProvider { } /** + * ::Experimental:: * Implemented by objects that can produce a streaming [[Sink]] for a specific format or system. */ +@Experimental trait StreamSinkProvider { def createSink( sqlContext: SQLContext, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17627] Mark Streaming Providers Experimental
Repository: spark Updated Branches: refs/heads/master 6902edab7 -> 3497ebe51 [SPARK-17627] Mark Streaming Providers Experimental All of structured streaming is experimental in its first release. We missed the annotation on two of the APIs. Author: Michael ArmbrustCloses #15188 from marmbrus/experimentalApi. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3497ebe5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3497ebe5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3497ebe5 Branch: refs/heads/master Commit: 3497ebe511fee67e66387e9e737c843a2939ce45 Parents: 6902eda Author: Michael Armbrust Authored: Wed Sep 21 20:59:46 2016 -0700 Committer: Reynold Xin Committed: Wed Sep 21 20:59:46 2016 -0700 -- .../src/main/scala/org/apache/spark/sql/sources/interfaces.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3497ebe5/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala index a16d7ed..6484c78 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala @@ -112,8 +112,10 @@ trait SchemaRelationProvider { } /** + * ::Experimental:: * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ +@Experimental trait StreamSourceProvider { /** Returns the name and schema of the source that can be used to continually read data. */ @@ -132,8 +134,10 @@ trait StreamSourceProvider { } /** + * ::Experimental:: * Implemented by objects that can produce a streaming [[Sink]] for a specific format or system. */ +@Experimental trait StreamSinkProvider { def createSink( sqlContext: SQLContext, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][BUILD] Fix CheckStyle Error
Repository: spark Updated Branches: refs/heads/master 976f3b122 -> 1ea49916a [MINOR][BUILD] Fix CheckStyle Error ## What changes were proposed in this pull request? This PR is to fix the code style errors before 2.0.1 release. ## How was this patch tested? Manual. Before: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[153] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[196] (sizes) LineLength: Line is longer than 100 characters (found 108). [ERROR] src/main/java/org/apache/spark/network/client/TransportClient.java:[239] (sizes) LineLength: Line is longer than 100 characters (found 115). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[119] (sizes) LineLength: Line is longer than 100 characters (found 107). [ERROR] src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:[129] (sizes) LineLength: Line is longer than 100 characters (found 104). [ERROR] src/main/java/org/apache/spark/network/util/LevelDBProvider.java:[124,11] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions. [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[26] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[33] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[38] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[43] (sizes) LineLength: Line is longer than 100 characters (found 106). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[48] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java:[0] (misc) NewlineAtEndOfFile: File does not end with a newline. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[67] (sizes) LineLength: Line is longer than 100 characters (found 106). [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[200] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[309] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[332] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:[348] (regexp) RegexpSingleline: No trailing whitespace allowed. ``` After: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Weiqing YangCloses #15170 from Sherry302/fixjavastyle. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1ea49916 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1ea49916 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1ea49916 Branch: refs/heads/master Commit: 1ea49916acc46b0a74e5c85eef907920c5e31142 Parents: 976f3b1 Author: Weiqing Yang Authored: Tue Sep 20 21:48:25 2016 -0700 Committer: Reynold Xin Committed: Tue Sep 20 21:48:25 2016 -0700 -- .../apache/spark/network/client/TransportClient.java| 11 ++- .../spark/network/server/TransportRequestHandler.java | 7 --- .../org/apache/spark/network/util/LevelDBProvider.java | 2 +- .../org/apache/spark/network/util/TransportConf.java| 2 +- .../util/collection/unsafe/sort/PrefixComparators.java | 12 .../collection/unsafe/sort/UnsafeInMemorySorter.java| 2 +- .../collection/unsafe/sort/UnsafeSorterSpillReader.java | 4 ++-- 7 files changed, 23 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1ea49916/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java -- diff --git a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java index 600b80e..7e7d78d 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java +++
spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
Repository: spark Updated Branches: refs/heads/master 7e418e99c -> 976f3b122 [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxleeCloses #15166 from petermaxlee/SPARK-17513-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/976f3b12 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/976f3b12 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/976f3b12 Branch: refs/heads/master Commit: 976f3b1227c1a9e0b878e010531285fdba57b6a7 Parents: 7e418e9 Author: petermaxlee Authored: Tue Sep 20 19:08:07 2016 -0700 Committer: Reynold Xin Committed: Tue Sep 20 19:08:07 2016 -0700 -- .../sql/execution/streaming/MetadataLog.scala | 1 + .../execution/streaming/StreamExecution.scala | 7 ++ .../sql/streaming/StreamingQuerySuite.scala | 24 3 files changed, 32 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index 78d6be1..9e2604c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming * - Allow the user to query the latest batch id. * - Allow the user to query the metadata object of a specified batch id. * - Allow the user to query metadata objects in a range of batch ids. + * - Allow the user to remove obsolete metadata */ trait MetadataLog[T] { http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index a1aae61..220f77d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -290,6 +290,13 @@ class StreamExecution( assert(offsetLog.add(currentBatchId, availableOffsets.toCompositeOffset(sources)), s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId") logInfo(s"Committed offsets for batch $currentBatchId.") + + // Now that we have logged the new batch, no further processing will happen for + // the previous batch, and it is safe to discard the old metadata. + // Note that purge is exclusive, i.e. it purges everything before currentBatchId. + // NOTE: If StreamExecution implements pipeline parallelism (multiple batches in + // flight at the same time), this cleanup logic will need to change. + offsetLog.purge(currentBatchId) } else { awaitBatchLock.lock() try { http://git-wip-us.apache.org/repos/asf/spark/blob/976f3b12/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 9d58315..88f1f18 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala @@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter { ) } + testQuietly("StreamExecution metadata garbage collection") { +val inputData = MemoryStream[Int] +val mapped = inputData.toDS().map(6 / _) + +// Run 3 batches, and then assert that only 1 metadata file is left at the end +// since the first 2
spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
Repository: spark Updated Branches: refs/heads/branch-2.0 8d8e2332c -> 726f05716 [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxleeCloses #15166 from petermaxlee/SPARK-17513-2. (cherry picked from commit 976f3b1227c1a9e0b878e010531285fdba57b6a7) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/726f0571 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/726f0571 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/726f0571 Branch: refs/heads/branch-2.0 Commit: 726f05716b6c1c5021460483eedb0c8ca55d9276 Parents: 8d8e233 Author: petermaxlee Authored: Tue Sep 20 19:08:07 2016 -0700 Committer: Reynold Xin Committed: Tue Sep 20 19:08:15 2016 -0700 -- .../sql/execution/streaming/MetadataLog.scala | 1 + .../execution/streaming/StreamExecution.scala | 7 ++ .../sql/streaming/StreamingQuerySuite.scala | 24 3 files changed, 32 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index 78d6be1..9e2604c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming * - Allow the user to query the latest batch id. * - Allow the user to query the metadata object of a specified batch id. * - Allow the user to query metadata objects in a range of batch ids. + * - Allow the user to remove obsolete metadata */ trait MetadataLog[T] { http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index 5e1e5ee..b7587f2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -290,6 +290,13 @@ class StreamExecution( assert(offsetLog.add(currentBatchId, availableOffsets.toCompositeOffset(sources)), s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId") logInfo(s"Committed offsets for batch $currentBatchId.") + + // Now that we have logged the new batch, no further processing will happen for + // the previous batch, and it is safe to discard the old metadata. + // Note that purge is exclusive, i.e. it purges everything before currentBatchId. + // NOTE: If StreamExecution implements pipeline parallelism (multiple batches in + // flight at the same time), this cleanup logic will need to change. + offsetLog.purge(currentBatchId) } else { awaitBatchLock.lock() try { http://git-wip-us.apache.org/repos/asf/spark/blob/726f0571/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 9d58315..88f1f18 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala @@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter { ) } + testQuietly("StreamExecution metadata garbage collection") { +val inputData = MemoryStream[Int] +val mapped =
spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
Repository: spark Updated Branches: refs/heads/branch-2.0 7026eb87e -> 5456a1b4f [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxleeAuthor: frreiss Closes #15126 from petermaxlee/SPARK-17513. (cherry picked from commit be9d57fc9d8b10e4234c01c06ed43fd7dd12c07b) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5456a1b4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5456a1b4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5456a1b4 Branch: refs/heads/branch-2.0 Commit: 5456a1b4fcd85d0d7f2f1cc64e44967def0950bf Parents: 7026eb8 Author: petermaxlee Authored: Mon Sep 19 22:19:51 2016 -0700 Committer: Reynold Xin Committed: Mon Sep 19 22:19:58 2016 -0700 -- .../sql/execution/streaming/MetadataLog.scala | 1 + .../execution/streaming/StreamExecution.scala | 7 ++ .../sql/streaming/StreamingQuerySuite.scala | 24 3 files changed, 32 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index 78d6be1..9e2604c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming * - Allow the user to query the latest batch id. * - Allow the user to query the metadata object of a specified batch id. * - Allow the user to query metadata objects in a range of batch ids. + * - Allow the user to remove obsolete metadata */ trait MetadataLog[T] { http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index 5e1e5ee..b7587f2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -290,6 +290,13 @@ class StreamExecution( assert(offsetLog.add(currentBatchId, availableOffsets.toCompositeOffset(sources)), s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId") logInfo(s"Committed offsets for batch $currentBatchId.") + + // Now that we have logged the new batch, no further processing will happen for + // the previous batch, and it is safe to discard the old metadata. + // Note that purge is exclusive, i.e. it purges everything before currentBatchId. + // NOTE: If StreamExecution implements pipeline parallelism (multiple batches in + // flight at the same time), this cleanup logic will need to change. + offsetLog.purge(currentBatchId) } else { awaitBatchLock.lock() try { http://git-wip-us.apache.org/repos/asf/spark/blob/5456a1b4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 9d58315..d3e2cab 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala @@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter { ) } + testQuietly("StreamExecution metadata garbage collection") { +val inputData = MemoryStream[Int] +val mapped =
spark git commit: [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
Repository: spark Updated Branches: refs/heads/master 26145a5af -> be9d57fc9 [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxleeAuthor: frreiss Closes #15126 from petermaxlee/SPARK-17513. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be9d57fc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be9d57fc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/be9d57fc Branch: refs/heads/master Commit: be9d57fc9d8b10e4234c01c06ed43fd7dd12c07b Parents: 26145a5 Author: petermaxlee Authored: Mon Sep 19 22:19:51 2016 -0700 Committer: Reynold Xin Committed: Mon Sep 19 22:19:51 2016 -0700 -- .../sql/execution/streaming/MetadataLog.scala | 1 + .../execution/streaming/StreamExecution.scala | 7 ++ .../sql/streaming/StreamingQuerySuite.scala | 24 3 files changed, 32 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index 78d6be1..9e2604c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -24,6 +24,7 @@ package org.apache.spark.sql.execution.streaming * - Allow the user to query the latest batch id. * - Allow the user to query the metadata object of a specified batch id. * - Allow the user to query metadata objects in a range of batch ids. + * - Allow the user to remove obsolete metadata */ trait MetadataLog[T] { http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index a1aae61..220f77d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -290,6 +290,13 @@ class StreamExecution( assert(offsetLog.add(currentBatchId, availableOffsets.toCompositeOffset(sources)), s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId") logInfo(s"Committed offsets for batch $currentBatchId.") + + // Now that we have logged the new batch, no further processing will happen for + // the previous batch, and it is safe to discard the old metadata. + // Note that purge is exclusive, i.e. it purges everything before currentBatchId. + // NOTE: If StreamExecution implements pipeline parallelism (multiple batches in + // flight at the same time), this cleanup logic will need to change. + offsetLog.purge(currentBatchId) } else { awaitBatchLock.lock() try { http://git-wip-us.apache.org/repos/asf/spark/blob/be9d57fc/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 9d58315..d3e2cab 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala @@ -125,6 +125,30 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter { ) } + testQuietly("StreamExecution metadata garbage collection") { +val inputData = MemoryStream[Int] +val mapped = inputData.toDS().map(6 / _) + +// Run 3 batches, and then assert that only 1 metadata file is left at the end +// since the first 2
spark git commit: [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value
Repository: spark Updated Branches: refs/heads/branch-2.0 151f808a1 -> 27ce39cf2 [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value ## What changes were proposed in this pull request? AssertOnQuery has two apply constructor: one that accepts a closure that returns boolean, and another that accepts a closure that returns Unit. This is actually very confusing because developers could mistakenly think that AssertOnQuery always require a boolean return type and verifies the return result, when indeed the value of the last statement is ignored in one of the constructors. This pull request makes the two constructor consistent and always require boolean value. It will overall make the test suites more robust against developer errors. As an evidence for the confusing behavior, this change also identified a bug with an existing test case due to file system time granularity. This pull request fixes that test case as well. ## How was this patch tested? This is a test only change. Author: petermaxleeCloses #15127 from petermaxlee/SPARK-17571. (cherry picked from commit 8f0c35a4d0dd458719627be5f524792bf244d70a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27ce39cf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27ce39cf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27ce39cf Branch: refs/heads/branch-2.0 Commit: 27ce39cf207eba46502ed11fcbfd51bed3e68f2b Parents: 151f808 Author: petermaxlee Authored: Sun Sep 18 15:22:01 2016 -0700 Committer: Reynold Xin Committed: Sun Sep 18 15:22:08 2016 -0700 -- .../apache/spark/sql/streaming/FileStreamSourceSuite.scala| 7 +-- .../scala/org/apache/spark/sql/streaming/StreamTest.scala | 4 ++-- .../spark/sql/streaming/StreamingQueryListenerSuite.scala | 3 +++ 3 files changed, 10 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala index 886f7be..a02a36c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala @@ -354,7 +354,9 @@ class FileStreamSourceSuite extends FileStreamSourceTest { CheckAnswer("a", "b"), // SLeeps longer than 5ms (maxFileAge) -AssertOnQuery { _ => Thread.sleep(10); true }, +// Unfortunately since a lot of file system does not have modification time granularity +// finer grained than 1 sec, we need to use 1 sec here. +AssertOnQuery { _ => Thread.sleep(1000); true }, AddTextFileData("c\nd", src, tmp), CheckAnswer("a", "b", "c", "d"), @@ -363,7 +365,8 @@ class FileStreamSourceSuite extends FileStreamSourceTest { val source = streamExecution.logicalPlan.collect { case e: StreamingExecutionRelation => e.source.asInstanceOf[FileStreamSource] }.head - source.seenFiles.size == 1 + assert(source.seenFiles.size == 1) + true } ) } http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala index af2b581..6c5b170 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala @@ -188,8 +188,8 @@ trait StreamTest extends QueryTest with SharedSQLContext with Timeouts { new AssertOnQuery(condition, message) } -def apply(message: String)(condition: StreamExecution => Unit): AssertOnQuery = { - new AssertOnQuery(s => { condition(s); true }, message) +def apply(message: String)(condition: StreamExecution => Boolean): AssertOnQuery = { + new AssertOnQuery(condition, message) } } http://git-wip-us.apache.org/repos/asf/spark/blob/27ce39cf/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala -- diff --git
spark git commit: [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value
Repository: spark Updated Branches: refs/heads/master 1dbb725db -> 8f0c35a4d [SPARK-17571][SQL] AssertOnQuery.condition should always return Boolean value ## What changes were proposed in this pull request? AssertOnQuery has two apply constructor: one that accepts a closure that returns boolean, and another that accepts a closure that returns Unit. This is actually very confusing because developers could mistakenly think that AssertOnQuery always require a boolean return type and verifies the return result, when indeed the value of the last statement is ignored in one of the constructors. This pull request makes the two constructor consistent and always require boolean value. It will overall make the test suites more robust against developer errors. As an evidence for the confusing behavior, this change also identified a bug with an existing test case due to file system time granularity. This pull request fixes that test case as well. ## How was this patch tested? This is a test only change. Author: petermaxleeCloses #15127 from petermaxlee/SPARK-17571. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8f0c35a4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8f0c35a4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8f0c35a4 Branch: refs/heads/master Commit: 8f0c35a4d0dd458719627be5f524792bf244d70a Parents: 1dbb725 Author: petermaxlee Authored: Sun Sep 18 15:22:01 2016 -0700 Committer: Reynold Xin Committed: Sun Sep 18 15:22:01 2016 -0700 -- .../apache/spark/sql/streaming/FileStreamSourceSuite.scala| 7 +-- .../scala/org/apache/spark/sql/streaming/StreamTest.scala | 4 ++-- .../spark/sql/streaming/StreamingQueryListenerSuite.scala | 3 +++ 3 files changed, 10 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala index 886f7be..a02a36c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala @@ -354,7 +354,9 @@ class FileStreamSourceSuite extends FileStreamSourceTest { CheckAnswer("a", "b"), // SLeeps longer than 5ms (maxFileAge) -AssertOnQuery { _ => Thread.sleep(10); true }, +// Unfortunately since a lot of file system does not have modification time granularity +// finer grained than 1 sec, we need to use 1 sec here. +AssertOnQuery { _ => Thread.sleep(1000); true }, AddTextFileData("c\nd", src, tmp), CheckAnswer("a", "b", "c", "d"), @@ -363,7 +365,8 @@ class FileStreamSourceSuite extends FileStreamSourceTest { val source = streamExecution.logicalPlan.collect { case e: StreamingExecutionRelation => e.source.asInstanceOf[FileStreamSource] }.head - source.seenFiles.size == 1 + assert(source.seenFiles.size == 1) + true } ) } http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala index af2b581..6c5b170 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala @@ -188,8 +188,8 @@ trait StreamTest extends QueryTest with SharedSQLContext with Timeouts { new AssertOnQuery(condition, message) } -def apply(message: String)(condition: StreamExecution => Unit): AssertOnQuery = { - new AssertOnQuery(s => { condition(s); true }, message) +def apply(message: String)(condition: StreamExecution => Boolean): AssertOnQuery = { + new AssertOnQuery(condition, message) } } http://git-wip-us.apache.org/repos/asf/spark/blob/8f0c35a4/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala index
spark git commit: [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems
Repository: spark Updated Branches: refs/heads/master dca771bec -> b9323fc93 [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems ## What changes were proposed in this pull request? Fix ` / ` problems in SQL scaladoc. ## How was this patch tested? Scaladoc build and manual verification of generated HTML. Author: Sean OwenCloses #15117 from srowen/SPARK-17561. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9323fc9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9323fc9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9323fc9 Branch: refs/heads/master Commit: b9323fc9381a09af510f542fd5c86473e029caf6 Parents: dca771b Author: Sean Owen Authored: Fri Sep 16 13:43:05 2016 -0700 Committer: Reynold Xin Committed: Fri Sep 16 13:43:05 2016 -0700 -- .../org/apache/spark/sql/DataFrameReader.scala | 32 + .../org/apache/spark/sql/DataFrameWriter.scala | 12 +++ .../spark/sql/streaming/DataStreamReader.scala | 38 3 files changed, 53 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9323fc9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index 93bf74d..d29d90c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -269,14 +269,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `allowBackslashEscapingAnyCharacter` (default `false`): allows accepting quoting of all * character using backslash quoting mechanism * `mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records - * during parsing. - * - * - `PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts - * the malformed string into a new field configured by `columnNameOfCorruptRecord`. When - * a schema is set by user, it sets `null` for extra fields. - * - `DROPMALFORMED` : ignores the whole corrupted records. - * - `FAILFAST` : throws an exception when it meets corrupted records. - * + * during parsing. + * + * `PERMISSIVE` : sets other fields to `null` when it meets a corrupted record, and puts + * the malformed string into a new field configured by `columnNameOfCorruptRecord`. When + * a schema is set by user, it sets `null` for extra fields. + * `DROPMALFORMED` : ignores the whole corrupted records. + * `FAILFAST` : throws an exception when it meets corrupted records. + * + * * `columnNameOfCorruptRecord` (default is the value specified in * `spark.sql.columnNameOfCorruptRecord`): allows renaming the new field having malformed string * created by `PERMISSIVE` mode. This overrides `spark.sql.columnNameOfCorruptRecord`. @@ -395,13 +396,14 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `maxMalformedLogPerPartition` (default `10`): sets the maximum number of malformed rows * Spark will log for each partition. Malformed records beyond this number will be ignored. * `mode` (default `PERMISSIVE`): allows a mode for dealing with corrupt records - *during parsing. - * - *- `PERMISSIVE` : sets other fields to `null` when it meets a corrupted record. When - * a schema is set by user, it sets `null` for extra fields. - *- `DROPMALFORMED` : ignores the whole corrupted records. - *- `FAILFAST` : throws an exception when it meets corrupted records. - * + *during parsing. + * + * `PERMISSIVE` : sets other fields to `null` when it meets a corrupted record. When + * a schema is set by user, it sets `null` for extra fields. + * `DROPMALFORMED` : ignores the whole corrupted records. + * `FAILFAST` : throws an exception when it meets corrupted records. + * + * * * @since 2.0.0 */ http://git-wip-us.apache.org/repos/asf/spark/blob/b9323fc9/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index c05c7a6..e137f07 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala @@ -397,7 +397,9 @@
spark git commit: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
Repository: spark Updated Branches: refs/heads/branch-2.0 9c23f4408 -> 5ad4395e1 [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 ## What changes were proposed in this pull request? This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes. ## How was this patch tested? The change should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #15115 from rxin/SPARK-17558. (cherry picked from commit dca771bec6edb1cd8fc75861d364e0ba9dccf7c3) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ad4395e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ad4395e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ad4395e Branch: refs/heads/branch-2.0 Commit: 5ad4395e1b41d5ec74785c0aef5c2f656f9db9da Parents: 9c23f44 Author: Reynold Xin <r...@databricks.com> Authored: Fri Sep 16 11:24:26 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Sep 16 11:24:40 2016 -0700 -- dev/deps/spark-deps-hadoop-2.7 | 30 +++--- pom.xml| 2 +- 2 files changed, 16 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5ad4395e/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 3da0860..a61f31e 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -58,21 +58,21 @@ gson-2.2.4.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.7.2.jar -hadoop-auth-2.7.2.jar -hadoop-client-2.7.2.jar -hadoop-common-2.7.2.jar -hadoop-hdfs-2.7.2.jar -hadoop-mapreduce-client-app-2.7.2.jar -hadoop-mapreduce-client-common-2.7.2.jar -hadoop-mapreduce-client-core-2.7.2.jar -hadoop-mapreduce-client-jobclient-2.7.2.jar -hadoop-mapreduce-client-shuffle-2.7.2.jar -hadoop-yarn-api-2.7.2.jar -hadoop-yarn-client-2.7.2.jar -hadoop-yarn-common-2.7.2.jar -hadoop-yarn-server-common-2.7.2.jar -hadoop-yarn-server-web-proxy-2.7.2.jar +hadoop-annotations-2.7.3.jar +hadoop-auth-2.7.3.jar +hadoop-client-2.7.3.jar +hadoop-common-2.7.3.jar +hadoop-hdfs-2.7.3.jar +hadoop-mapreduce-client-app-2.7.3.jar +hadoop-mapreduce-client-common-2.7.3.jar +hadoop-mapreduce-client-core-2.7.3.jar +hadoop-mapreduce-client-jobclient-2.7.3.jar +hadoop-mapreduce-client-shuffle-2.7.3.jar +hadoop-yarn-api-2.7.3.jar +hadoop-yarn-client-2.7.3.jar +hadoop-yarn-common-2.7.3.jar +hadoop-yarn-server-common-2.7.3.jar +hadoop-yarn-server-web-proxy-2.7.3.jar hk2-api-2.4.0-b34.jar hk2-locator-2.4.0-b34.jar hk2-utils-2.4.0-b34.jar http://git-wip-us.apache.org/repos/asf/spark/blob/5ad4395e/pom.xml -- diff --git a/pom.xml b/pom.xml index ee0032a..a723283 100644 --- a/pom.xml +++ b/pom.xml @@ -2511,7 +2511,7 @@ hadoop-2.7 -2.7.2 +2.7.3 0.9.3 3.4.6 2.6.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
Repository: spark Updated Branches: refs/heads/master a425a37a5 -> dca771bec [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 ## What changes were proposed in this pull request? This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes. ## How was this patch tested? The change should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #15115 from rxin/SPARK-17558. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dca771be Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dca771be Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dca771be Branch: refs/heads/master Commit: dca771bec6edb1cd8fc75861d364e0ba9dccf7c3 Parents: a425a37 Author: Reynold Xin <r...@databricks.com> Authored: Fri Sep 16 11:24:26 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Sep 16 11:24:26 2016 -0700 -- dev/deps/spark-deps-hadoop-2.7 | 30 +++--- pom.xml| 2 +- 2 files changed, 16 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dca771be/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index d464c97..6356612 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -59,21 +59,21 @@ gson-2.2.4.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.7.2.jar -hadoop-auth-2.7.2.jar -hadoop-client-2.7.2.jar -hadoop-common-2.7.2.jar -hadoop-hdfs-2.7.2.jar -hadoop-mapreduce-client-app-2.7.2.jar -hadoop-mapreduce-client-common-2.7.2.jar -hadoop-mapreduce-client-core-2.7.2.jar -hadoop-mapreduce-client-jobclient-2.7.2.jar -hadoop-mapreduce-client-shuffle-2.7.2.jar -hadoop-yarn-api-2.7.2.jar -hadoop-yarn-client-2.7.2.jar -hadoop-yarn-common-2.7.2.jar -hadoop-yarn-server-common-2.7.2.jar -hadoop-yarn-server-web-proxy-2.7.2.jar +hadoop-annotations-2.7.3.jar +hadoop-auth-2.7.3.jar +hadoop-client-2.7.3.jar +hadoop-common-2.7.3.jar +hadoop-hdfs-2.7.3.jar +hadoop-mapreduce-client-app-2.7.3.jar +hadoop-mapreduce-client-common-2.7.3.jar +hadoop-mapreduce-client-core-2.7.3.jar +hadoop-mapreduce-client-jobclient-2.7.3.jar +hadoop-mapreduce-client-shuffle-2.7.3.jar +hadoop-yarn-api-2.7.3.jar +hadoop-yarn-client-2.7.3.jar +hadoop-yarn-common-2.7.3.jar +hadoop-yarn-server-common-2.7.3.jar +hadoop-yarn-server-web-proxy-2.7.3.jar hk2-api-2.4.0-b34.jar hk2-locator-2.4.0-b34.jar hk2-utils-2.4.0-b34.jar http://git-wip-us.apache.org/repos/asf/spark/blob/dca771be/pom.xml -- diff --git a/pom.xml b/pom.xml index ef83c18..b514173 100644 --- a/pom.xml +++ b/pom.xml @@ -2524,7 +2524,7 @@ hadoop-2.7 -2.7.2 +2.7.3 0.9.3 3.4.6 2.6.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
Repository: spark Updated Branches: refs/heads/master 736a7911c -> 48b459ddd [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class There's an unused `classTag` val in the AtomicType base class which is causing unnecessary slowness in deserialization because it needs to grab ScalaReflectionLock and create a new runtime reflection mirror. Removing this unused code gives a small but measurable performance boost in SQL task deserialization. Author: Josh RosenCloses #14869 from JoshRosen/remove-unused-classtag. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48b459dd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48b459dd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48b459dd Branch: refs/heads/master Commit: 48b459ddd58affd5519856cb6e204398b7739a2a Parents: 736a791 Author: Josh Rosen Authored: Tue Aug 30 09:58:00 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 30 09:58:00 2016 +0800 -- .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/48b459dd/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala index 65eae86..1981fd8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.types -import scala.reflect.ClassTag -import scala.reflect.runtime.universe.{runtimeMirror, TypeTag} +import scala.reflect.runtime.universe.TypeTag import org.apache.spark.annotation.DeveloperApi -import org.apache.spark.sql.catalyst.ScalaReflectionLock import org.apache.spark.sql.catalyst.expressions.Expression -import org.apache.spark.util.Utils /** * A non-concrete data type, reserved for internal uses. @@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType { private[sql] type InternalType private[sql] val tag: TypeTag[InternalType] private[sql] val ordering: Ordering[InternalType] - - @transient private[sql] val classTag = ScalaReflectionLock.synchronized { -val mirror = runtimeMirror(Utils.getSparkClassLoader) -ClassTag[InternalType](mirror.runtimeClass(tag.tpe)) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
Repository: spark Updated Branches: refs/heads/branch-2.0 976a43dbf -> 59032570f [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class There's an unused `classTag` val in the AtomicType base class which is causing unnecessary slowness in deserialization because it needs to grab ScalaReflectionLock and create a new runtime reflection mirror. Removing this unused code gives a small but measurable performance boost in SQL task deserialization. Author: Josh RosenCloses #14869 from JoshRosen/remove-unused-classtag. (cherry picked from commit 48b459ddd58affd5519856cb6e204398b7739a2a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59032570 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59032570 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59032570 Branch: refs/heads/branch-2.0 Commit: 59032570fbd0985f758c27bdec5482221cc64af9 Parents: 976a43d Author: Josh Rosen Authored: Tue Aug 30 09:58:00 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 30 09:58:11 2016 +0800 -- .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/59032570/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala index 65eae86..1981fd8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.types -import scala.reflect.ClassTag -import scala.reflect.runtime.universe.{runtimeMirror, TypeTag} +import scala.reflect.runtime.universe.TypeTag import org.apache.spark.annotation.DeveloperApi -import org.apache.spark.sql.catalyst.ScalaReflectionLock import org.apache.spark.sql.catalyst.expressions.Expression -import org.apache.spark.util.Utils /** * A non-concrete data type, reserved for internal uses. @@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType { private[sql] type InternalType private[sql] val tag: TypeTag[InternalType] private[sql] val ordering: Ordering[InternalType] - - @transient private[sql] val classTag = ScalaReflectionLock.synchronized { -val mirror = runtimeMirror(Utils.getSparkClassLoader) -ClassTag[InternalType](mirror.runtimeClass(tag.tpe)) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17274][SQL] Move join optimizer rules into a separate file
Repository: spark Updated Branches: refs/heads/branch-2.0 f91614f36 -> 901ab0694 [SPARK-17274][SQL] Move join optimizer rules into a separate file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various join rules into a single file. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14846 from rxin/SPARK-17274. (cherry picked from commit 718b6bad2d698b76be6906d51da13626e9f3890e) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/901ab069 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/901ab069 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/901ab069 Branch: refs/heads/branch-2.0 Commit: 901ab06949addd05be6cb85df4eb6bd2104777e8 Parents: f91614f Author: Reynold Xin <r...@databricks.com> Authored: Sat Aug 27 00:36:18 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Aug 27 00:36:36 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 106 --- .../spark/sql/catalyst/optimizer/joins.scala| 134 +++ 2 files changed, 134 insertions(+), 106 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/901ab069/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 15d33c1..e743898 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -1150,112 +1150,6 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper { } /** - * Reorder the joins and push all the conditions into join, so that the bottom ones have at least - * one condition. - * - * The order of joins will not be changed if all of them already have at least one condition. - */ -object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { - - /** - * Join a list of plans together and push down the conditions into them. - * - * The joined plan are picked from left to right, prefer those has at least one join condition. - * - * @param input a list of LogicalPlans to join. - * @param conditions a list of condition for join. - */ - @tailrec - def createOrderedJoin(input: Seq[LogicalPlan], conditions: Seq[Expression]): LogicalPlan = { -assert(input.size >= 2) -if (input.size == 2) { - val (joinConditions, others) = conditions.partition( -e => !SubqueryExpression.hasCorrelatedSubquery(e)) - val join = Join(input(0), input(1), Inner, joinConditions.reduceLeftOption(And)) - if (others.nonEmpty) { -Filter(others.reduceLeft(And), join) - } else { -join - } -} else { - val left :: rest = input.toList - // find out the first join that have at least one join condition - val conditionalJoin = rest.find { plan => -val refs = left.outputSet ++ plan.outputSet -conditions.filterNot(canEvaluate(_, left)).filterNot(canEvaluate(_, plan)) - .exists(_.references.subsetOf(refs)) - } - // pick the next one if no condition left - val right = conditionalJoin.getOrElse(rest.head) - - val joinedRefs = left.outputSet ++ right.outputSet - val (joinConditions, others) = conditions.partition( -e => e.references.subsetOf(joinedRefs) && !SubqueryExpression.hasCorrelatedSubquery(e)) - val joined = Join(left, right, Inner, joinConditions.reduceLeftOption(And)) - - // should not have reference to same logical plan - createOrderedJoin(Seq(joined) ++ rest.filterNot(_ eq right), others) -} - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case j @ ExtractFiltersAndInnerJoins(input, conditions) -if input.size > 2 && conditions.nonEmpty => - createOrderedJoin(input, conditions) - } -} - -/** - * Elimination of outer joins, if the predicates can restrict the result sets so that - * all null-supplying rows are eliminated - * - * - full outer -> inner if both sides have such predicates - * - left outer -> inner if the right side has such predicates - * - right outer -> inner if the left side has such predicates - * - full outer -> left outer if only the left side has such predicates - * - full outer -> right outer if only the right side has such predicates - * - * This rule shoul
spark git commit: [SPARK-17274][SQL] Move join optimizer rules into a separate file
Repository: spark Updated Branches: refs/heads/master 5aad4509c -> 718b6bad2 [SPARK-17274][SQL] Move join optimizer rules into a separate file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various join rules into a single file. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14846 from rxin/SPARK-17274. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/718b6bad Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/718b6bad Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/718b6bad Branch: refs/heads/master Commit: 718b6bad2d698b76be6906d51da13626e9f3890e Parents: 5aad450 Author: Reynold Xin <r...@databricks.com> Authored: Sat Aug 27 00:36:18 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Aug 27 00:36:18 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 106 --- .../spark/sql/catalyst/optimizer/joins.scala| 134 +++ 2 files changed, 134 insertions(+), 106 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/718b6bad/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 17cab18..7617d34 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -800,112 +800,6 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper { } /** - * Reorder the joins and push all the conditions into join, so that the bottom ones have at least - * one condition. - * - * The order of joins will not be changed if all of them already have at least one condition. - */ -object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { - - /** - * Join a list of plans together and push down the conditions into them. - * - * The joined plan are picked from left to right, prefer those has at least one join condition. - * - * @param input a list of LogicalPlans to join. - * @param conditions a list of condition for join. - */ - @tailrec - def createOrderedJoin(input: Seq[LogicalPlan], conditions: Seq[Expression]): LogicalPlan = { -assert(input.size >= 2) -if (input.size == 2) { - val (joinConditions, others) = conditions.partition( -e => !SubqueryExpression.hasCorrelatedSubquery(e)) - val join = Join(input(0), input(1), Inner, joinConditions.reduceLeftOption(And)) - if (others.nonEmpty) { -Filter(others.reduceLeft(And), join) - } else { -join - } -} else { - val left :: rest = input.toList - // find out the first join that have at least one join condition - val conditionalJoin = rest.find { plan => -val refs = left.outputSet ++ plan.outputSet -conditions.filterNot(canEvaluate(_, left)).filterNot(canEvaluate(_, plan)) - .exists(_.references.subsetOf(refs)) - } - // pick the next one if no condition left - val right = conditionalJoin.getOrElse(rest.head) - - val joinedRefs = left.outputSet ++ right.outputSet - val (joinConditions, others) = conditions.partition( -e => e.references.subsetOf(joinedRefs) && !SubqueryExpression.hasCorrelatedSubquery(e)) - val joined = Join(left, right, Inner, joinConditions.reduceLeftOption(And)) - - // should not have reference to same logical plan - createOrderedJoin(Seq(joined) ++ rest.filterNot(_ eq right), others) -} - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case j @ ExtractFiltersAndInnerJoins(input, conditions) -if input.size > 2 && conditions.nonEmpty => - createOrderedJoin(input, conditions) - } -} - -/** - * Elimination of outer joins, if the predicates can restrict the result sets so that - * all null-supplying rows are eliminated - * - * - full outer -> inner if both sides have such predicates - * - left outer -> inner if the right side has such predicates - * - right outer -> inner if the left side has such predicates - * - full outer -> left outer if only the left side has such predicates - * - full outer -> right outer if only the right side has such predicates - * - * This rule should be executed before pushing down the Filter - */ -object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { - - /
spark git commit: [SPARK-17273][SQL] Move expression optimizer rules into a separate file
Repository: spark Updated Branches: refs/heads/master 0243b3287 -> 5aad4509c [SPARK-17273][SQL] Move expression optimizer rules into a separate file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various expression optimization rules into a single file. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14845 from rxin/SPARK-17273. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5aad4509 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5aad4509 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5aad4509 Branch: refs/heads/master Commit: 5aad4509c15e131948d387157ecf56af1a705e19 Parents: 0243b32 Author: Reynold Xin <r...@databricks.com> Authored: Sat Aug 27 00:34:35 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Aug 27 00:34:35 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 461 + .../sql/catalyst/optimizer/expressions.scala| 506 +++ 2 files changed, 507 insertions(+), 460 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5aad4509/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 8a50368..17cab18 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -534,176 +534,6 @@ object CollapseRepartition extends Rule[LogicalPlan] { } /** - * Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. - * For example, when the expression is just checking to see if a string starts with a given - * pattern. - */ -object LikeSimplification extends Rule[LogicalPlan] { - // if guards below protect from escapes on trailing %. - // Cases like "something\%" are not optimized, but this does not affect correctness. - private val startsWith = "([^_%]+)%".r - private val endsWith = "%([^_%]+)".r - private val startsAndEndsWith = "([^_%]+)%([^_%]+)".r - private val contains = "%([^_%]+)%".r - private val equalTo = "([^_%]*)".r - - def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { -case Like(input, Literal(pattern, StringType)) => - pattern.toString match { -case startsWith(prefix) if !prefix.endsWith("\\") => - StartsWith(input, Literal(prefix)) -case endsWith(postfix) => - EndsWith(input, Literal(postfix)) -// 'a%a' pattern is basically same with 'a%' && '%a'. -// However, the additional `Length` condition is required to prevent 'a' match 'a%a'. -case startsAndEndsWith(prefix, postfix) if !prefix.endsWith("\\") => - And(GreaterThanOrEqual(Length(input), Literal(prefix.size + postfix.size)), -And(StartsWith(input, Literal(prefix)), EndsWith(input, Literal(postfix -case contains(infix) if !infix.endsWith("\\") => - Contains(input, Literal(infix)) -case equalTo(str) => - EqualTo(input, Literal(str)) -case _ => - Like(input, Literal.create(pattern, StringType)) - } - } -} - -/** - * Replaces [[Expression Expressions]] that can be statically evaluated with - * equivalent [[Literal]] values. This rule is more specific with - * Null value propagation from bottom to top of the expression tree. - */ -object NullPropagation extends Rule[LogicalPlan] { - private def nonNullLiteral(e: Expression): Boolean = e match { -case Literal(null, _) => false -case _ => true - } - - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case q: LogicalPlan => q transformExpressionsUp { - case e @ WindowExpression(Cast(Literal(0L, _), _), _) => -Cast(Literal(0L), e.dataType) - case e @ AggregateExpression(Count(exprs), _, _, _) if !exprs.exists(nonNullLiteral) => -Cast(Literal(0L), e.dataType) - case e @ IsNull(c) if !c.nullable => Literal.create(false, BooleanType) - case e @ IsNotNull(c) if !c.nullable => Literal.create(true, BooleanType) - case e @ GetArrayItem(Literal(null, _), _) => Literal.create(null, e.dataType) - case e @ GetArrayItem(_, Literal(null, _)) => Literal.create(null, e.dataType) -
spark git commit: [SPARK-17272][SQL] Move subquery optimizer rules into its own file
Repository: spark Updated Branches: refs/heads/master dcefac438 -> 0243b3287 [SPARK-17272][SQL] Move subquery optimizer rules into its own file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various subquery rules into a single file. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14844 from rxin/SPARK-17272. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0243b328 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0243b328 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0243b328 Branch: refs/heads/master Commit: 0243b328736f83faea5f83d18c4d331890ed8e81 Parents: dcefac4 Author: Reynold Xin <r...@databricks.com> Authored: Sat Aug 27 00:32:57 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Aug 27 00:32:57 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 323 - .../spark/sql/catalyst/optimizer/subquery.scala | 356 +++ 2 files changed, 356 insertions(+), 323 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0243b328/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index d055bc3..8a50368 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -1637,326 +1637,3 @@ object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan] { a.copy(groupingExpressions = newGrouping) } } - -/** - * This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates - * are supported: - * a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter - *will be pulled out as the join conditions. - * b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will - *be pulled out as join conditions, value = selected column will also be used as join - *condition. - */ -object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Filter(condition, child) => - val (withSubquery, withoutSubquery) = - splitConjunctivePredicates(condition).partition(PredicateSubquery.hasPredicateSubquery) - - // Construct the pruned filter condition. - val newFilter: LogicalPlan = withoutSubquery match { -case Nil => child -case conditions => Filter(conditions.reduce(And), child) - } - - // Filter the plan by applying left semi and left anti joins. - withSubquery.foldLeft(newFilter) { -case (p, PredicateSubquery(sub, conditions, _, _)) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - Join(outerPlan, sub, LeftSemi, joinCond) -case (p, Not(PredicateSubquery(sub, conditions, false, _))) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - Join(outerPlan, sub, LeftAnti, joinCond) -case (p, Not(PredicateSubquery(sub, conditions, true, _))) => - // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN expr - // Construct the condition. A NULL in one of the conditions is regarded as a positive - // result; such a row will be filtered out by the Anti-Join operator. - - // Note that will almost certainly be planned as a Broadcast Nested Loop join. - // Use EXISTS if performance matters to you. - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - val anyNull = splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or) - Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get))) -case (p, predicate) => - val (newCond, inputPlan) = rewriteExistentialExpr(Seq(predicate), p) - Project(p.output, Filter(newCond.get, inputPlan)) - } - } - - /** - * Given a predicate expression and an input plan, it rewrites - * any embedded existential sub-query into an existential join. - * It returns the rewritten expression together with the updated plan. - * Currently, it does not support null-aware joins. Embedded NOT IN predicates - * are blocked in the Analyzer. - */ - private def rewriteExistentialExpr( - exprs: Seq[Expression], -
spark git commit: [SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0)
Repository: spark Updated Branches: refs/heads/branch-2.0 94d52d765 -> f91614f36 [SPARK-17270][SQL] Move object optimization rules into its own file (branch-2.0) ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various Dataset object optimization rules into a single file. I'm submitting separate pull requests so we can more easily merge this in branch-2.0 to simplify optimizer backports. This is https://github.com/apache/spark/pull/14839 but for branch-2.0. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14843 from rxin/SPARK-17270-branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f91614f3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f91614f3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f91614f3 Branch: refs/heads/branch-2.0 Commit: f91614f36472957355fad7d69d66327807fe80c8 Parents: 94d52d7 Author: Reynold Xin <r...@databricks.com> Authored: Sat Aug 27 00:31:49 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Aug 27 00:31:49 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 72 - .../spark/sql/catalyst/optimizer/objects.scala | 101 +++ 2 files changed, 101 insertions(+), 72 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f91614f3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index f3f1d21..15d33c1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -187,25 +187,6 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { } /** - * Removes cases where we are unnecessarily going between the object and serialized (InternalRow) - * representation of data item. For example back to back map operations. - */ -object EliminateSerialization extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case d @ DeserializeToObject(_, _, s: SerializeFromObject) -if d.outputObjectType == s.inputObjectType => - // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`. - // We will remove it later in RemoveAliasOnlyProject rule. - val objAttr = -Alias(s.child.output.head, s.child.output.head.name)(exprId = d.output.head.exprId) - Project(objAttr :: Nil, s.child) -case a @ AppendColumns(_, _, _, s: SerializeFromObject) -if a.deserializer.dataType == s.inputObjectType => - AppendColumnsWithObject(a.func, s.serializer, a.serializer, s.child) - } -} - -/** * Pushes down [[LocalLimit]] beneath UNION ALL and beneath the streamed inputs of outer joins. */ object LimitPushDown extends Rule[LogicalPlan] { @@ -1583,59 +1564,6 @@ object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan] { } /** - * Typed [[Filter]] is by default surrounded by a [[DeserializeToObject]] beneath it and a - * [[SerializeFromObject]] above it. If these serializations can't be eliminated, we should embed - * the deserializer in filter condition to save the extra serialization at last. - */ -object EmbedSerializerInFilter extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case s @ SerializeFromObject(_, Filter(condition, d: DeserializeToObject)) - // SPARK-15632: Conceptually, filter operator should never introduce schema change. This - // optimization rule also relies on this assumption. However, Dataset typed filter operator - // does introduce schema changes in some cases. Thus, we only enable this optimization when - // - // 1. either input and output schemata are exactly the same, or - // 2. both input and output schemata are single-field schema and share the same type. - // - // The 2nd case is included because encoders for primitive types always have only a single - // field with hard-coded field name "value". - // TODO Cleans this up after fixing SPARK-15632. - if s.schema == d.child.schema || samePrimitiveType(s.schema, d.child.schema) => - - val numObjects = condition.collect { -case a: Attribute if a == d.output.head => a - }.length - - if (numObjects > 1) { -// If the filter condition references the ob
spark git commit: [SPARK-17269][SQL] Move finish analysis optimization stage into its own file
Repository: spark Updated Branches: refs/heads/branch-2.0 9c0ac6b53 -> 94d52d765 [SPARK-17269][SQL] Move finish analysis optimization stage into its own file As part of breaking Optimizer.scala apart, this patch moves various finish analysis optimization stage rules into a single file. I'm submitting separate pull requests so we can more easily merge this in branch-2.0 to simplify optimizer backports. This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14838 from rxin/SPARK-17269. (cherry picked from commit dcefac438788c51d84641bfbc505efe095731a39) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94d52d76 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94d52d76 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94d52d76 Branch: refs/heads/branch-2.0 Commit: 94d52d76569f8b0782f424cfac959a4bb75c54c0 Parents: 9c0ac6b Author: Reynold Xin <r...@databricks.com> Authored: Fri Aug 26 22:10:28 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Aug 26 22:12:11 2016 -0700 -- .../analysis/RewriteDistinctAggregates.scala| 269 --- .../sql/catalyst/optimizer/Optimizer.scala | 38 --- .../optimizer/RewriteDistinctAggregates.scala | 269 +++ .../sql/catalyst/optimizer/finishAnalysis.scala | 65 + 4 files changed, 334 insertions(+), 307 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/94d52d76/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala deleted file mode 100644 index 8afd28d..000 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala +++ /dev/null @@ -1,269 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, AggregateFunction, Complete} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, LogicalPlan} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.types.IntegerType - -/** - * This rule rewrites an aggregate query with distinct aggregations into an expanded double - * aggregation in which the regular aggregation expressions and every distinct clause is aggregated - * in a separate group. The results are then combined in a second aggregate. - * - * For example (in scala): - * {{{ - * val data = Seq( - * ("a", "ca1", "cb1", 10), - * ("a", "ca1", "cb2", 5), - * ("b", "ca1", "cb1", 13)) - * .toDF("key", "cat1", "cat2", "value") - * data.createOrReplaceTempView("data") - * - * val agg = data.groupBy($"key") - * .agg( - * countDistinct($"cat1").as("cat1_cnt"), - * countDistinct($"cat2").as("cat2_cnt"), - * sum($"value").as("total")) - * }}} - * - * This translates to the following (pseudo) logical plan: - * {{{ - * Aggregate( - *key = ['key] - *functions = [COUNT(DISTINCT 'cat1), - * COUNT(DISTINCT 'cat2), - * sum('value)] - *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) - * LocalTableScan [...] - * }}} - * - * This rule rewrites this logical plan to the following (pseudo) logical plan: - * {{{ - * Aggregate( - *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - *
spark git commit: [SPARK-17269][SQL] Move finish analysis optimization stage into its own file
Repository: spark Updated Branches: refs/heads/master cc0caa690 -> dcefac438 [SPARK-17269][SQL] Move finish analysis optimization stage into its own file ## What changes were proposed in this pull request? As part of breaking Optimizer.scala apart, this patch moves various finish analysis optimization stage rules into a single file. I'm submitting separate pull requests so we can more easily merge this in branch-2.0 to simplify optimizer backports. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #14838 from rxin/SPARK-17269. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcefac43 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcefac43 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcefac43 Branch: refs/heads/master Commit: dcefac438788c51d84641bfbc505efe095731a39 Parents: cc0caa6 Author: Reynold Xin <r...@databricks.com> Authored: Fri Aug 26 22:10:28 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Aug 26 22:10:28 2016 -0700 -- .../analysis/RewriteDistinctAggregates.scala| 269 --- .../sql/catalyst/optimizer/Optimizer.scala | 38 --- .../optimizer/RewriteDistinctAggregates.scala | 269 +++ .../sql/catalyst/optimizer/finishAnalysis.scala | 65 + 4 files changed, 334 insertions(+), 307 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dcefac43/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala deleted file mode 100644 index 8afd28d..000 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteDistinctAggregates.scala +++ /dev/null @@ -1,269 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.analysis - -import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, AggregateFunction, Complete} -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Expand, LogicalPlan} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.types.IntegerType - -/** - * This rule rewrites an aggregate query with distinct aggregations into an expanded double - * aggregation in which the regular aggregation expressions and every distinct clause is aggregated - * in a separate group. The results are then combined in a second aggregate. - * - * For example (in scala): - * {{{ - * val data = Seq( - * ("a", "ca1", "cb1", 10), - * ("a", "ca1", "cb2", 5), - * ("b", "ca1", "cb1", 13)) - * .toDF("key", "cat1", "cat2", "value") - * data.createOrReplaceTempView("data") - * - * val agg = data.groupBy($"key") - * .agg( - * countDistinct($"cat1").as("cat1_cnt"), - * countDistinct($"cat2").as("cat2_cnt"), - * sum($"value").as("total")) - * }}} - * - * This translates to the following (pseudo) logical plan: - * {{{ - * Aggregate( - *key = ['key] - *functions = [COUNT(DISTINCT 'cat1), - * COUNT(DISTINCT 'cat2), - * sum('value)] - *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) - * LocalTableScan [...] - * }}} - * - * This rule rewrites this logical plan to the following (pseudo) logical plan: - * {{{ - * Aggregate( - *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), -
spark git commit: [SPARK-17235][SQL] Support purging of old logs in MetadataLog
Repository: spark Updated Branches: refs/heads/branch-2.0 52feb3fbf -> dfdfc3092 [SPARK-17235][SQL] Support purging of old logs in MetadataLog ## What changes were proposed in this pull request? This patch adds a purge interface to MetadataLog, and an implementation in HDFSMetadataLog. The purge function is currently unused, but I will use it to purge old execution and file source logs in follow-up patches. These changes are required in a production structured streaming job that runs for a long period of time. ## How was this patch tested? Added a unit test case in HDFSMetadataLogSuite. Author: petermaxleeCloses #14802 from petermaxlee/SPARK-17235. (cherry picked from commit f64a1ddd09a34d5d867ccbaba46204d75fad038d) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dfdfc309 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dfdfc309 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dfdfc309 Branch: refs/heads/branch-2.0 Commit: dfdfc3092d1b6942eb9092e28e15fa4efb6ac084 Parents: 52feb3f Author: petermaxlee Authored: Fri Aug 26 16:05:34 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 26 16:05:40 2016 -0700 -- .../execution/streaming/HDFSMetadataLog.scala | 14 ++ .../sql/execution/streaming/MetadataLog.scala | 6 + .../streaming/HDFSMetadataLogSuite.scala| 27 +--- 3 files changed, 43 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala index 2b6f76c..127ece9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala @@ -227,6 +227,20 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: String) None } + /** + * Removes all the log entry earlier than thresholdBatchId (exclusive). + */ + override def purge(thresholdBatchId: Long): Unit = { +val batchIds = fileManager.list(metadataPath, batchFilesFilter) + .map(f => pathToBatchId(f.getPath)) + +for (batchId <- batchIds if batchId < thresholdBatchId) { + val path = batchIdToPath(batchId) + fileManager.delete(path) + logTrace(s"Removed metadata log file: $path") +} + } + private def createFileManager(): FileManager = { val hadoopConf = sparkSession.sessionState.newHadoopConf() try { http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index cc70e1d..78d6be1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -48,4 +48,10 @@ trait MetadataLog[T] { * Return the latest batch Id and its metadata if exist. */ def getLatest(): Option[(Long, T)] + + /** + * Removes all the log entry earlier than thresholdBatchId (exclusive). + * This operation should be idempotent. + */ + def purge(thresholdBatchId: Long): Unit } http://git-wip-us.apache.org/repos/asf/spark/blob/dfdfc309/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala index ab5a2d2..4259384 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala @@ -46,14 +46,14 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { test("FileManager: FileContextManager") { withTempDir { temp => val path = new Path(temp.getAbsolutePath) - testManager(path, new FileContextManager(path, new Configuration)) + testFileManager(path, new
spark git commit: [SPARK-17235][SQL] Support purging of old logs in MetadataLog
Repository: spark Updated Branches: refs/heads/master a11d10f18 -> f64a1ddd0 [SPARK-17235][SQL] Support purging of old logs in MetadataLog ## What changes were proposed in this pull request? This patch adds a purge interface to MetadataLog, and an implementation in HDFSMetadataLog. The purge function is currently unused, but I will use it to purge old execution and file source logs in follow-up patches. These changes are required in a production structured streaming job that runs for a long period of time. ## How was this patch tested? Added a unit test case in HDFSMetadataLogSuite. Author: petermaxleeCloses #14802 from petermaxlee/SPARK-17235. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f64a1ddd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f64a1ddd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f64a1ddd Branch: refs/heads/master Commit: f64a1ddd09a34d5d867ccbaba46204d75fad038d Parents: a11d10f Author: petermaxlee Authored: Fri Aug 26 16:05:34 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 26 16:05:34 2016 -0700 -- .../execution/streaming/HDFSMetadataLog.scala | 14 ++ .../sql/execution/streaming/MetadataLog.scala | 6 + .../streaming/HDFSMetadataLogSuite.scala| 27 +--- 3 files changed, 43 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala index 2b6f76c..127ece9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala @@ -227,6 +227,20 @@ class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: String) None } + /** + * Removes all the log entry earlier than thresholdBatchId (exclusive). + */ + override def purge(thresholdBatchId: Long): Unit = { +val batchIds = fileManager.list(metadataPath, batchFilesFilter) + .map(f => pathToBatchId(f.getPath)) + +for (batchId <- batchIds if batchId < thresholdBatchId) { + val path = batchIdToPath(batchId) + fileManager.delete(path) + logTrace(s"Removed metadata log file: $path") +} + } + private def createFileManager(): FileManager = { val hadoopConf = sparkSession.sessionState.newHadoopConf() try { http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala index cc70e1d..78d6be1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala @@ -48,4 +48,10 @@ trait MetadataLog[T] { * Return the latest batch Id and its metadata if exist. */ def getLatest(): Option[(Long, T)] + + /** + * Removes all the log entry earlier than thresholdBatchId (exclusive). + * This operation should be idempotent. + */ + def purge(thresholdBatchId: Long): Unit } http://git-wip-us.apache.org/repos/asf/spark/blob/f64a1ddd/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala index ab5a2d2..4259384 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala @@ -46,14 +46,14 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { test("FileManager: FileContextManager") { withTempDir { temp => val path = new Path(temp.getAbsolutePath) - testManager(path, new FileContextManager(path, new Configuration)) + testFileManager(path, new FileContextManager(path, new Configuration)) } } test("FileManager: FileSystemManager") { withTempDir { temp => val path
spark git commit: [SPARK-17246][SQL] Add BigDecimal literal
Repository: spark Updated Branches: refs/heads/branch-2.0 deb6a54cf -> 52feb3fbf [SPARK-17246][SQL] Add BigDecimal literal ## What changes were proposed in this pull request? This PR adds parser support for `BigDecimal` literals. If you append the suffix `BD` to a valid number then this will be interpreted as a `BigDecimal`, for example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and precision 3. This is useful in situations where you need exact values. ## How was this patch tested? Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and `SQLQueryTestSuite`. Author: Herman van HovellCloses #14819 from hvanhovell/SPARK-17246. (cherry picked from commit a11d10f1826b578ff721c4738224eef2b3c3b9f3) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52feb3fb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52feb3fb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52feb3fb Branch: refs/heads/branch-2.0 Commit: 52feb3fbf75a234d041703e3ac41884294ab0b64 Parents: deb6a54 Author: Herman van Hovell Authored: Fri Aug 26 13:29:22 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 26 13:29:30 2016 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 6 + .../sql/catalyst/expressions/literals.scala | 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 16 - .../catalyst/parser/ExpressionParserSuite.scala | 7 ++ .../resources/sql-tests/inputs/literals.sql | 6 + .../sql-tests/results/literals.sql.out | 24 +++- .../catalyst/ExpressionSQLBuilderSuite.scala| 1 + 7 files changed, 59 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 51f3804..ecb7c8a 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -626,6 +626,7 @@ number | MINUS? SMALLINT_LITERAL #smallIntLiteral | MINUS? TINYINT_LITERAL #tinyIntLiteral | MINUS? DOUBLE_LITERAL #doubleLiteral +| MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral ; nonReserved @@ -920,6 +921,11 @@ DOUBLE_LITERAL (INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'D' ; +BIGDECIMAL_LITERAL +: +(INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'BD' +; + IDENTIFIER : (LETTER | DIGIT | '_')+ ; http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 730a7f6..41e3952 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -266,7 +266,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" case _ => v + "D" } -case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" +case (v: Decimal, t: DecimalType) => v + "BD" case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'" case (v: Long, TimestampType) => s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')" case _ => value.toString http://git-wip-us.apache.org/repos/asf/spark/blob/52feb3fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index aec3126..0451abe 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -26,7 +26,8 @@ import org.antlr.v4.runtime.{ParserRuleContext, Token} import org.antlr.v4.runtime.tree.{ParseTree, RuleNode,
spark git commit: [SPARK-17246][SQL] Add BigDecimal literal
Repository: spark Updated Branches: refs/heads/master 8e5475be3 -> a11d10f18 [SPARK-17246][SQL] Add BigDecimal literal ## What changes were proposed in this pull request? This PR adds parser support for `BigDecimal` literals. If you append the suffix `BD` to a valid number then this will be interpreted as a `BigDecimal`, for example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and precision 3. This is useful in situations where you need exact values. ## How was this patch tested? Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and `SQLQueryTestSuite`. Author: Herman van HovellCloses #14819 from hvanhovell/SPARK-17246. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a11d10f1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a11d10f1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a11d10f1 Branch: refs/heads/master Commit: a11d10f1826b578ff721c4738224eef2b3c3b9f3 Parents: 8e5475b Author: Herman van Hovell Authored: Fri Aug 26 13:29:22 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 26 13:29:22 2016 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 6 + .../sql/catalyst/expressions/literals.scala | 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 16 - .../catalyst/parser/ExpressionParserSuite.scala | 7 ++ .../resources/sql-tests/inputs/literals.sql | 6 + .../sql-tests/results/literals.sql.out | 24 +++- .../catalyst/ExpressionSQLBuilderSuite.scala| 1 + 7 files changed, 59 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index cab7c3f..a8af840 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -633,6 +633,7 @@ number | MINUS? SMALLINT_LITERAL #smallIntLiteral | MINUS? TINYINT_LITERAL #tinyIntLiteral | MINUS? DOUBLE_LITERAL #doubleLiteral +| MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral ; nonReserved @@ -928,6 +929,11 @@ DOUBLE_LITERAL (INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'D' ; +BIGDECIMAL_LITERAL +: +(INTEGER_VALUE | DECIMAL_VALUE | SCIENTIFIC_DECIMAL_VALUE) 'BD' +; + IDENTIFIER : (LETTER | DIGIT | '_')+ ; http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 730a7f6..41e3952 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -266,7 +266,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" case _ => v + "D" } -case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" +case (v: Decimal, t: DecimalType) => v + "BD" case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'" case (v: Long, TimestampType) => s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')" case _ => value.toString http://git-wip-us.apache.org/repos/asf/spark/blob/a11d10f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 8b98efc..893db93 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -26,7 +26,8 @@ import org.antlr.v4.runtime.{ParserRuleContext, Token} import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode} import org.apache.spark.internal.Logging -import org.apache.spark.sql.catalyst.{FunctionIdentifier, InternalRow,
spark git commit: [SPARK-17242][DOCUMENT] Update links of external dstream projects
Repository: spark Updated Branches: refs/heads/branch-2.0 73014a2aa -> 27ed6d5dc [SPARK-17242][DOCUMENT] Update links of external dstream projects ## What changes were proposed in this pull request? Updated links of external dstream projects. ## How was this patch tested? Just document changes. Author: Shixiong ZhuCloses #14814 from zsxwing/dstream-link. (cherry picked from commit 341e0e778dff8c404b47d34ee7661b658bb91880) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27ed6d5d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27ed6d5d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27ed6d5d Branch: refs/heads/branch-2.0 Commit: 27ed6d5dcd521b4ff1ebe777b03a03ba103d6e76 Parents: 73014a2 Author: Shixiong Zhu Authored: Thu Aug 25 21:08:42 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 25 21:08:48 2016 -0700 -- docs/streaming-programming-guide.md | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/27ed6d5d/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 14e1744..b92ca92 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -656,7 +656,7 @@ methods for creating DStreams from files as input sources. Python API `fileStream` is not available in the Python API, only `textFileStream` is available. - **Streams based on Custom Receivers:** DStreams can be created with data streams received through custom receivers. See the [Custom Receiver - Guide](streaming-custom-receivers.html) and [DStream Akka](https://github.com/spark-packages/dstream-akka) for more details. + Guide](streaming-custom-receivers.html) for more details. - **Queue of RDDs as a Stream:** For testing a Spark Streaming application with test data, one can also create a DStream based on a queue of RDDs, using `streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue will be treated as a batch of data in the DStream, and processed like a stream. @@ -2383,11 +2383,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are - [Kafka Integration Guide](streaming-kafka-integration.html) - [Kinesis Integration Guide](streaming-kinesis-integration.html) - [Custom Receiver Guide](streaming-custom-receivers.html) -* External DStream data sources: -- [DStream MQTT](https://github.com/spark-packages/dstream-mqtt) -- [DStream Twitter](https://github.com/spark-packages/dstream-twitter) -- [DStream Akka](https://github.com/spark-packages/dstream-akka) -- [DStream ZeroMQ](https://github.com/spark-packages/dstream-zeromq) +* Third-party DStream data sources can be found in [Spark Packages](https://spark-packages.org/) * API documentation - Scala docs * [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17242][DOCUMENT] Update links of external dstream projects
Repository: spark Updated Branches: refs/heads/master b964a172a -> 341e0e778 [SPARK-17242][DOCUMENT] Update links of external dstream projects ## What changes were proposed in this pull request? Updated links of external dstream projects. ## How was this patch tested? Just document changes. Author: Shixiong ZhuCloses #14814 from zsxwing/dstream-link. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/341e0e77 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/341e0e77 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/341e0e77 Branch: refs/heads/master Commit: 341e0e778dff8c404b47d34ee7661b658bb91880 Parents: b964a17 Author: Shixiong Zhu Authored: Thu Aug 25 21:08:42 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 25 21:08:42 2016 -0700 -- docs/streaming-programming-guide.md | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/341e0e77/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index df94e95..82d3647 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -656,7 +656,7 @@ methods for creating DStreams from files as input sources. Python API `fileStream` is not available in the Python API, only `textFileStream` is available. - **Streams based on Custom Receivers:** DStreams can be created with data streams received through custom receivers. See the [Custom Receiver - Guide](streaming-custom-receivers.html) and [DStream Akka](https://github.com/spark-packages/dstream-akka) for more details. + Guide](streaming-custom-receivers.html) for more details. - **Queue of RDDs as a Stream:** For testing a Spark Streaming application with test data, one can also create a DStream based on a queue of RDDs, using `streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue will be treated as a batch of data in the DStream, and processed like a stream. @@ -2383,11 +2383,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are - [Kafka Integration Guide](streaming-kafka-integration.html) - [Kinesis Integration Guide](streaming-kinesis-integration.html) - [Custom Receiver Guide](streaming-custom-receivers.html) -* External DStream data sources: -- [DStream MQTT](https://github.com/spark-packages/dstream-mqtt) -- [DStream Twitter](https://github.com/spark-packages/dstream-twitter) -- [DStream Akka](https://github.com/spark-packages/dstream-akka) -- [DStream ZeroMQ](https://github.com/spark-packages/dstream-zeromq) +* Third-party DStream data sources can be found in [Spark Packages](https://spark-packages.org/) * API documentation - Scala docs * [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.
Repository: spark Updated Branches: refs/heads/master 4d0706d61 -> 5f02d2e5b [SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` could be removed. ## What changes were proposed in this pull request? Method `SQLContext.parseDataType(dataTypeString: String)` could be removed, we should use `SparkSession.parseDataType(dataTypeString: String)` instead. This require updating PySpark. ## How was this patch tested? Existing test cases. Author: jiangxingboCloses #14790 from jiangxb1987/parseDataType. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5f02d2e5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5f02d2e5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5f02d2e5 Branch: refs/heads/master Commit: 5f02d2e5b4d37f554629cbd0e488e856fffd7b6b Parents: 4d0706d Author: jiangxingbo Authored: Wed Aug 24 23:36:04 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 24 23:36:04 2016 -0700 -- python/pyspark/sql/column.py | 7 +++ python/pyspark/sql/functions.py | 6 +++--- python/pyspark/sql/readwriter.py | 4 +++- python/pyspark/sql/streaming.py | 4 +++- python/pyspark/sql/tests.py | 2 +- python/pyspark/sql/types.py | 6 +++--- .../src/main/scala/org/apache/spark/sql/SQLContext.scala | 10 -- 7 files changed, 16 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/column.py -- diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 4b99f30..8d5adc8 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -328,10 +328,9 @@ class Column(object): if isinstance(dataType, basestring): jc = self._jc.cast(dataType) elif isinstance(dataType, DataType): -from pyspark.sql import SQLContext -sc = SparkContext.getOrCreate() -ctx = SQLContext.getOrCreate(sc) -jdt = ctx._ssql_ctx.parseDataType(dataType.json()) +from pyspark.sql import SparkSession +spark = SparkSession.builder.getOrCreate() +jdt = spark._jsparkSession.parseDataType(dataType.json()) jc = self._jc.cast(jdt) else: raise TypeError("unexpected type: %s" % type(dataType)) http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/functions.py -- diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 4ea83e2..89b3c07 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -1760,11 +1760,11 @@ class UserDefinedFunction(object): self._judf = self._create_judf(name) def _create_judf(self, name): -from pyspark.sql import SQLContext +from pyspark.sql import SparkSession sc = SparkContext.getOrCreate() wrapped_func = _wrap_function(sc, self.func, self.returnType) -ctx = SQLContext.getOrCreate(sc) -jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) +spark = SparkSession.builder.getOrCreate() +jdt = spark._jsparkSession.parseDataType(self.returnType.json()) if name is None: f = self.func name = f.__name__ if hasattr(f, '__name__') else f.__class__.__name__ http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 3da6f49..3d79e0c 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -98,9 +98,11 @@ class DataFrameReader(OptionUtils): :param schema: a :class:`pyspark.sql.types.StructType` object """ +from pyspark.sql import SparkSession if not isinstance(schema, StructType): raise TypeError("schema should be StructType") -jschema = self._spark._ssql_ctx.parseDataType(schema.json()) +spark = SparkSession.builder.getOrCreate() +jschema = spark._jsparkSession.parseDataType(schema.json()) self._jreader = self._jreader.schema(jschema) return self http://git-wip-us.apache.org/repos/asf/spark/blob/5f02d2e5/python/pyspark/sql/streaming.py -- diff --git a/python/pyspark/sql/streaming.py
spark git commit: [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
Repository: spark Updated Branches: refs/heads/branch-2.0 3258f27a8 -> aa57083af [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints ## What changes were proposed in this pull request? Given that filters based on non-deterministic constraints shouldn't be pushed down in the query plan, unnecessarily inferring them is confusing and a source of potential bugs. This patch simplifies the inferring logic by simply ignoring them. ## How was this patch tested? Added a new test in `ConstraintPropagationSuite`. Author: Sameer AgarwalCloses #14795 from sameeragarwal/deterministic-constraints. (cherry picked from commit ac27557eb622a257abeb3e8551f06ebc72f87133) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aa57083a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aa57083a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aa57083a Branch: refs/heads/branch-2.0 Commit: aa57083af4cecb595bac09e437607d7142b54913 Parents: 3258f27 Author: Sameer Agarwal Authored: Wed Aug 24 21:24:24 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 24 21:24:31 2016 -0700 -- .../spark/sql/catalyst/plans/QueryPlan.scala | 3 ++- .../plans/ConstraintPropagationSuite.scala | 17 + 2 files changed, 19 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/aa57083a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index cf34f4b..9c60590 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala @@ -35,7 +35,8 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT .union(inferAdditionalConstraints(constraints)) .union(constructIsNotNullConstraints(constraints)) .filter(constraint => -constraint.references.nonEmpty && constraint.references.subsetOf(outputSet)) +constraint.references.nonEmpty && constraint.references.subsetOf(outputSet) && + constraint.deterministic) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/aa57083a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala index 5a76969..8d6a49a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala @@ -352,4 +352,21 @@ class ConstraintPropagationSuite extends SparkFunSuite { verifyConstraints(tr.analyze.constraints, ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "b")), IsNotNull(resolveColumn(tr, "c") } + + test("not infer non-deterministic constraints") { +val tr = LocalRelation('a.int, 'b.string, 'c.int) + +verifyConstraints(tr + .where('a.attr === Rand(0)) + .analyze.constraints, + ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "a") + +verifyConstraints(tr + .where('a.attr === InputFileName()) + .where('a.attr =!= 'c.attr) + .analyze.constraints, + ExpressionSet(Seq(resolveColumn(tr, "a") =!= resolveColumn(tr, "c"), +IsNotNull(resolveColumn(tr, "a")), +IsNotNull(resolveColumn(tr, "c") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
Repository: spark Updated Branches: refs/heads/master 3a60be4b1 -> ac27557eb [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints ## What changes were proposed in this pull request? Given that filters based on non-deterministic constraints shouldn't be pushed down in the query plan, unnecessarily inferring them is confusing and a source of potential bugs. This patch simplifies the inferring logic by simply ignoring them. ## How was this patch tested? Added a new test in `ConstraintPropagationSuite`. Author: Sameer AgarwalCloses #14795 from sameeragarwal/deterministic-constraints. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac27557e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac27557e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac27557e Branch: refs/heads/master Commit: ac27557eb622a257abeb3e8551f06ebc72f87133 Parents: 3a60be4 Author: Sameer Agarwal Authored: Wed Aug 24 21:24:24 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 24 21:24:24 2016 -0700 -- .../spark/sql/catalyst/plans/QueryPlan.scala | 3 ++- .../plans/ConstraintPropagationSuite.scala | 17 + 2 files changed, 19 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index 8ee31f4..0fb6e7d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala @@ -35,7 +35,8 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT .union(inferAdditionalConstraints(constraints)) .union(constructIsNotNullConstraints(constraints)) .filter(constraint => -constraint.references.nonEmpty && constraint.references.subsetOf(outputSet)) +constraint.references.nonEmpty && constraint.references.subsetOf(outputSet) && + constraint.deterministic) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/ac27557e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala index 5a76969..8d6a49a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala @@ -352,4 +352,21 @@ class ConstraintPropagationSuite extends SparkFunSuite { verifyConstraints(tr.analyze.constraints, ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "b")), IsNotNull(resolveColumn(tr, "c") } + + test("not infer non-deterministic constraints") { +val tr = LocalRelation('a.int, 'b.string, 'c.int) + +verifyConstraints(tr + .where('a.attr === Rand(0)) + .analyze.constraints, + ExpressionSet(Seq(IsNotNull(resolveColumn(tr, "a") + +verifyConstraints(tr + .where('a.attr === InputFileName()) + .where('a.attr =!= 'c.attr) + .analyze.constraints, + ExpressionSet(Seq(resolveColumn(tr, "a") =!= resolveColumn(tr, "c"), +IsNotNull(resolveColumn(tr, "a")), +IsNotNull(resolveColumn(tr, "c") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat options for CSV and JSON
Repository: spark Updated Branches: refs/heads/branch-2.0 9f363a690 -> 3258f27a8 [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write dateFormat/timestampFormat options for CSV and JSON ## What changes were proposed in this pull request? This PR backports https://github.com/apache/spark/pull/14279 to 2.0. ## How was this patch tested? Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing tests cover the default cases. Author: hyukjinkwonCloses #14799 from HyukjinKwon/SPARK-16216-json-csv-backport. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3258f27a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3258f27a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3258f27a Branch: refs/heads/branch-2.0 Commit: 3258f27a881dfeb5ab8bae90c338603fa4b6f9d8 Parents: 9f363a6 Author: hyukjinkwon Authored: Wed Aug 24 21:19:35 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 24 21:19:35 2016 -0700 -- python/pyspark/sql/readwriter.py| 56 +-- python/pyspark/sql/streaming.py | 30 +++- .../org/apache/spark/sql/DataFrameReader.scala | 17 +- .../org/apache/spark/sql/DataFrameWriter.scala | 12 ++ .../datasources/csv/CSVInferSchema.scala| 42 ++--- .../execution/datasources/csv/CSVOptions.scala | 15 +- .../execution/datasources/csv/CSVRelation.scala | 43 - .../datasources/json/JSONOptions.scala | 9 ++ .../datasources/json/JacksonGenerator.scala | 14 +- .../datasources/json/JacksonParser.scala| 68 .../datasources/json/JsonFileFormat.scala | 5 +- .../spark/sql/streaming/DataStreamReader.scala | 16 +- .../datasources/csv/CSVInferSchemaSuite.scala | 4 +- .../execution/datasources/csv/CSVSuite.scala| 156 ++- .../datasources/csv/CSVTypeCastSuite.scala | 17 +- .../execution/datasources/json/JsonSuite.scala | 74 - .../datasources/json/TestJsonData.scala | 6 + .../sql/sources/JsonHadoopFsRelationSuite.scala | 4 + 18 files changed, 478 insertions(+), 110 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3258f27a/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 64de33e..3da6f49 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -156,7 +156,7 @@ class DataFrameReader(OptionUtils): def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, allowComments=None, allowUnquotedFieldNames=None, allowSingleQuotes=None, allowNumericLeadingZero=None, allowBackslashEscapingAnyCharacter=None, - mode=None, columnNameOfCorruptRecord=None): + mode=None, columnNameOfCorruptRecord=None, dateFormat=None, timestampFormat=None): """ Loads a JSON file (one object per line) or an RDD of Strings storing JSON objects (one object per record) and returns the result as a :class`DataFrame`. @@ -198,6 +198,14 @@ class DataFrameReader(OptionUtils): ``spark.sql.columnNameOfCorruptRecord``. If None is set, it uses the value specified in ``spark.sql.columnNameOfCorruptRecord``. +:param dateFormat: sets the string that indicates a date format. Custom date formats + follow the formats at ``java.text.SimpleDateFormat``. This + applies to date type. If None is set, it uses the + default value value, ``-MM-dd``. +:param timestampFormat: sets the string that indicates a timestamp format. Custom date +formats follow the formats at ``java.text.SimpleDateFormat``. +This applies to timestamp type. If None is set, it uses the +default value value, ``-MM-dd'T'HH:mm:ss.SSSZZ``. >>> df1 = spark.read.json('python/test_support/sql/people.json') >>> df1.dtypes @@ -213,7 +221,8 @@ class DataFrameReader(OptionUtils): allowComments=allowComments, allowUnquotedFieldNames=allowUnquotedFieldNames, allowSingleQuotes=allowSingleQuotes, allowNumericLeadingZero=allowNumericLeadingZero, allowBackslashEscapingAnyCharacter=allowBackslashEscapingAnyCharacter, -mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord) +mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord,
spark git commit: [SPARK-17186][SQL] remove catalog table type INDEX
Repository: spark Updated Branches: refs/heads/branch-2.0 a6e6a047b -> df87f161c [SPARK-17186][SQL] remove catalog table type INDEX ## What changes were proposed in this pull request? Actually Spark SQL doesn't support index, the catalog table type `INDEX` is from Hive. However, most operations in Spark SQL can't handle index table, e.g. create table, alter table, etc. Logically index table should be invisible to end users, and Hive also generates special table name for index table to avoid users accessing it directly. Hive has special SQL syntax to create/show/drop index tables. At Spark SQL side, although we can describe index table directly, but the result is unreadable, we should use the dedicated SQL syntax to do it(e.g. `SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the result is always empty.(Can hive read index table directly?) This PR remove the table type `INDEX`, to make it clear that Spark SQL doesn't support index currently. ## How was this patch tested? existing tests. Author: Wenchen FanCloses #14752 from cloud-fan/minor2. (cherry picked from commit 52fa45d62a5a0bc832442f38f9e634c5d8e29e08) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df87f161 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df87f161 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df87f161 Branch: refs/heads/branch-2.0 Commit: df87f161c9e40a49235ea722f6a662a488b41c4c Parents: a6e6a04 Author: Wenchen Fan Authored: Tue Aug 23 23:46:09 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 23 23:46:17 2016 -0700 -- .../org/apache/spark/sql/catalyst/catalog/interface.scala| 1 - .../org/apache/spark/sql/execution/command/tables.scala | 8 +++- .../scala/org/apache/spark/sql/hive/MetastoreRelation.scala | 1 - .../org/apache/spark/sql/hive/client/HiveClientImpl.scala| 4 ++-- .../apache/spark/sql/hive/execution/HiveCommandSuite.scala | 2 +- 5 files changed, 6 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index 6197aca..c083cf6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -203,7 +203,6 @@ case class CatalogTableType private(name: String) object CatalogTableType { val EXTERNAL = new CatalogTableType("EXTERNAL") val MANAGED = new CatalogTableType("MANAGED") - val INDEX = new CatalogTableType("INDEX") val VIEW = new CatalogTableType("VIEW") } http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index b2300b4..a5ccbcf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -678,12 +678,11 @@ case class ShowPartitionsCommand( * Validate and throws an [[AnalysisException]] exception under the following conditions: * 1. If the table is not partitioned. * 2. If it is a datasource table. - * 3. If it is a view or index table. + * 3. If it is a view. */ -if (tab.tableType == VIEW || - tab.tableType == INDEX) { +if (tab.tableType == VIEW) { throw new AnalysisException( -s"SHOW PARTITIONS is not allowed on a view or index table: ${tab.qualifiedName}") +s"SHOW PARTITIONS is not allowed on a view: ${tab.qualifiedName}") } if (!DDLUtils.isTablePartitioned(tab)) { @@ -765,7 +764,6 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman case EXTERNAL => " EXTERNAL TABLE" case VIEW => " VIEW" case MANAGED => " TABLE" - case INDEX => reportUnsupportedError(Seq("index table")) } builder ++= s"CREATE$tableTypeString ${table.quotedString}" http://git-wip-us.apache.org/repos/asf/spark/blob/df87f161/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala
spark git commit: [SPARK-17186][SQL] remove catalog table type INDEX
Repository: spark Updated Branches: refs/heads/master b9994ad05 -> 52fa45d62 [SPARK-17186][SQL] remove catalog table type INDEX ## What changes were proposed in this pull request? Actually Spark SQL doesn't support index, the catalog table type `INDEX` is from Hive. However, most operations in Spark SQL can't handle index table, e.g. create table, alter table, etc. Logically index table should be invisible to end users, and Hive also generates special table name for index table to avoid users accessing it directly. Hive has special SQL syntax to create/show/drop index tables. At Spark SQL side, although we can describe index table directly, but the result is unreadable, we should use the dedicated SQL syntax to do it(e.g. `SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the result is always empty.(Can hive read index table directly?) This PR remove the table type `INDEX`, to make it clear that Spark SQL doesn't support index currently. ## How was this patch tested? existing tests. Author: Wenchen FanCloses #14752 from cloud-fan/minor2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52fa45d6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52fa45d6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52fa45d6 Branch: refs/heads/master Commit: 52fa45d62a5a0bc832442f38f9e634c5d8e29e08 Parents: b9994ad Author: Wenchen Fan Authored: Tue Aug 23 23:46:09 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 23 23:46:09 2016 -0700 -- .../org/apache/spark/sql/catalyst/catalog/interface.scala| 1 - .../org/apache/spark/sql/execution/command/tables.scala | 8 +++- .../scala/org/apache/spark/sql/hive/MetastoreRelation.scala | 1 - .../org/apache/spark/sql/hive/client/HiveClientImpl.scala| 4 ++-- .../apache/spark/sql/hive/execution/HiveCommandSuite.scala | 2 +- 5 files changed, 6 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index f7762e0..83e01f9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -200,7 +200,6 @@ case class CatalogTableType private(name: String) object CatalogTableType { val EXTERNAL = new CatalogTableType("EXTERNAL") val MANAGED = new CatalogTableType("MANAGED") - val INDEX = new CatalogTableType("INDEX") val VIEW = new CatalogTableType("VIEW") } http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 21544a3..b4a15b8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -620,12 +620,11 @@ case class ShowPartitionsCommand( * Validate and throws an [[AnalysisException]] exception under the following conditions: * 1. If the table is not partitioned. * 2. If it is a datasource table. - * 3. If it is a view or index table. + * 3. If it is a view. */ -if (tab.tableType == VIEW || - tab.tableType == INDEX) { +if (tab.tableType == VIEW) { throw new AnalysisException( -s"SHOW PARTITIONS is not allowed on a view or index table: ${tab.qualifiedName}") +s"SHOW PARTITIONS is not allowed on a view: ${tab.qualifiedName}") } if (tab.partitionColumnNames.isEmpty) { @@ -708,7 +707,6 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman case EXTERNAL => " EXTERNAL TABLE" case VIEW => " VIEW" case MANAGED => " TABLE" - case INDEX => reportUnsupportedError(Seq("index table")) } builder ++= s"CREATE$tableTypeString ${table.quotedString}" http://git-wip-us.apache.org/repos/asf/spark/blob/52fa45d6/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala
spark git commit: [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala'
Repository: spark Updated Branches: refs/heads/branch-2.0 a772b4b5d -> a6e6a047b [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala' ## What changes were proposed in this pull request? This PR removes implemented functions from comments of `HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`. ## How was this patch tested? Manual. Author: Weiqing YangCloses #14769 from Sherry302/cleanComment. (cherry picked from commit b9994ad05628077016331e6b411fbc09017b1e63) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6e6a047 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6e6a047 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6e6a047 Branch: refs/heads/branch-2.0 Commit: a6e6a047bb9215df55b009957d4c560624d886fc Parents: a772b4b Author: Weiqing Yang Authored: Tue Aug 23 23:44:45 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 23 23:45:00 2016 -0700 -- .../scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a6e6a047/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala index c59ac3d..1684e8d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala @@ -230,10 +230,8 @@ private[sql] class HiveSessionCatalog( // List of functions we are explicitly not supporting are: // compute_stats, context_ngrams, create_union, // current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field, - // in_file, index, java_method, - // matchpath, ngrams, noop, noopstreaming, noopwithmap, noopwithmapstreaming, - // parse_url_tuple, posexplode, reflect2, - // str_to_map, windowingtablefunction. + // in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap, + // noopwithmapstreaming, parse_url_tuple, reflect2, windowingtablefunction. private val hiveFunctions = Seq( "hash", "histogram_numeric", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala'
Repository: spark Updated Branches: refs/heads/master c1937dd19 -> b9994ad05 [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala' ## What changes were proposed in this pull request? This PR removes implemented functions from comments of `HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`. ## How was this patch tested? Manual. Author: Weiqing YangCloses #14769 from Sherry302/cleanComment. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9994ad0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9994ad0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9994ad0 Branch: refs/heads/master Commit: b9994ad05628077016331e6b411fbc09017b1e63 Parents: c1937dd Author: Weiqing Yang Authored: Tue Aug 23 23:44:45 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 23 23:44:45 2016 -0700 -- .../scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9994ad0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala index ebed9eb..ca8c734 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala @@ -230,10 +230,8 @@ private[sql] class HiveSessionCatalog( // List of functions we are explicitly not supporting are: // compute_stats, context_ngrams, create_union, // current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field, - // in_file, index, java_method, - // matchpath, ngrams, noop, noopstreaming, noopwithmap, noopwithmapstreaming, - // parse_url_tuple, posexplode, reflect2, - // str_to_map, windowingtablefunction. + // in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap, + // noopwithmapstreaming, parse_url_tuple, reflect2, windowingtablefunction. private val hiveFunctions = Seq( "hash", "histogram_numeric", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`
Repository: spark Updated Branches: refs/heads/master bf8ff833e -> c1937dd19 [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` ## What changes were proposed in this pull request? Jira: https://issues.apache.org/jira/browse/SPARK-16862 `BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k buffer to read data off disk. This PR makes it configurable to improve on disk reads. I have made the default value to be 1 MB as with that value I observed improved performance. ## How was this patch tested? I am relying on the existing unit tests. ## Performance After deploying this change to prod and setting the config to 1 mb, there was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time. Author: Tejas PatilCloses #14726 from tejasapatil/spill_buffer_2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1937dd1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1937dd1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1937dd1 Branch: refs/heads/master Commit: c1937dd19a23bd096a4707656c7ba19fb5c16966 Parents: bf8ff83 Author: Tejas Patil Authored: Tue Aug 23 18:48:08 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 23 18:48:08 2016 -0700 -- .../unsafe/sort/UnsafeSorterSpillReader.java| 22 +++- 1 file changed, 21 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c1937dd1/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java index 1d588c3..d048cf7 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java @@ -22,15 +22,21 @@ import java.io.*; import com.google.common.io.ByteStreams; import com.google.common.io.Closeables; +import org.apache.spark.SparkEnv; import org.apache.spark.serializer.SerializerManager; import org.apache.spark.storage.BlockId; import org.apache.spark.unsafe.Platform; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * Reads spill files written by {@link UnsafeSorterSpillWriter} (see that class for a description * of the file format). */ public final class UnsafeSorterSpillReader extends UnsafeSorterIterator implements Closeable { + private static final Logger logger = LoggerFactory.getLogger(UnsafeSorterSpillReader.class); + private static final int DEFAULT_BUFFER_SIZE_BYTES = 1024 * 1024; // 1 MB + private static final int MAX_BUFFER_SIZE_BYTES = 16777216; // 16 mb private InputStream in; private DataInputStream din; @@ -50,7 +56,21 @@ public final class UnsafeSorterSpillReader extends UnsafeSorterIterator implemen File file, BlockId blockId) throws IOException { assert (file.length() > 0); -final BufferedInputStream bs = new BufferedInputStream(new FileInputStream(file)); +long bufferSizeBytes = +SparkEnv.get() == null ? +DEFAULT_BUFFER_SIZE_BYTES: + SparkEnv.get().conf().getSizeAsBytes("spark.unsafe.sorter.spill.reader.buffer.size", + DEFAULT_BUFFER_SIZE_BYTES); +if (bufferSizeBytes > MAX_BUFFER_SIZE_BYTES || bufferSizeBytes < DEFAULT_BUFFER_SIZE_BYTES) { + // fall back to a sane default value + logger.warn("Value of config \"spark.unsafe.sorter.spill.reader.buffer.size\" = {} not in " + + "allowed range [{}, {}). Falling back to default value : {} bytes", bufferSizeBytes, + DEFAULT_BUFFER_SIZE_BYTES, MAX_BUFFER_SIZE_BYTES, DEFAULT_BUFFER_SIZE_BYTES); + bufferSizeBytes = DEFAULT_BUFFER_SIZE_BYTES; +} + +final BufferedInputStream bs = +new BufferedInputStream(new FileInputStream(file), (int) bufferSizeBytes); try { this.in = serializerManager.wrapForCompression(blockId, bs); this.din = new DataInputStream(this.in); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication
Repository: spark Updated Branches: refs/heads/master 71afeeea4 -> 8e223ea67 [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication ## What changes were proposed in this pull request? This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042 ## How was this patch tested? End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch). Author: Eric LiangCloses #14311 from ericl/spark-16550. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e223ea6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e223ea6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e223ea6 Branch: refs/heads/master Commit: 8e223ea67acf5aa730ccf688802f17f6fc10907c Parents: 71afeee Author: Eric Liang Authored: Mon Aug 22 16:32:14 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 16:32:14 2016 -0700 -- .../spark/serializer/SerializerManager.scala| 14 +++- .../org/apache/spark/storage/BlockManager.scala | 13 +++- .../org/apache/spark/DistributedSuite.scala | 77 ++-- .../scala/org/apache/spark/repl/ReplSuite.scala | 14 4 files changed, 60 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala index 9dc274c..07caadb 100644 --- a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala +++ b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala @@ -68,7 +68,7 @@ private[spark] class SerializerManager(defaultSerializer: Serializer, conf: Spar * loaded yet. */ private lazy val compressionCodec: CompressionCodec = CompressionCodec.createCodec(conf) - private def canUseKryo(ct: ClassTag[_]): Boolean = { + def canUseKryo(ct: ClassTag[_]): Boolean = { primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag } @@ -128,8 +128,18 @@ private[spark] class SerializerManager(defaultSerializer: Serializer, conf: Spar /** Serializes into a chunked byte buffer. */ def dataSerialize[T: ClassTag](blockId: BlockId, values: Iterator[T]): ChunkedByteBuffer = { +dataSerializeWithExplicitClassTag(blockId, values, implicitly[ClassTag[T]]) + } + + /** Serializes into a chunked byte buffer. */ + def dataSerializeWithExplicitClassTag( + blockId: BlockId, + values: Iterator[_], + classTag: ClassTag[_]): ChunkedByteBuffer = { val bbos = new ChunkedByteBufferOutputStream(1024 * 1024 * 4, ByteBuffer.allocate) -dataSerializeStream(blockId, bbos, values) +val byteStream = new BufferedOutputStream(bbos) +val ser = getSerializer(classTag).newInstance() +ser.serializeStream(wrapForCompression(blockId, byteStream)).writeAll(values).close() bbos.toChunkedByteBuffer } http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index 015e71d..fe84652 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -498,7 +498,8 @@ private[spark] class BlockManager( diskStore.getBytes(blockId) } else if (level.useMemory && memoryStore.contains(blockId)) { // The block was not found on disk, so serialize an in-memory copy: -serializerManager.dataSerialize(blockId, memoryStore.getValues(blockId).get) +serializerManager.dataSerializeWithExplicitClassTag( + blockId, memoryStore.getValues(blockId).get, info.classTag) } else { handleLocalReadFailure(blockId) } @@ -973,8 +974,16 @@ private[spark] class BlockManager( if (level.replication > 1) { val remoteStartTime = System.currentTimeMillis val bytesToReplicate = doGetLocalBytes(blockId, info) + // [SPARK-16550] Erase the typed classTag when using default serialization, since + // NettyBlockRpcServer crashes when deserializing repl-defined classes. + // TODO(ekl) remove this once the classloader issue on
spark git commit: [SPARK-17162] Range does not support SQL generation
Repository: spark Updated Branches: refs/heads/branch-2.0 6dcc1a3f0 -> 01a4d69f3 [SPARK-17162] Range does not support SQL generation ## What changes were proposed in this pull request? The range operator previously didn't support SQL generation, which made it not possible to use in views. ## How was this patch tested? Unit tests. cc hvanhovell Author: Eric LiangCloses #14724 from ericl/spark-17162. (cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/01a4d69f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/01a4d69f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/01a4d69f Branch: refs/heads/branch-2.0 Commit: 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8 Parents: 6dcc1a3 Author: Eric Liang Authored: Mon Aug 22 15:48:35 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 15:48:43 2016 -0700 -- .../analysis/ResolveTableValuedFunctions.scala | 11 -- .../plans/logical/basicLogicalOperators.scala | 21 +--- .../apache/spark/sql/catalyst/SQLBuilder.scala | 3 +++ .../sql/execution/basicPhysicalOperators.scala | 2 +- .../spark/sql/execution/command/views.scala | 3 +-- sql/hive/src/test/resources/sqlgen/range.sql| 4 .../test/resources/sqlgen/range_with_splits.sql | 4 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 - 8 files changed, 44 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index 7fdf7fa..6b3bb68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} * Rule that resolves table-valued function references. */ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { - private lazy val defaultParallelism = -SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism - /** * List of argument names and their types, used to declare a function. */ @@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { "range" -> Map( /* range(end) */ tvf("end" -> LongType) { case Seq(end: Long) => -Range(0, end, 1, defaultParallelism) +Range(0, end, 1, None) }, /* range(start, end) */ tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: Long) => -Range(start, end, 1, defaultParallelism) +Range(start, end, 1, None) }, /* range(start, end, step) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) { case Seq(start: Long, end: Long, step: Long) => - Range(start, end, step, defaultParallelism) + Range(start, end, step, None) }, /* range(start, end, step, numPartitions) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType, "numPartitions" -> IntegerType) { case Seq(start: Long, end: Long, step: Long, numPartitions: Int) => - Range(start, end, step, numPartitions) + Range(start, end, step, Some(numPartitions)) }) ) http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index eb612c4..07e39b0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -422,17 +422,20 @@ case class Sort( /** Factory for constructing new `Range` nodes. */ object Range { - def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = { + def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { val
spark git commit: [SPARK-17162] Range does not support SQL generation
Repository: spark Updated Branches: refs/heads/master 929cb8bee -> 84770b59f [SPARK-17162] Range does not support SQL generation ## What changes were proposed in this pull request? The range operator previously didn't support SQL generation, which made it not possible to use in views. ## How was this patch tested? Unit tests. cc hvanhovell Author: Eric LiangCloses #14724 from ericl/spark-17162. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/84770b59 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/84770b59 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/84770b59 Branch: refs/heads/master Commit: 84770b59f773f132073cd2af4204957fc2d7bf35 Parents: 929cb8b Author: Eric Liang Authored: Mon Aug 22 15:48:35 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 15:48:35 2016 -0700 -- .../analysis/ResolveTableValuedFunctions.scala | 11 -- .../plans/logical/basicLogicalOperators.scala | 21 +--- .../apache/spark/sql/catalyst/SQLBuilder.scala | 3 +++ .../sql/execution/basicPhysicalOperators.scala | 2 +- .../spark/sql/execution/command/views.scala | 3 +-- sql/hive/src/test/resources/sqlgen/range.sql| 4 .../test/resources/sqlgen/range_with_splits.sql | 4 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 - 8 files changed, 44 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index 7fdf7fa..6b3bb68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} * Rule that resolves table-valued function references. */ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { - private lazy val defaultParallelism = -SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism - /** * List of argument names and their types, used to declare a function. */ @@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { "range" -> Map( /* range(end) */ tvf("end" -> LongType) { case Seq(end: Long) => -Range(0, end, 1, defaultParallelism) +Range(0, end, 1, None) }, /* range(start, end) */ tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: Long) => -Range(start, end, 1, defaultParallelism) +Range(start, end, 1, None) }, /* range(start, end, step) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) { case Seq(start: Long, end: Long, step: Long) => - Range(start, end, step, defaultParallelism) + Range(start, end, step, None) }, /* range(start, end, step, numPartitions) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType, "numPartitions" -> IntegerType) { case Seq(start: Long, end: Long, step: Long, numPartitions: Int) => - Range(start, end, step, numPartitions) + Range(start, end, step, Some(numPartitions)) }) ) http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index af1736e..010aec7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -422,17 +422,20 @@ case class Sort( /** Factory for constructing new `Range` nodes. */ object Range { - def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = { + def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes new Range(start, end, step,
spark git commit: [SPARK-17158][SQL] Change error message for out of range numeric literals
Repository: spark Updated Branches: refs/heads/branch-2.0 efe832200 -> 379b12729 [SPARK-17158][SQL] Change error message for out of range numeric literals ## What changes were proposed in this pull request? Modifies error message for numeric literals to Numeric literal does not fit in range [min, max] for type ## How was this patch tested? Fixed up the error messages for literals.sql in SqlQueryTestSuite and re-ran via sbt. Also fixed up error messages in ExpressionParserSuite Author: Srinath ShankarCloses #14721 from srinathshankar/sc4296. (cherry picked from commit ba1737c21aab91ff3f1a1737aa2d6b07575e36a3) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/379b1272 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/379b1272 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/379b1272 Branch: refs/heads/branch-2.0 Commit: 379b1272925e534d99ddf4e4add054284900d200 Parents: efe8322 Author: Srinath Shankar Authored: Fri Aug 19 19:54:26 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 19 19:54:47 2016 -0700 -- .../spark/sql/catalyst/parser/AstBuilder.scala | 29 .../catalyst/parser/ExpressionParserSuite.scala | 9 -- .../sql-tests/results/literals.sql.out | 6 ++-- 3 files changed, 27 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/379b1272/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 0230294..aec3126 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1273,10 +1273,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** Create a numeric literal expression. */ - private def numericLiteral(ctx: NumberContext)(f: String => Any): Literal = withOrigin(ctx) { -val raw = ctx.getText + private def numericLiteral + (ctx: NumberContext, minValue: BigDecimal, maxValue: BigDecimal, typeName: String) + (converter: String => Any): Literal = withOrigin(ctx) { +val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1) try { - Literal(f(raw.substring(0, raw.length - 1))) + val rawBigDecimal = BigDecimal(rawStrippedQualifier) + if (rawBigDecimal < minValue || rawBigDecimal > maxValue) { +throw new ParseException(s"Numeric literal ${rawStrippedQualifier} does not " + + s"fit in range [${minValue}, ${maxValue}] for type ${typeName}", ctx) + } + Literal(converter(rawStrippedQualifier)) } catch { case e: NumberFormatException => throw new ParseException(e.getMessage, ctx) @@ -1286,29 +1293,29 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { /** * Create a Byte Literal expression. */ - override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = numericLiteral(ctx) { -_.toByte + override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = { +numericLiteral(ctx, Byte.MinValue, Byte.MaxValue, ByteType.simpleString)(_.toByte) } /** * Create a Short Literal expression. */ - override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = numericLiteral(ctx) { -_.toShort + override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = { +numericLiteral(ctx, Short.MinValue, Short.MaxValue, ShortType.simpleString)(_.toShort) } /** * Create a Long Literal expression. */ - override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = numericLiteral(ctx) { -_.toLong + override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = { +numericLiteral(ctx, Long.MinValue, Long.MaxValue, LongType.simpleString)(_.toLong) } /** * Create a Double Literal expression. */ - override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = numericLiteral(ctx) { -_.toDouble + override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = { +numericLiteral(ctx, Double.MinValue, Double.MaxValue, DoubleType.simpleString)(_.toDouble) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/379b1272/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala -- diff
spark git commit: [SPARK-17158][SQL] Change error message for out of range numeric literals
Repository: spark Updated Branches: refs/heads/master a117afa7c -> ba1737c21 [SPARK-17158][SQL] Change error message for out of range numeric literals ## What changes were proposed in this pull request? Modifies error message for numeric literals to Numeric literal does not fit in range [min, max] for type ## How was this patch tested? Fixed up the error messages for literals.sql in SqlQueryTestSuite and re-ran via sbt. Also fixed up error messages in ExpressionParserSuite Author: Srinath ShankarCloses #14721 from srinathshankar/sc4296. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba1737c2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba1737c2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba1737c2 Branch: refs/heads/master Commit: ba1737c21aab91ff3f1a1737aa2d6b07575e36a3 Parents: a117afa Author: Srinath Shankar Authored: Fri Aug 19 19:54:26 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 19 19:54:26 2016 -0700 -- .../spark/sql/catalyst/parser/AstBuilder.scala | 29 .../catalyst/parser/ExpressionParserSuite.scala | 9 -- .../sql-tests/results/literals.sql.out | 6 ++-- 3 files changed, 27 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ba1737c2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 283e4d4..8b98efc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1278,10 +1278,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** Create a numeric literal expression. */ - private def numericLiteral(ctx: NumberContext)(f: String => Any): Literal = withOrigin(ctx) { -val raw = ctx.getText + private def numericLiteral + (ctx: NumberContext, minValue: BigDecimal, maxValue: BigDecimal, typeName: String) + (converter: String => Any): Literal = withOrigin(ctx) { +val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1) try { - Literal(f(raw.substring(0, raw.length - 1))) + val rawBigDecimal = BigDecimal(rawStrippedQualifier) + if (rawBigDecimal < minValue || rawBigDecimal > maxValue) { +throw new ParseException(s"Numeric literal ${rawStrippedQualifier} does not " + + s"fit in range [${minValue}, ${maxValue}] for type ${typeName}", ctx) + } + Literal(converter(rawStrippedQualifier)) } catch { case e: NumberFormatException => throw new ParseException(e.getMessage, ctx) @@ -1291,29 +1298,29 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { /** * Create a Byte Literal expression. */ - override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = numericLiteral(ctx) { -_.toByte + override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = { +numericLiteral(ctx, Byte.MinValue, Byte.MaxValue, ByteType.simpleString)(_.toByte) } /** * Create a Short Literal expression. */ - override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = numericLiteral(ctx) { -_.toShort + override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = { +numericLiteral(ctx, Short.MinValue, Short.MaxValue, ShortType.simpleString)(_.toShort) } /** * Create a Long Literal expression. */ - override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = numericLiteral(ctx) { -_.toLong + override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = { +numericLiteral(ctx, Long.MinValue, Long.MaxValue, LongType.simpleString)(_.toLong) } /** * Create a Double Literal expression. */ - override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = numericLiteral(ctx) { -_.toDouble + override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = { +numericLiteral(ctx, Double.MinValue, Double.MaxValue, DoubleType.simpleString)(_.toDouble) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/ba1737c2/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
spark git commit: [SPARK-17149][SQL] array.sql for testing array related functions
Repository: spark Updated Branches: refs/heads/master acac7a508 -> a117afa7c [SPARK-17149][SQL] array.sql for testing array related functions ## What changes were proposed in this pull request? This patch creates array.sql in SQLQueryTestSuite for testing array related functions, including: - indexing - array creation - size - array_contains - sort_array ## How was this patch tested? The patch itself is about adding tests. Author: petermaxleeCloses #14708 from petermaxlee/SPARK-17149. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a117afa7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a117afa7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a117afa7 Branch: refs/heads/master Commit: a117afa7c2d94f943106542ec53d74ba2b5f1058 Parents: acac7a5 Author: petermaxlee Authored: Fri Aug 19 18:14:45 2016 -0700 Committer: Reynold Xin Committed: Fri Aug 19 18:14:45 2016 -0700 -- .../catalyst/analysis/FunctionRegistry.scala| 12 +- .../test/resources/sql-tests/inputs/array.sql | 86 +++ .../resources/sql-tests/results/array.sql.out | 144 +++ .../org/apache/spark/sql/SQLQuerySuite.scala| 16 --- .../apache/spark/sql/SQLQueryTestSuite.scala| 10 ++ .../hive/execution/HiveCompatibilitySuite.scala | 4 +- .../sql/hive/execution/HiveQuerySuite.scala | 9 -- 7 files changed, 248 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a117afa7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index c5f91c1..35fd800 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -161,7 +161,6 @@ object FunctionRegistry { val expressions: Map[String, (ExpressionInfo, FunctionBuilder)] = Map( // misc non-aggregate functions expression[Abs]("abs"), -expression[CreateArray]("array"), expression[Coalesce]("coalesce"), expression[Explode]("explode"), expression[Greatest]("greatest"), @@ -172,10 +171,6 @@ object FunctionRegistry { expression[IsNull]("isnull"), expression[IsNotNull]("isnotnull"), expression[Least]("least"), -expression[CreateMap]("map"), -expression[MapKeys]("map_keys"), -expression[MapValues]("map_values"), -expression[CreateNamedStruct]("named_struct"), expression[NaNvl]("nanvl"), expression[NullIf]("nullif"), expression[Nvl]("nvl"), @@ -184,7 +179,6 @@ object FunctionRegistry { expression[Rand]("rand"), expression[Randn]("randn"), expression[Stack]("stack"), -expression[CreateStruct]("struct"), expression[CaseWhen]("when"), // math functions @@ -354,9 +348,15 @@ object FunctionRegistry { expression[TimeWindow]("window"), // collection functions +expression[CreateArray]("array"), expression[ArrayContains]("array_contains"), +expression[CreateMap]("map"), +expression[CreateNamedStruct]("named_struct"), +expression[MapKeys]("map_keys"), +expression[MapValues]("map_values"), expression[Size]("size"), expression[SortArray]("sort_array"), +expression[CreateStruct]("struct"), // misc functions expression[AssertTrue]("assert_true"), http://git-wip-us.apache.org/repos/asf/spark/blob/a117afa7/sql/core/src/test/resources/sql-tests/inputs/array.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql b/sql/core/src/test/resources/sql-tests/inputs/array.sql new file mode 100644 index 000..4038a0d --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql @@ -0,0 +1,86 @@ +-- test cases for array functions + +create temporary view data as select * from values + ("one", array(11, 12, 13), array(array(111, 112, 113), array(121, 122, 123))), + ("two", array(21, 22, 23), array(array(211, 212, 213), array(221, 222, 223))) + as data(a, b, c); + +select * from data; + +-- index into array +select a, b[0], b[0] + b[1] from data; + +-- index into array of arrays +select a, c[0][0] + c[0][0 + 1] from data; + + +create temporary view primitive_arrays as select * from values ( + array(true), + array(2Y, 1Y), + array(2S, 1S), + array(2, 1), + array(2L, 1L), + array(9223372036854775809, 9223372036854775808), + array(2.0D, 1.0D),
spark git commit: [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning
Repository: spark Updated Branches: refs/heads/branch-2.0 d0707c6ba -> 3276ccfac [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning We push down `Project` through `Sample` in `Optimizer` by the rule `PushProjectThroughSample`. However, if the projected columns produce new output, they will encounter whole data instead of sampled data. It will bring some inconsistency between original plan (Sample then Project) and optimized plan (Project then Sample). In the extreme case such as attached in the JIRA, if the projected column is an UDF which is supposed to not see the sampled out data, the result of UDF will be incorrect. Since the rule `ColumnPruning` already handles general `Project` pushdown. We don't need `PushProjectThroughSample` anymore. The rule `ColumnPruning` also avoids the described issue. Jenkins tests. Author: Liang-Chi HsiehCloses #14327 from viirya/fix-sample-pushdown. (cherry picked from commit 7b06a8948fc16d3c14e240fdd632b79ce1651008) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3276ccfa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3276ccfa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3276ccfa Branch: refs/heads/branch-2.0 Commit: 3276ccfac807514d5a959415bcf58d2aa6ed8fbc Parents: d0707c6 Author: Liang-Chi Hsieh Authored: Tue Jul 26 12:00:01 2016 +0800 Committer: Reynold Xin Committed: Fri Aug 19 11:18:55 2016 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 12 -- .../catalyst/optimizer/ColumnPruningSuite.scala | 15 .../optimizer/FilterPushdownSuite.scala | 17 - .../org/apache/spark/sql/DatasetSuite.scala | 25 4 files changed, 40 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3276ccfa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 19d3c39..88cc0e4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -75,7 +75,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: CatalystConf) Batch("Operator Optimizations", fixedPoint, // Operator push down PushThroughSetOperations, - PushProjectThroughSample, ReorderJoin, EliminateOuterJoin, PushPredicateThroughJoin, @@ -147,17 +146,6 @@ class SimpleTestOptimizer extends Optimizer( new SimpleCatalystConf(caseSensitiveAnalysis = true)) /** - * Pushes projects down beneath Sample to enable column pruning with sampling. - */ -object PushProjectThroughSample extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -// Push down projection into sample -case Project(projectList, Sample(lb, up, replace, seed, child)) => - Sample(lb, up, replace, seed, Project(projectList, child))() - } -} - -/** * Removes the Project only conducting Alias of its child node. * It is created mainly for removing extra Project added in EliminateSerialization rule, * but can also benefit other operators. http://git-wip-us.apache.org/repos/asf/spark/blob/3276ccfa/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala index b5664a5..589607e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala @@ -346,5 +346,20 @@ class ColumnPruningSuite extends PlanTest { comparePlans(Optimize.execute(plan1.analyze), correctAnswer1) } + test("push project down into sample") { +val testRelation = LocalRelation('a.int, 'b.int, 'c.int) +val x = testRelation.subquery('x) + +val query1 = Sample(0.0, 0.6, false, 11L, x)().select('a) +val optimized1 = Optimize.execute(query1.analyze) +val expected1 = Sample(0.0, 0.6, false, 11L, x.select('a))() +comparePlans(optimized1, expected1.analyze) + +val query2 = Sample(0.0, 0.6, false, 11L, x)().select('a
spark git commit: HOTFIX: compilation broken due to protected ctor.
Repository: spark Updated Branches: refs/heads/branch-2.0 c180d637a -> 05b180faa HOTFIX: compilation broken due to protected ctor. (cherry picked from commit b482c09fa22c5762a355f95820e4ba3e2517fb77) Signed-off-by: Reynold XinProject: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/05b180fa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/05b180fa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/05b180fa Branch: refs/heads/branch-2.0 Commit: 05b180faa4bd87498516c05d4769cc2f51d56aae Parents: c180d63 Author: Reynold Xin Authored: Thu Aug 18 19:02:32 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 18 19:03:00 2016 -0700 -- .../org/apache/spark/sql/catalyst/expressions/literals.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/05b180fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 95ed68f..7040008 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -163,8 +163,7 @@ object DecimalLiteral { /** * In order to do type checking, use Literal.create() instead of constructor */ -case class Literal protected (value: Any, dataType: DataType) - extends LeafExpression with CodegenFallback { +case class Literal (value: Any, dataType: DataType) extends LeafExpression with CodegenFallback { override def foldable: Boolean = true override def nullable: Boolean = value == null - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: HOTFIX: compilation broken due to protected ctor.
Repository: spark Updated Branches: refs/heads/master f5472dda5 -> b482c09fa HOTFIX: compilation broken due to protected ctor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b482c09f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b482c09f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b482c09f Branch: refs/heads/master Commit: b482c09fa22c5762a355f95820e4ba3e2517fb77 Parents: f5472dd Author: Reynold XinAuthored: Thu Aug 18 19:02:32 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 18 19:02:32 2016 -0700 -- .../org/apache/spark/sql/catalyst/expressions/literals.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b482c09f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index 95ed68f..7040008 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -163,8 +163,7 @@ object DecimalLiteral { /** * In order to do type checking, use Literal.create() instead of constructor */ -case class Literal protected (value: Any, dataType: DataType) - extends LeafExpression with CodegenFallback { +case class Literal (value: Any, dataType: DataType) extends LeafExpression with CodegenFallback { override def foldable: Boolean = true override def nullable: Boolean = value == null - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables
Repository: spark Updated Branches: refs/heads/branch-2.0 ea684b69c -> c180d637a [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables This patch improves inline table support with the following: 1. Support type coercion. 2. Support using foldable expressions. Previously only literals were supported. 3. Improve error message handling. 4. Improve test coverage. Added a new unit test suite ResolveInlineTablesSuite and a new file-based end-to-end test inline-table.sql. Author: petermaxleeCloses #14676 from petermaxlee/SPARK-16947. (cherry picked from commit f5472dda51b980a726346587257c22873ff708e3) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c180d637 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c180d637 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c180d637 Branch: refs/heads/branch-2.0 Commit: c180d637a3caca0d4e46f4980c10d1005eb453bc Parents: ea684b6 Author: petermaxlee Authored: Fri Aug 19 09:19:47 2016 +0800 Committer: Reynold Xin Committed: Thu Aug 18 18:37:40 2016 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 1 + .../catalyst/analysis/ResolveInlineTables.scala | 112 ++ .../sql/catalyst/analysis/TypeCoercion.scala| 2 +- .../sql/catalyst/analysis/unresolved.scala | 26 +++- .../spark/sql/catalyst/parser/AstBuilder.scala | 41 ++ .../analysis/ResolveInlineTablesSuite.scala | 101 + .../sql/catalyst/parser/PlanParserSuite.scala | 22 +-- .../resources/sql-tests/inputs/inline-table.sql | 48 ++ .../sql-tests/results/inline-table.sql.out | 145 +++ 9 files changed, 452 insertions(+), 46 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c180d637/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index e0b8166..14e995e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -108,6 +108,7 @@ class Analyzer( GlobalAggregates :: ResolveAggregateFunctions :: TimeWindowing :: + ResolveInlineTables :: TypeCoercion.typeCoercionRules ++ extendedResolutionRules : _*), Batch("Nondeterministic", Once, http://git-wip-us.apache.org/repos/asf/spark/blob/c180d637/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala new file mode 100644 index 000..7323197 --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import scala.util.control.NonFatal + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Cast +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types.{StructField, StructType} + +/** + * An analyzer rule that replaces [[UnresolvedInlineTable]] with [[LocalRelation]]. + */ +object ResolveInlineTables extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case table: UnresolvedInlineTable if table.expressionsResolved => +
spark git commit: [SPARK-17069] Expose spark.range() as table-valued function in SQL
Repository: spark Updated Branches: refs/heads/branch-2.0 176af17a7 -> ea684b69c [SPARK-17069] Expose spark.range() as table-valued function in SQL This adds analyzer rules for resolving table-valued functions, and adds one builtin implementation for range(). The arguments for range() are the same as those of `spark.range()`. Unit tests. cc hvanhovell Author: Eric LiangCloses #14656 from ericl/sc-4309. (cherry picked from commit 412dba63b511474a6db3c43c8618d803e604bc6b) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea684b69 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea684b69 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea684b69 Branch: refs/heads/branch-2.0 Commit: ea684b69cd6934bc093f4a5a8b0d8470e92157cd Parents: 176af17 Author: Eric Liang Authored: Thu Aug 18 13:33:55 2016 +0200 Committer: Reynold Xin Committed: Thu Aug 18 18:36:50 2016 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../spark/sql/catalyst/analysis/Analyzer.scala | 1 + .../analysis/ResolveTableValuedFunctions.scala | 132 +++ .../sql/catalyst/analysis/unresolved.scala | 11 ++ .../spark/sql/catalyst/parser/AstBuilder.scala | 8 ++ .../sql/catalyst/parser/PlanParserSuite.scala | 8 +- .../sql-tests/inputs/table-valued-functions.sql | 20 +++ .../results/table-valued-functions.sql.out | 87 8 files changed, 267 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index aca7282..51f3804 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -426,6 +426,7 @@ relationPrimary | '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery | '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation | inlineTable #inlineTableDefault2 +| identifier '(' (expression (',' expression)*)? ')' #tableValuedFunction ; inlineTable http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 57c3d9a..e0b8166 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -86,6 +86,7 @@ class Analyzer( WindowsSubstitution, EliminateUnions), Batch("Resolution", fixedPoint, + ResolveTableValuedFunctions :: ResolveRelations :: ResolveReferences :: ResolveDeserializer :: http://git-wip-us.apache.org/repos/asf/spark/blob/ea684b69/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala new file mode 100644 index 000..7fdf7fa --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language
spark git commit: [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs
Repository: spark Updated Branches: refs/heads/branch-2.0 3e0163bee -> 68a24d3e7 [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs ## What changes were proposed in this pull request? This should be credited to mvervuurt. The main purpose of this PR is - simply to include the change for the same instance in `DataFrameReader` just to match up. - just avoid duplicately verifying the PR (as I already did). The documentation for both should be the same because both assume the `properties` should be the same `dict` for the same option. ## How was this patch tested? Manually building Python documentation. This will produce the output as below: - `DataFrameReader` ![2016-08-17 11 12 00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png) - `DataFrameWriter` ![2016-08-17 11 12 10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png) Closes #14624 Author: hyukjinkwonAuthor: mvervuurt Closes #14677 from HyukjinKwon/typo-python. (cherry picked from commit 0f6aa8afaacdf0ceca9c2c1650ca26a5c167ae69) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68a24d3e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68a24d3e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68a24d3e Branch: refs/heads/branch-2.0 Commit: 68a24d3e7aa9b40d4557652d3179b0ccb0f8624e Parents: 3e0163b Author: mvervuurt Authored: Tue Aug 16 23:12:59 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 23:13:06 2016 -0700 -- python/pyspark/sql/readwriter.py | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/68a24d3e/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 4020bb3..64de33e 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -401,8 +401,9 @@ class DataFrameReader(OptionUtils): :param numPartitions: the number of partitions :param predicates: a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the :class:`DataFrame` -:param properties: a dictionary of JDBC database connection arguments; normally, - at least a "user" and "password" property should be included +:param properties: a dictionary of JDBC database connection arguments. Normally at + least properties "user" and "password" with their corresponding values. + For example { 'user' : 'SYSTEM', 'password' : 'mypassword' } :return: a DataFrame """ if properties is None: @@ -716,9 +717,9 @@ class DataFrameWriter(OptionUtils): * ``overwrite``: Overwrite existing data. * ``ignore``: Silently ignore this operation if data already exists. * ``error`` (default case): Throw an exception if data already exists. -:param properties: JDBC database connection arguments, a list of - arbitrary string tag/value. Normally at least a - "user" and "password" property should be included. +:param properties: a dictionary of JDBC database connection arguments. Normally at + least properties "user" and "password" with their corresponding values. + For example { 'user' : 'SYSTEM', 'password' : 'mypassword' } """ if properties is None: properties = dict() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs
Repository: spark Updated Branches: refs/heads/master f7c9ff57c -> 0f6aa8afa [MINOR][DOC] Fix the descriptions for `properties` argument in the documenation for jdbc APIs ## What changes were proposed in this pull request? This should be credited to mvervuurt. The main purpose of this PR is - simply to include the change for the same instance in `DataFrameReader` just to match up. - just avoid duplicately verifying the PR (as I already did). The documentation for both should be the same because both assume the `properties` should be the same `dict` for the same option. ## How was this patch tested? Manually building Python documentation. This will produce the output as below: - `DataFrameReader` ![2016-08-17 11 12 00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png) - `DataFrameWriter` ![2016-08-17 11 12 10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png) Closes #14624 Author: hyukjinkwonAuthor: mvervuurt Closes #14677 from HyukjinKwon/typo-python. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f6aa8af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f6aa8af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f6aa8af Branch: refs/heads/master Commit: 0f6aa8afaacdf0ceca9c2c1650ca26a5c167ae69 Parents: f7c9ff5 Author: mvervuurt Authored: Tue Aug 16 23:12:59 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 23:12:59 2016 -0700 -- python/pyspark/sql/readwriter.py | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0f6aa8af/python/pyspark/sql/readwriter.py -- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 4020bb3..64de33e 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -401,8 +401,9 @@ class DataFrameReader(OptionUtils): :param numPartitions: the number of partitions :param predicates: a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the :class:`DataFrame` -:param properties: a dictionary of JDBC database connection arguments; normally, - at least a "user" and "password" property should be included +:param properties: a dictionary of JDBC database connection arguments. Normally at + least properties "user" and "password" with their corresponding values. + For example { 'user' : 'SYSTEM', 'password' : 'mypassword' } :return: a DataFrame """ if properties is None: @@ -716,9 +717,9 @@ class DataFrameWriter(OptionUtils): * ``overwrite``: Overwrite existing data. * ``ignore``: Silently ignore this operation if data already exists. * ``error`` (default case): Throw an exception if data already exists. -:param properties: JDBC database connection arguments, a list of - arbitrary string tag/value. Normally at least a - "user" and "password" property should be included. +:param properties: a dictionary of JDBC database connection arguments. Normally at + least properties "user" and "password" with their corresponding values. + For example { 'user' : 'SYSTEM', 'password' : 'mypassword' } """ if properties is None: properties = dict() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17068][SQL] Make view-usage visible during analysis
Repository: spark Updated Branches: refs/heads/master 4a2c375be -> f7c9ff57c [SPARK-17068][SQL] Make view-usage visible during analysis ## What changes were proposed in this pull request? This PR adds a field to subquery alias in order to make the usage of views in a resolved `LogicalPlan` more visible (and more understandable). For example, the following view and query: ```sql create view constants as select 1 as id union all select 1 union all select 42 select * from constants; ``` ...now yields the following analyzed plan: ``` Project [id#39] +- SubqueryAlias c, `default`.`constants` +- Project [gen_attr_0#36 AS id#39] +- SubqueryAlias gen_subquery_0 +- Union :- Union : :- Project [1 AS gen_attr_0#36] : : +- OneRowRelation$ : +- Project [1 AS gen_attr_1#37] : +- OneRowRelation$ +- Project [42 AS gen_attr_2#38] +- OneRowRelation$ ``` ## How was this patch tested? Added tests for the two code paths in `SessionCatalogSuite` (sql/core) and `HiveMetastoreCatalogSuite` (sql/hive) Author: Herman van HovellCloses #14657 from hvanhovell/SPARK-17068. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f7c9ff57 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f7c9ff57 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f7c9ff57 Branch: refs/heads/master Commit: f7c9ff57c17a950cccdc26aadf8768c899a4d572 Parents: 4a2c375 Author: Herman van Hovell Authored: Tue Aug 16 23:09:53 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 23:09:53 2016 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +-- .../sql/catalyst/analysis/CheckAnalysis.scala | 4 +-- .../sql/catalyst/catalog/SessionCatalog.scala | 30 +++- .../apache/spark/sql/catalyst/dsl/package.scala | 4 +-- .../sql/catalyst/expressions/subquery.scala | 8 +++--- .../sql/catalyst/optimizer/Optimizer.scala | 8 +++--- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +-- .../plans/logical/basicLogicalOperators.scala | 7 - .../sql/catalyst/analysis/AnalysisSuite.scala | 4 +-- .../catalyst/catalog/SessionCatalogSuite.scala | 19 + .../catalyst/optimizer/ColumnPruningSuite.scala | 8 +++--- .../EliminateSubqueryAliasesSuite.scala | 6 ++-- .../optimizer/JoinOptimizationSuite.scala | 8 +++--- .../sql/catalyst/parser/PlanParserSuite.scala | 2 +- .../scala/org/apache/spark/sql/Dataset.scala| 2 +- .../apache/spark/sql/catalyst/SQLBuilder.scala | 6 ++-- .../spark/sql/execution/datasources/rules.scala | 2 +- .../spark/sql/hive/HiveMetastoreCatalog.scala | 21 ++ .../spark/sql/hive/HiveSessionCatalog.scala | 4 +-- .../sql/hive/HiveMetastoreCatalogSuite.scala| 14 - 20 files changed, 94 insertions(+), 71 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f7c9ff57/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index a2a022c..bd4c191 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -138,7 +138,7 @@ class Analyzer( case u : UnresolvedRelation => val substituted = cteRelations.find(x => resolver(x._1, u.tableIdentifier.table)) .map(_._2).map { relation => - val withAlias = u.alias.map(SubqueryAlias(_, relation)) + val withAlias = u.alias.map(SubqueryAlias(_, relation, None)) withAlias.getOrElse(relation) } substituted.getOrElse(u) @@ -2057,7 +2057,7 @@ class Analyzer( */ object EliminateSubqueryAliases extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { -case SubqueryAlias(_, child) => child +case SubqueryAlias(_, child, _) => child } } http://git-wip-us.apache.org/repos/asf/spark/blob/f7c9ff57/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 41b7e62..e07e919 100644 ---
spark git commit: [SPARK-17084][SQL] Rename ParserUtils.assert to validate
Repository: spark Updated Branches: refs/heads/branch-2.0 6cb3eab7c -> 3e0163bee [SPARK-17084][SQL] Rename ParserUtils.assert to validate ## What changes were proposed in this pull request? This PR renames `ParserUtils.assert` to `ParserUtils.validate`. This is done because this method is used to check requirements, and not to check if the program is in an invalid state. ## How was this patch tested? Simple rename. Compilation should do. Author: Herman van HovellCloses #14665 from hvanhovell/SPARK-17084. (cherry picked from commit 4a2c375be2bcd98cc7e00bea920fd6a0f68a4e14) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3e0163be Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3e0163be Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3e0163be Branch: refs/heads/branch-2.0 Commit: 3e0163bee2354258899c82ce4cc4aacafd2a802d Parents: 6cb3eab Author: Herman van Hovell Authored: Tue Aug 16 21:35:39 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 21:35:46 2016 -0700 -- .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 14 +++--- .../spark/sql/catalyst/parser/ParserUtils.scala | 4 ++-- .../apache/spark/sql/execution/SparkSqlParser.scala | 5 ++--- 3 files changed, 11 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3e0163be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 1a0e7ab..aee8eb1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -132,7 +132,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // Build the insert clauses. val inserts = ctx.multiInsertQueryBody.asScala.map { body => -assert(body.querySpecification.fromClause == null, +validate(body.querySpecification.fromClause == null, "Multi-Insert queries cannot have a FROM clause in their individual SELECT statements", body) @@ -591,7 +591,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // function takes X PERCENT as the input and the range of X is [0, 100], we need to // adjust the fraction. val eps = RandomSampler.roundingEpsilon - assert(fraction >= 0.0 - eps && fraction <= 1.0 + eps, + validate(fraction >= 0.0 - eps && fraction <= 1.0 + eps, s"Sampling fraction ($fraction) must be on interval [0, 1]", ctx) Sample(0.0, fraction, withReplacement = false, (math.random * 1000).toInt, query)(true) @@ -659,7 +659,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // Get the backing expressions. val expressions = ctx.expression.asScala.map { eCtx => val e = expression(eCtx) - assert(e.foldable, "All expressions in an inline table must be constants.", eCtx) + validate(e.foldable, "All expressions in an inline table must be constants.", eCtx) e } @@ -681,7 +681,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { val baseAttributes = structType.toAttributes.map(_.withNullability(true)) val attributes = if (ctx.identifierList != null) { val aliases = visitIdentifierList(ctx.identifierList) - assert(aliases.size == baseAttributes.size, + validate(aliases.size == baseAttributes.size, "Number of aliases must match the number of fields in an inline table.", ctx) baseAttributes.zip(aliases).map(p => p._1.withName(p._2)) } else { @@ -1089,7 +1089,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // We currently only allow foldable integers. def value: Int = { val e = expression(ctx.expression) - assert(e.resolved && e.foldable && e.dataType == IntegerType, + validate(e.resolved && e.foldable && e.dataType == IntegerType, "Frame bound value must be a constant integer.", ctx) e.eval().asInstanceOf[Int] @@ -1342,7 +1342,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { */ override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) { val intervals = ctx.intervalField.asScala.map(visitIntervalField) -assert(intervals.nonEmpty, "at least one time unit should be given for interval literal", ctx) +
spark git commit: [SPARK-17084][SQL] Rename ParserUtils.assert to validate
Repository: spark Updated Branches: refs/heads/master e28a8c589 -> 4a2c375be [SPARK-17084][SQL] Rename ParserUtils.assert to validate ## What changes were proposed in this pull request? This PR renames `ParserUtils.assert` to `ParserUtils.validate`. This is done because this method is used to check requirements, and not to check if the program is in an invalid state. ## How was this patch tested? Simple rename. Compilation should do. Author: Herman van HovellCloses #14665 from hvanhovell/SPARK-17084. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4a2c375b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4a2c375b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4a2c375b Branch: refs/heads/master Commit: 4a2c375be2bcd98cc7e00bea920fd6a0f68a4e14 Parents: e28a8c5 Author: Herman van Hovell Authored: Tue Aug 16 21:35:39 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 21:35:39 2016 -0700 -- .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 14 +++--- .../spark/sql/catalyst/parser/ParserUtils.scala | 4 ++-- .../apache/spark/sql/execution/SparkSqlParser.scala | 5 ++--- 3 files changed, 11 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4a2c375b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 25c8445..09b650c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -132,7 +132,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // Build the insert clauses. val inserts = ctx.multiInsertQueryBody.asScala.map { body => -assert(body.querySpecification.fromClause == null, +validate(body.querySpecification.fromClause == null, "Multi-Insert queries cannot have a FROM clause in their individual SELECT statements", body) @@ -596,7 +596,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // function takes X PERCENT as the input and the range of X is [0, 100], we need to // adjust the fraction. val eps = RandomSampler.roundingEpsilon - assert(fraction >= 0.0 - eps && fraction <= 1.0 + eps, + validate(fraction >= 0.0 - eps && fraction <= 1.0 + eps, s"Sampling fraction ($fraction) must be on interval [0, 1]", ctx) Sample(0.0, fraction, withReplacement = false, (math.random * 1000).toInt, query)(true) @@ -664,7 +664,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // Get the backing expressions. val expressions = ctx.expression.asScala.map { eCtx => val e = expression(eCtx) - assert(e.foldable, "All expressions in an inline table must be constants.", eCtx) + validate(e.foldable, "All expressions in an inline table must be constants.", eCtx) e } @@ -686,7 +686,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { val baseAttributes = structType.toAttributes.map(_.withNullability(true)) val attributes = if (ctx.identifierList != null) { val aliases = visitIdentifierList(ctx.identifierList) - assert(aliases.size == baseAttributes.size, + validate(aliases.size == baseAttributes.size, "Number of aliases must match the number of fields in an inline table.", ctx) baseAttributes.zip(aliases).map(p => p._1.withName(p._2)) } else { @@ -1094,7 +1094,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // We currently only allow foldable integers. def value: Int = { val e = expression(ctx.expression) - assert(e.resolved && e.foldable && e.dataType == IntegerType, + validate(e.resolved && e.foldable && e.dataType == IntegerType, "Frame bound value must be a constant integer.", ctx) e.eval().asInstanceOf[Int] @@ -1347,7 +1347,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { */ override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) { val intervals = ctx.intervalField.asScala.map(visitIntervalField) -assert(intervals.nonEmpty, "at least one time unit should be given for interval literal", ctx) +validate(intervals.nonEmpty, "at least one time unit should be given for interval literal", ctx)
spark git commit: [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
Repository: spark Updated Branches: refs/heads/branch-2.0 022230c20 -> 6cb3eab7c [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator ## What changes were proposed in this pull request? Remove the api doc link for mapReduceTriplets operator because in latest api they are remove so when user link to that api they will not get mapReduceTriplets there so its more good to remove than confuse the user. ## How was this patch tested? Run all the test cases ![screenshot from 2016-08-16 23-08-25](https://cloud.githubusercontent.com/assets/8075390/17709393/8cfbf75a-6406-11e6-98e6-38f7b319d833.png) Author: sandyCloses #14669 from phalodi/SPARK-17089. (cherry picked from commit e28a8c5899c48ff065e2fd3bb6b10c82b4d39c2c) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6cb3eab7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6cb3eab7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6cb3eab7 Branch: refs/heads/branch-2.0 Commit: 6cb3eab7cc49ad8b8459ddc479a900de9dea1bcf Parents: 022230c Author: sandy Authored: Tue Aug 16 12:50:55 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 12:51:02 2016 -0700 -- docs/graphx-programming-guide.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6cb3eab7/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index bf4b968..07b38d9 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -24,7 +24,6 @@ description: GraphX graph processing library guide for Spark SPARK_VERSION_SHORT [Graph.outerJoinVertices]: api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])⇒VD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED] [Graph.aggregateMessages]: api/scala/index.html#org.apache.spark.graphx.Graph@aggregateMessages[A]((EdgeContext[VD,ED,A])⇒Unit,(A,A)⇒A,TripletFields)(ClassTag[A]):VertexRDD[A] [EdgeContext]: api/scala/index.html#org.apache.spark.graphx.EdgeContext -[Graph.mapReduceTriplets]: api/scala/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A] [GraphOps.collectNeighborIds]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]] [GraphOps.collectNeighbors]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]] [RDD Persistence]: programming-guide.html#rdd-persistence @@ -596,7 +595,7 @@ compute the average age of the more senior followers of each user. ### Map Reduce Triplets Transition Guide (Legacy) In earlier versions of GraphX neighborhood aggregation was accomplished using the -[`mapReduceTriplets`][Graph.mapReduceTriplets] operator: +`mapReduceTriplets` operator: {% highlight scala %} class Graph[VD, ED] { @@ -607,7 +606,7 @@ class Graph[VD, ED] { } {% endhighlight %} -The [`mapReduceTriplets`][Graph.mapReduceTriplets] operator takes a user defined map function which +The `mapReduceTriplets` operator takes a user defined map function which is applied to each triplet and can yield *messages* which are aggregated using the user defined `reduce` function. However, we found the user of the returned iterator to be expensive and it inhibited our ability to - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
Repository: spark Updated Branches: refs/heads/master c34b546d6 -> e28a8c589 [SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator ## What changes were proposed in this pull request? Remove the api doc link for mapReduceTriplets operator because in latest api they are remove so when user link to that api they will not get mapReduceTriplets there so its more good to remove than confuse the user. ## How was this patch tested? Run all the test cases ![screenshot from 2016-08-16 23-08-25](https://cloud.githubusercontent.com/assets/8075390/17709393/8cfbf75a-6406-11e6-98e6-38f7b319d833.png) Author: sandyCloses #14669 from phalodi/SPARK-17089. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e28a8c58 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e28a8c58 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e28a8c58 Branch: refs/heads/master Commit: e28a8c5899c48ff065e2fd3bb6b10c82b4d39c2c Parents: c34b546 Author: sandy Authored: Tue Aug 16 12:50:55 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 12:50:55 2016 -0700 -- docs/graphx-programming-guide.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e28a8c58/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 6f738f0..58671e6 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -24,7 +24,6 @@ description: GraphX graph processing library guide for Spark SPARK_VERSION_SHORT [Graph.outerJoinVertices]: api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])⇒VD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED] [Graph.aggregateMessages]: api/scala/index.html#org.apache.spark.graphx.Graph@aggregateMessages[A]((EdgeContext[VD,ED,A])⇒Unit,(A,A)⇒A,TripletFields)(ClassTag[A]):VertexRDD[A] [EdgeContext]: api/scala/index.html#org.apache.spark.graphx.EdgeContext -[Graph.mapReduceTriplets]: api/scala/index.html#org.apache.spark.graphx.Graph@mapReduceTriplets[A](mapFunc:org.apache.spark.graphx.EdgeTriplet[VD,ED]=Iterator[(org.apache.spark.graphx.VertexId,A)],reduceFunc:(A,A)=A,activeSetOpt:Option[(org.apache.spark.graphx.VertexRDD[_],org.apache.spark.graphx.EdgeDirection)])(implicitevidence$10:scala.reflect.ClassTag[A]):org.apache.spark.graphx.VertexRDD[A] [GraphOps.collectNeighborIds]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]] [GraphOps.collectNeighbors]: api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]] [RDD Persistence]: programming-guide.html#rdd-persistence @@ -596,7 +595,7 @@ compute the average age of the more senior followers of each user. ### Map Reduce Triplets Transition Guide (Legacy) In earlier versions of GraphX neighborhood aggregation was accomplished using the -[`mapReduceTriplets`][Graph.mapReduceTriplets] operator: +`mapReduceTriplets` operator: {% highlight scala %} class Graph[VD, ED] { @@ -607,7 +606,7 @@ class Graph[VD, ED] { } {% endhighlight %} -The [`mapReduceTriplets`][Graph.mapReduceTriplets] operator takes a user defined map function which +The `mapReduceTriplets` operator takes a user defined map function which is applied to each triplet and can yield *messages* which are aggregated using the user defined `reduce` function. However, we found the user of the returned iterator to be expensive and it inhibited our ability to - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]
[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport] ## What changes were proposed in this pull request? This PR backports https://github.com/apache/spark/pull/14554 to branch-2.0. I have also changed the visibility of a few similar Hive classes. ## How was this patch tested? (Only a package visibility change) Author: Herman van HovellAuthor: Reynold Xin Closes #14652 from hvanhovell/SPARK-16964. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c569711 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c569711 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c569711 Branch: refs/heads/branch-2.0 Commit: 1c56971167a0ebb3c422ccc7cc3d6904015fe2ec Parents: 237ae54 Author: Herman van Hovell Authored: Tue Aug 16 01:15:31 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 01:15:31 2016 -0700 -- .../spark/sql/execution/CacheManager.scala | 22 +- .../spark/sql/execution/ExistingRDD.scala | 18 +++ .../apache/spark/sql/execution/ExpandExec.scala | 2 +- .../spark/sql/execution/FileRelation.scala | 2 +- .../spark/sql/execution/GenerateExec.scala | 2 +- .../sql/execution/LocalTableScanExec.scala | 4 ++-- .../spark/sql/execution/RowIterator.scala | 2 +- .../spark/sql/execution/SQLExecution.scala | 2 +- .../apache/spark/sql/execution/SortExec.scala | 6 ++--- .../apache/spark/sql/execution/SparkPlan.scala | 14 ++-- .../spark/sql/execution/SparkPlanInfo.scala | 2 +- .../spark/sql/execution/SparkStrategies.scala | 6 ++--- .../sql/execution/UnsafeRowSerializer.scala | 4 ++-- .../sql/execution/WholeStageCodegenExec.scala | 2 +- .../execution/aggregate/HashAggregateExec.scala | 2 +- .../execution/aggregate/SortAggregateExec.scala | 2 +- .../spark/sql/execution/aggregate/udaf.scala| 6 ++--- .../sql/execution/basicPhysicalOperators.scala | 6 ++--- .../execution/columnar/InMemoryRelation.scala | 8 +++ .../columnar/InMemoryTableScanExec.scala| 4 ++-- .../spark/sql/execution/command/commands.scala | 4 ++-- .../datasources/DataSourceStrategy.scala| 8 +++ .../datasources/FileSourceStrategy.scala| 2 +- .../InsertIntoDataSourceCommand.scala | 2 +- .../InsertIntoHadoopFsRelationCommand.scala | 2 +- .../datasources/PartitioningUtils.scala | 24 +++- .../execution/datasources/WriterContainer.scala | 8 +++ .../sql/execution/datasources/bucket.scala | 2 +- .../execution/datasources/csv/CSVOptions.scala | 2 +- .../execution/datasources/csv/CSVParser.scala | 4 ++-- .../execution/datasources/csv/CSVRelation.scala | 4 ++-- .../datasources/fileSourceInterfaces.scala | 6 ++--- .../execution/datasources/jdbc/JDBCRDD.scala| 8 +++ .../datasources/parquet/ParquetFileFormat.scala | 17 +++--- .../datasources/parquet/ParquetFilters.scala| 2 +- .../datasources/parquet/ParquetOptions.scala| 6 ++--- .../spark/sql/execution/datasources/rules.scala | 6 ++--- .../spark/sql/execution/debug/package.scala | 2 +- .../exchange/BroadcastExchangeExec.scala| 2 +- .../exchange/ExchangeCoordinator.scala | 4 ++-- .../execution/exchange/ShuffleExchange.scala| 9 .../execution/joins/BroadcastHashJoinExec.scala | 2 +- .../joins/BroadcastNestedLoopJoinExec.scala | 2 +- .../execution/joins/CartesianProductExec.scala | 5 ++-- .../execution/joins/ShuffledHashJoinExec.scala | 2 +- .../sql/execution/joins/SortMergeJoinExec.scala | 2 +- .../spark/sql/execution/metric/SQLMetrics.scala | 10 .../execution/python/ExtractPythonUDFs.scala| 4 ++-- .../sql/execution/r/MapPartitionsRWrapper.scala | 4 ++-- .../sql/execution/stat/FrequentItems.scala | 4 ++-- .../sql/execution/stat/StatFunctions.scala | 8 +++ .../streaming/IncrementalExecution.scala| 2 +- .../execution/streaming/StreamExecution.scala | 19 .../execution/streaming/StreamProgress.scala| 2 +- .../execution/streaming/state/StateStore.scala | 2 +- .../streaming/state/StateStoreCoordinator.scala | 4 ++-- .../spark/sql/execution/ui/ExecutionPage.scala | 2 +- .../spark/sql/execution/ui/SQLListener.scala| 6 ++--- .../apache/spark/sql/execution/ui/SQLTab.scala | 4 ++-- .../spark/sql/execution/ui/SparkPlanGraph.scala | 6 ++--- .../apache/spark/sql/internal/SharedState.scala | 2 -- .../CreateHiveTableAsSelectCommand.scala| 1 - .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../hive/execution/ScriptTransformation.scala | 3 --- .../spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 65 files
[1/2] spark git commit: [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package [Backport]
Repository: spark Updated Branches: refs/heads/branch-2.0 237ae54c9 -> 1c5697116 http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala index af2229a..66fb5a4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala @@ -49,10 +49,10 @@ class StreamExecution( override val id: Long, override val name: String, checkpointRoot: String, -private[sql] val logicalPlan: LogicalPlan, +val logicalPlan: LogicalPlan, val sink: Sink, val trigger: Trigger, -private[sql] val triggerClock: Clock, +val triggerClock: Clock, val outputMode: OutputMode) extends StreamingQuery with Logging { @@ -74,7 +74,7 @@ class StreamExecution( * input source. */ @volatile - private[sql] var committedOffsets = new StreamProgress + var committedOffsets = new StreamProgress /** * Tracks the offsets that are available to be processed, but have not yet be committed to the @@ -102,10 +102,10 @@ class StreamExecution( private var state: State = INITIALIZED @volatile - private[sql] var lastExecution: QueryExecution = null + var lastExecution: QueryExecution = null @volatile - private[sql] var streamDeathCause: StreamingQueryException = null + var streamDeathCause: StreamingQueryException = null /* Get the call site in the caller thread; will pass this into the micro batch thread */ private val callSite = Utils.getCallSite() @@ -115,7 +115,7 @@ class StreamExecution( * [[org.apache.spark.util.UninterruptibleThread]] to avoid potential deadlocks in using * [[HDFSMetadataLog]]. See SPARK-14131 for more details. */ - private[sql] val microBatchThread = + val microBatchThread = new UninterruptibleThread(s"stream execution thread for $name") { override def run(): Unit = { // To fix call site like "run at :0", we bridge the call site from the caller @@ -131,8 +131,7 @@ class StreamExecution( * processing is done. Thus, the Nth record in this log indicated data that is currently being * processed and the N-1th entry indicates which offsets have been durably committed to the sink. */ - private[sql] val offsetLog = -new HDFSMetadataLog[CompositeOffset](sparkSession, checkpointFile("offsets")) + val offsetLog = new HDFSMetadataLog[CompositeOffset](sparkSession, checkpointFile("offsets")) /** Whether the query is currently active or not */ override def isActive: Boolean = state == ACTIVE @@ -159,7 +158,7 @@ class StreamExecution( * Starts the execution. This returns only after the thread has started and [[QueryStarted]] event * has been posted to all the listeners. */ - private[sql] def start(): Unit = { + def start(): Unit = { microBatchThread.setDaemon(true) microBatchThread.start() startLatch.await() // Wait until thread started and QueryStart event has been posted @@ -518,7 +517,7 @@ class StreamExecution( case object TERMINATED extends State } -private[sql] object StreamExecution { +object StreamExecution { private val _nextId = new AtomicLong(0) def nextId: Long = _nextId.getAndIncrement() http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala index 405a5f0..db0bd9e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala @@ -26,7 +26,7 @@ class StreamProgress( val baseMap: immutable.Map[Source, Offset] = new immutable.HashMap[Source, Offset]) extends scala.collection.immutable.Map[Source, Offset] { - private[sql] def toCompositeOffset(source: Seq[Source]): CompositeOffset = { + def toCompositeOffset(source: Seq[Source]): CompositeOffset = { CompositeOffset(source.map(get)) } http://git-wip-us.apache.org/repos/asf/spark/blob/1c569711/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
spark git commit: Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package"
Repository: spark Updated Branches: refs/heads/branch-2.0 2e2c787bf -> 237ae54c9 Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package" This reverts commit 2e2c787bf588e129eaaadc792737fd9d2892939c. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/237ae54c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/237ae54c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/237ae54c Branch: refs/heads/branch-2.0 Commit: 237ae54c960d52b35b4bc673609aed9998c2bd45 Parents: 2e2c787 Author: Reynold XinAuthored: Tue Aug 16 01:14:53 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 01:14:53 2016 -0700 -- .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 + .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 +++ .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 3 ++- 3 files changed, 6 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala index 3a8b0f1..15a5d79 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala @@ -34,6 +34,7 @@ import org.apache.spark.sql.hive.MetastoreRelation * @param ignoreIfExists allow continue working if it's already exists, otherwise * raise exception */ +private[hive] case class CreateHiveTableAsSelectCommand( tableDesc: CatalogTable, query: LogicalPlan, http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala index 9747abb..dfb1251 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala @@ -51,6 +51,7 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, SerializableConfig * @param script the command that should be executed. * @param output the attributes that are produced by the script. */ +private[hive] case class ScriptTransformation( input: Seq[Expression], script: String, @@ -335,6 +336,7 @@ private class ScriptTransformationWriterThread( } } +private[hive] object HiveScriptIOSchema { def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = { HiveScriptIOSchema( @@ -353,6 +355,7 @@ object HiveScriptIOSchema { /** * The wrapper class of Hive input and output schema properties */ +private[hive] case class HiveScriptIOSchema ( inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], http://git-wip-us.apache.org/repos/asf/spark/blob/237ae54c/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala index 894c71c..a2c8092 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala @@ -47,7 +47,8 @@ import org.apache.spark.util.SerializableConfiguration * [[FileFormat]] for reading ORC files. If this is moved or renamed, please update * [[DataSource]]'s backwardCompatibilityMap. */ -class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable { +private[sql] class OrcFileFormat + extends FileFormat with DataSourceRegister with Serializable { override def shortName(): String = "orc" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package
Repository: spark Updated Branches: refs/heads/branch-2.0 45036327f -> 2e2c787bf [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package ## What changes were proposed in this pull request? This PR is a small follow-up to https://github.com/apache/spark/pull/14554. This also widens the visibility of a few (similar) Hive classes. ## How was this patch tested? No test. Only a visibility change. Author: Herman van HovellCloses #14654 from hvanhovell/SPARK-16964-hive. (cherry picked from commit 8fdc6ce400f9130399fbdd004df48b3ba95bcd6a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e2c787b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e2c787b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e2c787b Branch: refs/heads/branch-2.0 Commit: 2e2c787bf588e129eaaadc792737fd9d2892939c Parents: 4503632 Author: Herman van Hovell Authored: Tue Aug 16 01:12:27 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 01:12:33 2016 -0700 -- .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 - .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 --- .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 3 +-- 3 files changed, 1 insertion(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala index 15a5d79..3a8b0f1 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala @@ -34,7 +34,6 @@ import org.apache.spark.sql.hive.MetastoreRelation * @param ignoreIfExists allow continue working if it's already exists, otherwise * raise exception */ -private[hive] case class CreateHiveTableAsSelectCommand( tableDesc: CatalogTable, query: LogicalPlan, http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala index dfb1251..9747abb 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala @@ -51,7 +51,6 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, SerializableConfig * @param script the command that should be executed. * @param output the attributes that are produced by the script. */ -private[hive] case class ScriptTransformation( input: Seq[Expression], script: String, @@ -336,7 +335,6 @@ private class ScriptTransformationWriterThread( } } -private[hive] object HiveScriptIOSchema { def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = { HiveScriptIOSchema( @@ -355,7 +353,6 @@ object HiveScriptIOSchema { /** * The wrapper class of Hive input and output schema properties */ -private[hive] case class HiveScriptIOSchema ( inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], http://git-wip-us.apache.org/repos/asf/spark/blob/2e2c787b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala index a2c8092..894c71c 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala @@ -47,8 +47,7 @@ import org.apache.spark.util.SerializableConfiguration * [[FileFormat]] for reading ORC files. If this is moved or renamed, please update * [[DataSource]]'s backwardCompatibilityMap. */ -private[sql] class OrcFileFormat - extends FileFormat with DataSourceRegister with Serializable { +class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable { override def
spark git commit: [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package
Repository: spark Updated Branches: refs/heads/master 7b65030e7 -> 8fdc6ce40 [SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package ## What changes were proposed in this pull request? This PR is a small follow-up to https://github.com/apache/spark/pull/14554. This also widens the visibility of a few (similar) Hive classes. ## How was this patch tested? No test. Only a visibility change. Author: Herman van HovellCloses #14654 from hvanhovell/SPARK-16964-hive. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8fdc6ce4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8fdc6ce4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8fdc6ce4 Branch: refs/heads/master Commit: 8fdc6ce400f9130399fbdd004df48b3ba95bcd6a Parents: 7b65030 Author: Herman van Hovell Authored: Tue Aug 16 01:12:27 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 16 01:12:27 2016 -0700 -- .../spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala | 1 - .../apache/spark/sql/hive/execution/ScriptTransformation.scala| 3 --- .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 3 +-- 3 files changed, 1 insertion(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala index 678bf8d..6e6b1c2 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala @@ -34,7 +34,6 @@ import org.apache.spark.sql.hive.MetastoreRelation * @param ignoreIfExists allow continue working if it's already exists, otherwise * raise exception */ -private[hive] case class CreateHiveTableAsSelectCommand( tableDesc: CatalogTable, query: LogicalPlan, http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala index d063dd6..c553c03 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala @@ -51,7 +51,6 @@ import org.apache.spark.util.{CircularBuffer, RedirectThread, SerializableConfig * @param script the command that should be executed. * @param output the attributes that are produced by the script. */ -private[hive] case class ScriptTransformation( input: Seq[Expression], script: String, @@ -338,7 +337,6 @@ private class ScriptTransformationWriterThread( } } -private[hive] object HiveScriptIOSchema { def apply(input: ScriptInputOutputSchema): HiveScriptIOSchema = { HiveScriptIOSchema( @@ -357,7 +355,6 @@ object HiveScriptIOSchema { /** * The wrapper class of Hive input and output schema properties */ -private[hive] case class HiveScriptIOSchema ( inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], http://git-wip-us.apache.org/repos/asf/spark/blob/8fdc6ce4/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala index 1d3c466..c74d948 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala @@ -45,8 +45,7 @@ import org.apache.spark.util.SerializableConfiguration * [[FileFormat]] for reading ORC files. If this is moved or renamed, please update * [[DataSource]]'s backwardCompatibilityMap. */ -private[sql] class OrcFileFormat - extends FileFormat with DataSourceRegister with Serializable { +class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable { override def shortName(): String = "orc" - To unsubscribe, e-mail:
spark git commit: [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists
Repository: spark Updated Branches: refs/heads/branch-2.0 a21ecc996 -> 750f88045 [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists ## What changes were proposed in this pull request? Don't override app name specified in `SparkConf` with a random app name. Only set it if the conf has no app name even after options have been applied. See also https://github.com/apache/spark/pull/14602 This is similar to Sherry302 's original proposal in https://github.com/apache/spark/pull/14556 ## How was this patch tested? Jenkins test, with new case reproducing the bug Author: Sean OwenCloses #14630 from srowen/SPARK-16966.2. (cherry picked from commit cdaa562c9a09e2e83e6df4e84d911ce1428a7a7c) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/750f8804 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/750f8804 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/750f8804 Branch: refs/heads/branch-2.0 Commit: 750f8804540df5ad68a732f68598c4a2dbbc4761 Parents: a21ecc9 Author: Sean Owen Authored: Sat Aug 13 15:40:43 2016 -0700 Committer: Reynold Xin Committed: Sat Aug 13 15:40:59 2016 -0700 -- .../main/scala/org/apache/spark/sql/SparkSession.scala | 11 +++ .../org/apache/spark/sql/SparkSessionBuilderSuite.scala | 1 + 2 files changed, 8 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/750f8804/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index 946d8cb..c88206c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -822,16 +822,19 @@ object SparkSession { // No active nor global default session. Create a new one. val sparkContext = userSuppliedContext.getOrElse { // set app name if not given - if (!options.contains("spark.app.name")) { -options += "spark.app.name" -> java.util.UUID.randomUUID().toString - } - + val randomAppName = java.util.UUID.randomUUID().toString val sparkConf = new SparkConf() options.foreach { case (k, v) => sparkConf.set(k, v) } + if (!sparkConf.contains("spark.app.name")) { +sparkConf.setAppName(randomAppName) + } val sc = SparkContext.getOrCreate(sparkConf) // maybe this is an existing SparkContext, update its SparkConf which maybe used // by SparkSession options.foreach { case (k, v) => sc.conf.set(k, v) } + if (!sc.conf.contains("spark.app.name")) { +sc.conf.setAppName(randomAppName) + } sc } session = new SparkSession(sparkContext) http://git-wip-us.apache.org/repos/asf/spark/blob/750f8804/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala index 418345b..386d13d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala @@ -100,6 +100,7 @@ class SparkSessionBuilderSuite extends SparkFunSuite { assert(session.conf.get("key2") == "value2") assert(session.sparkContext.conf.get("key1") == "value1") assert(session.sparkContext.conf.get("key2") == "value2") +assert(session.sparkContext.conf.get("spark.app.name") == "test") session.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists
Repository: spark Updated Branches: refs/heads/master 67f025d90 -> cdaa562c9 [SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.app.name" exists ## What changes were proposed in this pull request? Don't override app name specified in `SparkConf` with a random app name. Only set it if the conf has no app name even after options have been applied. See also https://github.com/apache/spark/pull/14602 This is similar to Sherry302 's original proposal in https://github.com/apache/spark/pull/14556 ## How was this patch tested? Jenkins test, with new case reproducing the bug Author: Sean OwenCloses #14630 from srowen/SPARK-16966.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdaa562c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdaa562c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdaa562c Branch: refs/heads/master Commit: cdaa562c9a09e2e83e6df4e84d911ce1428a7a7c Parents: 67f025d Author: Sean Owen Authored: Sat Aug 13 15:40:43 2016 -0700 Committer: Reynold Xin Committed: Sat Aug 13 15:40:43 2016 -0700 -- .../main/scala/org/apache/spark/sql/SparkSession.scala | 11 +++ .../org/apache/spark/sql/SparkSessionBuilderSuite.scala | 1 + 2 files changed, 8 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cdaa562c/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index 2ade36d..362bf45 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -816,16 +816,19 @@ object SparkSession { // No active nor global default session. Create a new one. val sparkContext = userSuppliedContext.getOrElse { // set app name if not given - if (!options.contains("spark.app.name")) { -options += "spark.app.name" -> java.util.UUID.randomUUID().toString - } - + val randomAppName = java.util.UUID.randomUUID().toString val sparkConf = new SparkConf() options.foreach { case (k, v) => sparkConf.set(k, v) } + if (!sparkConf.contains("spark.app.name")) { +sparkConf.setAppName(randomAppName) + } val sc = SparkContext.getOrCreate(sparkConf) // maybe this is an existing SparkContext, update its SparkConf which maybe used // by SparkSession options.foreach { case (k, v) => sc.conf.set(k, v) } + if (!sc.conf.contains("spark.app.name")) { +sc.conf.setAppName(randomAppName) + } sc } session = new SparkSession(sparkContext) http://git-wip-us.apache.org/repos/asf/spark/blob/cdaa562c/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala index 418345b..386d13d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala @@ -100,6 +100,7 @@ class SparkSessionBuilderSuite extends SparkFunSuite { assert(session.conf.get("key2") == "value2") assert(session.sparkContext.conf.get("key1") == "value1") assert(session.sparkContext.conf.get("key2") == "value2") +assert(session.sparkContext.conf.get("spark.app.name") == "test") session.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17013][SQL] Parse negative numeric literals
Repository: spark Updated Branches: refs/heads/master abff92bfd -> 00e103a6e [SPARK-17013][SQL] Parse negative numeric literals ## What changes were proposed in this pull request? This patch updates the SQL parser to parse negative numeric literals as numeric literals, instead of unary minus of positive literals. This allows the parser to parse the minimal value for each data type, e.g. "-32768S". ## How was this patch tested? Updated test cases. Author: petermaxleeCloses #14608 from petermaxlee/SPARK-17013. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00e103a6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00e103a6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00e103a6 Branch: refs/heads/master Commit: 00e103a6edd1a1f001a94d41dd1f7acc40a1e30f Parents: abff92b Author: petermaxlee Authored: Thu Aug 11 23:56:55 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 23:56:55 2016 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 14 +++ .../sql/catalyst/expressions/arithmetic.scala | 4 +- .../sql-tests/results/arithmetic.sql.out| 26 ++-- .../sql-tests/results/literals.sql.out | 44 ++-- .../catalyst/ExpressionSQLBuilderSuite.scala| 4 +- 5 files changed, 37 insertions(+), 55 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index ba65f2a..6122bcd 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -625,13 +625,13 @@ quotedIdentifier ; number -: DECIMAL_VALUE#decimalLiteral -| SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral -| INTEGER_VALUE#integerLiteral -| BIGINT_LITERAL #bigIntLiteral -| SMALLINT_LITERAL #smallIntLiteral -| TINYINT_LITERAL #tinyIntLiteral -| DOUBLE_LITERAL #doubleLiteral +: MINUS? DECIMAL_VALUE#decimalLiteral +| MINUS? SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral +| MINUS? INTEGER_VALUE#integerLiteral +| MINUS? BIGINT_LITERAL #bigIntLiteral +| MINUS? SMALLINT_LITERAL #smallIntLiteral +| MINUS? TINYINT_LITERAL #tinyIntLiteral +| MINUS? DOUBLE_LITERAL #doubleLiteral ; nonReserved http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 4aebef9..13e539a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -58,7 +58,7 @@ case class UnaryMinus(child: Expression) extends UnaryExpression } } - override def sql: String = s"(-${child.sql})" + override def sql: String = s"(- ${child.sql})" } @ExpressionDescription( @@ -76,7 +76,7 @@ case class UnaryPositive(child: Expression) protected override def nullSafeEval(input: Any): Any = input - override def sql: String = s"(+${child.sql})" + override def sql: String = s"(+ ${child.sql})" } /** http://git-wip-us.apache.org/repos/asf/spark/blob/00e103a6/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out index 50ea254..f2b40a0 100644 --- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out @@ -5,7 +5,7 @@ -- !query 0 select -100 -- !query 0 schema -struct<(-100):int> +struct<-100:int> -- !query 0 output -100 @@ -21,7 +21,7 @@ struct<230:int> -- !query 2 select -5.2 -- !query 2 schema -struct<(-5.2):decimal(2,1)> +struct<-5.2:decimal(2,1)> -- !query 2 output -5.2 @@ -37,7 +37,7 @@ struct<6.8:double> -- !query 4 select -key, +key from testdata where key = 2 -- !query 4
spark git commit: [SPARK-17013][SQL] Parse negative numeric literals
Repository: spark Updated Branches: refs/heads/branch-2.0 b4047fc21 -> bde94cd71 [SPARK-17013][SQL] Parse negative numeric literals ## What changes were proposed in this pull request? This patch updates the SQL parser to parse negative numeric literals as numeric literals, instead of unary minus of positive literals. This allows the parser to parse the minimal value for each data type, e.g. "-32768S". ## How was this patch tested? Updated test cases. Author: petermaxleeCloses #14608 from petermaxlee/SPARK-17013. (cherry picked from commit 00e103a6edd1a1f001a94d41dd1f7acc40a1e30f) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bde94cd7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bde94cd7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bde94cd7 Branch: refs/heads/branch-2.0 Commit: bde94cd71086fd348f3ba96de628d6df3f87dba5 Parents: b4047fc Author: petermaxlee Authored: Thu Aug 11 23:56:55 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 23:57:01 2016 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 14 +++ .../sql/catalyst/expressions/arithmetic.scala | 4 +- .../sql-tests/results/arithmetic.sql.out| 26 ++-- .../sql-tests/results/literals.sql.out | 44 ++-- .../catalyst/ExpressionSQLBuilderSuite.scala| 4 +- 5 files changed, 37 insertions(+), 55 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 279a1ce..aca7282 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -618,13 +618,13 @@ quotedIdentifier ; number -: DECIMAL_VALUE#decimalLiteral -| SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral -| INTEGER_VALUE#integerLiteral -| BIGINT_LITERAL #bigIntLiteral -| SMALLINT_LITERAL #smallIntLiteral -| TINYINT_LITERAL #tinyIntLiteral -| DOUBLE_LITERAL #doubleLiteral +: MINUS? DECIMAL_VALUE#decimalLiteral +| MINUS? SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral +| MINUS? INTEGER_VALUE#integerLiteral +| MINUS? BIGINT_LITERAL #bigIntLiteral +| MINUS? SMALLINT_LITERAL #smallIntLiteral +| MINUS? TINYINT_LITERAL #tinyIntLiteral +| MINUS? DOUBLE_LITERAL #doubleLiteral ; nonReserved http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 7ff8795..fa459aa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -57,7 +57,7 @@ case class UnaryMinus(child: Expression) extends UnaryExpression } } - override def sql: String = s"(-${child.sql})" + override def sql: String = s"(- ${child.sql})" } @ExpressionDescription( @@ -75,7 +75,7 @@ case class UnaryPositive(child: Expression) protected override def nullSafeEval(input: Any): Any = input - override def sql: String = s"(+${child.sql})" + override def sql: String = s"(+ ${child.sql})" } /** http://git-wip-us.apache.org/repos/asf/spark/blob/bde94cd7/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out index 50ea254..f2b40a0 100644 --- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out @@ -5,7 +5,7 @@ -- !query 0 select -100 -- !query 0 schema -struct<(-100):int> +struct<-100:int> -- !query 0 output -100 @@ -21,7 +21,7 @@ struct<230:int> -- !query 2 select -5.2 -- !query 2 schema -struct<(-5.2):decimal(2,1)> +struct<-5.2:decimal(2,1)> -- !query 2
spark git commit: [SPARK-17018][SQL] literals.sql for testing literal parsing
Repository: spark Updated Branches: refs/heads/branch-2.0 6bf20cd94 -> bc683f037 [SPARK-17018][SQL] literals.sql for testing literal parsing ## What changes were proposed in this pull request? This patch adds literals.sql for testing literal parsing end-to-end in SQL. ## How was this patch tested? The patch itself is only about adding test cases. Author: petermaxleeCloses #14598 from petermaxlee/SPARK-17018-2. (cherry picked from commit cf9367826c38e5f34ae69b409f5d09c55ed1d319) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc683f03 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc683f03 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc683f03 Branch: refs/heads/branch-2.0 Commit: bc683f037d4e84f2a42eb7b1aaa9e0e4fd5f833a Parents: 6bf20cd Author: petermaxlee Authored: Thu Aug 11 13:55:10 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 13:55:17 2016 -0700 -- .../resources/sql-tests/inputs/literals.sql | 92 + .../sql-tests/inputs/number-format.sql | 16 - .../sql-tests/results/literals.sql.out | 374 +++ .../sql-tests/results/number-format.sql.out | 42 --- .../apache/spark/sql/SQLQueryTestSuite.scala| 14 +- 5 files changed, 476 insertions(+), 62 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bc683f03/sql/core/src/test/resources/sql-tests/inputs/literals.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/literals.sql b/sql/core/src/test/resources/sql-tests/inputs/literals.sql new file mode 100644 index 000..62f0d3d --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/literals.sql @@ -0,0 +1,92 @@ +-- Literal parsing + +-- null +select null, Null, nUll; + +-- boolean +select true, tRue, false, fALse; + +-- byte (tinyint) +select 1Y; +select 127Y, -128Y; + +-- out of range byte +select 128Y; + +-- short (smallint) +select 1S; +select 32767S, -32768S; + +-- out of range short +select 32768S; + +-- long (bigint) +select 1L, 2147483648L; +select 9223372036854775807L, -9223372036854775808L; + +-- out of range long +select 9223372036854775808L; + +-- integral parsing + +-- parse int +select 1, -1; + +-- parse int max and min value as int +select 2147483647, -2147483648; + +-- parse long max and min value as long +select 9223372036854775807, -9223372036854775808; + +-- parse as decimals (Long.MaxValue + 1, and Long.MinValue - 1) +select 9223372036854775808, -9223372036854775809; + +-- out of range decimal numbers +select 1234567890123456789012345678901234567890; +select 1234567890123456789012345678901234567890.0; + +-- double +select 1D, 1.2D, 1e10, 1.5e5, .10D, 0.10D, .1e5, .9e+2, 0.9e+2, 900e-1, 9.e+1; +select -1D, -1.2D, -1e10, -1.5e5, -.10D, -0.10D, -.1e5; +-- negative double +select .e3; +-- inf and -inf +select 1E309, -1E309; + +-- decimal parsing +select 0.3, -0.8, .5, -.18, 0., .; + +-- super large scientific notation numbers should still be valid doubles +select 123456789012345678901234567890123456789e10, 123456789012345678901234567890123456789.1e10; + +-- string +select "Hello Peter!", 'hello lee!'; +-- multi string +select 'hello' 'world', 'hello' " " 'lee'; +-- single quote within double quotes +select "hello 'peter'"; +select 'pattern%', 'no-pattern\%', 'pattern\\%', 'pattern\\\%'; +select '\'', '"', '\n', '\r', '\t', 'Z'; +-- "Hello!" in octals +select '\110\145\154\154\157\041'; +-- "World :)" in unicode +select '\u0057\u006F\u0072\u006C\u0064\u0020\u003A\u0029'; + +-- date +select dAte '2016-03-12'; +-- invalid date +select date 'mar 11 2016'; + +-- timestamp +select tImEstAmp '2016-03-11 20:54:00.000'; +-- invalid timestamp +select timestamp '2016-33-11 20:54:00.000'; + +-- interval +select interval 13.123456789 seconds, interval -13.123456789 second; +select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisecond, 9 microsecond; +-- ns is not supported +select interval 10 nanoseconds; + +-- unsupported data type +select GEO '(10,-6)'; http://git-wip-us.apache.org/repos/asf/spark/blob/bc683f03/sql/core/src/test/resources/sql-tests/inputs/number-format.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql b/sql/core/src/test/resources/sql-tests/inputs/number-format.sql deleted file mode 100644 index a32d068..000 --- a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql +++ /dev/null @@ -1,16 +0,0 @@ --- Verifies how we parse numbers - --- parse as ints -select 1, -1; - --- parse as longs (Int.MaxValue + 1, and Int.MinValue -
spark git commit: [SPARK-17018][SQL] literals.sql for testing literal parsing
Repository: spark Updated Branches: refs/heads/master acaf2a81a -> cf9367826 [SPARK-17018][SQL] literals.sql for testing literal parsing ## What changes were proposed in this pull request? This patch adds literals.sql for testing literal parsing end-to-end in SQL. ## How was this patch tested? The patch itself is only about adding test cases. Author: petermaxleeCloses #14598 from petermaxlee/SPARK-17018-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf936782 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf936782 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf936782 Branch: refs/heads/master Commit: cf9367826c38e5f34ae69b409f5d09c55ed1d319 Parents: acaf2a8 Author: petermaxlee Authored: Thu Aug 11 13:55:10 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 13:55:10 2016 -0700 -- .../resources/sql-tests/inputs/literals.sql | 92 + .../sql-tests/inputs/number-format.sql | 16 - .../sql-tests/results/literals.sql.out | 374 +++ .../sql-tests/results/number-format.sql.out | 42 --- .../apache/spark/sql/SQLQueryTestSuite.scala| 14 +- 5 files changed, 476 insertions(+), 62 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cf936782/sql/core/src/test/resources/sql-tests/inputs/literals.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/literals.sql b/sql/core/src/test/resources/sql-tests/inputs/literals.sql new file mode 100644 index 000..62f0d3d --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/literals.sql @@ -0,0 +1,92 @@ +-- Literal parsing + +-- null +select null, Null, nUll; + +-- boolean +select true, tRue, false, fALse; + +-- byte (tinyint) +select 1Y; +select 127Y, -128Y; + +-- out of range byte +select 128Y; + +-- short (smallint) +select 1S; +select 32767S, -32768S; + +-- out of range short +select 32768S; + +-- long (bigint) +select 1L, 2147483648L; +select 9223372036854775807L, -9223372036854775808L; + +-- out of range long +select 9223372036854775808L; + +-- integral parsing + +-- parse int +select 1, -1; + +-- parse int max and min value as int +select 2147483647, -2147483648; + +-- parse long max and min value as long +select 9223372036854775807, -9223372036854775808; + +-- parse as decimals (Long.MaxValue + 1, and Long.MinValue - 1) +select 9223372036854775808, -9223372036854775809; + +-- out of range decimal numbers +select 1234567890123456789012345678901234567890; +select 1234567890123456789012345678901234567890.0; + +-- double +select 1D, 1.2D, 1e10, 1.5e5, .10D, 0.10D, .1e5, .9e+2, 0.9e+2, 900e-1, 9.e+1; +select -1D, -1.2D, -1e10, -1.5e5, -.10D, -0.10D, -.1e5; +-- negative double +select .e3; +-- inf and -inf +select 1E309, -1E309; + +-- decimal parsing +select 0.3, -0.8, .5, -.18, 0., .; + +-- super large scientific notation numbers should still be valid doubles +select 123456789012345678901234567890123456789e10, 123456789012345678901234567890123456789.1e10; + +-- string +select "Hello Peter!", 'hello lee!'; +-- multi string +select 'hello' 'world', 'hello' " " 'lee'; +-- single quote within double quotes +select "hello 'peter'"; +select 'pattern%', 'no-pattern\%', 'pattern\\%', 'pattern\\\%'; +select '\'', '"', '\n', '\r', '\t', 'Z'; +-- "Hello!" in octals +select '\110\145\154\154\157\041'; +-- "World :)" in unicode +select '\u0057\u006F\u0072\u006C\u0064\u0020\u003A\u0029'; + +-- date +select dAte '2016-03-12'; +-- invalid date +select date 'mar 11 2016'; + +-- timestamp +select tImEstAmp '2016-03-11 20:54:00.000'; +-- invalid timestamp +select timestamp '2016-33-11 20:54:00.000'; + +-- interval +select interval 13.123456789 seconds, interval -13.123456789 second; +select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisecond, 9 microsecond; +-- ns is not supported +select interval 10 nanoseconds; + +-- unsupported data type +select GEO '(10,-6)'; http://git-wip-us.apache.org/repos/asf/spark/blob/cf936782/sql/core/src/test/resources/sql-tests/inputs/number-format.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql b/sql/core/src/test/resources/sql-tests/inputs/number-format.sql deleted file mode 100644 index a32d068..000 --- a/sql/core/src/test/resources/sql-tests/inputs/number-format.sql +++ /dev/null @@ -1,16 +0,0 @@ --- Verifies how we parse numbers - --- parse as ints -select 1, -1; - --- parse as longs (Int.MaxValue + 1, and Int.MinValue - 1) -select 2147483648, -2147483649; - --- parse long min and max value -select 9223372036854775807, -9223372036854775808; - ---
spark git commit: [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
Repository: spark Updated Branches: refs/heads/branch-2.0 33a213f33 -> 6bf20cd94 [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests This patch adds three test files: 1. arithmetic.sql.out 2. order-by-ordinal.sql 3. group-by-ordinal.sql This includes https://github.com/apache/spark/pull/14594. This is a test case change. Author: petermaxleeCloses #14595 from petermaxlee/SPARK-17015. (cherry picked from commit a7b02db457d5fc663ce6a1ef01bf04689870e6b4) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6bf20cd9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6bf20cd9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6bf20cd9 Branch: refs/heads/branch-2.0 Commit: 6bf20cd9460fd27c3e1e434b1cf31a3778ec3443 Parents: 33a213f Author: petermaxlee Authored: Thu Aug 11 01:43:08 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 10:50:52 2016 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 24 +- .../resources/sql-tests/inputs/arithmetic.sql | 26 +++ .../sql-tests/inputs/group-by-ordinal.sql | 50 + .../sql-tests/inputs/order-by-ordinal.sql | 36 +++ .../sql-tests/results/arithmetic.sql.out| 178 +++ .../sql-tests/results/group-by-ordinal.sql.out | 168 ++ .../sql-tests/results/order-by-ordinal.sql.out | 143 .../org/apache/spark/sql/SQLQuerySuite.scala| 220 --- 8 files changed, 613 insertions(+), 232 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6bf20cd9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 660f523..57c3d9a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -547,8 +547,7 @@ class Analyzer( case a: Aggregate if containsStar(a.aggregateExpressions) => if (conf.groupByOrdinal && a.groupingExpressions.exists(IntegerIndex.unapply(_).nonEmpty)) { failAnalysis( -"Group by position: star is not allowed to use in the select list " + - "when using ordinals in group by") +"Star (*) is not allowed in select list when GROUP BY ordinal position is used") } else { a.copy(aggregateExpressions = buildExpandedProjectList(a.aggregateExpressions, a.child)) } @@ -723,9 +722,9 @@ class Analyzer( if (index > 0 && index <= child.output.size) { SortOrder(child.output(index - 1), direction) } else { - throw new UnresolvedException(s, -s"Order/sort By position: $index does not exist " + -s"The Select List is indexed from 1 to ${child.output.size}") + s.failAnalysis( +s"ORDER BY position $index is not in select list " + + s"(valid range is [1, ${child.output.size}])") } case o => o } @@ -737,17 +736,18 @@ class Analyzer( if conf.groupByOrdinal && aggs.forall(_.resolved) && groups.exists(IntegerIndex.unapply(_).nonEmpty) => val newGroups = groups.map { - case IntegerIndex(index) if index > 0 && index <= aggs.size => + case ordinal @ IntegerIndex(index) if index > 0 && index <= aggs.size => aggs(index - 1) match { case e if ResolveAggregateFunctions.containsAggregate(e) => -throw new UnresolvedException(a, - s"Group by position: the '$index'th column in the select contains an " + - s"aggregate function: ${e.sql}. Aggregate functions are not allowed in GROUP BY") +ordinal.failAnalysis( + s"GROUP BY position $index is an aggregate function, and " + +"aggregate functions are not allowed in GROUP BY") case o => o } - case IntegerIndex(index) => -throw new UnresolvedException(a, - s"Group by position: '$index' exceeds the size of the select list '${aggs.size}'.") + case ordinal @ IntegerIndex(index) => +ordinal.failAnalysis( + s"GROUP BY position $index is not in select list " + +s"(valid range is [1, ${aggs.size}])") case o => o } Aggregate(newGroups,
spark git commit: [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
Repository: spark Updated Branches: refs/heads/master 0db373aaf -> a7b02db45 [SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests ## What changes were proposed in this pull request? This patch adds three test files: 1. arithmetic.sql.out 2. order-by-ordinal.sql 3. group-by-ordinal.sql This includes https://github.com/apache/spark/pull/14594. ## How was this patch tested? This is a test case change. Author: petermaxleeCloses #14595 from petermaxlee/SPARK-17015. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7b02db4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7b02db4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7b02db4 Branch: refs/heads/master Commit: a7b02db457d5fc663ce6a1ef01bf04689870e6b4 Parents: 0db373a Author: petermaxlee Authored: Thu Aug 11 01:43:08 2016 -0700 Committer: Reynold Xin Committed: Thu Aug 11 01:43:08 2016 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 24 +- .../resources/sql-tests/inputs/arithmetic.sql | 26 +++ .../sql-tests/inputs/group-by-ordinal.sql | 50 + .../sql-tests/inputs/order-by-ordinal.sql | 36 +++ .../sql-tests/results/arithmetic.sql.out| 178 +++ .../sql-tests/results/group-by-ordinal.sql.out | 168 ++ .../sql-tests/results/order-by-ordinal.sql.out | 143 .../org/apache/spark/sql/SQLQuerySuite.scala| 220 --- 8 files changed, 613 insertions(+), 232 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a7b02db4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 25202b5..14a2a32 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -547,8 +547,7 @@ class Analyzer( case a: Aggregate if containsStar(a.aggregateExpressions) => if (conf.groupByOrdinal && a.groupingExpressions.exists(IntegerIndex.unapply(_).nonEmpty)) { failAnalysis( -"Group by position: star is not allowed to use in the select list " + - "when using ordinals in group by") +"Star (*) is not allowed in select list when GROUP BY ordinal position is used") } else { a.copy(aggregateExpressions = buildExpandedProjectList(a.aggregateExpressions, a.child)) } @@ -723,9 +722,9 @@ class Analyzer( if (index > 0 && index <= child.output.size) { SortOrder(child.output(index - 1), direction) } else { - throw new UnresolvedException(s, -s"Order/sort By position: $index does not exist " + -s"The Select List is indexed from 1 to ${child.output.size}") + s.failAnalysis( +s"ORDER BY position $index is not in select list " + + s"(valid range is [1, ${child.output.size}])") } case o => o } @@ -737,17 +736,18 @@ class Analyzer( if conf.groupByOrdinal && aggs.forall(_.resolved) && groups.exists(IntegerIndex.unapply(_).nonEmpty) => val newGroups = groups.map { - case IntegerIndex(index) if index > 0 && index <= aggs.size => + case ordinal @ IntegerIndex(index) if index > 0 && index <= aggs.size => aggs(index - 1) match { case e if ResolveAggregateFunctions.containsAggregate(e) => -throw new UnresolvedException(a, - s"Group by position: the '$index'th column in the select contains an " + - s"aggregate function: ${e.sql}. Aggregate functions are not allowed in GROUP BY") +ordinal.failAnalysis( + s"GROUP BY position $index is an aggregate function, and " + +"aggregate functions are not allowed in GROUP BY") case o => o } - case IntegerIndex(index) => -throw new UnresolvedException(a, - s"Group by position: '$index' exceeds the size of the select list '${aggs.size}'.") + case ordinal @ IntegerIndex(index) => +ordinal.failAnalysis( + s"GROUP BY position $index is not in select list " + +s"(valid range is [1, ${aggs.size}])") case o => o } Aggregate(newGroups, aggs, child)
spark git commit: [SPARK-17010][MINOR][DOC] Wrong description in memory management document
Repository: spark Updated Branches: refs/heads/master 665e17532 -> 7a6a3c3fb [SPARK-17010][MINOR][DOC] Wrong description in memory management document ## What changes were proposed in this pull request? change the remain percent to right one. ## How was this patch tested? Manual review Author: Tao WangCloses #14591 from WangTaoTheTonic/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a6a3c3f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7a6a3c3f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7a6a3c3f Branch: refs/heads/master Commit: 7a6a3c3fbcea889ca20beae9d4198df2fe53bd1b Parents: 665e175 Author: Tao Wang Authored: Wed Aug 10 22:30:18 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 10 22:30:18 2016 -0700 -- docs/tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7a6a3c3f/docs/tuning.md -- diff --git a/docs/tuning.md b/docs/tuning.md index 1ed1409..976f2eb 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -115,7 +115,7 @@ Although there are two relevant configurations, the typical user should not need as the default values are applicable to most workloads: * `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM heap space - 300MB) -(default 0.6). The rest of the space (25%) is reserved for user data structures, internal +(default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. * `spark.memory.storageFraction` expresses the size of `R` as a fraction of `M` (default 0.5). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17010][MINOR][DOC] Wrong description in memory management document
Repository: spark Updated Branches: refs/heads/branch-2.0 d3a30d2f0 -> 1e4013571 [SPARK-17010][MINOR][DOC] Wrong description in memory management document ## What changes were proposed in this pull request? change the remain percent to right one. ## How was this patch tested? Manual review Author: Tao WangCloses #14591 from WangTaoTheTonic/patch-1. (cherry picked from commit 7a6a3c3fbcea889ca20beae9d4198df2fe53bd1b) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e401357 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e401357 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e401357 Branch: refs/heads/branch-2.0 Commit: 1e4013571b18ca337ea664838f7f8e781c8de7aa Parents: d3a30d2 Author: Tao Wang Authored: Wed Aug 10 22:30:18 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 10 22:30:25 2016 -0700 -- docs/tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1e401357/docs/tuning.md -- diff --git a/docs/tuning.md b/docs/tuning.md index 1ed1409..976f2eb 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -115,7 +115,7 @@ Although there are two relevant configurations, the typical user should not need as the default values are applicable to most workloads: * `spark.memory.fraction` expresses the size of `M` as a fraction of the (JVM heap space - 300MB) -(default 0.6). The rest of the space (25%) is reserved for user data structures, internal +(default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records. * `spark.memory.storageFraction` expresses the size of `R` as a fraction of `M` (default 0.5). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17007][SQL] Move test data files into a test-data folder
Repository: spark Updated Branches: refs/heads/master 425c7c2db -> 665e17532 [SPARK-17007][SQL] Move test data files into a test-data folder ## What changes were proposed in this pull request? This patch moves all the test data files in sql/core/src/test/resources to sql/core/src/test/resources/test-data, so we don't clutter the top level sql/core/src/test/resources. Also deleted sql/core/src/test/resources/old-repeated.parquet since it is no longer used. The change will make it easier to spot sql-tests directory. ## How was this patch tested? This is a test-only change. Author: petermaxleeCloses #14589 from petermaxlee/SPARK-17007. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/665e1753 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/665e1753 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/665e1753 Branch: refs/heads/master Commit: 665e175328130ab3eb0370cdd2a43ed5a7bed1d6 Parents: 425c7c2 Author: petermaxlee Authored: Wed Aug 10 21:26:46 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 10 21:26:46 2016 -0700 -- .../apache/spark/sql/JavaDataFrameSuite.java| 12 +++ sql/core/src/test/resources/bool.csv| 5 --- .../src/test/resources/cars-alternative.csv | 5 --- .../test/resources/cars-blank-column-name.csv | 3 -- sql/core/src/test/resources/cars-malformed.csv | 6 sql/core/src/test/resources/cars-null.csv | 6 .../test/resources/cars-unbalanced-quotes.csv | 4 --- sql/core/src/test/resources/cars.csv| 7 sql/core/src/test/resources/cars.tsv| 4 --- sql/core/src/test/resources/cars_iso-8859-1.csv | 6 sql/core/src/test/resources/comments.csv| 6 sql/core/src/test/resources/dates.csv | 4 --- .../src/test/resources/dec-in-fixed-len.parquet | Bin 460 -> 0 bytes sql/core/src/test/resources/dec-in-i32.parquet | Bin 420 -> 0 bytes sql/core/src/test/resources/dec-in-i64.parquet | Bin 437 -> 0 bytes sql/core/src/test/resources/decimal.csv | 7 .../src/test/resources/disable_comments.csv | 2 -- sql/core/src/test/resources/empty.csv | 0 .../test/resources/nested-array-struct.parquet | Bin 775 -> 0 bytes sql/core/src/test/resources/numbers.csv | 9 - .../src/test/resources/old-repeated-int.parquet | Bin 389 -> 0 bytes .../test/resources/old-repeated-message.parquet | Bin 600 -> 0 bytes .../src/test/resources/old-repeated.parquet | Bin 432 -> 0 bytes .../parquet-thrift-compat.snappy.parquet| Bin 10550 -> 0 bytes .../resources/proto-repeated-string.parquet | Bin 411 -> 0 bytes .../resources/proto-repeated-struct.parquet | Bin 608 -> 0 bytes .../proto-struct-with-array-many.parquet| Bin 802 -> 0 bytes .../resources/proto-struct-with-array.parquet | Bin 1576 -> 0 bytes sql/core/src/test/resources/simple_sparse.csv | 5 --- sql/core/src/test/resources/test-data/bool.csv | 5 +++ .../resources/test-data/cars-alternative.csv| 5 +++ .../test-data/cars-blank-column-name.csv| 3 ++ .../test/resources/test-data/cars-malformed.csv | 6 .../src/test/resources/test-data/cars-null.csv | 6 .../test-data/cars-unbalanced-quotes.csv| 4 +++ sql/core/src/test/resources/test-data/cars.csv | 7 sql/core/src/test/resources/test-data/cars.tsv | 4 +++ .../resources/test-data/cars_iso-8859-1.csv | 6 .../src/test/resources/test-data/comments.csv | 6 sql/core/src/test/resources/test-data/dates.csv | 4 +++ .../test-data/dec-in-fixed-len.parquet | Bin 0 -> 460 bytes .../test/resources/test-data/dec-in-i32.parquet | Bin 0 -> 420 bytes .../test/resources/test-data/dec-in-i64.parquet | Bin 0 -> 437 bytes .../src/test/resources/test-data/decimal.csv| 7 .../resources/test-data/disable_comments.csv| 2 ++ sql/core/src/test/resources/test-data/empty.csv | 0 .../test-data/nested-array-struct.parquet | Bin 0 -> 775 bytes .../src/test/resources/test-data/numbers.csv| 9 + .../test-data/old-repeated-int.parquet | Bin 0 -> 389 bytes .../test-data/old-repeated-message.parquet | Bin 0 -> 600 bytes .../parquet-thrift-compat.snappy.parquet| Bin 0 -> 10550 bytes .../test-data/proto-repeated-string.parquet | Bin 0 -> 411 bytes .../test-data/proto-repeated-struct.parquet | Bin 0 -> 608 bytes .../proto-struct-with-array-many.parquet| Bin 0 -> 802 bytes .../test-data/proto-struct-with-array.parquet | Bin 0 -> 1576 bytes .../test/resources/test-data/simple_sparse.csv | 5 +++ .../text-partitioned/year=2014/data.txt | 1 + .../text-partitioned/year=2015/data.txt | 1 +
spark git commit: [SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQueryTestSuite.
Repository: spark Updated Branches: refs/heads/master ab648c000 -> 425c7c2db [SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQueryTestSuite. ## What changes were proposed in this pull request? This patch enhances SQLQueryTestSuite in two ways: 1. SPARK-17009: Use a new SparkSession for each test case to provide stronger isolation (e.g. config changes in one test case does not impact another). That said, we do not currently isolate catalog changes. 2. SPARK-17008: Normalize query output using sorting, inspired by HiveComparisonTest. I also ported a few new test cases over from SQLQuerySuite. ## How was this patch tested? This is a test harness update. Author: petermaxleeCloses #14590 from petermaxlee/SPARK-17008. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/425c7c2d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/425c7c2d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/425c7c2d Branch: refs/heads/master Commit: 425c7c2dbd2923094712e1215dd29272fb09cd79 Parents: ab648c0 Author: petermaxlee Authored: Wed Aug 10 21:05:32 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 10 21:05:32 2016 -0700 -- .../resources/sql-tests/inputs/datetime.sql | 4 ++ .../test/resources/sql-tests/inputs/having.sql | 15 + .../resources/sql-tests/inputs/natural-join.sql | 20 ++ .../sql-tests/results/datetime.sql.out | 10 +++ .../resources/sql-tests/results/having.sql.out | 40 .../sql-tests/results/natural-join.sql.out | 64 .../org/apache/spark/sql/SQLQuerySuite.scala| 62 --- .../apache/spark/sql/SQLQueryTestSuite.scala| 30 - 8 files changed, 180 insertions(+), 65 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/datetime.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql new file mode 100644 index 000..3fd1c37 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql @@ -0,0 +1,4 @@ +-- date time functions + +-- [SPARK-16836] current_date and current_timestamp literals +select current_date = current_date(), current_timestamp = current_timestamp(); http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/having.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/having.sql b/sql/core/src/test/resources/sql-tests/inputs/having.sql new file mode 100644 index 000..364c022 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/having.sql @@ -0,0 +1,15 @@ +create temporary view hav as select * from values + ("one", 1), + ("two", 2), + ("three", 3), + ("one", 5) + as hav(k, v); + +-- having clause +SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) > 2; + +-- having condition contains grouping column +SELECT count(k) FROM hav GROUP BY v + 1 HAVING v + 1 = 2; + +-- SPARK-11032: resolve having correctly +SELECT MIN(t.v) FROM (SELECT * FROM hav WHERE v > 0) t HAVING(COUNT(1) > 0); http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql -- diff --git a/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql b/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql new file mode 100644 index 000..71a5015 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql @@ -0,0 +1,20 @@ +create temporary view nt1 as select * from values + ("one", 1), + ("two", 2), + ("three", 3) + as nt1(k, v1); + +create temporary view nt2 as select * from values + ("one", 1), + ("two", 22), + ("one", 5) + as nt2(k, v2); + + +SELECT * FROM nt1 natural join nt2 where k = "one"; + +SELECT * FROM nt1 natural left join nt2 order by v1, v2; + +SELECT * FROM nt1 natural right join nt2 order by v1, v2; + +SELECT count(*) FROM nt1 natural full outer join nt2; http://git-wip-us.apache.org/repos/asf/spark/blob/425c7c2d/sql/core/src/test/resources/sql-tests/results/datetime.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out new file mode 100644 index 000..5174657 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out @@ -0,0 +1,10 @@ +-- Automatically generated by
spark git commit: Fixed typo
Repository: spark Updated Branches: refs/heads/master 121643bc7 -> 9dc3e602d Fixed typo ## What changes were proposed in this pull request? Fixed small typo - "value ... ~~in~~ is null" ## How was this patch tested? Still compiles! Author: Michał KiełbowiczCloses #14569 from jupblb/typo-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9dc3e602 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9dc3e602 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9dc3e602 Branch: refs/heads/master Commit: 9dc3e602d77ccdf670f1b6648e5674066d189cc0 Parents: 121643b Author: Michał Kiełbowicz Authored: Tue Aug 9 23:01:50 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 9 23:01:50 2016 -0700 -- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9dc3e602/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index d83eef7..e16850e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -463,6 +463,6 @@ trait Row extends Serializable { * @throws NullPointerException when value is null. */ private def getAnyValAs[T <: AnyVal](i: Int): T = -if (isNullAt(i)) throw new NullPointerException(s"Value at index $i in null") +if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is null") else getAs[T](i) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Fixed typo
Repository: spark Updated Branches: refs/heads/branch-2.0 2d136dba4 -> 475ee3815 Fixed typo ## What changes were proposed in this pull request? Fixed small typo - "value ... ~~in~~ is null" ## How was this patch tested? Still compiles! Author: Michał KiełbowiczCloses #14569 from jupblb/typo-fix. (cherry picked from commit 9dc3e602d77ccdf670f1b6648e5674066d189cc0) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/475ee381 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/475ee381 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/475ee381 Branch: refs/heads/branch-2.0 Commit: 475ee38150ee5a234156a903e4de227954b0063e Parents: 2d136db Author: Michał Kiełbowicz Authored: Tue Aug 9 23:01:50 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 9 23:01:57 2016 -0700 -- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/475ee381/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index d83eef7..e16850e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -463,6 +463,6 @@ trait Row extends Serializable { * @throws NullPointerException when value is null. */ private def getAnyValAs[T <: AnyVal](i: Int): T = -if (isNullAt(i)) throw new NullPointerException(s"Value at index $i in null") +if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is null") else getAs[T](i) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
Repository: spark Updated Branches: refs/heads/branch-2.0 6fc54b776 -> 601c649d0 [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug ## What changes were proposed in this pull request? Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly. JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563 ## How was this patch tested? This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server. Author: AliceAuthor: Alice Closes #14218 from alicegugu/SparkSQLFetchResultsBug. (cherry picked from commit e17a76efdb44837c38388a4d0e62436065cd4dc9) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/601c649d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/601c649d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/601c649d Branch: refs/heads/branch-2.0 Commit: 601c649d0134e6791f1c0e0aaa25d6aad3c541d4 Parents: 6fc54b7 Author: Alice Authored: Mon Aug 8 18:00:04 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 8 18:00:58 2016 -0700 -- .../SparkExecuteStatementOperation.scala| 12 + .../thriftserver/HiveThriftServer2Suites.scala | 48 2 files changed, 60 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/601c649d/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index e8bcdd7..b2717ec 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -51,6 +51,7 @@ private[hive] class SparkExecuteStatementOperation( private var result: DataFrame = _ private var iter: Iterator[SparkRow] = _ + private var iterHeader: Iterator[SparkRow] = _ private var dataTypes: Array[DataType] = _ private var statementId: String = _ @@ -110,6 +111,14 @@ private[hive] class SparkExecuteStatementOperation( assertState(OperationState.FINISHED) setHasResultSet(true) val resultRowSet: RowSet = RowSetFactory.create(getResultSetSchema, getProtocolVersion) + +// Reset iter to header when fetching start from first row +if (order.equals(FetchOrientation.FETCH_FIRST)) { + val (ita, itb) = iterHeader.duplicate + iter = ita + iterHeader = itb +} + if (!iter.hasNext) { resultRowSet } else { @@ -228,6 +237,9 @@ private[hive] class SparkExecuteStatementOperation( result.collect().iterator } } + val (itra, itrb) = iter.duplicate + iterHeader = itra + iter = itrb dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray } catch { case e: HiveSQLException => http://git-wip-us.apache.org/repos/asf/spark/blob/601c649d/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala -- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala index e388c2a..8f2c4fa 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala @@ -36,6 +36,8 @@ import org.apache.hive.service.auth.PlainSaslHelper import org.apache.hive.service.cli.GetInfoType import org.apache.hive.service.cli.thrift.TCLIService.Client import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient +import org.apache.hive.service.cli.FetchOrientation +import org.apache.hive.service.cli.FetchType import org.apache.thrift.protocol.TBinaryProtocol import org.apache.thrift.transport.TSocket import org.scalatest.BeforeAndAfterAll @@ -91,6 +93,52 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { } }
spark git commit: [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
Repository: spark Updated Branches: refs/heads/master bca43cd63 -> e17a76efd [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug ## What changes were proposed in this pull request? Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly. JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563 ## How was this patch tested? This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server. Author: AliceAuthor: Alice Closes #14218 from alicegugu/SparkSQLFetchResultsBug. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e17a76ef Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e17a76ef Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e17a76ef Branch: refs/heads/master Commit: e17a76efdb44837c38388a4d0e62436065cd4dc9 Parents: bca43cd Author: Alice Authored: Mon Aug 8 18:00:04 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 8 18:00:04 2016 -0700 -- .../SparkExecuteStatementOperation.scala| 12 + .../thriftserver/HiveThriftServer2Suites.scala | 48 2 files changed, 60 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e17a76ef/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index e8bcdd7..b2717ec 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -51,6 +51,7 @@ private[hive] class SparkExecuteStatementOperation( private var result: DataFrame = _ private var iter: Iterator[SparkRow] = _ + private var iterHeader: Iterator[SparkRow] = _ private var dataTypes: Array[DataType] = _ private var statementId: String = _ @@ -110,6 +111,14 @@ private[hive] class SparkExecuteStatementOperation( assertState(OperationState.FINISHED) setHasResultSet(true) val resultRowSet: RowSet = RowSetFactory.create(getResultSetSchema, getProtocolVersion) + +// Reset iter to header when fetching start from first row +if (order.equals(FetchOrientation.FETCH_FIRST)) { + val (ita, itb) = iterHeader.duplicate + iter = ita + iterHeader = itb +} + if (!iter.hasNext) { resultRowSet } else { @@ -228,6 +237,9 @@ private[hive] class SparkExecuteStatementOperation( result.collect().iterator } } + val (itra, itrb) = iter.duplicate + iterHeader = itra + iter = itrb dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray } catch { case e: HiveSQLException => http://git-wip-us.apache.org/repos/asf/spark/blob/e17a76ef/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala -- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala index e388c2a..8f2c4fa 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala @@ -36,6 +36,8 @@ import org.apache.hive.service.auth.PlainSaslHelper import org.apache.hive.service.cli.GetInfoType import org.apache.hive.service.cli.thrift.TCLIService.Client import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient +import org.apache.hive.service.cli.FetchOrientation +import org.apache.hive.service.cli.FetchType import org.apache.thrift.protocol.TBinaryProtocol import org.apache.thrift.transport.TSocket import org.scalatest.BeforeAndAfterAll @@ -91,6 +93,52 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { } } + test("SPARK-16563 ThriftCLIService FetchResults repeat fetching result") { +withCLIServiceClient { client => + val
spark git commit: Update docs to include SASL support for RPC
Repository: spark Updated Branches: refs/heads/branch-2.0 9748a2928 -> 6fc54b776 Update docs to include SASL support for RPC ## What changes were proposed in this pull request? Update docs to include SASL support for RPC Evidence: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63 ## How was this patch tested? Docs change only Author: Michael GummeltCloses #14549 from mgummelt/sasl. (cherry picked from commit 53d1c7877967f03cc9c8c7e7394f380d1bbefc27) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fc54b77 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fc54b77 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fc54b77 Branch: refs/heads/branch-2.0 Commit: 6fc54b776419317dc55754a76b68a5ba7eecdcf3 Parents: 9748a29 Author: Michael Gummelt Authored: Mon Aug 8 16:07:51 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 8 16:08:09 2016 -0700 -- docs/configuration.md | 7 --- docs/security.md | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6fc54b77/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index bf10b24..8facd0e 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1204,7 +1204,7 @@ Apart from these, the following properties are also available, and may be useful false Whether to use dynamic resource allocation, which scales the number of executors registered -with this application up and down based on the workload. +with this application up and down based on the workload. For more detail, see the description here. @@ -1345,8 +1345,9 @@ Apart from these, the following properties are also available, and may be useful spark.authenticate.enableSaslEncryption false -Enable encrypted communication when authentication is enabled. This option is currently -only supported by the block transfer service. +Enable encrypted communication when authentication is +enabled. This is supported by the block transfer service and the +RPC endpoints. http://git-wip-us.apache.org/repos/asf/spark/blob/6fc54b77/docs/security.md -- diff --git a/docs/security.md b/docs/security.md index d2708a8..baadfef 100644 --- a/docs/security.md +++ b/docs/security.md @@ -27,7 +27,8 @@ If your applications are using event logging, the directory where the event logs ## Encryption -Spark supports SSL for HTTP protocols. SASL encryption is supported for the block transfer service. +Spark supports SSL for HTTP protocols. SASL encryption is supported for the block transfer service +and the RPC endpoints. Encryption is not yet supported for data stored by Spark in temporary local storage, such as shuffle files, cached data, and other application files. If encrypting this data is desired, a workaround is - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Update docs to include SASL support for RPC
Repository: spark Updated Branches: refs/heads/master 9216901d5 -> 53d1c7877 Update docs to include SASL support for RPC ## What changes were proposed in this pull request? Update docs to include SASL support for RPC Evidence: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63 ## How was this patch tested? Docs change only Author: Michael GummeltCloses #14549 from mgummelt/sasl. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/53d1c787 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/53d1c787 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53d1c787 Branch: refs/heads/master Commit: 53d1c7877967f03cc9c8c7e7394f380d1bbefc27 Parents: 9216901 Author: Michael Gummelt Authored: Mon Aug 8 16:07:51 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 8 16:07:51 2016 -0700 -- docs/configuration.md | 7 --- docs/security.md | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/53d1c787/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index cc6b2b6..4569bed 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1211,7 +1211,7 @@ Apart from these, the following properties are also available, and may be useful false Whether to use dynamic resource allocation, which scales the number of executors registered -with this application up and down based on the workload. +with this application up and down based on the workload. For more detail, see the description here. @@ -1352,8 +1352,9 @@ Apart from these, the following properties are also available, and may be useful spark.authenticate.enableSaslEncryption false -Enable encrypted communication when authentication is enabled. This option is currently -only supported by the block transfer service. +Enable encrypted communication when authentication is +enabled. This is supported by the block transfer service and the +RPC endpoints. http://git-wip-us.apache.org/repos/asf/spark/blob/53d1c787/docs/security.md -- diff --git a/docs/security.md b/docs/security.md index d2708a8..baadfef 100644 --- a/docs/security.md +++ b/docs/security.md @@ -27,7 +27,8 @@ If your applications are using event logging, the directory where the event logs ## Encryption -Spark supports SSL for HTTP protocols. SASL encryption is supported for the block transfer service. +Spark supports SSL for HTTP protocols. SASL encryption is supported for the block transfer service +and the RPC endpoints. Encryption is not yet supported for data stored by Spark in temporary local storage, such as shuffle files, cached data, and other application files. If encrypting this data is desired, a workaround is - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
Repository: spark Updated Branches: refs/heads/branch-1.6 52d8837c6 -> d2518acc1 [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data ## What changes were proposed in this pull request? SpillReader NPE when spillFile has no data. See follow logs: 16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa, fileSize:0.0 B 16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from org.apache.spark.util.collection.ExternalSorter3db4b52d 16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: Exception in task 1013.0 in stage 18.0 (TID 23585) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 1090.1 in stage 18.0 (TID 23793) 16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown ## How was this patch tested? Manual test. Author: sharkdAuthor: sharkdtu Closes #14479 from sharkdtu/master. (cherry picked from commit 583d91a1957f4258a64184cc6b9007588791d332) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d2518acc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d2518acc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d2518acc Branch: refs/heads/branch-1.6 Commit: d2518acc1df44b1ecb8eed20404bcc1277f358a4 Parents: 52d8837 Author: sharkd Authored: Wed Aug 3 19:20:34 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 3 19:21:16 2016 -0700 -- .../scala/org/apache/spark/util/collection/ExternalSorter.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d2518acc/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala index 44b1d90..60ec1ca 100644 --- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala +++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala @@ -592,7 +592,9 @@ private[spark] class ExternalSorter[K, V, C]( val ds = deserializeStream deserializeStream = null fileStream = null - ds.close() + if (ds != null) { +ds.close() + } // NOTE: We don't do file.delete() here because that is done in ExternalSorter.stop(). // This should also be fixed in
spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
Repository: spark Updated Branches: refs/heads/branch-2.0 bb30a3d0f -> 11854e5a1 [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data ## What changes were proposed in this pull request? SpillReader NPE when spillFile has no data. See follow logs: 16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa, fileSize:0.0 B 16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from org.apache.spark.util.collection.ExternalSorter3db4b52d 16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: Exception in task 1013.0 in stage 18.0 (TID 23585) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 1090.1 in stage 18.0 (TID 23793) 16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown ## How was this patch tested? Manual test. Author: sharkdAuthor: sharkdtu Closes #14479 from sharkdtu/master. (cherry picked from commit 583d91a1957f4258a64184cc6b9007588791d332) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11854e5a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11854e5a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11854e5a Branch: refs/heads/branch-2.0 Commit: 11854e5a1baa7682d91bfce4e8bba57566f22b3a Parents: bb30a3d Author: sharkd Authored: Wed Aug 3 19:20:34 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 3 19:20:56 2016 -0700 -- .../scala/org/apache/spark/util/collection/ExternalSorter.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/11854e5a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala index 4067ace..6ea7307 100644 --- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala +++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala @@ -622,7 +622,9 @@ private[spark] class ExternalSorter[K, V, C]( val ds = deserializeStream deserializeStream = null fileStream = null - ds.close() + if (ds != null) { +ds.close() + } // NOTE: We don't do file.delete() here because that is done in ExternalSorter.stop(). // This should also be fixed in
spark git commit: [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
Repository: spark Updated Branches: refs/heads/master c5eb1df72 -> 583d91a19 [SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data ## What changes were proposed in this pull request? SpillReader NPE when spillFile has no data. See follow logs: 16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa, fileSize:0.0 B 16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from org.apache.spark.util.collection.ExternalSorter3db4b52d 16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: Exception in task 1013.0 in stage 18.0 (TID 23585) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539) at org.apache.spark.util.collection.ExternalSorter$SpillReader.(ExternalSorter.scala:507) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 1090.1 in stage 18.0 (TID 23793) 16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown ## How was this patch tested? Manual test. Author: sharkdAuthor: sharkdtu Closes #14479 from sharkdtu/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/583d91a1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/583d91a1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/583d91a1 Branch: refs/heads/master Commit: 583d91a1957f4258a64184cc6b9007588791d332 Parents: c5eb1df Author: sharkd Authored: Wed Aug 3 19:20:34 2016 -0700 Committer: Reynold Xin Committed: Wed Aug 3 19:20:34 2016 -0700 -- .../scala/org/apache/spark/util/collection/ExternalSorter.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/583d91a1/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala index 708a007..7c98e8c 100644 --- a/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala +++ b/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala @@ -611,7 +611,9 @@ private[spark] class ExternalSorter[K, V, C]( val ds = deserializeStream deserializeStream = null fileStream = null - ds.close() + if (ds != null) { +ds.close() + } // NOTE: We don't do file.delete() here because that is done in ExternalSorter.stop(). // This should also be fixed in ExternalAppendOnlyMap. } - To unsubscribe, e-mail:
spark git commit: [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState
Repository: spark Updated Branches: refs/heads/master e9fc0b6a8 -> b73a57060 [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState ### What changes were proposed in this pull request? This PR is to remove `TestHiveSharedState`. Also, this is also associated with the Hive refractoring for removing `HiveSharedState`. ### How was this patch tested? The existing test cases Author: gatorsmileCloses #14463 from gatorsmile/removeTestHiveSharedState. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b73a5706 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b73a5706 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b73a5706 Branch: refs/heads/master Commit: b73a5706032eae7c87f7f2f8b0a72e7ee6d2e7e5 Parents: e9fc0b6 Author: gatorsmile Authored: Tue Aug 2 14:17:45 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 14:17:45 2016 -0700 -- .../apache/spark/sql/hive/test/TestHive.scala | 78 +--- .../spark/sql/hive/ShowCreateTableSuite.scala | 2 +- 2 files changed, 20 insertions(+), 60 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b73a5706/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala index fbacd59..cdc8d61 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala @@ -24,7 +24,6 @@ import scala.collection.JavaConverters._ import scala.collection.mutable import scala.language.implicitConversions -import org.apache.hadoop.conf.Configuration import org.apache.hadoop.hive.conf.HiveConf.ConfVars import org.apache.hadoop.hive.ql.exec.FunctionRegistry import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe @@ -40,7 +39,6 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.execution.command.CacheTableCommand import org.apache.spark.sql.hive._ -import org.apache.spark.sql.hive.client.HiveClient import org.apache.spark.sql.internal.SQLConf import org.apache.spark.util.{ShutdownHookManager, Utils} @@ -86,8 +84,6 @@ class TestHiveContext( new TestHiveContext(sparkSession.newSession()) } - override def sharedState: TestHiveSharedState = sparkSession.sharedState - override def sessionState: TestHiveSessionState = sparkSession.sessionState def setCacheTables(c: Boolean): Unit = { @@ -112,38 +108,43 @@ class TestHiveContext( * A [[SparkSession]] used in [[TestHiveContext]]. * * @param sc SparkContext - * @param scratchDirPath scratch directory used by Hive's metastore client - * @param metastoreTemporaryConf configuration options for Hive's metastore - * @param existingSharedState optional [[TestHiveSharedState]] + * @param existingSharedState optional [[HiveSharedState]] * @param loadTestTables if true, load the test tables. They can only be loaded when running * in the JVM, i.e when calling from Python this flag has to be false. */ private[hive] class TestHiveSparkSession( @transient private val sc: SparkContext, -scratchDirPath: File, -metastoreTemporaryConf: Map[String, String], -@transient private val existingSharedState: Option[TestHiveSharedState], +@transient private val existingSharedState: Option[HiveSharedState], private val loadTestTables: Boolean) extends SparkSession(sc) with Logging { self => def this(sc: SparkContext, loadTestTables: Boolean) { this( sc, - TestHiveContext.makeScratchDir(), - HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false), - None, + existingSharedState = None, loadTestTables) } + { // set the metastore temporary configuration +val metastoreTempConf = HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false) ++ Map( + ConfVars.METASTORE_INTEGER_JDO_PUSHDOWN.varname -> "true", + // scratch directory used by Hive's metastore client + ConfVars.SCRATCHDIR.varname -> TestHiveContext.makeScratchDir().toURI.toString, + ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname -> "1") + +metastoreTempConf.foreach { case (k, v) => + sc.hadoopConfiguration.set(k, v) +} + } + assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive") - // TODO: Let's remove TestHiveSharedState and TestHiveSessionState. Otherwise, + // TODO: Let's remove HiveSharedState and TestHiveSessionState. Otherwise, // we are not really testing the
spark git commit: [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala
Repository: spark Updated Branches: refs/heads/master cbdff4935 -> a9beeaaae [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala ## What changes were proposed in this pull request? `Greatest` and `Least` are not conditional expressions, but arithmetic expressions. ## How was this patch tested? N/A Author: Wenchen FanCloses #14460 from cloud-fan/move. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9beeaaa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9beeaaa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9beeaaa Branch: refs/heads/master Commit: a9beeaaaeb52e9c940fe86a3d70801655401623c Parents: cbdff49 Author: Wenchen Fan Authored: Tue Aug 2 11:08:32 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 11:08:32 2016 -0700 -- .../sql/catalyst/expressions/arithmetic.scala | 121 ++ .../expressions/conditionalExpressions.scala| 122 --- .../expressions/ArithmeticExpressionSuite.scala | 107 .../ConditionalExpressionSuite.scala| 107 4 files changed, 228 insertions(+), 229 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a9beeaaa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 77d40a5..4aebef9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.spark.sql.types._ @@ -460,3 +461,123 @@ case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic wi override def sql: String = s"$prettyName(${left.sql}, ${right.sql})" } + +/** + * A function that returns the least value of all parameters, skipping null values. + * It takes at least 2 parameters, and returns null iff all parameters are null. + */ +@ExpressionDescription( + usage = "_FUNC_(n1, ...) - Returns the least value of all parameters, skipping null values.") +case class Least(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = children.forall(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"LEAST requires at least 2 arguments") +} else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" got LEAST(${children.map(_.dataType.simpleString).mkString(", ")}).") +} else { + TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) +} + } + + override def dataType: DataType = children.head.dataType + + override def eval(input: InternalRow): Any = { +children.foldLeft[Any](null)((r, c) => { + val evalc = c.eval(input) + if (evalc != null) { +if (r == null || ordering.lt(evalc, r)) evalc else r + } else { +r + } +}) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val first = evalChildren(0) +val rest = evalChildren.drop(1) +def updateEval(eval: ExprCode): String = { + s""" +${eval.code} +if (!${eval.isNull} && (${ev.isNull} || + ${ctx.genGreater(dataType, ev.value, eval.value)})) { + ${ev.isNull} = false; + ${ev.value} = ${eval.value}; +} + """ +} +ev.copy(code = s""" + ${first.code} + boolean ${ev.isNull} = ${first.isNull}; + ${ctx.javaType(dataType)} ${ev.value} = ${first.value}; + ${rest.map(updateEval).mkString("\n")}""") + } +} + +/** + * A function that returns the greatest value of all parameters, skipping null values. + * It takes at least 2 parameters, and returns null iff all parameters are null. + */ +@ExpressionDescription( + usage = "_FUNC_(n1, ...) -
spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least
Repository: spark Updated Branches: refs/heads/branch-2.0 a937c9ee4 -> f190bb83b [SPARK-16850][SQL] Improve type checking error message for greatest/least Greatest/least function does not have the most friendly error message for data types. This patch improves the error message to not show the Seq type, and use more human readable data types. Before: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; line 1 pos 7 ``` After: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7 ``` Manually verified the output and also added unit tests to ConditionalExpressionSuite. Author: petermaxleeCloses #14453 from petermaxlee/SPARK-16850. (cherry picked from commit a1ff72e1cce6f22249ccc4905e8cef30075beb2f) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f190bb83 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f190bb83 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f190bb83 Branch: refs/heads/branch-2.0 Commit: f190bb83beaafb65c8e6290e9ecaa61ac51e04bb Parents: a937c9e Author: petermaxlee Authored: Tue Aug 2 19:32:35 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 2 10:22:18 2016 -0700 -- .../catalyst/expressions/conditionalExpressions.scala | 4 ++-- .../expressions/ConditionalExpressionSuite.scala | 13 + 2 files changed, 15 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala index e97e089..5f2585f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala @@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got LEAST (${children.map(_.dataType)}).") + s" got LEAST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } @@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got GREATEST (${children.map(_.dataType)}).") + s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala index 3c581ec..36185b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala @@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp} import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.types._ @@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper Literal(Timestamp.valueOf("2015-07-01 10:00:00", Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty) +// Type
spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
Repository: spark Updated Branches: refs/heads/master 146001a9f -> 2330f3ecb [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals ## What changes were proposed in this pull request? In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions as literals (without adding braces), for example: ```SQL select /* Spark 1.6: */ current_date, /* Spark 1.6 & Spark 2.0: */ current_date() ``` This was accidentally dropped in Spark 2.0. This PR reinstates this functionality. ## How was this patch tested? Added a case to ExpressionParserSuite. Author: Herman van HovellCloses #14442 from hvanhovell/SPARK-16836. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2330f3ec Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2330f3ec Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2330f3ec Branch: refs/heads/master Commit: 2330f3ecbbd89c7eaab9cc0d06726aa743b16334 Parents: 146001a Author: Herman van Hovell Authored: Tue Aug 2 10:09:47 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 10:09:47 2016 -0700 -- .../org/apache/spark/sql/catalyst/parser/SqlBase.g4| 5 - .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 13 + .../sql/catalyst/parser/ExpressionParserSuite.scala| 5 + .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++- 4 files changed, 32 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 5e10462..c7d5086 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -500,6 +500,7 @@ valueExpression primaryExpression : constant #constantDefault +| name=(CURRENT_DATE | CURRENT_TIMESTAMP) #timeFunctionCall | ASTERISK #star | qualifiedName '.' ASTERISK #star | '(' expression (',' expression)+ ')' #rowConstructor @@ -660,7 +661,7 @@ nonReserved | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | MACRO | OR | STRATIFY | THEN | UNBOUNDED | WHEN -| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT +| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | CURRENT_DATE | CURRENT_TIMESTAMP ; SELECT: 'SELECT'; @@ -880,6 +881,8 @@ OPTION: 'OPTION'; ANTI: 'ANTI'; LOCAL: 'LOCAL'; INPATH: 'INPATH'; +CURRENT_DATE: 'CURRENT_DATE'; +CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP'; STRING : '\'' ( ~('\''|'\\') | ('\\' .) )* '\'' http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index f2cc8d3..679adf2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** + * Create a current timestamp/date expression. These are different from regular function because + * they do not require the user to specify braces when calling them. + */ + override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression = withOrigin(ctx) { +ctx.name.getType match { + case SqlBaseParser.CURRENT_DATE => +CurrentDate() + case SqlBaseParser.CURRENT_TIMESTAMP => +CurrentTimestamp() +} + } + + /** * Create a function database (optional) and name pair. */ protected def visitFunctionName(ctx: QualifiedNameContext): FunctionIdentifier = {
spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
Repository: spark Updated Branches: refs/heads/branch-2.0 ef7927e8e -> a937c9ee4 [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals ## What changes were proposed in this pull request? In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions as literals (without adding braces), for example: ```SQL select /* Spark 1.6: */ current_date, /* Spark 1.6 & Spark 2.0: */ current_date() ``` This was accidentally dropped in Spark 2.0. This PR reinstates this functionality. ## How was this patch tested? Added a case to ExpressionParserSuite. Author: Herman van HovellCloses #14442 from hvanhovell/SPARK-16836. (cherry picked from commit 2330f3ecbbd89c7eaab9cc0d06726aa743b16334) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a937c9ee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a937c9ee Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a937c9ee Branch: refs/heads/branch-2.0 Commit: a937c9ee44e0766194fc8ca4bce2338453112a53 Parents: ef7927e Author: Herman van Hovell Authored: Tue Aug 2 10:09:47 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 10:09:53 2016 -0700 -- .../org/apache/spark/sql/catalyst/parser/SqlBase.g4| 5 - .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 13 + .../sql/catalyst/parser/ExpressionParserSuite.scala| 5 + .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++- 4 files changed, 32 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 4c15f9c..de98a87 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -493,6 +493,7 @@ valueExpression primaryExpression : constant #constantDefault +| name=(CURRENT_DATE | CURRENT_TIMESTAMP) #timeFunctionCall | ASTERISK #star | qualifiedName '.' ASTERISK #star | '(' expression (',' expression)+ ')' #rowConstructor @@ -653,7 +654,7 @@ nonReserved | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | MACRO | OR | STRATIFY | THEN | UNBOUNDED | WHEN -| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT +| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | CURRENT_DATE | CURRENT_TIMESTAMP ; SELECT: 'SELECT'; @@ -873,6 +874,8 @@ OPTION: 'OPTION'; ANTI: 'ANTI'; LOCAL: 'LOCAL'; INPATH: 'INPATH'; +CURRENT_DATE: 'CURRENT_DATE'; +CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP'; STRING : '\'' ( ~('\''|'\\') | ('\\' .) )* '\'' http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index c7420a1..1a0e7ab 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** + * Create a current timestamp/date expression. These are different from regular function because + * they do not require the user to specify braces when calling them. + */ + override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression = withOrigin(ctx) { +ctx.name.getType match { + case SqlBaseParser.CURRENT_DATE => +CurrentDate() + case SqlBaseParser.CURRENT_TIMESTAMP => +CurrentTimestamp() +} + } + + /** * Create a function database (optional) and name pair. */ protected def visitFunctionName(ctx: QualifiedNameContext):
spark git commit: [SPARK-16793][SQL] Set the temporary warehouse path to sc'conf in TestHive.
Repository: spark Updated Branches: refs/heads/master 2eedc00b0 -> 5184df06b [SPARK-16793][SQL] Set the temporary warehouse path to sc'conf in TestHive. ## What changes were proposed in this pull request? With SPARK-15034, we could use the value of spark.sql.warehouse.dir to set the warehouse location. In TestHive, we can now simply set the temporary warehouse path in sc's conf, and thus, param "warehousePath" could be removed. ## How was this patch tested? exsiting testsuites. Author: jiangxingboCloses #14401 from jiangxb1987/warehousePath. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5184df06 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5184df06 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5184df06 Branch: refs/heads/master Commit: 5184df06b347f86776c8ac87415b8002a5942a35 Parents: 2eedc00 Author: jiangxingbo Authored: Mon Aug 1 23:08:06 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 1 23:08:06 2016 -0700 -- .../apache/spark/sql/hive/test/TestHive.scala | 42 +--- .../sql/hive/execution/HiveQuerySuite.scala | 2 +- .../spark/sql/sources/BucketedReadSuite.scala | 2 +- 3 files changed, 21 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5184df06/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala index 7f89204..fbacd59 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala @@ -54,6 +54,7 @@ object TestHive .set("spark.sql.test", "") .set("spark.sql.hive.metastore.barrierPrefixes", "org.apache.spark.sql.hive.execution.PairSerDe") +.set("spark.sql.warehouse.dir", TestHiveContext.makeWarehouseDir().toURI.getPath) // SPARK-8910 .set("spark.ui.enabled", "false"))) @@ -111,7 +112,6 @@ class TestHiveContext( * A [[SparkSession]] used in [[TestHiveContext]]. * * @param sc SparkContext - * @param warehousePath path to the Hive warehouse directory * @param scratchDirPath scratch directory used by Hive's metastore client * @param metastoreTemporaryConf configuration options for Hive's metastore * @param existingSharedState optional [[TestHiveSharedState]] @@ -120,23 +120,15 @@ class TestHiveContext( */ private[hive] class TestHiveSparkSession( @transient private val sc: SparkContext, -val warehousePath: File, scratchDirPath: File, metastoreTemporaryConf: Map[String, String], @transient private val existingSharedState: Option[TestHiveSharedState], private val loadTestTables: Boolean) extends SparkSession(sc) with Logging { self => - // TODO: We need to set the temp warehouse path to sc's conf. - // Right now, In SparkSession, we will set the warehouse path to the default one - // instead of the temp one. Then, we override the setting in TestHiveSharedState - // when we creating metadataHive. This flow is not easy to follow and can introduce - // confusion when a developer is debugging an issue. We need to refactor this part - // to just set the temp warehouse path in sc's conf. def this(sc: SparkContext, loadTestTables: Boolean) { this( sc, - Utils.createTempDir(namePrefix = "warehouse"), TestHiveContext.makeScratchDir(), HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false), None, @@ -151,16 +143,16 @@ private[hive] class TestHiveSparkSession( @transient override lazy val sharedState: TestHiveSharedState = { existingSharedState.getOrElse( - new TestHiveSharedState(sc, warehousePath, scratchDirPath, metastoreTemporaryConf)) + new TestHiveSharedState(sc, scratchDirPath, metastoreTemporaryConf)) } @transient override lazy val sessionState: TestHiveSessionState = -new TestHiveSessionState(self, warehousePath) +new TestHiveSessionState(self) override def newSession(): TestHiveSparkSession = { new TestHiveSparkSession( - sc, warehousePath, scratchDirPath, metastoreTemporaryConf, Some(sharedState), loadTestTables) + sc, scratchDirPath, metastoreTemporaryConf, Some(sharedState), loadTestTables) } private var cacheTables: Boolean = false @@ -199,6 +191,12 @@ private[hive] class TestHiveSparkSession( new File(Thread.currentThread().getContextClassLoader.getResource(path).getFile) } + def getWarehousePath(): String = { +val tempConf = new SQLConf +
spark git commit: [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions
Repository: spark Updated Branches: refs/heads/branch-2.0 1813bbd9b -> 5fbf5f93e [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions https://github.com/apache/spark/pull/14425 rebased for branch-2.0 Author: Eric LiangCloses #14427 from ericl/spark-16818-br-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5fbf5f93 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5fbf5f93 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5fbf5f93 Branch: refs/heads/branch-2.0 Commit: 5fbf5f93ee5aa4d1aca0fa0c8fb769a085dd7b93 Parents: 1813bbd Author: Eric Liang Authored: Mon Aug 1 19:46:20 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 1 19:46:20 2016 -0700 -- .../datasources/FileSourceStrategy.scala| 2 ++ .../datasources/FileSourceStrategySuite.scala | 35 +++- 2 files changed, 36 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5fbf5f93/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala index 13a86bf..8af9562 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala @@ -202,7 +202,9 @@ private[sql] object FileSourceStrategy extends Strategy with Logging { partitions } + // These metadata values make scan plans uniquely identifiable for equality checking. val meta = Map( +"PartitionFilters" -> partitionKeyFilters.mkString("[", ", ", "]"), "Format" -> files.fileFormat.toString, "ReadSchema" -> prunedDataSchema.simpleString, PUSHED_FILTERS -> pushedDownFilters.mkString("[", ", ", "]"), http://git-wip-us.apache.org/repos/asf/spark/blob/5fbf5f93/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala index 8d8a18f..7a24f21 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionSet, PredicateHelper} import org.apache.spark.sql.catalyst.util -import org.apache.spark.sql.execution.DataSourceScanExec +import org.apache.spark.sql.execution.{DataSourceScanExec, SparkPlan} import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.sources._ @@ -407,6 +407,39 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi } } + test("[SPARK-16818] partition pruned file scans implement sameResult correctly") { +withTempPath { path => + val tempDir = path.getCanonicalPath + spark.range(100) +.selectExpr("id", "id as b") +.write +.partitionBy("id") +.parquet(tempDir) + val df = spark.read.parquet(tempDir) + def getPlan(df: DataFrame): SparkPlan = { +df.queryExecution.executedPlan + } + assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2" + assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 3" +} + } + + test("[SPARK-16818] exchange reuse respects differences in partition pruning") { +spark.conf.set("spark.sql.exchange.reuse", true) +withTempPath { path => + val tempDir = path.getCanonicalPath + spark.range(10) +.selectExpr("id % 2 as a", "id % 3 as b", "id as c") +.write +.partitionBy("a") +.parquet(tempDir) + val df = spark.read.parquet(tempDir) + val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum") + val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum") + checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 10, 5) :: Nil) +} + } + // Helpers for checking the arguments passed to the FileFormat. protected val checkPartitionSchema =
spark git commit: [SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package
Repository: spark Updated Branches: refs/heads/branch-2.0 75dd78130 -> d357ca302 [SPARK-16813][SQL] Remove private[sql] and private[spark] from catalyst package The catalyst package is meant to be internal, and as a result it does not make sense to mark things as private[sql] or private[spark]. It simply makes debugging harder when Spark developers need to inspect the plans at runtime. This patch removes all private[sql] and private[spark] visibility modifiers in org.apache.spark.sql.catalyst. N/A - just visibility changes. Author: Reynold Xin <r...@databricks.com> Closes #14418 from rxin/SPARK-16813. (cherry picked from commit 064d91ff7342002414d3274694a8e2e37f154986) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d357ca30 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d357ca30 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d357ca30 Branch: refs/heads/branch-2.0 Commit: d357ca3023c84e472927380bed65b1cee33c4e03 Parents: 75dd781 Author: Reynold Xin <r...@databricks.com> Authored: Sun Jul 31 16:31:06 2016 +0800 Committer: Reynold Xin <r...@databricks.com> Committed: Sun Jul 31 11:10:07 2016 -0700 -- .../spark/sql/catalyst/CatalystTypeConverters.scala | 4 ++-- .../apache/spark/sql/catalyst/ScalaReflection.scala | 2 +- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../spark/sql/catalyst/analysis/TypeCoercion.scala| 2 +- .../spark/sql/catalyst/catalog/SessionCatalog.scala | 6 +++--- .../apache/spark/sql/catalyst/encoders/package.scala | 2 +- .../spark/sql/catalyst/expressions/Expression.scala | 2 +- .../expressions/MonotonicallyIncreasingID.scala | 2 +- .../sql/catalyst/expressions/SparkPartitionID.scala | 2 +- .../catalyst/expressions/aggregate/interfaces.scala | 14 +++--- .../spark/sql/catalyst/expressions/arithmetic.scala | 2 +- .../sql/catalyst/expressions/complexTypeCreator.scala | 4 ++-- .../catalyst/expressions/complexTypeExtractors.scala | 2 +- .../apache/spark/sql/catalyst/expressions/misc.scala | 2 +- .../spark/sql/catalyst/expressions/predicates.scala | 4 ++-- .../apache/spark/sql/catalyst/expressions/rows.scala | 2 +- .../plans/logical/basicLogicalOperators.scala | 6 +++--- .../sql/catalyst/util/AbstractScalaRowIterator.scala | 2 +- 18 files changed, 32 insertions(+), 32 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d357ca30/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala index 9cc7b2a..f542f5c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala @@ -382,7 +382,7 @@ object CatalystTypeConverters { * Typical use case would be converting a collection of rows that have the same schema. You will * call this function once to get a converter, and apply it to every row. */ - private[sql] def createToCatalystConverter(dataType: DataType): Any => Any = { + def createToCatalystConverter(dataType: DataType): Any => Any = { if (isPrimitive(dataType)) { // Although the `else` branch here is capable of handling inbound conversion of primitives, // we add some special-case handling for those types here. The motivation for this relates to @@ -409,7 +409,7 @@ object CatalystTypeConverters { * Typical use case would be converting a collection of rows that have the same schema. You will * call this function once to get a converter, and apply it to every row. */ - private[sql] def createToScalaConverter(dataType: DataType): Any => Any = { + def createToScalaConverter(dataType: DataType): Any => Any = { if (isPrimitive(dataType)) { identity } else { http://git-wip-us.apache.org/repos/asf/spark/blob/d357ca30/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 8affb03..dd36468 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -720,7 +720,7 @@ object
spark git commit: [SPARK-16812] Open up SparkILoop.getAddedJars
Repository: spark Updated Branches: refs/heads/branch-2.0 26da5a7fc -> 75dd78130 [SPARK-16812] Open up SparkILoop.getAddedJars ## What changes were proposed in this pull request? This patch makes SparkILoop.getAddedJars a public developer API. It is a useful function to get the list of jars added. ## How was this patch tested? N/A - this is a simple visibility change. Author: Reynold Xin <r...@databricks.com> Closes #14417 from rxin/SPARK-16812. (cherry picked from commit 7c27d075c39ebaf3e762284e2536fe7be0e3da87) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75dd7813 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75dd7813 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75dd7813 Branch: refs/heads/branch-2.0 Commit: 75dd78130d29154a3147490c57bce6883c992469 Parents: 26da5a7 Author: Reynold Xin <r...@databricks.com> Authored: Sat Jul 30 23:05:03 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Jul 30 23:05:12 2016 -0700 -- .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/75dd7813/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala index 16f330a..e017aa4 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -1059,7 +1059,8 @@ class SparkILoop( @deprecated("Use `process` instead", "2.9.0") private def main(settings: Settings): Unit = process(settings) - private[repl] def getAddedJars(): Array[String] = { + @DeveloperApi + def getAddedJars(): Array[String] = { val conf = new SparkConf().setMaster(getMaster()) val envJars = sys.env.get("ADD_JARS") if (envJars.isDefined) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16812] Open up SparkILoop.getAddedJars
Repository: spark Updated Branches: refs/heads/master 957a8ab37 -> 7c27d075c [SPARK-16812] Open up SparkILoop.getAddedJars ## What changes were proposed in this pull request? This patch makes SparkILoop.getAddedJars a public developer API. It is a useful function to get the list of jars added. ## How was this patch tested? N/A - this is a simple visibility change. Author: Reynold Xin <r...@databricks.com> Closes #14417 from rxin/SPARK-16812. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c27d075 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c27d075 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c27d075 Branch: refs/heads/master Commit: 7c27d075c39ebaf3e762284e2536fe7be0e3da87 Parents: 957a8ab Author: Reynold Xin <r...@databricks.com> Authored: Sat Jul 30 23:05:03 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Jul 30 23:05:03 2016 -0700 -- .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7c27d075/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala index 16f330a..e017aa4 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -1059,7 +1059,8 @@ class SparkILoop( @deprecated("Use `process` instead", "2.9.0") private def main(settings: Settings): Unit = process(settings) - private[repl] def getAddedJars(): Array[String] = { + @DeveloperApi + def getAddedJars(): Array[String] = { val conf = new SparkConf().setMaster(getMaster()) val envJars = sys.env.get("ADD_JARS") if (envJars.isDefined) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions
Repository: spark Updated Branches: refs/heads/master a6290e51e -> 957a8ab37 [SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions ## What changes were proposed in this pull request? This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation. The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying. cc rxin ## How was this patch tested? Unit tests. Author: Eric Liang <e...@databricks.com> Closes #14425 from ericl/spark-16818. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/957a8ab3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/957a8ab3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/957a8ab3 Branch: refs/heads/master Commit: 957a8ab3743521850fb1c0106c37c5d3997b9e56 Parents: a6290e5 Author: Eric Liang <e...@databricks.com> Authored: Sat Jul 30 22:48:09 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Sat Jul 30 22:48:09 2016 -0700 -- .../datasources/FileSourceStrategy.scala| 2 ++ .../datasources/FileSourceStrategySuite.scala | 35 +++- 2 files changed, 36 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/957a8ab3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala index 32aa471..6749130 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala @@ -130,7 +130,9 @@ private[sql] object FileSourceStrategy extends Strategy with Logging { createNonBucketedReadRDD(readFile, selectedPartitions, fsRelation) } + // These metadata values make scan plans uniquely identifiable for equality checking. val meta = Map( +"PartitionFilters" -> partitionKeyFilters.mkString("[", ", ", "]"), "Format" -> fsRelation.fileFormat.toString, "ReadSchema" -> prunedDataSchema.simpleString, PUSHED_FILTERS -> pushedDownFilters.mkString("[", ", ", "]"), http://git-wip-us.apache.org/repos/asf/spark/blob/957a8ab3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala index 2f551b1..1824650 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala @@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.catalog.BucketSpec import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionSet, PredicateHelper} import org.apache.spark.sql.catalyst.util -import org.apache.spark.sql.execution.DataSourceScanExec +import org.apache.spark.sql.execution.{DataSourceScanExec, SparkPlan} import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.sources._ @@ -408,6 +408,39 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi } } + test("[SPARK-16818] partition pruned file scans implement sameResult correctly") { +withTempPath { path => + val tempDir = path.getCanonicalPath + spark.range(100) +.selectExpr("id", "id as b") +.write +.partitionBy("id") +.parquet(tempDir) + val df = spark.read.parquet(tempDir) + def getPlan(df: DataFrame): SparkPlan = { +df.queryExecution.executedPlan + } + assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2" + assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 3" +} + } + + test("[SPARK-1681
spark git commit: [SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings
Repository: spark Updated Branches: refs/heads/master 2c15323ad -> 2182e4322 [SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings ## What changes were proposed in this pull request? This PR corrects [an error made in an earlier PR](https://github.com/apache/spark/pull/14393/files#r72843069). ## How was this patch tested? ```sh $ ./dev/lint-python PEP8 checks passed. rm -rf _build/* pydoc checks passed. ``` I also built the docs and confirmed that they looked good in my browser. Author: Nicholas ChammasCloses #14408 from nchammas/SPARK-16772. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2182e432 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2182e432 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2182e432 Branch: refs/heads/master Commit: 2182e4322da6ba732f99ae75dce00f76f1cdc4d9 Parents: 2c15323 Author: Nicholas Chammas Authored: Fri Jul 29 14:07:03 2016 -0700 Committer: Reynold Xin Committed: Fri Jul 29 14:07:03 2016 -0700 -- python/pyspark/sql/context.py | 10 -- python/pyspark/sql/session.py | 10 -- 2 files changed, 8 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2182e432/python/pyspark/sql/context.py -- diff --git a/python/pyspark/sql/context.py b/python/pyspark/sql/context.py index f7009fe..4085f16 100644 --- a/python/pyspark/sql/context.py +++ b/python/pyspark/sql/context.py @@ -226,9 +226,8 @@ class SQLContext(object): from ``data``, which should be an RDD of :class:`Row`, or :class:`namedtuple`, or :class:`dict`. -When ``schema`` is :class:`pyspark.sql.types.DataType` or -:class:`pyspark.sql.types.StringType`, it must match the -real data, or an exception will be thrown at runtime. If the given schema is not +When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype string it must match +the real data, or an exception will be thrown at runtime. If the given schema is not :class:`pyspark.sql.types.StructType`, it will be wrapped into a :class:`pyspark.sql.types.StructType` as its only field, and the field name will be "value", each record will also be wrapped into a tuple, which can be converted to row later. @@ -239,8 +238,7 @@ class SQLContext(object): :param data: an RDD of any kind of SQL data representation(e.g. :class:`Row`, :class:`tuple`, ``int``, ``boolean``, etc.), or :class:`list`, or :class:`pandas.DataFrame`. -:param schema: a :class:`pyspark.sql.types.DataType` or a -:class:`pyspark.sql.types.StringType` or a list of +:param schema: a :class:`pyspark.sql.types.DataType` or a datatype string or a list of column names, default is None. The data type string format equals to :class:`pyspark.sql.types.DataType.simpleString`, except that top level struct type can omit the ``struct<>`` and atomic types use ``typeName()`` as their format, e.g. use @@ -251,7 +249,7 @@ class SQLContext(object): .. versionchanged:: 2.0 The ``schema`` parameter can be a :class:`pyspark.sql.types.DataType` or a - :class:`pyspark.sql.types.StringType` after 2.0. + datatype string after 2.0. If it's not a :class:`pyspark.sql.types.StructType`, it will be wrapped into a :class:`pyspark.sql.types.StructType` and each record will also be wrapped into a tuple. http://git-wip-us.apache.org/repos/asf/spark/blob/2182e432/python/pyspark/sql/session.py -- diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 10bd89b..2dacf48 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -414,9 +414,8 @@ class SparkSession(object): from ``data``, which should be an RDD of :class:`Row`, or :class:`namedtuple`, or :class:`dict`. -When ``schema`` is :class:`pyspark.sql.types.DataType` or -:class:`pyspark.sql.types.StringType`, it must match the -real data, or an exception will be thrown at runtime. If the given schema is not +When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype string, it must match +the real data, or an exception will be thrown at runtime. If the given schema is not :class:`pyspark.sql.types.StructType`, it will be wrapped into a :class:`pyspark.sql.types.StructType` as its only field, and the field name will be "value", each record will also be