spark git commit: [SPARKR][MINOR] Update R DESCRIPTION file
Repository: spark Updated Branches: refs/heads/branch-2.0 eaea1c86b -> d16f9a0b7 [SPARKR][MINOR] Update R DESCRIPTION file ## What changes were proposed in this pull request? Update DESCRIPTION ## How was this patch tested? Run install and CRAN tests Author: Felix CheungCloses #14764 from felixcheung/rpackagedescription. (cherry picked from commit d2b3d3e63e1a9217de6ef507c350308017664a62) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d16f9a0b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d16f9a0b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d16f9a0b Branch: refs/heads/branch-2.0 Commit: d16f9a0b7c464728d7b11899740908e23820a797 Parents: eaea1c8 Author: Felix Cheung Authored: Mon Aug 22 20:15:03 2016 -0700 Committer: Xiangrui Meng Committed: Mon Aug 22 20:15:14 2016 -0700 -- R/pkg/DESCRIPTION | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d16f9a0b/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index d81f1a3..e5afed2 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -3,10 +3,15 @@ Type: Package Title: R Frontend for Apache Spark Version: 2.0.0 Date: 2016-07-07 -Author: The Apache Software Foundation -Maintainer: Shivaram Venkataraman -Xiangrui Meng -Felix Cheung +Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), +email = "shiva...@cs.berkeley.edu"), + person("Xiangrui", "Meng", role = "aut", +email = "m...@databricks.com"), + person("Felix", "Cheung", role = "aut", +email = "felixche...@apache.org"), + person(family = "The Apache Software Foundation", role = c("aut", "cph"))) +URL: http://www.apache.org/ http://spark.apache.org/ +BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4 Depends: R (>= 3.0), methods - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic
Repository: spark Updated Branches: refs/heads/branch-2.0 225898961 -> eaea1c86b [SPARK-17182][SQL] Mark Collect as non-deterministic ## What changes were proposed in this pull request? This PR marks the abstract class `Collect` as non-deterministic since the results of `CollectList` and `CollectSet` depend on the actual order of input rows. ## How was this patch tested? Existing test cases should be enough. Author: Cheng LianCloses #14749 from liancheng/spark-17182-non-deterministic-collect. (cherry picked from commit 2cdd92a7cd6f85186c846635b422b977bdafbcdd) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaea1c86 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eaea1c86 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eaea1c86 Branch: refs/heads/branch-2.0 Commit: eaea1c86b897d302107a9b6833a27a2b24ca31a0 Parents: 2258989 Author: Cheng Lian Authored: Tue Aug 23 09:11:47 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 23 09:14:47 2016 +0800 -- .../spark/sql/catalyst/expressions/aggregate/collect.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/eaea1c86/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index ac2cefa..896ff61 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -54,6 +54,10 @@ abstract class Collect extends ImperativeAggregate { override def inputAggBufferAttributes: Seq[AttributeReference] = Nil + // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the + // actual order of input rows. + override def deterministic: Boolean = false + protected[this] val buffer: Growable[Any] with Iterable[Any] override def initialize(b: MutableRow): Unit = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic
Repository: spark Updated Branches: refs/heads/master 920806ab2 -> 2cdd92a7c [SPARK-17182][SQL] Mark Collect as non-deterministic ## What changes were proposed in this pull request? This PR marks the abstract class `Collect` as non-deterministic since the results of `CollectList` and `CollectSet` depend on the actual order of input rows. ## How was this patch tested? Existing test cases should be enough. Author: Cheng LianCloses #14749 from liancheng/spark-17182-non-deterministic-collect. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cdd92a7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cdd92a7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cdd92a7 Branch: refs/heads/master Commit: 2cdd92a7cd6f85186c846635b422b977bdafbcdd Parents: 920806a Author: Cheng Lian Authored: Tue Aug 23 09:11:47 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 23 09:11:47 2016 +0800 -- .../spark/sql/catalyst/expressions/aggregate/collect.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2cdd92a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index ac2cefa..896ff61 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -54,6 +54,10 @@ abstract class Collect extends ImperativeAggregate { override def inputAggBufferAttributes: Seq[AttributeReference] = Nil + // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the + // actual order of input rows. + override def deterministic: Boolean = false + protected[this] val buffer: Growable[Any] with Iterable[Any] override def initialize(b: MutableRow): Unit = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh
Repository: spark Updated Branches: refs/heads/branch-2.0 ff2f87380 -> 225898961 [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram VenkataramanCloses #14759 from shivaram/sparkr-cran-jenkins. (cherry picked from commit 920806ab272ba58a369072a5eeb89df5e9b470a6) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22589896 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22589896 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22589896 Branch: refs/heads/branch-2.0 Commit: 225898961bc4bc71d56f33c027adbb2d0929ae5a Parents: ff2f873 Author: Shivaram Venkataraman Authored: Mon Aug 22 17:09:32 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 17:09:44 2016 -0700 -- R/check-cran.sh | 18 +++--- R/run-tests.sh | 27 --- 2 files changed, 39 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/22589896/R/check-cran.sh -- diff --git a/R/check-cran.sh b/R/check-cran.sh index 5c90fd0..bb33146 100755 --- a/R/check-cran.sh +++ b/R/check-cran.sh @@ -43,10 +43,22 @@ $FWDIR/create-docs.sh "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg # Run check as-cran. -# TODO(shivaram): Remove the skip tests once we figure out the install mechanism - VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'` -"$R_SCRIPT_PATH/"R CMD check --as-cran SparkR_"$VERSION".tar.gz +CRAN_CHECK_OPTIONS="--as-cran" + +if [ -n "$NO_TESTS" ] +then + CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests" +fi + +if [ -n "$NO_MANUAL" ] +then + CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual" +fi + +echo "Running CRAN check with $CRAN_CHECK_OPTIONS options" + +"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz popd > /dev/null http://git-wip-us.apache.org/repos/asf/spark/blob/22589896/R/run-tests.sh -- diff --git a/R/run-tests.sh b/R/run-tests.sh index 9dcf0ac..1a1e8ab 100755 --- a/R/run-tests.sh +++ b/R/run-tests.sh @@ -26,6 +26,17 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE + +NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE +FAILED=$((PIPESTATUS[0]||$FAILED)) + +NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)" + if [[ $FAILED != 0 ]]; then cat $LOGFILE echo -en "\033[31m" # Red @@ -33,7 +44,17 @@ if [[ $FAILED != 0 ]]; then echo -en "\033[0m" # No color exit -1 else -echo -en "\033[32m" # Green -echo "Tests passed." -echo -en "\033[0m" # No color +# We have 2 existing NOTEs for new maintainer, attach() +# We have one more NOTE in Jenkins due to "No repository set" +if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then + cat $CRAN_CHECK_LOG_FILE + echo -en "\033[31m" # Red + echo "Had CRAN check errors; see logs." + echo -en "\033[0m" # No color + exit -1 +else + echo -en "\033[32m" # Green + echo "Tests passed." + echo -en "\033[0m" # No color +fi fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh
Repository: spark Updated Branches: refs/heads/master 37f0ab70d -> 920806ab2 [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram VenkataramanCloses #14759 from shivaram/sparkr-cran-jenkins. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/920806ab Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/920806ab Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/920806ab Branch: refs/heads/master Commit: 920806ab272ba58a369072a5eeb89df5e9b470a6 Parents: 37f0ab7 Author: Shivaram Venkataraman Authored: Mon Aug 22 17:09:32 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 17:09:32 2016 -0700 -- R/check-cran.sh | 18 +++--- R/run-tests.sh | 27 --- 2 files changed, 39 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/920806ab/R/check-cran.sh -- diff --git a/R/check-cran.sh b/R/check-cran.sh index 5c90fd0..bb33146 100755 --- a/R/check-cran.sh +++ b/R/check-cran.sh @@ -43,10 +43,22 @@ $FWDIR/create-docs.sh "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg # Run check as-cran. -# TODO(shivaram): Remove the skip tests once we figure out the install mechanism - VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'` -"$R_SCRIPT_PATH/"R CMD check --as-cran SparkR_"$VERSION".tar.gz +CRAN_CHECK_OPTIONS="--as-cran" + +if [ -n "$NO_TESTS" ] +then + CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests" +fi + +if [ -n "$NO_MANUAL" ] +then + CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual" +fi + +echo "Running CRAN check with $CRAN_CHECK_OPTIONS options" + +"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz popd > /dev/null http://git-wip-us.apache.org/repos/asf/spark/blob/920806ab/R/run-tests.sh -- diff --git a/R/run-tests.sh b/R/run-tests.sh index 9dcf0ac..1a1e8ab 100755 --- a/R/run-tests.sh +++ b/R/run-tests.sh @@ -26,6 +26,17 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE + +NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE +FAILED=$((PIPESTATUS[0]||$FAILED)) + +NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)" + if [[ $FAILED != 0 ]]; then cat $LOGFILE echo -en "\033[31m" # Red @@ -33,7 +44,17 @@ if [[ $FAILED != 0 ]]; then echo -en "\033[0m" # No color exit -1 else -echo -en "\033[32m" # Green -echo "Tests passed." -echo -en "\033[0m" # No color +# We have 2 existing NOTEs for new maintainer, attach() +# We have one more NOTE in Jenkins due to "No repository set" +if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then + cat $CRAN_CHECK_LOG_FILE + echo -en "\033[31m" # Red + echo "Had CRAN check errors; see logs." + echo -en "\033[0m" # No color + exit -1 +else + echo -en "\033[32m" # Green + echo "Tests passed." + echo -en "\033[0m" # No color +fi fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen
Repository: spark Updated Branches: refs/heads/master 6d93f9e02 -> 37f0ab70d [SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen ## What changes were proposed in this pull request? Add expert param support to SharedParamsCodeGen where aggregationDepth a expert param is added. Author: hqzizaniaCloses #14738 from hqzizania/SPARK-17090-minor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37f0ab70 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37f0ab70 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37f0ab70 Branch: refs/heads/master Commit: 37f0ab70d25802b609317bc93421d2fe3ee9db6e Parents: 6d93f9e Author: hqzizania Authored: Mon Aug 22 17:09:08 2016 -0700 Committer: Yanbo Liang Committed: Mon Aug 22 17:09:08 2016 -0700 -- .../spark/ml/param/shared/SharedParamsCodeGen.scala | 14 ++ .../apache/spark/ml/param/shared/sharedParams.scala | 4 ++-- 2 files changed, 12 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/37f0ab70/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala index 0f48a16..480b03d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala @@ -80,7 +80,7 @@ private[shared] object SharedParamsCodeGen { ParamDesc[String]("solver", "the solver algorithm for optimization. If this is not set or " + "empty, default value is 'auto'", Some("\"auto\"")), ParamDesc[Int]("aggregationDepth", "suggested depth for treeAggregate (>= 2)", Some("2"), -isValid = "ParamValidators.gtEq(2)")) +isValid = "ParamValidators.gtEq(2)", isExpertParam = true)) val code = genSharedParams(params) val file = "src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala" @@ -95,7 +95,8 @@ private[shared] object SharedParamsCodeGen { doc: String, defaultValueStr: Option[String] = None, isValid: String = "", - finalMethods: Boolean = true) { + finalMethods: Boolean = true, + isExpertParam: Boolean = false) { require(name.matches("[a-z][a-zA-Z0-9]*"), s"Param name $name is invalid.") require(doc.nonEmpty) // TODO: more rigorous on doc @@ -153,6 +154,11 @@ private[shared] object SharedParamsCodeGen { } else { "" } +val groupStr = if (param.isExpertParam) { + Array("expertParam", "expertGetParam") +} else { + Array("param", "getParam") +} val methodStr = if (param.finalMethods) { "final def" } else { @@ -167,11 +173,11 @@ private[shared] object SharedParamsCodeGen { | | /** | * Param for $doc. - | * @group param + | * @group ${groupStr(0)} | */ | final val $name: $Param = new $Param(this, "$name", "$doc"$isValid) |$setDefault - | /** @group getParam */ + | /** @group ${groupStr(1)} */ | $methodStr get$Name: $T = $$($name) |} |""".stripMargin http://git-wip-us.apache.org/repos/asf/spark/blob/37f0ab70/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala index 6803772..9125d9e 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala @@ -397,13 +397,13 @@ private[ml] trait HasAggregationDepth extends Params { /** * Param for suggested depth for treeAggregate (>= 2). - * @group param + * @group expertParam */ final val aggregationDepth: IntParam = new IntParam(this, "aggregationDepth", "suggested depth for treeAggregate (>= 2)", ParamValidators.gtEq(2)) setDefault(aggregationDepth, 2) - /** @group getParam */ + /** @group expertGetParam */ final def getAggregationDepth: Int = $(aggregationDepth) } // scalastyle:on - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan
Repository: spark Updated Branches: refs/heads/master 8e223ea67 -> 6d93f9e02 [SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan ## What changes were proposed in this pull request? `CreateHiveTableAsSelectLogicalPlan` is a dead code after refactoring. ## How was this patch tested? N/A Author: gatorsmileCloses #14707 from gatorsmile/removeCreateHiveTable. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6d93f9e0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6d93f9e0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6d93f9e0 Branch: refs/heads/master Commit: 6d93f9e0236aa61e39a1abfb0f7f7c558fb7d5d5 Parents: 8e223ea Author: gatorsmile Authored: Tue Aug 23 08:03:08 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 23 08:03:08 2016 +0800 -- .../spark/sql/execution/command/tables.scala | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6d93f9e0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index af2b5ff..21544a3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -33,28 +33,11 @@ import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, CatalogT import org.apache.spark.sql.catalyst.catalog.CatalogTableType._ import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} -import org.apache.spark.sql.catalyst.plans.logical.{Command, LogicalPlan, UnaryNode} import org.apache.spark.sql.catalyst.util.quoteIdentifier -import org.apache.spark.sql.execution.datasources.{PartitioningUtils} +import org.apache.spark.sql.execution.datasources.PartitioningUtils import org.apache.spark.sql.types._ import org.apache.spark.util.Utils -case class CreateHiveTableAsSelectLogicalPlan( -tableDesc: CatalogTable, -child: LogicalPlan, -allowExisting: Boolean) extends UnaryNode with Command { - - override def output: Seq[Attribute] = Seq.empty[Attribute] - - override lazy val resolved: Boolean = -tableDesc.identifier.database.isDefined && - tableDesc.schema.nonEmpty && - tableDesc.storage.serde.isDefined && - tableDesc.storage.inputFormat.isDefined && - tableDesc.storage.outputFormat.isDefined && - childrenResolved -} - /** * A command to create a table with the same definition of the given existing table. * - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication
Repository: spark Updated Branches: refs/heads/master 71afeeea4 -> 8e223ea67 [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication ## What changes were proposed in this pull request? This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042 ## How was this patch tested? End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch). Author: Eric LiangCloses #14311 from ericl/spark-16550. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e223ea6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e223ea6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e223ea6 Branch: refs/heads/master Commit: 8e223ea67acf5aa730ccf688802f17f6fc10907c Parents: 71afeee Author: Eric Liang Authored: Mon Aug 22 16:32:14 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 16:32:14 2016 -0700 -- .../spark/serializer/SerializerManager.scala| 14 +++- .../org/apache/spark/storage/BlockManager.scala | 13 +++- .../org/apache/spark/DistributedSuite.scala | 77 ++-- .../scala/org/apache/spark/repl/ReplSuite.scala | 14 4 files changed, 60 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala index 9dc274c..07caadb 100644 --- a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala +++ b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala @@ -68,7 +68,7 @@ private[spark] class SerializerManager(defaultSerializer: Serializer, conf: Spar * loaded yet. */ private lazy val compressionCodec: CompressionCodec = CompressionCodec.createCodec(conf) - private def canUseKryo(ct: ClassTag[_]): Boolean = { + def canUseKryo(ct: ClassTag[_]): Boolean = { primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag } @@ -128,8 +128,18 @@ private[spark] class SerializerManager(defaultSerializer: Serializer, conf: Spar /** Serializes into a chunked byte buffer. */ def dataSerialize[T: ClassTag](blockId: BlockId, values: Iterator[T]): ChunkedByteBuffer = { +dataSerializeWithExplicitClassTag(blockId, values, implicitly[ClassTag[T]]) + } + + /** Serializes into a chunked byte buffer. */ + def dataSerializeWithExplicitClassTag( + blockId: BlockId, + values: Iterator[_], + classTag: ClassTag[_]): ChunkedByteBuffer = { val bbos = new ChunkedByteBufferOutputStream(1024 * 1024 * 4, ByteBuffer.allocate) -dataSerializeStream(blockId, bbos, values) +val byteStream = new BufferedOutputStream(bbos) +val ser = getSerializer(classTag).newInstance() +ser.serializeStream(wrapForCompression(blockId, byteStream)).writeAll(values).close() bbos.toChunkedByteBuffer } http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index 015e71d..fe84652 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -498,7 +498,8 @@ private[spark] class BlockManager( diskStore.getBytes(blockId) } else if (level.useMemory && memoryStore.contains(blockId)) { // The block was not found on disk, so serialize an in-memory copy: -serializerManager.dataSerialize(blockId, memoryStore.getValues(blockId).get) +serializerManager.dataSerializeWithExplicitClassTag( + blockId, memoryStore.getValues(blockId).get, info.classTag) } else { handleLocalReadFailure(blockId) } @@ -973,8 +974,16 @@ private[spark] class BlockManager( if (level.replication > 1) { val remoteStartTime = System.currentTimeMillis val bytesToReplicate = doGetLocalBytes(blockId, info) + // [SPARK-16550] Erase the typed classTag when using default serialization, since + // NettyBlockRpcServer crashes when deserializing repl-defined classes. + // TODO(ekl) remove this once the classloader issue on
spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes
Repository: spark Updated Branches: refs/heads/branch-2.0 01a4d69f3 -> b65b041af [SPARK-16508][SPARKR] doc updates and more CRAN check fixes replace ``` ` ``` in code doc with `\code{thing}` remove added `...` for drop(DataFrame) fix remaining CRAN check warnings create doc with knitr junyangq Author: Felix CheungCloses #14734 from felixcheung/rdoccleanup. (cherry picked from commit 71afeeea4ec8e67edc95b5d504c557c88a2598b9) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b65b041a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b65b041a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b65b041a Branch: refs/heads/branch-2.0 Commit: b65b041af8b64413c7d460d4ea110b2044d6f36e Parents: 01a4d69 Author: Felix Cheung Authored: Mon Aug 22 15:53:10 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 16:17:18 2016 -0700 -- R/pkg/NAMESPACE | 6 - R/pkg/R/DataFrame.R | 69 +++ R/pkg/R/RDD.R| 10 +++ R/pkg/R/SQLContext.R | 30 ++--- R/pkg/R/WindowSpec.R | 23 R/pkg/R/column.R | 2 +- R/pkg/R/functions.R | 36 - R/pkg/R/generics.R | 14 +- R/pkg/R/group.R | 1 + R/pkg/R/mllib.R | 5 ++-- R/pkg/R/pairRDD.R| 6 ++--- R/pkg/R/stats.R | 14 +- 12 files changed, 110 insertions(+), 106 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b65b041a/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index aaab92f..cdb8834 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -1,5 +1,9 @@ # Imports from base R -importFrom(methods, setGeneric, setMethod, setOldClass) +# Do not include stats:: "rpois", "runif" - causes error at runtime +importFrom("methods", "setGeneric", "setMethod", "setOldClass") +importFrom("methods", "is", "new", "signature", "show") +importFrom("stats", "gaussian", "setNames") +importFrom("utils", "download.file", "packageVersion", "untar") # Disable native libraries till we figure out how to package it # See SPARKR-7839 http://git-wip-us.apache.org/repos/asf/spark/blob/b65b041a/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 0266939..f8a05c6 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -150,7 +150,7 @@ setMethod("explain", #' isLocal #' -#' Returns True if the `collect` and `take` methods can be run locally +#' Returns True if the \code{collect} and \code{take} methods can be run locally #' (without any Spark executors). #' #' @param x A SparkDataFrame @@ -635,10 +635,10 @@ setMethod("unpersist", #' The following options for repartition are possible: #' \itemize{ #' \item{1.} {Return a new SparkDataFrame partitioned by -#' the given columns into `numPartitions`.} -#' \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.} +#' the given columns into \code{numPartitions}.} +#' \item{2.} {Return a new SparkDataFrame that has exactly \code{numPartitions}.} #' \item{3.} {Return a new SparkDataFrame partitioned by the given column(s), -#' using `spark.sql.shuffle.partitions` as number of partitions.} +#' using \code{spark.sql.shuffle.partitions} as number of partitions.} #'} #' @param x a SparkDataFrame. #' @param numPartitions the number of partitions to use. @@ -1125,9 +1125,8 @@ setMethod("take", #' Head #' -#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is NULL, -#' then head() returns the first 6 rows in keeping with the current data.frame -#' convention in R. +#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If \code{num} is not +#' specified, then head() returns the first 6 rows as with R data.frame. #' #' @param x a SparkDataFrame. #' @param num the number of rows to return. Default is 6. @@ -1399,11 +1398,11 @@ setMethod("dapplyCollect", #' #' @param cols grouping columns. #' @param func a function to be applied to each group partition specified by grouping -#' column of the SparkDataFrame. The function `func` takes as argument +#' column of the SparkDataFrame. The function \code{func} takes as argument #' a key - grouping columns and a data frame - a local R data.frame. -#' The output of `func` is a local R data.frame. +#' The output of \code{func} is a local R data.frame. #'
spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes
Repository: spark Updated Branches: refs/heads/master 84770b59f -> 71afeeea4 [SPARK-16508][SPARKR] doc updates and more CRAN check fixes ## What changes were proposed in this pull request? replace ``` ` ``` in code doc with `\code{thing}` remove added `...` for drop(DataFrame) fix remaining CRAN check warnings ## How was this patch tested? create doc with knitr junyangq Author: Felix CheungCloses #14734 from felixcheung/rdoccleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/71afeeea Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/71afeeea Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/71afeeea Branch: refs/heads/master Commit: 71afeeea4ec8e67edc95b5d504c557c88a2598b9 Parents: 84770b5 Author: Felix Cheung Authored: Mon Aug 22 15:53:10 2016 -0700 Committer: Felix Cheung Committed: Mon Aug 22 15:53:10 2016 -0700 -- R/pkg/NAMESPACE | 6 +++- R/pkg/R/DataFrame.R | 71 +++ R/pkg/R/RDD.R| 10 +++ R/pkg/R/SQLContext.R | 30 ++-- R/pkg/R/WindowSpec.R | 23 +++ R/pkg/R/column.R | 2 +- R/pkg/R/functions.R | 36 R/pkg/R/generics.R | 15 +- R/pkg/R/group.R | 1 + R/pkg/R/mllib.R | 19 +++-- R/pkg/R/pairRDD.R| 6 ++-- R/pkg/R/stats.R | 14 +- 12 files changed, 119 insertions(+), 114 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/71afeeea/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index e1b87b2..7090576 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -1,5 +1,9 @@ # Imports from base R -importFrom(methods, setGeneric, setMethod, setOldClass) +# Do not include stats:: "rpois", "runif" - causes error at runtime +importFrom("methods", "setGeneric", "setMethod", "setOldClass") +importFrom("methods", "is", "new", "signature", "show") +importFrom("stats", "gaussian", "setNames") +importFrom("utils", "download.file", "packageVersion", "untar") # Disable native libraries till we figure out how to package it # See SPARKR-7839 http://git-wip-us.apache.org/repos/asf/spark/blob/71afeeea/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 540dc31..52a6628 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -150,7 +150,7 @@ setMethod("explain", #' isLocal #' -#' Returns True if the `collect` and `take` methods can be run locally +#' Returns True if the \code{collect} and \code{take} methods can be run locally #' (without any Spark executors). #' #' @param x A SparkDataFrame @@ -182,7 +182,7 @@ setMethod("isLocal", #' @param numRows the number of rows to print. Defaults to 20. #' @param truncate whether truncate long strings. If \code{TRUE}, strings more than #' 20 characters will be truncated. However, if set greater than zero, -#' truncates strings longer than `truncate` characters and all cells +#' truncates strings longer than \code{truncate} characters and all cells #' will be aligned right. #' @param ... further arguments to be passed to or from other methods. #' @family SparkDataFrame functions @@ -642,10 +642,10 @@ setMethod("unpersist", #' The following options for repartition are possible: #' \itemize{ #' \item{1.} {Return a new SparkDataFrame partitioned by -#' the given columns into `numPartitions`.} -#' \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.} +#' the given columns into \code{numPartitions}.} +#' \item{2.} {Return a new SparkDataFrame that has exactly \code{numPartitions}.} #' \item{3.} {Return a new SparkDataFrame partitioned by the given column(s), -#' using `spark.sql.shuffle.partitions` as number of partitions.} +#' using \code{spark.sql.shuffle.partitions} as number of partitions.} #'} #' @param x a SparkDataFrame. #' @param numPartitions the number of partitions to use. @@ -1132,9 +1132,8 @@ setMethod("take", #' Head #' -#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is NULL, -#' then head() returns the first 6 rows in keeping with the current data.frame -#' convention in R. +#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If \code{num} is not +#' specified, then head() returns the first 6 rows as with R data.frame. #' #' @param x a SparkDataFrame. #' @param num the number of rows to return. Default is 6. @@ -1406,11 +1405,11
spark git commit: [SPARK-17162] Range does not support SQL generation
Repository: spark Updated Branches: refs/heads/branch-2.0 6dcc1a3f0 -> 01a4d69f3 [SPARK-17162] Range does not support SQL generation ## What changes were proposed in this pull request? The range operator previously didn't support SQL generation, which made it not possible to use in views. ## How was this patch tested? Unit tests. cc hvanhovell Author: Eric LiangCloses #14724 from ericl/spark-17162. (cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/01a4d69f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/01a4d69f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/01a4d69f Branch: refs/heads/branch-2.0 Commit: 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8 Parents: 6dcc1a3 Author: Eric Liang Authored: Mon Aug 22 15:48:35 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 15:48:43 2016 -0700 -- .../analysis/ResolveTableValuedFunctions.scala | 11 -- .../plans/logical/basicLogicalOperators.scala | 21 +--- .../apache/spark/sql/catalyst/SQLBuilder.scala | 3 +++ .../sql/execution/basicPhysicalOperators.scala | 2 +- .../spark/sql/execution/command/views.scala | 3 +-- sql/hive/src/test/resources/sqlgen/range.sql| 4 .../test/resources/sqlgen/range_with_splits.sql | 4 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 - 8 files changed, 44 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index 7fdf7fa..6b3bb68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} * Rule that resolves table-valued function references. */ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { - private lazy val defaultParallelism = -SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism - /** * List of argument names and their types, used to declare a function. */ @@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { "range" -> Map( /* range(end) */ tvf("end" -> LongType) { case Seq(end: Long) => -Range(0, end, 1, defaultParallelism) +Range(0, end, 1, None) }, /* range(start, end) */ tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: Long) => -Range(start, end, 1, defaultParallelism) +Range(start, end, 1, None) }, /* range(start, end, step) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) { case Seq(start: Long, end: Long, step: Long) => - Range(start, end, step, defaultParallelism) + Range(start, end, step, None) }, /* range(start, end, step, numPartitions) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType, "numPartitions" -> IntegerType) { case Seq(start: Long, end: Long, step: Long, numPartitions: Int) => - Range(start, end, step, numPartitions) + Range(start, end, step, Some(numPartitions)) }) ) http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index eb612c4..07e39b0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -422,17 +422,20 @@ case class Sort( /** Factory for constructing new `Range` nodes. */ object Range { - def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = { + def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { val
spark git commit: [SPARK-17162] Range does not support SQL generation
Repository: spark Updated Branches: refs/heads/master 929cb8bee -> 84770b59f [SPARK-17162] Range does not support SQL generation ## What changes were proposed in this pull request? The range operator previously didn't support SQL generation, which made it not possible to use in views. ## How was this patch tested? Unit tests. cc hvanhovell Author: Eric LiangCloses #14724 from ericl/spark-17162. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/84770b59 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/84770b59 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/84770b59 Branch: refs/heads/master Commit: 84770b59f773f132073cd2af4204957fc2d7bf35 Parents: 929cb8b Author: Eric Liang Authored: Mon Aug 22 15:48:35 2016 -0700 Committer: Reynold Xin Committed: Mon Aug 22 15:48:35 2016 -0700 -- .../analysis/ResolveTableValuedFunctions.scala | 11 -- .../plans/logical/basicLogicalOperators.scala | 21 +--- .../apache/spark/sql/catalyst/SQLBuilder.scala | 3 +++ .../sql/execution/basicPhysicalOperators.scala | 2 +- .../spark/sql/execution/command/views.scala | 3 +-- sql/hive/src/test/resources/sqlgen/range.sql| 4 .../test/resources/sqlgen/range_with_splits.sql | 4 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 - 8 files changed, 44 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index 7fdf7fa..6b3bb68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} * Rule that resolves table-valued function references. */ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { - private lazy val defaultParallelism = -SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism - /** * List of argument names and their types, used to declare a function. */ @@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { "range" -> Map( /* range(end) */ tvf("end" -> LongType) { case Seq(end: Long) => -Range(0, end, 1, defaultParallelism) +Range(0, end, 1, None) }, /* range(start, end) */ tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: Long) => -Range(start, end, 1, defaultParallelism) +Range(start, end, 1, None) }, /* range(start, end, step) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) { case Seq(start: Long, end: Long, step: Long) => - Range(start, end, step, defaultParallelism) + Range(start, end, step, None) }, /* range(start, end, step, numPartitions) */ tvf("start" -> LongType, "end" -> LongType, "step" -> LongType, "numPartitions" -> IntegerType) { case Seq(start: Long, end: Long, step: Long, numPartitions: Int) => - Range(start, end, step, numPartitions) + Range(start, end, step, Some(numPartitions)) }) ) http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index af1736e..010aec7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -422,17 +422,20 @@ case class Sort( /** Factory for constructing new `Range` nodes. */ object Range { - def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = { + def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes new Range(start, end, step,
spark git commit: [MINOR][SQL] Fix some typos in comments and test hints
Repository: spark Updated Branches: refs/heads/master 6f3cd36f9 -> 929cb8bee [MINOR][SQL] Fix some typos in comments and test hints ## What changes were proposed in this pull request? Fix some typos in comments and test hints ## How was this patch tested? N/A. Author: Sean ZhongCloses #14755 from clockfly/fix_minor_typo. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/929cb8be Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/929cb8be Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/929cb8be Branch: refs/heads/master Commit: 929cb8beed9b7014231580cc002853236a5337d6 Parents: 6f3cd36 Author: Sean Zhong Authored: Mon Aug 22 13:31:38 2016 -0700 Committer: Yin Huai Committed: Mon Aug 22 13:31:38 2016 -0700 -- .../org/apache/spark/sql/execution/UnsafeKVExternalSorter.java | 2 +- .../sql/execution/aggregate/TungstenAggregationIterator.scala | 6 +++--- sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java index eb105bd..0d51dc9 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java @@ -99,7 +99,7 @@ public final class UnsafeKVExternalSorter { // The array will be used to do in-place sort, which require half of the space to be empty. assert(map.numKeys() <= map.getArray().size() / 2); // During spilling, the array in map will not be used, so we can borrow that and use it - // as the underline array for in-memory sorter (it's always large enough). + // as the underlying array for in-memory sorter (it's always large enough). // Since we will not grow the array, it's fine to pass `null` as consumer. final UnsafeInMemorySorter inMemSorter = new UnsafeInMemorySorter( null, taskMemoryManager, recordComparator, prefixComparator, map.getArray(), http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala index 4b8adf5..4e072a9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala @@ -32,9 +32,9 @@ import org.apache.spark.unsafe.KVIterator * An iterator used to evaluate aggregate functions. It operates on [[UnsafeRow]]s. * * This iterator first uses hash-based aggregation to process input rows. It uses - * a hash map to store groups and their corresponding aggregation buffers. If we - * this map cannot allocate memory from memory manager, it spill the map into disk - * and create a new one. After processed all the input, then merge all the spills + * a hash map to store groups and their corresponding aggregation buffers. If + * this map cannot allocate memory from memory manager, it spills the map into disk + * and creates a new one. After processed all the input, then merge all the spills * together using external sorter, and do sort-based aggregation. * * The process has the following step: http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala index 484e438..c7af402 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala @@ -358,11 +358,11 @@ abstract class QueryTest extends PlanTest { */ def assertEmptyMissingInput(query: Dataset[_]): Unit = { assert(query.queryExecution.analyzed.missingInput.isEmpty, - s"The analyzed logical plan has missing inputs: ${query.queryExecution.analyzed}") + s"The analyzed logical plan has missing
spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
Repository: spark Updated Branches: refs/heads/branch-2.0 94eff0875 -> 6dcc1a3f0 [SPARKR][MINOR] Add Xiangrui and Felix to maintainers ## What changes were proposed in this pull request? This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram VenkataramanCloses #14758 from shivaram/sparkr-maintainers. (cherry picked from commit 6f3cd36f93c11265449fdce3323e139fec8ab22d) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6dcc1a3f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6dcc1a3f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6dcc1a3f Branch: refs/heads/branch-2.0 Commit: 6dcc1a3f0cc8f2ed71f7bb6b1493852a58259d2f Parents: 94eff08 Author: Shivaram Venkataraman Authored: Mon Aug 22 12:53:52 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 12:54:03 2016 -0700 -- R/pkg/DESCRIPTION | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6dcc1a3f/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 357ab00..d81f1a3 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -5,6 +5,8 @@ Version: 2.0.0 Date: 2016-07-07 Author: The Apache Software Foundation Maintainer: Shivaram Venkataraman +Xiangrui Meng +Felix Cheung Depends: R (>= 3.0), methods - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
Repository: spark Updated Branches: refs/heads/master 0583ecda1 -> 6f3cd36f9 [SPARKR][MINOR] Add Xiangrui and Felix to maintainers ## What changes were proposed in this pull request? This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram VenkataramanCloses #14758 from shivaram/sparkr-maintainers. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f3cd36f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f3cd36f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f3cd36f Branch: refs/heads/master Commit: 6f3cd36f93c11265449fdce3323e139fec8ab22d Parents: 0583ecd Author: Shivaram Venkataraman Authored: Mon Aug 22 12:53:52 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 12:53:52 2016 -0700 -- R/pkg/DESCRIPTION | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6f3cd36f/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 357ab00..d81f1a3 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -5,6 +5,8 @@ Version: 2.0.0 Date: 2016-07-07 Author: The Apache Software Foundation Maintainer: Shivaram Venkataraman +Xiangrui Meng +Felix Cheung Depends: R (>= 3.0), methods - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test
Repository: spark Updated Branches: refs/heads/master 342278c09 -> 0583ecda1 [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test ## What changes were proposed in this pull request? refactor, cleanup, reformat, fix deprecation in test ## How was this patch tested? unit tests, manual tests Author: Felix CheungCloses #14735 from felixcheung/rmllibutil. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0583ecda Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0583ecda Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0583ecda Branch: refs/heads/master Commit: 0583ecda1b63a7e3f126c3276059e4f99548a741 Parents: 342278c Author: Felix Cheung Authored: Mon Aug 22 12:27:33 2016 -0700 Committer: Felix Cheung Committed: Mon Aug 22 12:27:33 2016 -0700 -- R/pkg/R/mllib.R| 205 R/pkg/inst/tests/testthat/test_mllib.R | 10 +- 2 files changed, 98 insertions(+), 117 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0583ecda/R/pkg/R/mllib.R -- diff --git a/R/pkg/R/mllib.R b/R/pkg/R/mllib.R index 9a53c80..b36fbce 100644 --- a/R/pkg/R/mllib.R +++ b/R/pkg/R/mllib.R @@ -88,9 +88,9 @@ setClass("ALSModel", representation(jobj = "jobj")) #' @rdname write.ml #' @name write.ml #' @export -#' @seealso \link{spark.glm}, \link{glm}, \link{spark.gaussianMixture} -#' @seealso \link{spark.als}, \link{spark.kmeans}, \link{spark.lda}, \link{spark.naiveBayes} -#' @seealso \link{spark.survreg}, \link{spark.isoreg} +#' @seealso \link{spark.glm}, \link{glm}, +#' @seealso \link{spark.als}, \link{spark.gaussianMixture}, \link{spark.isoreg}, \link{spark.kmeans}, +#' @seealso \link{spark.lda}, \link{spark.naiveBayes}, \link{spark.survreg}, #' @seealso \link{read.ml} NULL @@ -101,11 +101,22 @@ NULL #' @rdname predict #' @name predict #' @export -#' @seealso \link{spark.glm}, \link{glm}, \link{spark.gaussianMixture} -#' @seealso \link{spark.als}, \link{spark.kmeans}, \link{spark.naiveBayes}, \link{spark.survreg} -#' @seealso \link{spark.isoreg} +#' @seealso \link{spark.glm}, \link{glm}, +#' @seealso \link{spark.als}, \link{spark.gaussianMixture}, \link{spark.isoreg}, \link{spark.kmeans}, +#' @seealso \link{spark.naiveBayes}, \link{spark.survreg}, NULL +write_internal <- function(object, path, overwrite = FALSE) { + writer <- callJMethod(object@jobj, "write") + if (overwrite) { +writer <- callJMethod(writer, "overwrite") + } + invisible(callJMethod(writer, "save", path)) +} + +predict_internal <- function(object, newData) { + dataFrame(callJMethod(object@jobj, "transform", newData@sdf)) +} #' Generalized Linear Models #' @@ -173,7 +184,7 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", "fit", formula, data@sdf, family$family, family$link, tol, as.integer(maxIter), as.character(weightCol)) -return(new("GeneralizedLinearRegressionModel", jobj = jobj)) +new("GeneralizedLinearRegressionModel", jobj = jobj) }) #' Generalized Linear Models (R-compliant) @@ -219,7 +230,7 @@ setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDat #' @export #' @note summary(GeneralizedLinearRegressionModel) since 2.0.0 setMethod("summary", signature(object = "GeneralizedLinearRegressionModel"), - function(object, ...) { + function(object) { jobj <- object@jobj is.loaded <- callJMethod(jobj, "isLoaded") features <- callJMethod(jobj, "rFeatures") @@ -245,7 +256,7 @@ setMethod("summary", signature(object = "GeneralizedLinearRegressionModel"), deviance = deviance, df.null = df.null, df.residual = df.residual, aic = aic, iter = iter, family = family, is.loaded = is.loaded) class(ans) <- "summary.GeneralizedLinearRegressionModel" -return(ans) +ans }) # Prints the summary of GeneralizedLinearRegressionModel @@ -275,8 +286,7 @@ print.summary.GeneralizedLinearRegressionModel <- function(x, ...) { " on", format(unlist(x[c("df.null", "df.residual")])), " degrees of freedom\n"), 1L, paste, collapse = " "), sep = "") cat("AIC: ", format(x$aic, digits = 4L), "\n\n", -"Number of Fisher Scoring iterations: ", x$iter, "\n", sep = "") - cat("\n") +"Number of Fisher Scoring iterations: ", x$iter, "\n\n", sep = "") invisible(x) } @@ -291,7
spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
Repository: spark Updated Branches: refs/heads/branch-2.0 79195982a -> 94eff0875 [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 ## What changes were proposed in this pull request? Collect GC discussion in one section, and documenting findings about G1 GC heap region size. ## How was this patch tested? Jekyll doc build Author: Sean OwenCloses #14732 from srowen/SPARK-16320. (cherry picked from commit 342278c09cf6e79ed4f63422988a6bbd1e7d8a91) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94eff087 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94eff087 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94eff087 Branch: refs/heads/branch-2.0 Commit: 94eff08757cee70c5b31fff7095bbb1e6ebc7ecf Parents: 7919598 Author: Sean Owen Authored: Mon Aug 22 11:15:53 2016 -0700 Committer: Yin Huai Committed: Mon Aug 22 11:16:11 2016 -0700 -- docs/tuning.md | 36 +--- 1 file changed, 17 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/94eff087/docs/tuning.md -- diff --git a/docs/tuning.md b/docs/tuning.md index 976f2eb..cbf3721 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -122,21 +122,8 @@ large records. `R` is the storage space within `M` where cached blocks immune to being evicted by execution. The value of `spark.memory.fraction` should be set in order to fit this amount of heap space -comfortably within the JVM's old or "tenured" generation. Otherwise, when much of this space is -used for caching and execution, the tenured generation will be full, which causes the JVM to -significantly increase time spent in garbage collection. See -https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html;>Java GC sizing documentation -for more information. - -The tenured generation size is controlled by the JVM's `NewRatio` parameter, which defaults to 2, -meaning that the tenured generation is 2 times the size of the new generation (the rest of the heap). -So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. A value of -0.6 for `spark.memory.fraction` keeps storage and execution memory within the old generation with -room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then `NewRatio` may have to -increase to 6 or more. - -`NewRatio` is set as a JVM flag for executors, which means adding -`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's configuration. +comfortably within the JVM's old or "tenured" generation. See the discussion of advanced GC +tuning below for details. ## Determining Memory Consumption @@ -217,14 +204,22 @@ temporary objects created during task execution. Some steps which may be useful * Check if there are too many garbage collections by collecting GC stats. If a full GC is invoked multiple times for before a task completes, it means that there isn't enough memory available for executing tasks. -* In the GC stats that are printed, if the OldGen is close to being full, reduce the amount of - memory used for caching by lowering `spark.memory.storageFraction`; it is better to cache fewer - objects than to slow down task execution! - * If there are too many minor collections but not many major GCs, allocating more memory for Eden would help. You can set the size of the Eden to be an over-estimate of how much memory each task will need. If the size of Eden is determined to be `E`, then you can set the size of the Young generation using the option `-Xmn=4/3*E`. (The scaling up by 4/3 is to account for space used by survivor regions as well.) + +* In the GC stats that are printed, if the OldGen is close to being full, reduce the amount of + memory used for caching by lowering `spark.memory.fraction`; it is better to cache fewer + objects than to slow down task execution. Alternatively, consider decreasing the size of + the Young generation. This means lowering `-Xmn` if you've set it as above. If not, try changing the + value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, meaning that the Old generation + occupies 2/3 of the heap. It should be large enough such that this fraction exceeds `spark.memory.fraction`. + +* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve performance in some situations where + garbage collection is a bottleneck. Note that with large executor heap sizes, it may be important to + increase the [G1 region size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) + with `-XX:G1HeapRegionSize` * As an
spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
Repository: spark Updated Branches: refs/heads/master 209e1b3c0 -> 342278c09 [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 ## What changes were proposed in this pull request? Collect GC discussion in one section, and documenting findings about G1 GC heap region size. ## How was this patch tested? Jekyll doc build Author: Sean OwenCloses #14732 from srowen/SPARK-16320. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/342278c0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/342278c0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/342278c0 Branch: refs/heads/master Commit: 342278c09cf6e79ed4f63422988a6bbd1e7d8a91 Parents: 209e1b3 Author: Sean Owen Authored: Mon Aug 22 11:15:53 2016 -0700 Committer: Yin Huai Committed: Mon Aug 22 11:15:53 2016 -0700 -- docs/tuning.md | 36 +--- 1 file changed, 17 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/342278c0/docs/tuning.md -- diff --git a/docs/tuning.md b/docs/tuning.md index 976f2eb..cbf3721 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -122,21 +122,8 @@ large records. `R` is the storage space within `M` where cached blocks immune to being evicted by execution. The value of `spark.memory.fraction` should be set in order to fit this amount of heap space -comfortably within the JVM's old or "tenured" generation. Otherwise, when much of this space is -used for caching and execution, the tenured generation will be full, which causes the JVM to -significantly increase time spent in garbage collection. See -https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html;>Java GC sizing documentation -for more information. - -The tenured generation size is controlled by the JVM's `NewRatio` parameter, which defaults to 2, -meaning that the tenured generation is 2 times the size of the new generation (the rest of the heap). -So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. A value of -0.6 for `spark.memory.fraction` keeps storage and execution memory within the old generation with -room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then `NewRatio` may have to -increase to 6 or more. - -`NewRatio` is set as a JVM flag for executors, which means adding -`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's configuration. +comfortably within the JVM's old or "tenured" generation. See the discussion of advanced GC +tuning below for details. ## Determining Memory Consumption @@ -217,14 +204,22 @@ temporary objects created during task execution. Some steps which may be useful * Check if there are too many garbage collections by collecting GC stats. If a full GC is invoked multiple times for before a task completes, it means that there isn't enough memory available for executing tasks. -* In the GC stats that are printed, if the OldGen is close to being full, reduce the amount of - memory used for caching by lowering `spark.memory.storageFraction`; it is better to cache fewer - objects than to slow down task execution! - * If there are too many minor collections but not many major GCs, allocating more memory for Eden would help. You can set the size of the Eden to be an over-estimate of how much memory each task will need. If the size of Eden is determined to be `E`, then you can set the size of the Young generation using the option `-Xmn=4/3*E`. (The scaling up by 4/3 is to account for space used by survivor regions as well.) + +* In the GC stats that are printed, if the OldGen is close to being full, reduce the amount of + memory used for caching by lowering `spark.memory.fraction`; it is better to cache fewer + objects than to slow down task execution. Alternatively, consider decreasing the size of + the Young generation. This means lowering `-Xmn` if you've set it as above. If not, try changing the + value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, meaning that the Old generation + occupies 2/3 of the heap. It should be large enough such that this fraction exceeds `spark.memory.fraction`. + +* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve performance in some situations where + garbage collection is a bottleneck. Note that with large executor heap sizes, it may be important to + increase the [G1 region size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) + with `-XX:G1HeapRegionSize` * As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using the size
spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows
Repository: spark Updated Branches: refs/heads/branch-2.0 2add45fab -> 79195982a [SPARKR][MINOR] Fix Cache Folder Path in Windows ## What changes were proposed in this pull request? This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`. ## How was this patch tested? Manual test in Windows 7. Author: Junyang QianCloses #14743 from junyangq/SPARKR-FixWindowsInstall. (cherry picked from commit 209e1b3c0683a9106428e269e5041980b6cc327f) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79195982 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79195982 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79195982 Branch: refs/heads/branch-2.0 Commit: 79195982a4c6f8b1a3e02069dea00049cc806574 Parents: 2add45f Author: Junyang Qian Authored: Mon Aug 22 10:03:48 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 10:03:59 2016 -0700 -- R/pkg/R/install.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/79195982/R/pkg/R/install.R -- diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R index 987bac7..ff81e86 100644 --- a/R/pkg/R/install.R +++ b/R/pkg/R/install.R @@ -212,7 +212,7 @@ hadoop_version_name <- function(hadoopVersion) { # adapt to Spark context spark_cache_path <- function() { if (.Platform$OS.type == "windows") { -winAppPath <- Sys.getenv("%LOCALAPPDATA%", unset = NA) +winAppPath <- Sys.getenv("LOCALAPPDATA", unset = NA) if (is.na(winAppPath)) { msg <- paste("%LOCALAPPDATA% not found.", "Please define the environment variable", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows
Repository: spark Updated Branches: refs/heads/master b264cbb16 -> 209e1b3c0 [SPARKR][MINOR] Fix Cache Folder Path in Windows ## What changes were proposed in this pull request? This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`. ## How was this patch tested? Manual test in Windows 7. Author: Junyang QianCloses #14743 from junyangq/SPARKR-FixWindowsInstall. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/209e1b3c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/209e1b3c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/209e1b3c Branch: refs/heads/master Commit: 209e1b3c0683a9106428e269e5041980b6cc327f Parents: b264cbb Author: Junyang Qian Authored: Mon Aug 22 10:03:48 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 22 10:03:48 2016 -0700 -- R/pkg/R/install.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/209e1b3c/R/pkg/R/install.R -- diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R index 987bac7..ff81e86 100644 --- a/R/pkg/R/install.R +++ b/R/pkg/R/install.R @@ -212,7 +212,7 @@ hadoop_version_name <- function(hadoopVersion) { # adapt to Spark context spark_cache_path <- function() { if (.Platform$OS.type == "windows") { -winAppPath <- Sys.getenv("%LOCALAPPDATA%", unset = NA) +winAppPath <- Sys.getenv("LOCALAPPDATA", unset = NA) if (is.na(winAppPath)) { msg <- paste("%LOCALAPPDATA% not found.", "Please define the environment variable", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15113][PYSPARK][ML] Add missing num features num classes
Repository: spark Updated Branches: refs/heads/master bd9655063 -> b264cbb16 [SPARK-15113][PYSPARK][ML] Add missing num features num classes ## What changes were proposed in this pull request? Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as Expiremental to match Scala doc. ## How was this patch tested? Extended doctests Author: Holden KarauCloses #12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b264cbb1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b264cbb1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b264cbb1 Branch: refs/heads/master Commit: b264cbb16fb97116e630fb593adf5898a5a0e8fa Parents: bd96550 Author: Holden Karau Authored: Mon Aug 22 12:21:22 2016 +0200 Committer: Nick Pentreath Committed: Mon Aug 22 12:21:22 2016 +0200 -- .../GeneralizedLinearRegression.scala | 2 ++ python/pyspark/ml/classification.py | 37 python/pyspark/ml/regression.py | 22 +--- python/pyspark/ml/util.py | 16 + 4 files changed, 66 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b264cbb1/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala b/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala index 2bdc09e..1d4dfd1 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala @@ -788,6 +788,8 @@ class GeneralizedLinearRegressionModel private[ml] ( @Since("2.0.0") override def write: MLWriter = new GeneralizedLinearRegressionModel.GeneralizedLinearRegressionModelWriter(this) + + override val numFeatures: Int = coefficients.size } @Since("2.0.0") http://git-wip-us.apache.org/repos/asf/spark/blob/b264cbb1/python/pyspark/ml/classification.py -- diff --git a/python/pyspark/ml/classification.py b/python/pyspark/ml/classification.py index 6468007..33ada27 100644 --- a/python/pyspark/ml/classification.py +++ b/python/pyspark/ml/classification.py @@ -44,6 +44,23 @@ __all__ = ['LogisticRegression', 'LogisticRegressionModel', @inherit_doc +class JavaClassificationModel(JavaPredictionModel): +""" +(Private) Java Model produced by a ``Classifier``. +Classes are indexed {0, 1, ..., numClasses - 1}. +To be mixed in with class:`pyspark.ml.JavaModel` +""" + +@property +@since("2.1.0") +def numClasses(self): +""" +Number of classes (values which the label can take). +""" +return self._call_java("numClasses") + + +@inherit_doc class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter, HasRegParam, HasTol, HasProbabilityCol, HasRawPredictionCol, HasElasticNetParam, HasFitIntercept, HasStandardization, HasThresholds, @@ -212,7 +229,7 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti " threshold (%g) and thresholds (equivalent to %g)" % (t2, t)) -class LogisticRegressionModel(JavaModel, JavaMLWritable, JavaMLReadable): +class LogisticRegressionModel(JavaModel, JavaClassificationModel, JavaMLWritable, JavaMLReadable): """ Model fitted by LogisticRegression. @@ -522,6 +539,10 @@ class DecisionTreeClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPred 1 >>> model.featureImportances SparseVector(1, {0: 1.0}) +>>> model.numFeatures +1 +>>> model.numClasses +2 >>> print(model.toDebugString) DecisionTreeClassificationModel (uid=...) of depth 1 with 3 nodes... >>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"]) @@ -595,7 +616,8 @@ class DecisionTreeClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPred @inherit_doc -class DecisionTreeClassificationModel(DecisionTreeModel, JavaMLWritable, JavaMLReadable): +class DecisionTreeClassificationModel(DecisionTreeModel, JavaClassificationModel, JavaMLWritable, + JavaMLReadable): """ Model fitted by DecisionTreeClassifier. @@ -722,7 +744,8 @@ class RandomForestClassifier(JavaEstimator,
spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]
Repository: spark Updated Branches: refs/heads/master 8d35a6f68 -> bd9655063 [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS] Changes in Spark Stuctured Streaming doc in this link https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations Author: JagadeesanCloses #14715 from jagadeesanas2/SPARK-17085. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd965506 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd965506 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd965506 Branch: refs/heads/master Commit: bd9655063bdba8836b4ec96ed115e5653e246b65 Parents: 8d35a6f Author: Jagadeesan Authored: Mon Aug 22 09:30:31 2016 +0100 Committer: Sean Owen Committed: Mon Aug 22 09:30:31 2016 +0100 -- docs/structured-streaming-programming-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bd965506/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index e2c881b..226ff74 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -726,9 +726,9 @@ However, note that all of the operations applicable on static DataFrames/Dataset + Full outer join with a streaming Dataset is not supported -+ Left outer join with a streaming Dataset on the left is not supported ++ Left outer join with a streaming Dataset on the right is not supported -+ Right outer join with a streaming Dataset on the right is not supported ++ Right outer join with a streaming Dataset on the left is not supported - Any kind of joins between two streaming Datasets are not yet supported. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]
Repository: spark Updated Branches: refs/heads/branch-2.0 49cc44de3 -> 2add45fab [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS] Changes in Spark Stuctured Streaming doc in this link https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations Author: JagadeesanCloses #14715 from jagadeesanas2/SPARK-17085. (cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2add45fa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2add45fa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2add45fa Branch: refs/heads/branch-2.0 Commit: 2add45fabeb0ea4f7b17b5bc4910161370e72627 Parents: 49cc44d Author: Jagadeesan Authored: Mon Aug 22 09:30:31 2016 +0100 Committer: Sean Owen Committed: Mon Aug 22 09:30:41 2016 +0100 -- docs/structured-streaming-programming-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2add45fa/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 811e8c4..a2f1ee2 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -726,9 +726,9 @@ However, note that all of the operations applicable on static DataFrames/Dataset + Full outer join with a streaming Dataset is not supported -+ Left outer join with a streaming Dataset on the left is not supported ++ Left outer join with a streaming Dataset on the right is not supported -+ Right outer join with a streaming Dataset on the right is not supported ++ Right outer join with a streaming Dataset on the left is not supported - Any kind of joins between two streaming Datasets are not yet supported. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions
Repository: spark Updated Branches: refs/heads/branch-2.0 e62b29f29 -> 49cc44de3 [SPARK-17115][SQL] decrease the threshold when split expressions ## What changes were proposed in this pull request? In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very bad performance on wide table, because the generated method can't be JIT compiled by default (above the limit of 8K bytecode). This PR will decrease it to 1K, based on the benchmark results for a wide table with 400 columns of LongType. It also fix a bug around splitting expression in whole-stage codegen (it should not split them). ## How was this patch tested? Added benchmark suite. Author: Davies LiuCloses #14692 from davies/split_exprs. (cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/49cc44de Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/49cc44de Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/49cc44de Branch: refs/heads/branch-2.0 Commit: 49cc44de3ad5495b2690633791941aa00a62b553 Parents: e62b29f Author: Davies Liu Authored: Mon Aug 22 16:16:03 2016 +0800 Committer: Wenchen Fan Committed: Mon Aug 22 16:16:36 2016 +0800 -- .../expressions/codegen/CodeGenerator.scala | 9 ++-- .../execution/aggregate/HashAggregateExec.scala | 2 - .../benchmark/BenchmarkWideTable.scala | 53 3 files changed, 59 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 16fb1f6..4bd9ee0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -584,15 +584,18 @@ class CodegenContext { * @param expressions the codes to evaluate expressions. */ def splitExpressions(row: String, expressions: Seq[String]): String = { -if (row == null) { +if (row == null || currentVars != null) { // Cannot split these expressions because they are not created from a row object. return expressions.mkString("\n") } val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() for (code <- expressions) { - // We can't know how many byte code will be generated, so use the number of bytes as limit - if (blockBuilder.length > 64 * 1000) { + // We can't know how many bytecode will be generated, so use the length of source code + // as metric. A method should not go beyond 8K, otherwise it will not be JITted, should + // also not be too small, or it will have many function calls (for wide table), see the + // results in BenchmarkWideTable. + if (blockBuilder.length > 1024) { blocks.append(blockBuilder.toString()) blockBuilder.clear() } http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala index cfc47ab..bd7efa6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala @@ -603,8 +603,6 @@ case class HashAggregateExec( // create grouping key ctx.currentVars = input -// make sure that the generated code will not be splitted as multiple functions -ctx.INPUT_ROW = null val unsafeRowKeyCode = GenerateUnsafeProjection.createCode( ctx, groupingExpressions.map(e => BindReferences.bindReference[Expression](e, child.output))) val vectorizedRowKeys = ctx.generateExpressions( http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala -- diff --git
spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions
Repository: spark Updated Branches: refs/heads/master 4b6c2cbcb -> 8d35a6f68 [SPARK-17115][SQL] decrease the threshold when split expressions ## What changes were proposed in this pull request? In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very bad performance on wide table, because the generated method can't be JIT compiled by default (above the limit of 8K bytecode). This PR will decrease it to 1K, based on the benchmark results for a wide table with 400 columns of LongType. It also fix a bug around splitting expression in whole-stage codegen (it should not split them). ## How was this patch tested? Added benchmark suite. Author: Davies LiuCloses #14692 from davies/split_exprs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d35a6f6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d35a6f6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d35a6f6 Branch: refs/heads/master Commit: 8d35a6f68d6d733212674491cbf31bed73fada0f Parents: 4b6c2cb Author: Davies Liu Authored: Mon Aug 22 16:16:03 2016 +0800 Committer: Wenchen Fan Committed: Mon Aug 22 16:16:03 2016 +0800 -- .../expressions/codegen/CodeGenerator.scala | 9 ++-- .../execution/aggregate/HashAggregateExec.scala | 2 - .../benchmark/BenchmarkWideTable.scala | 53 3 files changed, 59 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 16fb1f6..4bd9ee0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -584,15 +584,18 @@ class CodegenContext { * @param expressions the codes to evaluate expressions. */ def splitExpressions(row: String, expressions: Seq[String]): String = { -if (row == null) { +if (row == null || currentVars != null) { // Cannot split these expressions because they are not created from a row object. return expressions.mkString("\n") } val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() for (code <- expressions) { - // We can't know how many byte code will be generated, so use the number of bytes as limit - if (blockBuilder.length > 64 * 1000) { + // We can't know how many bytecode will be generated, so use the length of source code + // as metric. A method should not go beyond 8K, otherwise it will not be JITted, should + // also not be too small, or it will have many function calls (for wide table), see the + // results in BenchmarkWideTable. + if (blockBuilder.length > 1024) { blocks.append(blockBuilder.toString()) blockBuilder.clear() } http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala index cfc47ab..bd7efa6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala @@ -603,8 +603,6 @@ case class HashAggregateExec( // create grouping key ctx.currentVars = input -// make sure that the generated code will not be splitted as multiple functions -ctx.INPUT_ROW = null val unsafeRowKeyCode = GenerateUnsafeProjection.createCode( ctx, groupingExpressions.map(e => BindReferences.bindReference[Expression](e, child.output))) val vectorizedRowKeys = ctx.generateExpressions( http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala new file mode
spark git commit: [SPARK-16968] Document additional options in jdbc Writer
Repository: spark Updated Branches: refs/heads/master 083de00cb -> 4b6c2cbcb [SPARK-16968] Document additional options in jdbc Writer ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) This is the document for previous JDBC Writer options. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit test has been added in previous PR. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: GraceHCloses #14683 from GraceH/jdbc_options. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b6c2cbc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b6c2cbc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b6c2cbc Branch: refs/heads/master Commit: 4b6c2cbcb109c7cef6087bae32d87cc3ddb69cf9 Parents: 083de00 Author: GraceH Authored: Mon Aug 22 09:03:46 2016 +0100 Committer: Sean Owen Committed: Mon Aug 22 09:03:46 2016 +0100 -- docs/sql-programming-guide.md | 14 ++ 1 file changed, 14 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4b6c2cbc/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index c89286d..28cc88c 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1058,6 +1058,20 @@ the Data Sources API. The following options are supported: The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows). + + +truncate + + This is a JDBC writer related option. When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g. indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false. + + + + +createTableOptions + + This is a JDBC writer related option. If specified, this option allows setting of database-specific table and partition options when creating a table. For example: CREATE TABLE t (name string) ENGINE=InnoDB. + + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17127] Make unaligned access in unsafe available for AArch64
Repository: spark Updated Branches: refs/heads/master b2074b664 -> 083de00cb [SPARK-17127] Make unaligned access in unsafe available for AArch64 ## # What changes were proposed in this pull request? >From the spark of version 2.0.0 , when MemoryMode.OFF_HEAP is set , whether >the architecture supports unaligned access or not is checked. If the check >doesn't pass, exception is raised. We know that AArch64 also supports unaligned access , but now only i386, x86, amd64, and X86_64 are included. I think we should include aarch64 when performing the check. ## How was this patch tested? Unit test suite Author: RichaelCloses #14700 from yimuxi/zym_change_unsafe. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/083de00c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/083de00c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/083de00c Branch: refs/heads/master Commit: 083de00cb608a7414aae99a639825482bebfea8a Parents: b2074b6 Author: Richael Authored: Mon Aug 22 09:01:50 2016 +0100 Committer: Sean Owen Committed: Mon Aug 22 09:01:50 2016 +0100 -- common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/083de00c/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java -- diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java index a2ee45c..c892b9c 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java @@ -55,7 +55,7 @@ public final class Platform { // We at least know x86 and x64 support unaligned access. String arch = System.getProperty("os.arch", ""); //noinspection DynamicRegexReplaceableByCompiledPattern - _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64)$"); + _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$"); } unaligned = _unaligned; } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org