date:20160822

spark git commit: [SPARKR][MINOR] Update R DESCRIPTION file

2016-08-22 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 eaea1c86b -> d16f9a0b7


[SPARKR][MINOR] Update R DESCRIPTION file

## What changes were proposed in this pull request?

Update DESCRIPTION

## How was this patch tested?

Run install and CRAN tests

Author: Felix Cheung 

Closes #14764 from felixcheung/rpackagedescription.

(cherry picked from commit d2b3d3e63e1a9217de6ef507c350308017664a62)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d16f9a0b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d16f9a0b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d16f9a0b

Branch: refs/heads/branch-2.0
Commit: d16f9a0b7c464728d7b11899740908e23820a797
Parents: eaea1c8
Author: Felix Cheung 
Authored: Mon Aug 22 20:15:03 2016 -0700
Committer: Xiangrui Meng 
Committed: Mon Aug 22 20:15:14 2016 -0700

--
 R/pkg/DESCRIPTION | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d16f9a0b/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index d81f1a3..e5afed2 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -3,10 +3,15 @@ Type: Package
 Title: R Frontend for Apache Spark
 Version: 2.0.0
 Date: 2016-07-07
-Author: The Apache Software Foundation
-Maintainer: Shivaram Venkataraman 
-Xiangrui Meng 
-Felix Cheung 
+Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
+email = "shiva...@cs.berkeley.edu"),
+ person("Xiangrui", "Meng", role = "aut",
+email = "m...@databricks.com"),
+ person("Felix", "Cheung", role = "aut",
+email = "felixche...@apache.org"),
+ person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
+URL: http://www.apache.org/ http://spark.apache.org/
+BugReports: 
https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4
 Depends:
 R (>= 3.0),
 methods


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic

2016-08-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 225898961 -> eaea1c86b


[SPARK-17182][SQL] Mark Collect as non-deterministic

## What changes were proposed in this pull request?

This PR marks the abstract class `Collect` as non-deterministic since the 
results of `CollectList` and `CollectSet` depend on the actual order of input 
rows.

## How was this patch tested?

Existing test cases should be enough.

Author: Cheng Lian 

Closes #14749 from liancheng/spark-17182-non-deterministic-collect.

(cherry picked from commit 2cdd92a7cd6f85186c846635b422b977bdafbcdd)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaea1c86
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eaea1c86
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eaea1c86

Branch: refs/heads/branch-2.0
Commit: eaea1c86b897d302107a9b6833a27a2b24ca31a0
Parents: 2258989
Author: Cheng Lian 
Authored: Tue Aug 23 09:11:47 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 23 09:14:47 2016 +0800

--
 .../spark/sql/catalyst/expressions/aggregate/collect.scala   | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eaea1c86/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index ac2cefa..896ff61 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -54,6 +54,10 @@ abstract class Collect extends ImperativeAggregate {
 
   override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
 
+  // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
+  // actual order of input rows.
+  override def deterministic: Boolean = false
+
   protected[this] val buffer: Growable[Any] with Iterable[Any]
 
   override def initialize(b: MutableRow): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic

2016-08-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 920806ab2 -> 2cdd92a7c


[SPARK-17182][SQL] Mark Collect as non-deterministic

## What changes were proposed in this pull request?

This PR marks the abstract class `Collect` as non-deterministic since the 
results of `CollectList` and `CollectSet` depend on the actual order of input 
rows.

## How was this patch tested?

Existing test cases should be enough.

Author: Cheng Lian 

Closes #14749 from liancheng/spark-17182-non-deterministic-collect.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cdd92a7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cdd92a7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cdd92a7

Branch: refs/heads/master
Commit: 2cdd92a7cd6f85186c846635b422b977bdafbcdd
Parents: 920806a
Author: Cheng Lian 
Authored: Tue Aug 23 09:11:47 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 23 09:11:47 2016 +0800

--
 .../spark/sql/catalyst/expressions/aggregate/collect.scala   | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2cdd92a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index ac2cefa..896ff61 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -54,6 +54,10 @@ abstract class Collect extends ImperativeAggregate {
 
   override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
 
+  // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
+  // actual order of input rows.
+  override def deterministic: Boolean = false
+
   protected[this] val buffer: Growable[Any] with Iterable[Any]
 
   override def initialize(b: MutableRow): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ff2f87380 -> 225898961


[SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

This change adds CRAN documentation checks to be run as a part of 
`R/run-tests.sh` . As this script is also used by Jenkins this means that we 
will get documentation checks on every PR going forward.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: Shivaram Venkataraman 

Closes #14759 from shivaram/sparkr-cran-jenkins.

(cherry picked from commit 920806ab272ba58a369072a5eeb89df5e9b470a6)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22589896
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22589896
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22589896

Branch: refs/heads/branch-2.0
Commit: 225898961bc4bc71d56f33c027adbb2d0929ae5a
Parents: ff2f873
Author: Shivaram Venkataraman 
Authored: Mon Aug 22 17:09:32 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 17:09:44 2016 -0700

--
 R/check-cran.sh | 18 +++---
 R/run-tests.sh  | 27 ---
 2 files changed, 39 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/22589896/R/check-cran.sh
--
diff --git a/R/check-cran.sh b/R/check-cran.sh
index 5c90fd0..bb33146 100755
--- a/R/check-cran.sh
+++ b/R/check-cran.sh
@@ -43,10 +43,22 @@ $FWDIR/create-docs.sh
 "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
 
 # Run check as-cran.
-# TODO(shivaram): Remove the skip tests once we figure out the install 
mechanism
-
 VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
 
-"$R_SCRIPT_PATH/"R CMD check --as-cran SparkR_"$VERSION".tar.gz
+CRAN_CHECK_OPTIONS="--as-cran"
+
+if [ -n "$NO_TESTS" ]
+then
+  CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests"
+fi
+
+if [ -n "$NO_MANUAL" ]
+then
+  CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
+fi
+
+echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
+
+"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
 
 popd > /dev/null

http://git-wip-us.apache.org/repos/asf/spark/blob/22589896/R/run-tests.sh
--
diff --git a/R/run-tests.sh b/R/run-tests.sh
index 9dcf0ac..1a1e8ab 100755
--- a/R/run-tests.sh
+++ b/R/run-tests.sh
@@ -26,6 +26,17 @@ rm -f $LOGFILE
 SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options 
"-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf 
spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee 
-a $LOGFILE
 FAILED=$((PIPESTATUS[0]||$FAILED))
 
+# Also run the documentation tests for CRAN
+CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out
+rm -f $CRAN_CHECK_LOG_FILE
+
+NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE
+FAILED=$((PIPESTATUS[0]||$FAILED))
+
+NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)"
+NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)"
+NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)"
+
 if [[ $FAILED != 0 ]]; then
 cat $LOGFILE
 echo -en "\033[31m"  # Red
@@ -33,7 +44,17 @@ if [[ $FAILED != 0 ]]; then
 echo -en "\033[0m"  # No color
 exit -1
 else
-echo -en "\033[32m"  # Green
-echo "Tests passed."
-echo -en "\033[0m"  # No color
+# We have 2 existing NOTEs for new maintainer, attach()
+# We have one more NOTE in Jenkins due to "No repository set"
+if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES 
-gt 3 ]]; then
+  cat $CRAN_CHECK_LOG_FILE
+  echo -en "\033[31m"  # Red
+  echo "Had CRAN check errors; see logs."
+  echo -en "\033[0m"  # No color
+  exit -1
+else
+  echo -en "\033[32m"  # Green
+  echo "Tests passed."
+  echo -en "\033[0m"  # No color
+fi
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master 37f0ab70d -> 920806ab2


[SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

This change adds CRAN documentation checks to be run as a part of 
`R/run-tests.sh` . As this script is also used by Jenkins this means that we 
will get documentation checks on every PR going forward.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: Shivaram Venkataraman 

Closes #14759 from shivaram/sparkr-cran-jenkins.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/920806ab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/920806ab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/920806ab

Branch: refs/heads/master
Commit: 920806ab272ba58a369072a5eeb89df5e9b470a6
Parents: 37f0ab7
Author: Shivaram Venkataraman 
Authored: Mon Aug 22 17:09:32 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 17:09:32 2016 -0700

--
 R/check-cran.sh | 18 +++---
 R/run-tests.sh  | 27 ---
 2 files changed, 39 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/920806ab/R/check-cran.sh
--
diff --git a/R/check-cran.sh b/R/check-cran.sh
index 5c90fd0..bb33146 100755
--- a/R/check-cran.sh
+++ b/R/check-cran.sh
@@ -43,10 +43,22 @@ $FWDIR/create-docs.sh
 "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
 
 # Run check as-cran.
-# TODO(shivaram): Remove the skip tests once we figure out the install 
mechanism
-
 VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
 
-"$R_SCRIPT_PATH/"R CMD check --as-cran SparkR_"$VERSION".tar.gz
+CRAN_CHECK_OPTIONS="--as-cran"
+
+if [ -n "$NO_TESTS" ]
+then
+  CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-tests"
+fi
+
+if [ -n "$NO_MANUAL" ]
+then
+  CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
+fi
+
+echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"
+
+"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
 
 popd > /dev/null

http://git-wip-us.apache.org/repos/asf/spark/blob/920806ab/R/run-tests.sh
--
diff --git a/R/run-tests.sh b/R/run-tests.sh
index 9dcf0ac..1a1e8ab 100755
--- a/R/run-tests.sh
+++ b/R/run-tests.sh
@@ -26,6 +26,17 @@ rm -f $LOGFILE
 SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options 
"-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf 
spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee 
-a $LOGFILE
 FAILED=$((PIPESTATUS[0]||$FAILED))
 
+# Also run the documentation tests for CRAN
+CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out
+rm -f $CRAN_CHECK_LOG_FILE
+
+NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE
+FAILED=$((PIPESTATUS[0]||$FAILED))
+
+NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)"
+NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)"
+NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)"
+
 if [[ $FAILED != 0 ]]; then
 cat $LOGFILE
 echo -en "\033[31m"  # Red
@@ -33,7 +44,17 @@ if [[ $FAILED != 0 ]]; then
 echo -en "\033[0m"  # No color
 exit -1
 else
-echo -en "\033[32m"  # Green
-echo "Tests passed."
-echo -en "\033[0m"  # No color
+# We have 2 existing NOTEs for new maintainer, attach()
+# We have one more NOTE in Jenkins due to "No repository set"
+if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES 
-gt 3 ]]; then
+  cat $CRAN_CHECK_LOG_FILE
+  echo -en "\033[31m"  # Red
+  echo "Had CRAN check errors; see logs."
+  echo -en "\033[0m"  # No color
+  exit -1
+else
+  echo -en "\033[32m"  # Green
+  echo "Tests passed."
+  echo -en "\033[0m"  # No color
+fi
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen

2016-08-22 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/master 6d93f9e02 -> 37f0ab70d


[SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen

## What changes were proposed in this pull request?

Add expert param support to SharedParamsCodeGen where aggregationDepth a expert 
param is added.

Author: hqzizania 

Closes #14738 from hqzizania/SPARK-17090-minor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37f0ab70
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37f0ab70
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37f0ab70

Branch: refs/heads/master
Commit: 37f0ab70d25802b609317bc93421d2fe3ee9db6e
Parents: 6d93f9e
Author: hqzizania 
Authored: Mon Aug 22 17:09:08 2016 -0700
Committer: Yanbo Liang 
Committed: Mon Aug 22 17:09:08 2016 -0700

--
 .../spark/ml/param/shared/SharedParamsCodeGen.scala   | 14 ++
 .../apache/spark/ml/param/shared/sharedParams.scala   |  4 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/37f0ab70/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index 0f48a16..480b03d 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -80,7 +80,7 @@ private[shared] object SharedParamsCodeGen {
   ParamDesc[String]("solver", "the solver algorithm for optimization. If 
this is not set or " +
 "empty, default value is 'auto'", Some("\"auto\"")),
   ParamDesc[Int]("aggregationDepth", "suggested depth for treeAggregate 
(>= 2)", Some("2"),
-isValid = "ParamValidators.gtEq(2)"))
+isValid = "ParamValidators.gtEq(2)", isExpertParam = true))
 
 val code = genSharedParams(params)
 val file = 
"src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala"
@@ -95,7 +95,8 @@ private[shared] object SharedParamsCodeGen {
   doc: String,
   defaultValueStr: Option[String] = None,
   isValid: String = "",
-  finalMethods: Boolean = true) {
+  finalMethods: Boolean = true,
+  isExpertParam: Boolean = false) {
 
 require(name.matches("[a-z][a-zA-Z0-9]*"), s"Param name $name is invalid.")
 require(doc.nonEmpty) // TODO: more rigorous on doc
@@ -153,6 +154,11 @@ private[shared] object SharedParamsCodeGen {
 } else {
   ""
 }
+val groupStr = if (param.isExpertParam) {
+  Array("expertParam", "expertGetParam")
+} else {
+  Array("param", "getParam")
+}
 val methodStr = if (param.finalMethods) {
   "final def"
 } else {
@@ -167,11 +173,11 @@ private[shared] object SharedParamsCodeGen {
   |
   |  /**
   |   * Param for $doc.
-  |   * @group param
+  |   * @group ${groupStr(0)}
   |   */
   |  final val $name: $Param = new $Param(this, "$name", "$doc"$isValid)
   |$setDefault
-  |  /** @group getParam */
+  |  /** @group ${groupStr(1)} */
   |  $methodStr get$Name: $T = $$($name)
   |}
   |""".stripMargin

http://git-wip-us.apache.org/repos/asf/spark/blob/37f0ab70/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
index 6803772..9125d9e 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
@@ -397,13 +397,13 @@ private[ml] trait HasAggregationDepth extends Params {
 
   /**
* Param for suggested depth for treeAggregate (>= 2).
-   * @group param
+   * @group expertParam
*/
   final val aggregationDepth: IntParam = new IntParam(this, 
"aggregationDepth", "suggested depth for treeAggregate (>= 2)", 
ParamValidators.gtEq(2))
 
   setDefault(aggregationDepth, 2)
 
-  /** @group getParam */
+  /** @group expertGetParam */
   final def getAggregationDepth: Int = $(aggregationDepth)
 }
 // scalastyle:on


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan

2016-08-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 8e223ea67 -> 6d93f9e02


[SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan

## What changes were proposed in this pull request?
`CreateHiveTableAsSelectLogicalPlan` is a dead code after refactoring.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #14707 from gatorsmile/removeCreateHiveTable.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6d93f9e0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6d93f9e0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6d93f9e0

Branch: refs/heads/master
Commit: 6d93f9e0236aa61e39a1abfb0f7f7c558fb7d5d5
Parents: 8e223ea
Author: gatorsmile 
Authored: Tue Aug 23 08:03:08 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 23 08:03:08 2016 +0800

--
 .../spark/sql/execution/command/tables.scala | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6d93f9e0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index af2b5ff..21544a3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -33,28 +33,11 @@ import org.apache.spark.sql.catalyst.catalog.{BucketSpec, 
CatalogTable, CatalogT
 import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
 import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
-import org.apache.spark.sql.catalyst.plans.logical.{Command, LogicalPlan, 
UnaryNode}
 import org.apache.spark.sql.catalyst.util.quoteIdentifier
-import org.apache.spark.sql.execution.datasources.{PartitioningUtils}
+import org.apache.spark.sql.execution.datasources.PartitioningUtils
 import org.apache.spark.sql.types._
 import org.apache.spark.util.Utils
 
-case class CreateHiveTableAsSelectLogicalPlan(
-tableDesc: CatalogTable,
-child: LogicalPlan,
-allowExisting: Boolean) extends UnaryNode with Command {
-
-  override def output: Seq[Attribute] = Seq.empty[Attribute]
-
-  override lazy val resolved: Boolean =
-tableDesc.identifier.database.isDefined &&
-  tableDesc.schema.nonEmpty &&
-  tableDesc.storage.serde.isDefined &&
-  tableDesc.storage.inputFormat.isDefined &&
-  tableDesc.storage.outputFormat.isDefined &&
-  childrenResolved
-}
-
 /**
  * A command to create a table with the same definition of the given existing 
table.
  *


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 71afeeea4 -> 8e223ea67


[SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block 
manager replication

## What changes were proposed in this pull request?

This is a straightforward clone of JoshRosen 's original patch. I have 
follow-up changes to fix block replication for repl-defined classes as well, 
but those appear to be flaking tests so I'm going to leave that for SPARK-17042

## How was this patch tested?

End-to-end test in ReplSuite (also more tests in DistributedSuite from the 
original patch).

Author: Eric Liang 

Closes #14311 from ericl/spark-16550.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e223ea6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e223ea6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e223ea6

Branch: refs/heads/master
Commit: 8e223ea67acf5aa730ccf688802f17f6fc10907c
Parents: 71afeee
Author: Eric Liang 
Authored: Mon Aug 22 16:32:14 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 16:32:14 2016 -0700

--
 .../spark/serializer/SerializerManager.scala| 14 +++-
 .../org/apache/spark/storage/BlockManager.scala | 13 +++-
 .../org/apache/spark/DistributedSuite.scala | 77 ++--
 .../scala/org/apache/spark/repl/ReplSuite.scala | 14 
 4 files changed, 60 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala 
b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
index 9dc274c..07caadb 100644
--- a/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala
@@ -68,7 +68,7 @@ private[spark] class SerializerManager(defaultSerializer: 
Serializer, conf: Spar
* loaded yet. */
   private lazy val compressionCodec: CompressionCodec = 
CompressionCodec.createCodec(conf)
 
-  private def canUseKryo(ct: ClassTag[_]): Boolean = {
+  def canUseKryo(ct: ClassTag[_]): Boolean = {
 primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag
   }
 
@@ -128,8 +128,18 @@ private[spark] class SerializerManager(defaultSerializer: 
Serializer, conf: Spar
 
   /** Serializes into a chunked byte buffer. */
   def dataSerialize[T: ClassTag](blockId: BlockId, values: Iterator[T]): 
ChunkedByteBuffer = {
+dataSerializeWithExplicitClassTag(blockId, values, implicitly[ClassTag[T]])
+  }
+
+  /** Serializes into a chunked byte buffer. */
+  def dataSerializeWithExplicitClassTag(
+  blockId: BlockId,
+  values: Iterator[_],
+  classTag: ClassTag[_]): ChunkedByteBuffer = {
 val bbos = new ChunkedByteBufferOutputStream(1024 * 1024 * 4, 
ByteBuffer.allocate)
-dataSerializeStream(blockId, bbos, values)
+val byteStream = new BufferedOutputStream(bbos)
+val ser = getSerializer(classTag).newInstance()
+ser.serializeStream(wrapForCompression(blockId, 
byteStream)).writeAll(values).close()
 bbos.toChunkedByteBuffer
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8e223ea6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 015e71d..fe84652 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -498,7 +498,8 @@ private[spark] class BlockManager(
 diskStore.getBytes(blockId)
   } else if (level.useMemory && memoryStore.contains(blockId)) {
 // The block was not found on disk, so serialize an in-memory copy:
-serializerManager.dataSerialize(blockId, 
memoryStore.getValues(blockId).get)
+serializerManager.dataSerializeWithExplicitClassTag(
+  blockId, memoryStore.getValues(blockId).get, info.classTag)
   } else {
 handleLocalReadFailure(blockId)
   }
@@ -973,8 +974,16 @@ private[spark] class BlockManager(
 if (level.replication > 1) {
   val remoteStartTime = System.currentTimeMillis
   val bytesToReplicate = doGetLocalBytes(blockId, info)
+  // [SPARK-16550] Erase the typed classTag when using default 
serialization, since
+  // NettyBlockRpcServer crashes when deserializing repl-defined 
classes.
+  // TODO(ekl) remove this once the classloader issue on

spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 01a4d69f3 -> b65b041af


[SPARK-16508][SPARKR] doc updates and more CRAN check fixes

replace ``` ` ``` in code doc with `\code{thing}`
remove added `...` for drop(DataFrame)
fix remaining CRAN check warnings

create doc with knitr

junyangq

Author: Felix Cheung 

Closes #14734 from felixcheung/rdoccleanup.

(cherry picked from commit 71afeeea4ec8e67edc95b5d504c557c88a2598b9)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b65b041a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b65b041a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b65b041a

Branch: refs/heads/branch-2.0
Commit: b65b041af8b64413c7d460d4ea110b2044d6f36e
Parents: 01a4d69
Author: Felix Cheung 
Authored: Mon Aug 22 15:53:10 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 16:17:18 2016 -0700

--
 R/pkg/NAMESPACE  |  6 -
 R/pkg/R/DataFrame.R  | 69 +++
 R/pkg/R/RDD.R| 10 +++
 R/pkg/R/SQLContext.R | 30 ++---
 R/pkg/R/WindowSpec.R | 23 
 R/pkg/R/column.R |  2 +-
 R/pkg/R/functions.R  | 36 -
 R/pkg/R/generics.R   | 14 +-
 R/pkg/R/group.R  |  1 +
 R/pkg/R/mllib.R  |  5 ++--
 R/pkg/R/pairRDD.R|  6 ++---
 R/pkg/R/stats.R  | 14 +-
 12 files changed, 110 insertions(+), 106 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b65b041a/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index aaab92f..cdb8834 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -1,5 +1,9 @@
 # Imports from base R
-importFrom(methods, setGeneric, setMethod, setOldClass)
+# Do not include stats:: "rpois", "runif" - causes error at runtime
+importFrom("methods", "setGeneric", "setMethod", "setOldClass")
+importFrom("methods", "is", "new", "signature", "show")
+importFrom("stats", "gaussian", "setNames")
+importFrom("utils", "download.file", "packageVersion", "untar")
 
 # Disable native libraries till we figure out how to package it
 # See SPARKR-7839

http://git-wip-us.apache.org/repos/asf/spark/blob/b65b041a/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 0266939..f8a05c6 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -150,7 +150,7 @@ setMethod("explain",
 
 #' isLocal
 #'
-#' Returns True if the `collect` and `take` methods can be run locally
+#' Returns True if the \code{collect} and \code{take} methods can be run 
locally
 #' (without any Spark executors).
 #'
 #' @param x A SparkDataFrame
@@ -635,10 +635,10 @@ setMethod("unpersist",
 #' The following options for repartition are possible:
 #' \itemize{
 #'  \item{1.} {Return a new SparkDataFrame partitioned by
-#'  the given columns into `numPartitions`.}
-#'  \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.}
+#'  the given columns into \code{numPartitions}.}
+#'  \item{2.} {Return a new SparkDataFrame that has exactly 
\code{numPartitions}.}
 #'  \item{3.} {Return a new SparkDataFrame partitioned by the given column(s),
-#'  using `spark.sql.shuffle.partitions` as number of 
partitions.}
+#'  using \code{spark.sql.shuffle.partitions} as number of 
partitions.}
 #'}
 #' @param x a SparkDataFrame.
 #' @param numPartitions the number of partitions to use.
@@ -1125,9 +1125,8 @@ setMethod("take",
 
 #' Head
 #'
-#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is 
NULL,
-#' then head() returns the first 6 rows in keeping with the current data.frame
-#' convention in R.
+#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If 
\code{num} is not
+#' specified, then head() returns the first 6 rows as with R data.frame.
 #'
 #' @param x a SparkDataFrame.
 #' @param num the number of rows to return. Default is 6.
@@ -1399,11 +1398,11 @@ setMethod("dapplyCollect",
 #'
 #' @param cols grouping columns.
 #' @param func a function to be applied to each group partition specified by 
grouping
-#' column of the SparkDataFrame. The function `func` takes as 
argument
+#' column of the SparkDataFrame. The function \code{func} takes as 
argument
 #' a key - grouping columns and a data frame - a local R 
data.frame.
-#' The output of `func` is a local R data.frame.
+#' The output of \code{func} is a local R data.frame.
 #'

spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes

2016-08-22 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master 84770b59f -> 71afeeea4


[SPARK-16508][SPARKR] doc updates and more CRAN check fixes

## What changes were proposed in this pull request?

replace ``` ` ``` in code doc with `\code{thing}`
remove added `...` for drop(DataFrame)
fix remaining CRAN check warnings

## How was this patch tested?

create doc with knitr

junyangq

Author: Felix Cheung 

Closes #14734 from felixcheung/rdoccleanup.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/71afeeea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/71afeeea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/71afeeea

Branch: refs/heads/master
Commit: 71afeeea4ec8e67edc95b5d504c557c88a2598b9
Parents: 84770b5
Author: Felix Cheung 
Authored: Mon Aug 22 15:53:10 2016 -0700
Committer: Felix Cheung 
Committed: Mon Aug 22 15:53:10 2016 -0700

--
 R/pkg/NAMESPACE  |  6 +++-
 R/pkg/R/DataFrame.R  | 71 +++
 R/pkg/R/RDD.R| 10 +++
 R/pkg/R/SQLContext.R | 30 ++--
 R/pkg/R/WindowSpec.R | 23 +++
 R/pkg/R/column.R |  2 +-
 R/pkg/R/functions.R  | 36 
 R/pkg/R/generics.R   | 15 +-
 R/pkg/R/group.R  |  1 +
 R/pkg/R/mllib.R  | 19 +++--
 R/pkg/R/pairRDD.R|  6 ++--
 R/pkg/R/stats.R  | 14 +-
 12 files changed, 119 insertions(+), 114 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/71afeeea/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index e1b87b2..7090576 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -1,5 +1,9 @@
 # Imports from base R
-importFrom(methods, setGeneric, setMethod, setOldClass)
+# Do not include stats:: "rpois", "runif" - causes error at runtime
+importFrom("methods", "setGeneric", "setMethod", "setOldClass")
+importFrom("methods", "is", "new", "signature", "show")
+importFrom("stats", "gaussian", "setNames")
+importFrom("utils", "download.file", "packageVersion", "untar")
 
 # Disable native libraries till we figure out how to package it
 # See SPARKR-7839

http://git-wip-us.apache.org/repos/asf/spark/blob/71afeeea/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 540dc31..52a6628 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -150,7 +150,7 @@ setMethod("explain",
 
 #' isLocal
 #'
-#' Returns True if the `collect` and `take` methods can be run locally
+#' Returns True if the \code{collect} and \code{take} methods can be run 
locally
 #' (without any Spark executors).
 #'
 #' @param x A SparkDataFrame
@@ -182,7 +182,7 @@ setMethod("isLocal",
 #' @param numRows the number of rows to print. Defaults to 20.
 #' @param truncate whether truncate long strings. If \code{TRUE}, strings more 
than
 #' 20 characters will be truncated. However, if set greater 
than zero,
-#' truncates strings longer than `truncate` characters and all 
cells
+#' truncates strings longer than \code{truncate} characters 
and all cells
 #' will be aligned right.
 #' @param ... further arguments to be passed to or from other methods.
 #' @family SparkDataFrame functions
@@ -642,10 +642,10 @@ setMethod("unpersist",
 #' The following options for repartition are possible:
 #' \itemize{
 #'  \item{1.} {Return a new SparkDataFrame partitioned by
-#'  the given columns into `numPartitions`.}
-#'  \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.}
+#'  the given columns into \code{numPartitions}.}
+#'  \item{2.} {Return a new SparkDataFrame that has exactly 
\code{numPartitions}.}
 #'  \item{3.} {Return a new SparkDataFrame partitioned by the given column(s),
-#'  using `spark.sql.shuffle.partitions` as number of 
partitions.}
+#'  using \code{spark.sql.shuffle.partitions} as number of 
partitions.}
 #'}
 #' @param x a SparkDataFrame.
 #' @param numPartitions the number of partitions to use.
@@ -1132,9 +1132,8 @@ setMethod("take",
 
 #' Head
 #'
-#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is 
NULL,
-#' then head() returns the first 6 rows in keeping with the current data.frame
-#' convention in R.
+#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If 
\code{num} is not
+#' specified, then head() returns the first 6 rows as with R data.frame.
 #'
 #' @param x a SparkDataFrame.
 #' @param num the number of rows to return. Default is 6.
@@ -1406,11 +1405,11

spark git commit: [SPARK-17162] Range does not support SQL generation

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6dcc1a3f0 -> 01a4d69f3


[SPARK-17162] Range does not support SQL generation

## What changes were proposed in this pull request?

The range operator previously didn't support SQL generation, which made it not 
possible to use in views.

## How was this patch tested?

Unit tests.

cc hvanhovell

Author: Eric Liang 

Closes #14724 from ericl/spark-17162.

(cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/01a4d69f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/01a4d69f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/01a4d69f

Branch: refs/heads/branch-2.0
Commit: 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8
Parents: 6dcc1a3
Author: Eric Liang 
Authored: Mon Aug 22 15:48:35 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 15:48:43 2016 -0700

--
 .../analysis/ResolveTableValuedFunctions.scala  | 11 --
 .../plans/logical/basicLogicalOperators.scala   | 21 +---
 .../apache/spark/sql/catalyst/SQLBuilder.scala  |  3 +++
 .../sql/execution/basicPhysicalOperators.scala  |  2 +-
 .../spark/sql/execution/command/views.scala |  3 +--
 sql/hive/src/test/resources/sqlgen/range.sql|  4 
 .../test/resources/sqlgen/range_with_splits.sql |  4 
 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 -
 8 files changed, 44 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index 7fdf7fa..6b3bb68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, 
LongType}
  * Rule that resolves table-valued function references.
  */
 object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
-  private lazy val defaultParallelism =
-SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism
-
   /**
* List of argument names and their types, used to declare a function.
*/
@@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 "range" -> Map(
   /* range(end) */
   tvf("end" -> LongType) { case Seq(end: Long) =>
-Range(0, end, 1, defaultParallelism)
+Range(0, end, 1, None)
   },
 
   /* range(start, end) */
   tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: 
Long) =>
-Range(start, end, 1, defaultParallelism)
+Range(start, end, 1, None)
   },
 
   /* range(start, end, step) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) {
 case Seq(start: Long, end: Long, step: Long) =>
-  Range(start, end, step, defaultParallelism)
+  Range(start, end, step, None)
   },
 
   /* range(start, end, step, numPartitions) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType,
   "numPartitions" -> IntegerType) {
 case Seq(start: Long, end: Long, step: Long, numPartitions: Int) =>
-  Range(start, end, step, numPartitions)
+  Range(start, end, step, Some(numPartitions))
   })
   )
 

http://git-wip-us.apache.org/repos/asf/spark/blob/01a4d69f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index eb612c4..07e39b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -422,17 +422,20 @@ case class Sort(
 
 /** Factory for constructing new `Range` nodes. */
 object Range {
-  def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
+  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range 
= {
 val

spark git commit: [SPARK-17162] Range does not support SQL generation

2016-08-22 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 929cb8bee -> 84770b59f


[SPARK-17162] Range does not support SQL generation

## What changes were proposed in this pull request?

The range operator previously didn't support SQL generation, which made it not 
possible to use in views.

## How was this patch tested?

Unit tests.

cc hvanhovell

Author: Eric Liang 

Closes #14724 from ericl/spark-17162.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/84770b59
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/84770b59
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/84770b59

Branch: refs/heads/master
Commit: 84770b59f773f132073cd2af4204957fc2d7bf35
Parents: 929cb8b
Author: Eric Liang 
Authored: Mon Aug 22 15:48:35 2016 -0700
Committer: Reynold Xin 
Committed: Mon Aug 22 15:48:35 2016 -0700

--
 .../analysis/ResolveTableValuedFunctions.scala  | 11 --
 .../plans/logical/basicLogicalOperators.scala   | 21 +---
 .../apache/spark/sql/catalyst/SQLBuilder.scala  |  3 +++
 .../sql/execution/basicPhysicalOperators.scala  |  2 +-
 .../spark/sql/execution/command/views.scala |  3 +--
 sql/hive/src/test/resources/sqlgen/range.sql|  4 
 .../test/resources/sqlgen/range_with_splits.sql |  4 
 .../sql/catalyst/LogicalPlanToSQLSuite.scala| 14 -
 8 files changed, 44 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index 7fdf7fa..6b3bb68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -28,9 +28,6 @@ import org.apache.spark.sql.types.{DataType, IntegerType, 
LongType}
  * Rule that resolves table-valued function references.
  */
 object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
-  private lazy val defaultParallelism =
-SparkContext.getOrCreate(new SparkConf(false)).defaultParallelism
-
   /**
* List of argument names and their types, used to declare a function.
*/
@@ -84,25 +81,25 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 "range" -> Map(
   /* range(end) */
   tvf("end" -> LongType) { case Seq(end: Long) =>
-Range(0, end, 1, defaultParallelism)
+Range(0, end, 1, None)
   },
 
   /* range(start, end) */
   tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: 
Long) =>
-Range(start, end, 1, defaultParallelism)
+Range(start, end, 1, None)
   },
 
   /* range(start, end, step) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) {
 case Seq(start: Long, end: Long, step: Long) =>
-  Range(start, end, step, defaultParallelism)
+  Range(start, end, step, None)
   },
 
   /* range(start, end, step, numPartitions) */
   tvf("start" -> LongType, "end" -> LongType, "step" -> LongType,
   "numPartitions" -> IntegerType) {
 case Seq(start: Long, end: Long, step: Long, numPartitions: Int) =>
-  Range(start, end, step, numPartitions)
+  Range(start, end, step, Some(numPartitions))
   })
   )
 

http://git-wip-us.apache.org/repos/asf/spark/blob/84770b59/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index af1736e..010aec7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -422,17 +422,20 @@ case class Sort(
 
 /** Factory for constructing new `Range` nodes. */
 object Range {
-  def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
+  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range 
= {
 val output = StructType(StructField("id", LongType, nullable = false) :: 
Nil).toAttributes
 new Range(start, end, step,

spark git commit: [MINOR][SQL] Fix some typos in comments and test hints

2016-08-22 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 6f3cd36f9 -> 929cb8bee


[MINOR][SQL] Fix some typos in comments and test hints

## What changes were proposed in this pull request?

Fix some typos in comments and test hints

## How was this patch tested?

N/A.

Author: Sean Zhong 

Closes #14755 from clockfly/fix_minor_typo.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/929cb8be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/929cb8be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/929cb8be

Branch: refs/heads/master
Commit: 929cb8beed9b7014231580cc002853236a5337d6
Parents: 6f3cd36
Author: Sean Zhong 
Authored: Mon Aug 22 13:31:38 2016 -0700
Committer: Yin Huai 
Committed: Mon Aug 22 13:31:38 2016 -0700

--
 .../org/apache/spark/sql/execution/UnsafeKVExternalSorter.java | 2 +-
 .../sql/execution/aggregate/TungstenAggregationIterator.scala  | 6 +++---
 sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala   | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
index eb105bd..0d51dc9 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
@@ -99,7 +99,7 @@ public final class UnsafeKVExternalSorter {
   // The array will be used to do in-place sort, which require half of the 
space to be empty.
   assert(map.numKeys() <= map.getArray().size() / 2);
   // During spilling, the array in map will not be used, so we can borrow 
that and use it
-  // as the underline array for in-memory sorter (it's always large 
enough).
+  // as the underlying array for in-memory sorter (it's always large 
enough).
   // Since we will not grow the array, it's fine to pass `null` as 
consumer.
   final UnsafeInMemorySorter inMemSorter = new UnsafeInMemorySorter(
 null, taskMemoryManager, recordComparator, prefixComparator, 
map.getArray(),

http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
index 4b8adf5..4e072a9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
@@ -32,9 +32,9 @@ import org.apache.spark.unsafe.KVIterator
  * An iterator used to evaluate aggregate functions. It operates on 
[[UnsafeRow]]s.
  *
  * This iterator first uses hash-based aggregation to process input rows. It 
uses
- * a hash map to store groups and their corresponding aggregation buffers. If 
we
- * this map cannot allocate memory from memory manager, it spill the map into 
disk
- * and create a new one. After processed all the input, then merge all the 
spills
+ * a hash map to store groups and their corresponding aggregation buffers. If
+ * this map cannot allocate memory from memory manager, it spills the map into 
disk
+ * and creates a new one. After processed all the input, then merge all the 
spills
  * together using external sorter, and do sort-based aggregation.
  *
  * The process has the following step:

http://git-wip-us.apache.org/repos/asf/spark/blob/929cb8be/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
index 484e438..c7af402 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
@@ -358,11 +358,11 @@ abstract class QueryTest extends PlanTest {
*/
   def assertEmptyMissingInput(query: Dataset[_]): Unit = {
 assert(query.queryExecution.analyzed.missingInput.isEmpty,
-  s"The analyzed logical plan has missing inputs: 
${query.queryExecution.analyzed}")
+  s"The analyzed logical plan has missing

spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 94eff0875 -> 6dcc1a3f0


[SPARKR][MINOR] Add Xiangrui and Felix to maintainers

## What changes were proposed in this pull request?

This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the 
package description.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: Shivaram Venkataraman 

Closes #14758 from shivaram/sparkr-maintainers.

(cherry picked from commit 6f3cd36f93c11265449fdce3323e139fec8ab22d)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6dcc1a3f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6dcc1a3f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6dcc1a3f

Branch: refs/heads/branch-2.0
Commit: 6dcc1a3f0cc8f2ed71f7bb6b1493852a58259d2f
Parents: 94eff08
Author: Shivaram Venkataraman 
Authored: Mon Aug 22 12:53:52 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 12:54:03 2016 -0700

--
 R/pkg/DESCRIPTION | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6dcc1a3f/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 357ab00..d81f1a3 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -5,6 +5,8 @@ Version: 2.0.0
 Date: 2016-07-07
 Author: The Apache Software Foundation
 Maintainer: Shivaram Venkataraman 
+Xiangrui Meng 
+Felix Cheung 
 Depends:
 R (>= 3.0),
 methods


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master 0583ecda1 -> 6f3cd36f9


[SPARKR][MINOR] Add Xiangrui and Felix to maintainers

## What changes were proposed in this pull request?

This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the 
package description.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: Shivaram Venkataraman 

Closes #14758 from shivaram/sparkr-maintainers.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f3cd36f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f3cd36f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f3cd36f

Branch: refs/heads/master
Commit: 6f3cd36f93c11265449fdce3323e139fec8ab22d
Parents: 0583ecd
Author: Shivaram Venkataraman 
Authored: Mon Aug 22 12:53:52 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 12:53:52 2016 -0700

--
 R/pkg/DESCRIPTION | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f3cd36f/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 357ab00..d81f1a3 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -5,6 +5,8 @@ Version: 2.0.0
 Date: 2016-07-07
 Author: The Apache Software Foundation
 Maintainer: Shivaram Venkataraman 
+Xiangrui Meng 
+Felix Cheung 
 Depends:
 R (>= 3.0),
 methods


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test

2016-08-22 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master 342278c09 -> 0583ecda1


[SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in 
test

## What changes were proposed in this pull request?

refactor, cleanup, reformat, fix deprecation in test

## How was this patch tested?

unit tests, manual tests

Author: Felix Cheung 

Closes #14735 from felixcheung/rmllibutil.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0583ecda
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0583ecda
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0583ecda

Branch: refs/heads/master
Commit: 0583ecda1b63a7e3f126c3276059e4f99548a741
Parents: 342278c
Author: Felix Cheung 
Authored: Mon Aug 22 12:27:33 2016 -0700
Committer: Felix Cheung 
Committed: Mon Aug 22 12:27:33 2016 -0700

--
 R/pkg/R/mllib.R| 205 
 R/pkg/inst/tests/testthat/test_mllib.R |  10 +-
 2 files changed, 98 insertions(+), 117 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0583ecda/R/pkg/R/mllib.R
--
diff --git a/R/pkg/R/mllib.R b/R/pkg/R/mllib.R
index 9a53c80..b36fbce 100644
--- a/R/pkg/R/mllib.R
+++ b/R/pkg/R/mllib.R
@@ -88,9 +88,9 @@ setClass("ALSModel", representation(jobj = "jobj"))
 #' @rdname write.ml
 #' @name write.ml
 #' @export
-#' @seealso \link{spark.glm}, \link{glm}, \link{spark.gaussianMixture}
-#' @seealso \link{spark.als}, \link{spark.kmeans}, \link{spark.lda}, 
\link{spark.naiveBayes}
-#' @seealso \link{spark.survreg}, \link{spark.isoreg}
+#' @seealso \link{spark.glm}, \link{glm},
+#' @seealso \link{spark.als}, \link{spark.gaussianMixture}, 
\link{spark.isoreg}, \link{spark.kmeans},
+#' @seealso \link{spark.lda}, \link{spark.naiveBayes}, \link{spark.survreg},
 #' @seealso \link{read.ml}
 NULL
 
@@ -101,11 +101,22 @@ NULL
 #' @rdname predict
 #' @name predict
 #' @export
-#' @seealso \link{spark.glm}, \link{glm}, \link{spark.gaussianMixture}
-#' @seealso \link{spark.als}, \link{spark.kmeans}, \link{spark.naiveBayes}, 
\link{spark.survreg}
-#' @seealso \link{spark.isoreg}
+#' @seealso \link{spark.glm}, \link{glm},
+#' @seealso \link{spark.als}, \link{spark.gaussianMixture}, 
\link{spark.isoreg}, \link{spark.kmeans},
+#' @seealso \link{spark.naiveBayes}, \link{spark.survreg},
 NULL
 
+write_internal <- function(object, path, overwrite = FALSE) {
+  writer <- callJMethod(object@jobj, "write")
+  if (overwrite) {
+writer <- callJMethod(writer, "overwrite")
+  }
+  invisible(callJMethod(writer, "save", path))
+}
+
+predict_internal <- function(object, newData) {
+  dataFrame(callJMethod(object@jobj, "transform", newData@sdf))
+}
 
 #' Generalized Linear Models
 #'
@@ -173,7 +184,7 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", 
formula = "formula"),
 jobj <- 
callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
 "fit", formula, data@sdf, family$family, 
family$link,
 tol, as.integer(maxIter), 
as.character(weightCol))
-return(new("GeneralizedLinearRegressionModel", jobj = jobj))
+new("GeneralizedLinearRegressionModel", jobj = jobj)
   })
 
 #' Generalized Linear Models (R-compliant)
@@ -219,7 +230,7 @@ setMethod("glm", signature(formula = "formula", family = 
"ANY", data = "SparkDat
 #' @export
 #' @note summary(GeneralizedLinearRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "GeneralizedLinearRegressionModel"),
-  function(object, ...) {
+  function(object) {
 jobj <- object@jobj
 is.loaded <- callJMethod(jobj, "isLoaded")
 features <- callJMethod(jobj, "rFeatures")
@@ -245,7 +256,7 @@ setMethod("summary", signature(object = 
"GeneralizedLinearRegressionModel"),
 deviance = deviance, df.null = df.null, df.residual = 
df.residual,
 aic = aic, iter = iter, family = family, is.loaded = 
is.loaded)
 class(ans) <- "summary.GeneralizedLinearRegressionModel"
-return(ans)
+ans
   })
 
 #  Prints the summary of GeneralizedLinearRegressionModel
@@ -275,8 +286,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
 " on", format(unlist(x[c("df.null", "df.residual")])), " degrees of 
freedom\n"),
 1L, paste, collapse = " "), sep = "")
   cat("AIC: ", format(x$aic, digits = 4L), "\n\n",
-"Number of Fisher Scoring iterations: ", x$iter, "\n", sep = "")
-  cat("\n")
+"Number of Fisher Scoring iterations: ", x$iter, "\n\n", sep = "")
   invisible(x)
   }
 
@@ -291,7

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

2016-08-22 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 79195982a -> 94eff0875


[SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

## What changes were proposed in this pull request?

Collect GC discussion in one section, and documenting findings about G1 GC heap 
region size.

## How was this patch tested?

Jekyll doc build

Author: Sean Owen 

Closes #14732 from srowen/SPARK-16320.

(cherry picked from commit 342278c09cf6e79ed4f63422988a6bbd1e7d8a91)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94eff087
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94eff087
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94eff087

Branch: refs/heads/branch-2.0
Commit: 94eff08757cee70c5b31fff7095bbb1e6ebc7ecf
Parents: 7919598
Author: Sean Owen 
Authored: Mon Aug 22 11:15:53 2016 -0700
Committer: Yin Huai 
Committed: Mon Aug 22 11:16:11 2016 -0700

--
 docs/tuning.md | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/94eff087/docs/tuning.md
--
diff --git a/docs/tuning.md b/docs/tuning.md
index 976f2eb..cbf3721 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -122,21 +122,8 @@ large records.
 `R` is the storage space within `M` where cached blocks immune to being 
evicted by execution.
 
 The value of `spark.memory.fraction` should be set in order to fit this amount 
of heap space
-comfortably within the JVM's old or "tenured" generation. Otherwise, when much 
of this space is
-used for caching and execution, the tenured generation will be full, which 
causes the JVM to
-significantly increase time spent in garbage collection. See
-https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html;>Java
 GC sizing documentation
-for more information.
-
-The tenured generation size is controlled by the JVM's `NewRatio` parameter, 
which defaults to 2,
-meaning that the tenured generation is 2 times the size of the new generation 
(the rest of the heap).
-So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. 
A value of
-0.6 for `spark.memory.fraction` keeps storage and execution memory within the 
old generation with
-room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then 
`NewRatio` may have to
-increase to 6 or more.
-
-`NewRatio` is set as a JVM flag for executors, which means adding
-`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's 
configuration.
+comfortably within the JVM's old or "tenured" generation. See the discussion 
of advanced GC
+tuning below for details.
 
 ## Determining Memory Consumption
 
@@ -217,14 +204,22 @@ temporary objects created during task execution. Some 
steps which may be useful
 * Check if there are too many garbage collections by collecting GC stats. If a 
full GC is invoked multiple times for
   before a task completes, it means that there isn't enough memory available 
for executing tasks.
 
-* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
-  memory used for caching by lowering `spark.memory.storageFraction`; it is 
better to cache fewer
-  objects than to slow down task execution!
-
 * If there are too many minor collections but not many major GCs, allocating 
more memory for Eden would help. You
   can set the size of the Eden to be an over-estimate of how much memory each 
task will need. If the size of Eden
   is determined to be `E`, then you can set the size of the Young generation 
using the option `-Xmn=4/3*E`. (The scaling
   up by 4/3 is to account for space used by survivor regions as well.)
+  
+* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
+  memory used for caching by lowering `spark.memory.fraction`; it is better to 
cache fewer
+  objects than to slow down task execution. Alternatively, consider decreasing 
the size of
+  the Young generation. This means lowering `-Xmn` if you've set it as above. 
If not, try changing the 
+  value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, 
meaning that the Old generation 
+  occupies 2/3 of the heap. It should be large enough such that this fraction 
exceeds `spark.memory.fraction`.
+  
+* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve 
performance in some situations where
+  garbage collection is a bottleneck. Note that with large executor heap 
sizes, it may be important to
+  increase the [G1 region 
size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) 
+  with `-XX:G1HeapRegionSize`
 
 * As an

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

2016-08-22 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 209e1b3c0 -> 342278c09


[SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

## What changes were proposed in this pull request?

Collect GC discussion in one section, and documenting findings about G1 GC heap 
region size.

## How was this patch tested?

Jekyll doc build

Author: Sean Owen 

Closes #14732 from srowen/SPARK-16320.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/342278c0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/342278c0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/342278c0

Branch: refs/heads/master
Commit: 342278c09cf6e79ed4f63422988a6bbd1e7d8a91
Parents: 209e1b3
Author: Sean Owen 
Authored: Mon Aug 22 11:15:53 2016 -0700
Committer: Yin Huai 
Committed: Mon Aug 22 11:15:53 2016 -0700

--
 docs/tuning.md | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/342278c0/docs/tuning.md
--
diff --git a/docs/tuning.md b/docs/tuning.md
index 976f2eb..cbf3721 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -122,21 +122,8 @@ large records.
 `R` is the storage space within `M` where cached blocks immune to being 
evicted by execution.
 
 The value of `spark.memory.fraction` should be set in order to fit this amount 
of heap space
-comfortably within the JVM's old or "tenured" generation. Otherwise, when much 
of this space is
-used for caching and execution, the tenured generation will be full, which 
causes the JVM to
-significantly increase time spent in garbage collection. See
-https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html;>Java
 GC sizing documentation
-for more information.
-
-The tenured generation size is controlled by the JVM's `NewRatio` parameter, 
which defaults to 2,
-meaning that the tenured generation is 2 times the size of the new generation 
(the rest of the heap).
-So, by default, the tenured generation occupies 2/3 or about 0.66 of the heap. 
A value of
-0.6 for `spark.memory.fraction` keeps storage and execution memory within the 
old generation with
-room to spare. If `spark.memory.fraction` is increased to, say, 0.8, then 
`NewRatio` may have to
-increase to 6 or more.
-
-`NewRatio` is set as a JVM flag for executors, which means adding
-`spark.executor.extraJavaOptions=-XX:NewRatio=x` to a Spark job's 
configuration.
+comfortably within the JVM's old or "tenured" generation. See the discussion 
of advanced GC
+tuning below for details.
 
 ## Determining Memory Consumption
 
@@ -217,14 +204,22 @@ temporary objects created during task execution. Some 
steps which may be useful
 * Check if there are too many garbage collections by collecting GC stats. If a 
full GC is invoked multiple times for
   before a task completes, it means that there isn't enough memory available 
for executing tasks.
 
-* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
-  memory used for caching by lowering `spark.memory.storageFraction`; it is 
better to cache fewer
-  objects than to slow down task execution!
-
 * If there are too many minor collections but not many major GCs, allocating 
more memory for Eden would help. You
   can set the size of the Eden to be an over-estimate of how much memory each 
task will need. If the size of Eden
   is determined to be `E`, then you can set the size of the Young generation 
using the option `-Xmn=4/3*E`. (The scaling
   up by 4/3 is to account for space used by survivor regions as well.)
+  
+* In the GC stats that are printed, if the OldGen is close to being full, 
reduce the amount of
+  memory used for caching by lowering `spark.memory.fraction`; it is better to 
cache fewer
+  objects than to slow down task execution. Alternatively, consider decreasing 
the size of
+  the Young generation. This means lowering `-Xmn` if you've set it as above. 
If not, try changing the 
+  value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, 
meaning that the Old generation 
+  occupies 2/3 of the heap. It should be large enough such that this fraction 
exceeds `spark.memory.fraction`.
+  
+* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve 
performance in some situations where
+  garbage collection is a bottleneck. Note that with large executor heap 
sizes, it may be important to
+  increase the [G1 region 
size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) 
+  with `-XX:G1HeapRegionSize`
 
 * As an example, if your task is reading data from HDFS, the amount of memory 
used by the task can be estimated using
   the size

spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2add45fab -> 79195982a


[SPARKR][MINOR] Fix Cache Folder Path in Windows

## What changes were proposed in this pull request?

This PR tries to fix the scheme of local cache folder in Windows. The name of 
the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`.

## How was this patch tested?

Manual test in Windows 7.

Author: Junyang Qian 

Closes #14743 from junyangq/SPARKR-FixWindowsInstall.

(cherry picked from commit 209e1b3c0683a9106428e269e5041980b6cc327f)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79195982
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79195982
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79195982

Branch: refs/heads/branch-2.0
Commit: 79195982a4c6f8b1a3e02069dea00049cc806574
Parents: 2add45f
Author: Junyang Qian 
Authored: Mon Aug 22 10:03:48 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 10:03:59 2016 -0700

--
 R/pkg/R/install.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/79195982/R/pkg/R/install.R
--
diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R
index 987bac7..ff81e86 100644
--- a/R/pkg/R/install.R
+++ b/R/pkg/R/install.R
@@ -212,7 +212,7 @@ hadoop_version_name <- function(hadoopVersion) {
 # adapt to Spark context
 spark_cache_path <- function() {
   if (.Platform$OS.type == "windows") {
-winAppPath <- Sys.getenv("%LOCALAPPDATA%", unset = NA)
+winAppPath <- Sys.getenv("LOCALAPPDATA", unset = NA)
 if (is.na(winAppPath)) {
   msg <- paste("%LOCALAPPDATA% not found.",
"Please define the environment variable",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows

2016-08-22 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master b264cbb16 -> 209e1b3c0


[SPARKR][MINOR] Fix Cache Folder Path in Windows

## What changes were proposed in this pull request?

This PR tries to fix the scheme of local cache folder in Windows. The name of 
the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`.

## How was this patch tested?

Manual test in Windows 7.

Author: Junyang Qian 

Closes #14743 from junyangq/SPARKR-FixWindowsInstall.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/209e1b3c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/209e1b3c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/209e1b3c

Branch: refs/heads/master
Commit: 209e1b3c0683a9106428e269e5041980b6cc327f
Parents: b264cbb
Author: Junyang Qian 
Authored: Mon Aug 22 10:03:48 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 22 10:03:48 2016 -0700

--
 R/pkg/R/install.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/209e1b3c/R/pkg/R/install.R
--
diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R
index 987bac7..ff81e86 100644
--- a/R/pkg/R/install.R
+++ b/R/pkg/R/install.R
@@ -212,7 +212,7 @@ hadoop_version_name <- function(hadoopVersion) {
 # adapt to Spark context
 spark_cache_path <- function() {
   if (.Platform$OS.type == "windows") {
-winAppPath <- Sys.getenv("%LOCALAPPDATA%", unset = NA)
+winAppPath <- Sys.getenv("LOCALAPPDATA", unset = NA)
 if (is.na(winAppPath)) {
   msg <- paste("%LOCALAPPDATA% not found.",
"Please define the environment variable",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15113][PYSPARK][ML] Add missing num features num classes

2016-08-22 Thread mlnick

Repository: spark
Updated Branches:
  refs/heads/master bd9655063 -> b264cbb16


[SPARK-15113][PYSPARK][ML] Add missing num features num classes

## What changes were proposed in this pull request?

Add missing `numFeatures` and `numClasses` to the wrapped Java models in 
PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as 
Expiremental to match Scala doc.

## How was this patch tested?

Extended doctests

Author: Holden Karau 

Closes #12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b264cbb1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b264cbb1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b264cbb1

Branch: refs/heads/master
Commit: b264cbb16fb97116e630fb593adf5898a5a0e8fa
Parents: bd96550
Author: Holden Karau 
Authored: Mon Aug 22 12:21:22 2016 +0200
Committer: Nick Pentreath 
Committed: Mon Aug 22 12:21:22 2016 +0200

--
 .../GeneralizedLinearRegression.scala   |  2 ++
 python/pyspark/ml/classification.py | 37 
 python/pyspark/ml/regression.py | 22 +---
 python/pyspark/ml/util.py   | 16 +
 4 files changed, 66 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b264cbb1/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
index 2bdc09e..1d4dfd1 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
@@ -788,6 +788,8 @@ class GeneralizedLinearRegressionModel private[ml] (
   @Since("2.0.0")
   override def write: MLWriter =
 new 
GeneralizedLinearRegressionModel.GeneralizedLinearRegressionModelWriter(this)
+
+  override val numFeatures: Int = coefficients.size
 }
 
 @Since("2.0.0")

http://git-wip-us.apache.org/repos/asf/spark/blob/b264cbb1/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 6468007..33ada27 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -44,6 +44,23 @@ __all__ = ['LogisticRegression', 'LogisticRegressionModel',
 
 
 @inherit_doc
+class JavaClassificationModel(JavaPredictionModel):
+"""
+(Private) Java Model produced by a ``Classifier``.
+Classes are indexed {0, 1, ..., numClasses - 1}.
+To be mixed in with class:`pyspark.ml.JavaModel`
+"""
+
+@property
+@since("2.1.0")
+def numClasses(self):
+"""
+Number of classes (values which the label can take).
+"""
+return self._call_java("numClasses")
+
+
+@inherit_doc
 class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
  HasRegParam, HasTol, HasProbabilityCol, 
HasRawPredictionCol,
  HasElasticNetParam, HasFitIntercept, 
HasStandardization, HasThresholds,
@@ -212,7 +229,7 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPredicti
  " threshold (%g) and thresholds (equivalent 
to %g)" % (t2, t))
 
 
-class LogisticRegressionModel(JavaModel, JavaMLWritable, JavaMLReadable):
+class LogisticRegressionModel(JavaModel, JavaClassificationModel, 
JavaMLWritable, JavaMLReadable):
 """
 Model fitted by LogisticRegression.
 
@@ -522,6 +539,10 @@ class DecisionTreeClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 1
 >>> model.featureImportances
 SparseVector(1, {0: 1.0})
+>>> model.numFeatures
+1
+>>> model.numClasses
+2
 >>> print(model.toDebugString)
 DecisionTreeClassificationModel (uid=...) of depth 1 with 3 nodes...
 >>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
@@ -595,7 +616,8 @@ class DecisionTreeClassifier(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPred
 
 
 @inherit_doc
-class DecisionTreeClassificationModel(DecisionTreeModel, JavaMLWritable, 
JavaMLReadable):
+class DecisionTreeClassificationModel(DecisionTreeModel, 
JavaClassificationModel, JavaMLWritable,
+  JavaMLReadable):
 """
 Model fitted by DecisionTreeClassifier.
 
@@ -722,7 +744,8 @@ class RandomForestClassifier(JavaEstimator,

spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]

2016-08-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 8d35a6f68 -> bd9655063


[SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED 
OPERATIONS]

Changes in  Spark Stuctured Streaming doc in this link
https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations

Author: Jagadeesan 

Closes #14715 from jagadeesanas2/SPARK-17085.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd965506
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd965506
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd965506

Branch: refs/heads/master
Commit: bd9655063bdba8836b4ec96ed115e5653e246b65
Parents: 8d35a6f
Author: Jagadeesan 
Authored: Mon Aug 22 09:30:31 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 22 09:30:31 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bd965506/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index e2c881b..226ff74 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -726,9 +726,9 @@ However, note that all of the operations applicable on 
static DataFrames/Dataset
 
 + Full outer join with a streaming Dataset is not supported
 
-+ Left outer join with a streaming Dataset on the left is not supported
++ Left outer join with a streaming Dataset on the right is not supported
 
-+ Right outer join with a streaming Dataset on the right is not supported
++ Right outer join with a streaming Dataset on the left is not supported
 
 - Any kind of joins between two streaming Datasets are not yet supported.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]

2016-08-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 49cc44de3 -> 2add45fab


[SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED 
OPERATIONS]

Changes in  Spark Stuctured Streaming doc in this link
https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations

Author: Jagadeesan 

Closes #14715 from jagadeesanas2/SPARK-17085.

(cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2add45fa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2add45fa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2add45fa

Branch: refs/heads/branch-2.0
Commit: 2add45fabeb0ea4f7b17b5bc4910161370e72627
Parents: 49cc44d
Author: Jagadeesan 
Authored: Mon Aug 22 09:30:31 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 22 09:30:41 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2add45fa/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 811e8c4..a2f1ee2 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -726,9 +726,9 @@ However, note that all of the operations applicable on 
static DataFrames/Dataset
 
 + Full outer join with a streaming Dataset is not supported
 
-+ Left outer join with a streaming Dataset on the left is not supported
++ Left outer join with a streaming Dataset on the right is not supported
 
-+ Right outer join with a streaming Dataset on the right is not supported
++ Right outer join with a streaming Dataset on the left is not supported
 
 - Any kind of joins between two streaming Datasets are not yet supported.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions

2016-08-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 e62b29f29 -> 49cc44de3


[SPARK-17115][SQL] decrease the threshold when split expressions

## What changes were proposed in this pull request?

In 2.0, we change the threshold of splitting expressions from 16K to 64K, which 
cause very bad performance on wide table, because the generated method can't be 
JIT compiled by default (above the limit of 8K bytecode).

This PR will decrease it to 1K, based on the benchmark results for a wide table 
with 400 columns of LongType.

It also fix a bug around splitting expression in whole-stage codegen (it should 
not split them).

## How was this patch tested?

Added benchmark suite.

Author: Davies Liu 

Closes #14692 from davies/split_exprs.

(cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/49cc44de
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/49cc44de
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/49cc44de

Branch: refs/heads/branch-2.0
Commit: 49cc44de3ad5495b2690633791941aa00a62b553
Parents: e62b29f
Author: Davies Liu 
Authored: Mon Aug 22 16:16:03 2016 +0800
Committer: Wenchen Fan 
Committed: Mon Aug 22 16:16:36 2016 +0800

--
 .../expressions/codegen/CodeGenerator.scala |  9 ++--
 .../execution/aggregate/HashAggregateExec.scala |  2 -
 .../benchmark/BenchmarkWideTable.scala  | 53 
 3 files changed, 59 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 16fb1f6..4bd9ee0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -584,15 +584,18 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
*/
   def splitExpressions(row: String, expressions: Seq[String]): String = {
-if (row == null) {
+if (row == null || currentVars != null) {
   // Cannot split these expressions because they are not created from a 
row object.
   return expressions.mkString("\n")
 }
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
 for (code <- expressions) {
-  // We can't know how many byte code will be generated, so use the number 
of bytes as limit
-  if (blockBuilder.length > 64 * 1000) {
+  // We can't know how many bytecode will be generated, so use the length 
of source code
+  // as metric. A method should not go beyond 8K, otherwise it will not be 
JITted, should
+  // also not be too small, or it will have many function calls (for wide 
table), see the
+  // results in BenchmarkWideTable.
+  if (blockBuilder.length > 1024) {
 blocks.append(blockBuilder.toString())
 blockBuilder.clear()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index cfc47ab..bd7efa6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -603,8 +603,6 @@ case class HashAggregateExec(
 
 // create grouping key
 ctx.currentVars = input
-// make sure that the generated code will not be splitted as multiple 
functions
-ctx.INPUT_ROW = null
 val unsafeRowKeyCode = GenerateUnsafeProjection.createCode(
   ctx, groupingExpressions.map(e => 
BindReferences.bindReference[Expression](e, child.output)))
 val vectorizedRowKeys = ctx.generateExpressions(

http://git-wip-us.apache.org/repos/asf/spark/blob/49cc44de/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
--
diff --git

spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions

2016-08-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 4b6c2cbcb -> 8d35a6f68


[SPARK-17115][SQL] decrease the threshold when split expressions

## What changes were proposed in this pull request?

In 2.0, we change the threshold of splitting expressions from 16K to 64K, which 
cause very bad performance on wide table, because the generated method can't be 
JIT compiled by default (above the limit of 8K bytecode).

This PR will decrease it to 1K, based on the benchmark results for a wide table 
with 400 columns of LongType.

It also fix a bug around splitting expression in whole-stage codegen (it should 
not split them).

## How was this patch tested?

Added benchmark suite.

Author: Davies Liu 

Closes #14692 from davies/split_exprs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d35a6f6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d35a6f6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d35a6f6

Branch: refs/heads/master
Commit: 8d35a6f68d6d733212674491cbf31bed73fada0f
Parents: 4b6c2cb
Author: Davies Liu 
Authored: Mon Aug 22 16:16:03 2016 +0800
Committer: Wenchen Fan 
Committed: Mon Aug 22 16:16:03 2016 +0800

--
 .../expressions/codegen/CodeGenerator.scala |  9 ++--
 .../execution/aggregate/HashAggregateExec.scala |  2 -
 .../benchmark/BenchmarkWideTable.scala  | 53 
 3 files changed, 59 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 16fb1f6..4bd9ee0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -584,15 +584,18 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
*/
   def splitExpressions(row: String, expressions: Seq[String]): String = {
-if (row == null) {
+if (row == null || currentVars != null) {
   // Cannot split these expressions because they are not created from a 
row object.
   return expressions.mkString("\n")
 }
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
 for (code <- expressions) {
-  // We can't know how many byte code will be generated, so use the number 
of bytes as limit
-  if (blockBuilder.length > 64 * 1000) {
+  // We can't know how many bytecode will be generated, so use the length 
of source code
+  // as metric. A method should not go beyond 8K, otherwise it will not be 
JITted, should
+  // also not be too small, or it will have many function calls (for wide 
table), see the
+  // results in BenchmarkWideTable.
+  if (blockBuilder.length > 1024) {
 blocks.append(blockBuilder.toString())
 blockBuilder.clear()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index cfc47ab..bd7efa6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -603,8 +603,6 @@ case class HashAggregateExec(
 
 // create grouping key
 ctx.currentVars = input
-// make sure that the generated code will not be splitted as multiple 
functions
-ctx.INPUT_ROW = null
 val unsafeRowKeyCode = GenerateUnsafeProjection.createCode(
   ctx, groupingExpressions.map(e => 
BindReferences.bindReference[Expression](e, child.output)))
 val vectorizedRowKeys = ctx.generateExpressions(

http://git-wip-us.apache.org/repos/asf/spark/blob/8d35a6f6/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
new file mode

spark git commit: [SPARK-16968] Document additional options in jdbc Writer

2016-08-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 083de00cb -> 4b6c2cbcb


[SPARK-16968] Document additional options in jdbc Writer

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
This is the document for previous JDBC Writer options.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
Unit test has been added in previous PR.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: GraceH 

Closes #14683 from GraceH/jdbc_options.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b6c2cbc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b6c2cbc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b6c2cbc

Branch: refs/heads/master
Commit: 4b6c2cbcb109c7cef6087bae32d87cc3ddb69cf9
Parents: 083de00
Author: GraceH 
Authored: Mon Aug 22 09:03:46 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 22 09:03:46 2016 +0100

--
 docs/sql-programming-guide.md | 14 ++
 1 file changed, 14 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b6c2cbc/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index c89286d..28cc88c 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1058,6 +1058,20 @@ the Data Sources API. The following options are 
supported:
   The JDBC fetch size, which determines how many rows to fetch per round 
trip. This can help performance on JDBC drivers which default to low fetch size 
(eg. Oracle with 10 rows).
 
   
+  
+  
+truncate
+
+ This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g. indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. 
+   
+  
+  
+  
+createTableOptions
+
+ This is a JDBC writer related option. If specified, this option allows 
setting of database-specific table and partition options when creating a table. 
For example: CREATE TABLE t (name string) ENGINE=InnoDB.
+   
+  
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17127] Make unaligned access in unsafe available for AArch64

2016-08-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master b2074b664 -> 083de00cb


[SPARK-17127] Make unaligned access in unsafe available for AArch64

## # What changes were proposed in this pull request?

>From the spark of version 2.0.0 , when MemoryMode.OFF_HEAP is set , whether 
>the architecture supports unaligned access or not is checked. If the check 
>doesn't pass, exception is raised.

We know that AArch64 also supports unaligned access , but now only i386, x86, 
amd64, and X86_64 are included.

I think we should include aarch64 when performing the check.

## How was this patch tested?

Unit test suite

Author: Richael 

Closes #14700 from yimuxi/zym_change_unsafe.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/083de00c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/083de00c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/083de00c

Branch: refs/heads/master
Commit: 083de00cb608a7414aae99a639825482bebfea8a
Parents: b2074b6
Author: Richael 
Authored: Mon Aug 22 09:01:50 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 22 09:01:50 2016 +0100

--
 common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/083de00c/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
--
diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
index a2ee45c..c892b9c 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
@@ -55,7 +55,7 @@ public final class Platform {
   // We at least know x86 and x64 support unaligned access.
   String arch = System.getProperty("os.arch", "");
   //noinspection DynamicRegexReplaceableByCompiledPattern
-  _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64)$");
+  _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$");
 }
 unaligned = _unaligned;
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARKR][MINOR] Update R DESCRIPTION file

spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic

spark git commit: [SPARK-17182][SQL] Mark Collect as non-deterministic

spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

spark git commit: [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh

spark git commit: [SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen

spark git commit: [SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan

spark git commit: [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication

spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes

spark git commit: [SPARK-16508][SPARKR] doc updates and more CRAN check fixes

spark git commit: [SPARK-17162] Range does not support SQL generation

spark git commit: [SPARK-17162] Range does not support SQL generation

spark git commit: [MINOR][SQL] Fix some typos in comments and test hints

spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers

spark git commit: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers

spark git commit: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

spark git commit: [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6

spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows

spark git commit: [SPARKR][MINOR] Fix Cache Folder Path in Windows

spark git commit: [SPARK-15113][PYSPARK][ML] Add missing num features num classes

spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]

spark git commit: [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS]

spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions

spark git commit: [SPARK-17115][SQL] decrease the threshold when split expressions

spark git commit: [SPARK-16968] Document additional options in jdbc Writer

spark git commit: [SPARK-17127] Make unaligned access in unsafe available for AArch64

27 matches

Site Navigation

Mail list logo

Footer information