date:20180412

spark git commit: [SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to write output on multi level partition

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 2995b79d6 -> dfdf1bb9b


[SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to 
write output on multi level partition

## What changes were proposed in this pull request?

Spark introduced new writer mode to overwrite only related partitions in 
SPARK-20236. While we are using this feature in our production cluster, we 
found a bug when writing multi-level partitions on HDFS.

A simple test case to reproduce this issue:
val df = Seq(("1","2","3")).toDF("col1", "col2","col3")
df.write.partitionBy("col1","col2").mode("overwrite").save("/my/hdfs/location")

If HDFS location "/my/hdfs/location" does not exist, there will be no output.

This seems to be caused by the job commit change in SPARK-20236 in 
HadoopMapReduceCommitProtocol.

In the commit job process, the output has been written into staging dir 
/my/hdfs/location/.spark-staging.xxx/col1=1/col2=2, and then the code calls 
fs.rename to rename /my/hdfs/location/.spark-staging.xxx/col1=1/col2=2 to 
/my/hdfs/location/col1=1/col2=2. However, in our case the operation will fail 
on HDFS because /my/hdfs/location/col1=1 does not exists. HDFS rename can not 
create directory for more than one level.

This does not happen in the new unit test added with SPARK-20236 which uses 
local file system.

We are proposing a fix. When cleaning current partition dir 
/my/hdfs/location/col1=1/col2=2 before the rename op, if the delete op fails 
(because /my/hdfs/location/col1=1/col2=2 may not exist), we call mkdirs op to 
create the parent dir /my/hdfs/location/col1=1 (if the parent dir does not 
exist) so the following rename op can succeed.

Reference: in official HDFS 
document(https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html),
 the rename command has precondition "dest must be root, or have a parent that 
exists"

## How was this patch tested?

We have tested this patch on our production cluster and it fixed the problem

Author: Fangshi Li 

Closes #20931 from fangshil/master.

(cherry picked from commit 4b07036799b01894826b47c73142fe282c607a57)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dfdf1bb9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dfdf1bb9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dfdf1bb9

Branch: refs/heads/branch-2.3
Commit: dfdf1bb9be19bd31e398f97310391b391fabfcfd
Parents: 2995b79
Author: Fangshi Li 
Authored: Fri Apr 13 13:46:34 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Apr 13 13:47:31 2018 +0800

--
 .../internal/io/HadoopMapReduceCommitProtocol.scala | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dfdf1bb9/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
index 6d20ef1..3e60c50 100644
--- 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
+++ 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
@@ -186,7 +186,17 @@ class HadoopMapReduceCommitProtocol(
 logDebug(s"Clean up default partition directories for overwriting: 
$partitionPaths")
 for (part <- partitionPaths) {
   val finalPartPath = new Path(path, part)
-  fs.delete(finalPartPath, true)
+  if (!fs.delete(finalPartPath, true) && 
!fs.exists(finalPartPath.getParent)) {
+// According to the official hadoop FileSystem API spec, delete op 
should assume
+// the destination is no longer present regardless of return 
value, thus we do not
+// need to double check if finalPartPath exists before rename.
+// Also in our case, based on the spec, delete returns false only 
when finalPartPath
+// does not exist. When this happens, we need to take action if 
parent of finalPartPath
+// also does not exist(e.g. the scenario described on 
SPARK-23815), because
+// FileSystem API spec on rename op says the rename 
dest(finalPartPath) must have
+// a parent that exists, otherwise we may get unexpected result on 
the rename.
+fs.mkdirs(finalPartPath.getParent)
+  }
   fs.rename(new Path(stagingDir, part), finalPartPath)
 }
   }


-
To unsubscribe,

spark git commit: [SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to write output on multi level partition

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 1018be44d -> 4b0703679


[SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to 
write output on multi level partition

## What changes were proposed in this pull request?

Spark introduced new writer mode to overwrite only related partitions in 
SPARK-20236. While we are using this feature in our production cluster, we 
found a bug when writing multi-level partitions on HDFS.

A simple test case to reproduce this issue:
val df = Seq(("1","2","3")).toDF("col1", "col2","col3")
df.write.partitionBy("col1","col2").mode("overwrite").save("/my/hdfs/location")

If HDFS location "/my/hdfs/location" does not exist, there will be no output.

This seems to be caused by the job commit change in SPARK-20236 in 
HadoopMapReduceCommitProtocol.

In the commit job process, the output has been written into staging dir 
/my/hdfs/location/.spark-staging.xxx/col1=1/col2=2, and then the code calls 
fs.rename to rename /my/hdfs/location/.spark-staging.xxx/col1=1/col2=2 to 
/my/hdfs/location/col1=1/col2=2. However, in our case the operation will fail 
on HDFS because /my/hdfs/location/col1=1 does not exists. HDFS rename can not 
create directory for more than one level.

This does not happen in the new unit test added with SPARK-20236 which uses 
local file system.

We are proposing a fix. When cleaning current partition dir 
/my/hdfs/location/col1=1/col2=2 before the rename op, if the delete op fails 
(because /my/hdfs/location/col1=1/col2=2 may not exist), we call mkdirs op to 
create the parent dir /my/hdfs/location/col1=1 (if the parent dir does not 
exist) so the following rename op can succeed.

Reference: in official HDFS 
document(https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html),
 the rename command has precondition "dest must be root, or have a parent that 
exists"

## How was this patch tested?

We have tested this patch on our production cluster and it fixed the problem

Author: Fangshi Li 

Closes #20931 from fangshil/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b070367
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b070367
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b070367

Branch: refs/heads/master
Commit: 4b07036799b01894826b47c73142fe282c607a57
Parents: 1018be4
Author: Fangshi Li 
Authored: Fri Apr 13 13:46:34 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Apr 13 13:46:34 2018 +0800

--
 .../internal/io/HadoopMapReduceCommitProtocol.scala | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b070367/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
index 6d20ef1..3e60c50 100644
--- 
a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
+++ 
b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
@@ -186,7 +186,17 @@ class HadoopMapReduceCommitProtocol(
 logDebug(s"Clean up default partition directories for overwriting: 
$partitionPaths")
 for (part <- partitionPaths) {
   val finalPartPath = new Path(path, part)
-  fs.delete(finalPartPath, true)
+  if (!fs.delete(finalPartPath, true) && 
!fs.exists(finalPartPath.getParent)) {
+// According to the official hadoop FileSystem API spec, delete op 
should assume
+// the destination is no longer present regardless of return 
value, thus we do not
+// need to double check if finalPartPath exists before rename.
+// Also in our case, based on the spec, delete returns false only 
when finalPartPath
+// does not exist. When this happens, we need to take action if 
parent of finalPartPath
+// also does not exist(e.g. the scenario described on 
SPARK-23815), because
+// FileSystem API spec on rename op says the rename 
dest(finalPartPath) must have
+// a parent that exists, otherwise we may get unexpected result on 
the rename.
+fs.mkdirs(finalPartPath.getParent)
+  }
   fs.rename(new Path(stagingDir, part), finalPartPath)
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23971] Should not leak Spark sessions across test suites

2018-04-12 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master ab7b961a4 -> 1018be44d


[SPARK-23971] Should not leak Spark sessions across test suites

## What changes were proposed in this pull request?

Many suites currently leak Spark sessions (sometimes with stopped 
SparkContexts) via the thread-local active Spark session and default Spark 
session. We should attempt to clean these up and detect when this happens to 
improve the reproducibility of tests.

## How was this patch tested?

Existing tests

Author: Eric Liang 

Closes #21058 from ericl/clear-session.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1018be44
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1018be44
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1018be44

Branch: refs/heads/master
Commit: 1018be44d6c52cf18e14d84160850063f0e60a1d
Parents: ab7b961
Author: Eric Liang 
Authored: Thu Apr 12 22:30:59 2018 -0700
Committer: gatorsmile 
Committed: Thu Apr 12 22:30:59 2018 -0700

--
 .../org/apache/spark/SharedSparkSession.java|  9 ++--
 .../org/apache/spark/sql/SparkSession.scala | 23 ++--
 .../apache/spark/sql/SessionStateSuite.scala|  2 ++
 .../spark/sql/test/SharedSparkSession.scala | 22 ++-
 4 files changed, 47 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1018be44/mllib/src/test/java/org/apache/spark/SharedSparkSession.java
--
diff --git a/mllib/src/test/java/org/apache/spark/SharedSparkSession.java 
b/mllib/src/test/java/org/apache/spark/SharedSparkSession.java
index 4377987..35a2509 100644
--- a/mllib/src/test/java/org/apache/spark/SharedSparkSession.java
+++ b/mllib/src/test/java/org/apache/spark/SharedSparkSession.java
@@ -42,7 +42,12 @@ public abstract class SharedSparkSession implements 
Serializable {
 
   @After
   public void tearDown() {
-spark.stop();
-spark = null;
+try {
+  spark.stop();
+  spark = null;
+} finally {
+  SparkSession.clearDefaultSession();
+  SparkSession.clearActiveSession();
+}
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/1018be44/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index b107492..c502e58 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -44,7 +44,7 @@ import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.streaming._
 import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
-import org.apache.spark.util.Utils
+import org.apache.spark.util.{CallSite, Utils}
 
 
 /**
@@ -81,6 +81,9 @@ class SparkSession private(
 @transient private[sql] val extensions: SparkSessionExtensions)
   extends Serializable with Closeable with Logging { self =>
 
+  // The call site where this SparkSession was constructed.
+  private val creationSite: CallSite = Utils.getCallSite()
+
   private[sql] def this(sc: SparkContext) {
 this(sc, None, None, new SparkSessionExtensions)
   }
@@ -763,7 +766,7 @@ class SparkSession private(
 
 
 @InterfaceStability.Stable
-object SparkSession {
+object SparkSession extends Logging {
 
   /**
* Builder for [[SparkSession]].
@@ -1090,4 +1093,20 @@ object SparkSession {
 }
   }
 
+  private[spark] def cleanupAnyExistingSession(): Unit = {
+val session = getActiveSession.orElse(getDefaultSession)
+if (session.isDefined) {
+  logWarning(
+s"""An existing Spark session exists as the active or default session.
+   |This probably means another suite leaked it. Attempting to stop it 
before continuing.
+   |This existing Spark session was created at:
+   |
+   |${session.get.creationSite.longForm}
+   |
+ """.stripMargin)
+  session.get.stop()
+  SparkSession.clearActiveSession()
+  SparkSession.clearDefaultSession()
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/1018be44/sql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala
index 4efae4c..7d13660 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala
+++

svn commit: r26318 - in /dev/spark/2.3.1-SNAPSHOT-2018_04_12_22_02-2995b79-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-04-12 Thread pwendell

Author: pwendell
Date: Fri Apr 13 05:17:16 2018
New Revision: 26318

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_04_12_22_02-2995b79 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

2018-04-12 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 908c681c6 -> 2995b79d6


[SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

## What changes were proposed in this pull request?

Current SS continuous doesn't support processing on temp table or 
`df.as("xxx")`, SS will throw an exception as LogicalPlan not supported, 
details described in [here](https://issues.apache.org/jira/browse/SPARK-23748).

So here propose to add this support.

## How was this patch tested?

new UT.

Author: jerryshao 

Closes #21017 from jerryshao/SPARK-23748.

(cherry picked from commit 14291b061b9b40eadbf4ed442f9a5021b8e09597)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2995b79d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2995b79d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2995b79d

Branch: refs/heads/branch-2.3
Commit: 2995b79d6a78bf632aa4c1c99bebfc213fb31c54
Parents: 908c681
Author: jerryshao 
Authored: Thu Apr 12 20:00:25 2018 -0700
Committer: Tathagata Das 
Committed: Thu Apr 12 20:00:40 2018 -0700

--
 .../analysis/UnsupportedOperationChecker.scala   |  2 +-
 .../streaming/continuous/ContinuousSuite.scala   | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2995b79d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
index b55043c..ff9d6d7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
@@ -345,7 +345,7 @@ object UnsupportedOperationChecker {
 plan.foreachUp { implicit subPlan =>
   subPlan match {
 case (_: Project | _: Filter | _: MapElements | _: MapPartitions |
-  _: DeserializeToObject | _: SerializeFromObject) =>
+  _: DeserializeToObject | _: SerializeFromObject | _: 
SubqueryAlias) =>
 case node if node.nodeName == "StreamingRelationV2" =>
 case node =>
   throwError(s"Continuous processing does not support ${node.nodeName} 
operations.")

http://git-wip-us.apache.org/repos/asf/spark/blob/2995b79d/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
index 4b4ed82..95406b3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
@@ -174,6 +174,25 @@ class ContinuousSuite extends ContinuousSuiteBase {
   "Continuous processing does not support current time operations."))
   }
 
+  test("subquery alias") {
+val df = spark.readStream
+  .format("rate")
+  .option("numPartitions", "5")
+  .option("rowsPerSecond", "5")
+  .load()
+  .createOrReplaceTempView("rate")
+val test = spark.sql("select value from rate where value > 5")
+
+testStream(test, useV2Sink = true)(
+  StartStream(longContinuousTrigger),
+  AwaitEpoch(0),
+  Execute(waitForRateSourceTriggers(_, 2)),
+  IncrementEpoch(),
+  Execute(waitForRateSourceTriggers(_, 4)),
+  IncrementEpoch(),
+  CheckAnswerRowsContains(scala.Range(6, 20).map(Row(_
+  }
+
   test("repeatedly restart") {
 val df = spark.readStream
   .format("rate")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

2018-04-12 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 682002b6d -> 14291b061


[SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

## What changes were proposed in this pull request?

Current SS continuous doesn't support processing on temp table or 
`df.as("xxx")`, SS will throw an exception as LogicalPlan not supported, 
details described in [here](https://issues.apache.org/jira/browse/SPARK-23748).

So here propose to add this support.

## How was this patch tested?

new UT.

Author: jerryshao 

Closes #21017 from jerryshao/SPARK-23748.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/14291b06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/14291b06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/14291b06

Branch: refs/heads/master
Commit: 14291b061b9b40eadbf4ed442f9a5021b8e09597
Parents: 682002b
Author: jerryshao 
Authored: Thu Apr 12 20:00:25 2018 -0700
Committer: Tathagata Das 
Committed: Thu Apr 12 20:00:25 2018 -0700

--
 .../analysis/UnsupportedOperationChecker.scala   |  2 +-
 .../streaming/continuous/ContinuousSuite.scala   | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/14291b06/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
index b55043c..ff9d6d7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
@@ -345,7 +345,7 @@ object UnsupportedOperationChecker {
 plan.foreachUp { implicit subPlan =>
   subPlan match {
 case (_: Project | _: Filter | _: MapElements | _: MapPartitions |
-  _: DeserializeToObject | _: SerializeFromObject) =>
+  _: DeserializeToObject | _: SerializeFromObject | _: 
SubqueryAlias) =>
 case node if node.nodeName == "StreamingRelationV2" =>
 case node =>
   throwError(s"Continuous processing does not support ${node.nodeName} 
operations.")

http://git-wip-us.apache.org/repos/asf/spark/blob/14291b06/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
index f5884b9..ef74efe 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
@@ -171,6 +171,25 @@ class ContinuousSuite extends ContinuousSuiteBase {
   "Continuous processing does not support current time operations."))
   }
 
+  test("subquery alias") {
+val df = spark.readStream
+  .format("rate")
+  .option("numPartitions", "5")
+  .option("rowsPerSecond", "5")
+  .load()
+  .createOrReplaceTempView("rate")
+val test = spark.sql("select value from rate where value > 5")
+
+testStream(test, useV2Sink = true)(
+  StartStream(longContinuousTrigger),
+  AwaitEpoch(0),
+  Execute(waitForRateSourceTriggers(_, 2)),
+  IncrementEpoch(),
+  Execute(waitForRateSourceTriggers(_, 4)),
+  IncrementEpoch(),
+  CheckAnswerRowsContains(scala.Range(6, 20).map(Row(_
+  }
+
   test("repeatedly restart") {
 val df = spark.readStream
   .format("rate")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23867][SCHEDULER] use droppedCount in logWarning

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 571269519 -> 908c681c6


[SPARK-23867][SCHEDULER] use droppedCount in logWarning

## What changes were proposed in this pull request?

Get the count of dropped events for output in log message.

## How was this patch tested?

The fix is pretty trivial, but `./dev/run-tests` were run and were successful.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

vanzin cloud-fan

The contribution is my original work and I license the work to the project 
under the projectâs open source license.

Author: Patrick Pisciuneri 

Closes #20977 from phpisciuneri/fix-log-warning.

(cherry picked from commit 682002b6da844ed11324ee5ff4d00fc0294c0b31)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/908c681c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/908c681c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/908c681c

Branch: refs/heads/branch-2.3
Commit: 908c681c6786ef0d772a43508285cb8891fc524a
Parents: 5712695
Author: Patrick Pisciuneri 
Authored: Fri Apr 13 09:45:27 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Apr 13 09:45:45 2018 +0800

--
 .../main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/908c681c/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala 
b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
index 7e14938..c1fedd6 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
@@ -166,7 +166,7 @@ private class AsyncEventQueue(val name: String, conf: 
SparkConf, metrics: LiveLi
   val prevLastReportTimestamp = lastReportTimestamp
   lastReportTimestamp = System.currentTimeMillis()
   val previous = new java.util.Date(prevLastReportTimestamp)
-  logWarning(s"Dropped $droppedEvents events from $name since 
$previous.")
+  logWarning(s"Dropped $droppedCount events from $name since 
$previous.")
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23867][SCHEDULER] use droppedCount in logWarning

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 0f93b91a7 -> 682002b6d


[SPARK-23867][SCHEDULER] use droppedCount in logWarning

## What changes were proposed in this pull request?

Get the count of dropped events for output in log message.

## How was this patch tested?

The fix is pretty trivial, but `./dev/run-tests` were run and were successful.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

vanzin cloud-fan

The contribution is my original work and I license the work to the project 
under the projectâs open source license.

Author: Patrick Pisciuneri 

Closes #20977 from phpisciuneri/fix-log-warning.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/682002b6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/682002b6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/682002b6

Branch: refs/heads/master
Commit: 682002b6da844ed11324ee5ff4d00fc0294c0b31
Parents: 0f93b91
Author: Patrick Pisciuneri 
Authored: Fri Apr 13 09:45:27 2018 +0800
Committer: Wenchen Fan 
Committed: Fri Apr 13 09:45:27 2018 +0800

--
 .../main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/682002b6/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala 
b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
index 7e14938..c1fedd6 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
@@ -166,7 +166,7 @@ private class AsyncEventQueue(val name: String, conf: 
SparkConf, metrics: LiveLi
   val prevLastReportTimestamp = lastReportTimestamp
   lastReportTimestamp = System.currentTimeMillis()
   val previous = new java.util.Date(prevLastReportTimestamp)
-  logWarning(s"Dropped $droppedEvents events from $name since 
$previous.")
+  logWarning(s"Dropped $droppedCount events from $name since 
$previous.")
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

2018-04-12 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 0b19122d4 -> 0f93b91a7


[SPARK-23751][FOLLOW-UP] fix build for scala-2.12

## What changes were proposed in this pull request?

fix build for scala-2.12

## How was this patch tested?

Manual.

Author: WeichenXu 

Closes #21051 from WeichenXu123/fix_build212.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f93b91a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f93b91a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f93b91a

Branch: refs/heads/master
Commit: 0f93b91a71444a1a938acfd8ea2191c54fb0187c
Parents: 0b19122
Author: WeichenXu 
Authored: Thu Apr 12 15:47:42 2018 -0600
Committer: Joseph K. Bradley 
Committed: Thu Apr 12 15:47:42 2018 -0600

--
 .../scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0f93b91a/mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala 
b/mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala
index af8ff64..adf8145 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala
@@ -85,7 +85,7 @@ object KolmogorovSmirnovTest {
   dataset: Dataset[_],
   sampleCol: String,
   cdf: Function[java.lang.Double, java.lang.Double]): DataFrame = {
-test(dataset, sampleCol, (x: Double) => cdf.call(x))
+test(dataset, sampleCol, (x: Double) => cdf.call(x).toDouble)
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 91b561749 -> 658467248


http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/strata-exercises-now-available-online.html
--
diff --git a/site/news/strata-exercises-now-available-online.html 
b/site/news/strata-exercises-now-available-online.html
index 916f242..4f250a3 100644
--- a/site/news/strata-exercises-now-available-online.html
+++ b/site/news/strata-exercises-now-available-online.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2014.html
--
diff --git a/site/news/submit-talks-to-spark-summit-2014.html 
b/site/news/submit-talks-to-spark-summit-2014.html
index 4f43c23..18f2642 100644
--- a/site/news/submit-talks-to-spark-summit-2014.html
+++ b/site/news/submit-talks-to-spark-summit-2014.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-2016.html 
b/site/news/submit-talks-to-spark-summit-2016.html
index 3163bab..3766932 100644
--- a/site/news/submit-talks-to-spark-summit-2016.html
+++ b/site/news/submit-talks-to-spark-summit-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-east-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-east-2016.html 
b/site/news/submit-talks-to-spark-summit-east-2016.html
index 1984db7..b4a51a7 100644
--- a/site/news/submit-talks-to-spark-summit-east-2016.html
+++ b/site/news/submit-talks-to-spark-summit-east-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-eu-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-eu-2016.html 
b/site/news/submit-talks-to-spark-summit-eu-2016.html
index 8e33a17..940bc6f 100644
--- a/site/news/submit-talks-to-spark-summit-eu-2016.html
+++ b/site/news/submit-talks-to-spark-summit-eu-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/two-weeks-to-spark-summit-2014.html
--
diff --git a/site/news/two-weeks-to-spark-summit-2014.html 
b/site/news/two-weeks-to-spark-summit-2014.html
index 3863298..d4e993a 100644
--- a/site/news/two-weeks-to-spark-summit-2014.html
+++ b/site/news/two-weeks-to-spark-summit-2014.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/video-from-first-spark-development-meetup.html
--
diff --git a/site/news/video-from-first-spark-development-meetup.html 
b/site/news/video-from-first-spark-development-meetup.html
index 2be7f50..04151a8 100644
--- a/site/news/video-from-first-spark-development-meetup.html
+++ b/site/news/video-from-first-spark-development-meetup.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/powered-by.html
--
diff --git a/site/powered-by.html b/site/powered-by.html
index 3449782..b303df0 100644
--- a/site/powered-by.html
+++ b/site/powered-by.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/release-process.html
--
diff --git a/site/release-process.html b/site/release-process.html
index

[2/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin

Update text/wording to more "modern" Spark and more consistent.

1. Use DataFrame examples.

2. Reduce explicit comparison with MapReduce, since the topic does not really 
come up.

3. More focus on analytics rather than "cluster compute".

4. Update committer affiliation.

5. Make it more clear Spark runs in diverse environments (especially on MLlib 
page).

There are a lot that needs to be done that I don't have time today, e.g. refer 
to Structured Streaming.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/65846724
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/65846724
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/65846724

Branch: refs/heads/asf-site
Commit: 658467248b278b109bc3d2594b0ef08ff0c727cb
Parents: 91b5617
Author: Reynold Xin 
Authored: Thu Apr 12 12:56:05 2018 -0700
Committer: Reynold Xin 
Committed: Thu Apr 12 12:56:05 2018 -0700

--
 _layouts/global.html|   2 +-
 committers.md   |  22 +-
 index.md|  34 +--
 mllib/index.md  |  18 +-
 site/committers.html|  24 +-
 site/community.html |   2 +-
 site/contributing.html  |   2 +-
 site/developer-tools.html   |   2 +-
 site/documentation.html |   2 +-
 site/downloads.html |   2 +-
 site/examples.html  |   2 +-
 site/faq.html   |   2 +-
 site/history.html   |   2 +-
 site/improvement-proposals.html |   2 +-
 site/index.html |  36 +--
 site/mailing-lists.html |   4 +-
 site/mllib/index.html   |  18 +-
 site/news/amp-camp-2013-registration-ope.html   |   2 +-
 .../news/announcing-the-first-spark-summit.html |   2 +-
 .../news/fourth-spark-screencast-published.html |   2 +-
 site/news/index.html|   2 +-
 site/news/nsdi-paper.html   |   2 +-
 site/news/one-month-to-spark-summit-2015.html   |   2 +-
 .../proposals-open-for-spark-summit-east.html   |   2 +-
 ...registration-open-for-spark-summit-east.html |   2 +-
 .../news/run-spark-and-shark-on-amazon-emr.html |   2 +-
 site/news/spark-0-6-1-and-0-5-2-released.html   |   2 +-
 site/news/spark-0-6-2-released.html |   2 +-
 site/news/spark-0-7-0-released.html |   2 +-
 site/news/spark-0-7-2-released.html |   2 +-
 site/news/spark-0-7-3-released.html |   2 +-
 site/news/spark-0-8-0-released.html |   2 +-
 site/news/spark-0-8-1-released.html |   2 +-
 site/news/spark-0-9-0-released.html |   2 +-
 site/news/spark-0-9-1-released.html |   2 +-
 site/news/spark-0-9-2-released.html |   2 +-
 site/news/spark-1-0-0-released.html |   2 +-
 site/news/spark-1-0-1-released.html |   2 +-
 site/news/spark-1-0-2-released.html |   2 +-
 site/news/spark-1-1-0-released.html |   2 +-
 site/news/spark-1-1-1-released.html |   2 +-
 site/news/spark-1-2-0-released.html |   2 +-
 site/news/spark-1-2-1-released.html |   2 +-
 site/news/spark-1-2-2-released.html |   2 +-
 site/news/spark-1-3-0-released.html |   2 +-
 site/news/spark-1-4-0-released.html |   2 +-
 site/news/spark-1-4-1-released.html |   2 +-
 site/news/spark-1-5-0-released.html |   2 +-
 site/news/spark-1-5-1-released.html |   2 +-
 site/news/spark-1-5-2-released.html |   2 +-
 site/news/spark-1-6-0-released.html |   2 +-
 site/news/spark-1-6-1-released.html |   2 +-
 site/news/spark-1-6-2-released.html |   2 +-
 site/news/spark-1-6-3-released.html |   2 +-
 site/news/spark-2-0-0-released.html |   2 +-
 site/news/spark-2-0-1-released.html |   2 +-
 site/news/spark-2-0-2-released.html |   2 +-
 site/news/spark-2-1-0-released.html |   2 +-
 site/news/spark-2-1-1-released.html |   2 +-
 site/news/spark-2-1-2-released.html |   2 +-
 site/news/spark-2-2-0-released.html |   2 +-
 site/news/spark-2-2-1-released.html |   2 +-
 site/news/spark-2-3-0-released.html |   2 +-
 site/news/spark-2.0.0-preview.html  |   2 +-
 .../spark-accepted-into-apache-incubator.html   |   2 +-
 site/news/spark-and-shark-in-the-news.html  |   2 +-
 site/news/spark-becomes-tlp.html|   2 +-

svn commit: r26307 - in /dev/spark/2.4.0-SNAPSHOT-2018_04_12_08_02-0b19122-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-04-12 Thread pwendell

Author: pwendell
Date: Thu Apr 12 15:24:31 2018
New Revision: 26307

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_04_12_08_02-0b19122 docs


[This commit notification would consist of 1458 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 6a2289ecf -> 0b19122d4


[SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

## What changes were proposed in this pull request?

This PR tries to use `MemoryBlock` in `UTF8StringBuffer`. In general, there are 
two advantages to use `MemoryBlock`.

1. Has clean API calls rather than using a Java array or `PlatformMemory`
2. Improve runtime performance of memory access instead of using `Object`.

## How was this patch tested?

Added `UTF8StringBufferSuite`

Author: Kazuaki Ishizaki 

Closes #20871 from kiszk/SPARK-23762.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0b19122d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0b19122d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0b19122d

Branch: refs/heads/master
Commit: 0b19122d434e39eb117ccc3174a0688c9c874d48
Parents: 6a2289e
Author: Kazuaki Ishizaki 
Authored: Thu Apr 12 22:21:30 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Apr 12 22:21:30 2018 +0800

--
 .../expressions/codegen/UTF8StringBuilder.java  | 35 +++-
 .../codegen/UTF8StringBuilderSuite.scala| 42 
 2 files changed, 56 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0b19122d/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java
--
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java
index f0f66ba..f8000d7 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java
@@ -19,6 +19,8 @@ package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.unsafe.Platform;
 import org.apache.spark.unsafe.array.ByteArrayMethods;
+import org.apache.spark.unsafe.memory.ByteArrayMemoryBlock;
+import org.apache.spark.unsafe.memory.MemoryBlock;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
@@ -29,43 +31,34 @@ public class UTF8StringBuilder {
 
   private static final int ARRAY_MAX = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH;
 
-  private byte[] buffer;
-  private int cursor = Platform.BYTE_ARRAY_OFFSET;
+  private ByteArrayMemoryBlock buffer;
+  private int length = 0;
 
   public UTF8StringBuilder() {
 // Since initial buffer size is 16 in `StringBuilder`, we set the same 
size here
-this.buffer = new byte[16];
+this.buffer = new ByteArrayMemoryBlock(16);
   }
 
   // Grows the buffer by at least `neededSize`
   private void grow(int neededSize) {
-if (neededSize > ARRAY_MAX - totalSize()) {
+if (neededSize > ARRAY_MAX - length) {
   throw new UnsupportedOperationException(
 "Cannot grow internal buffer by size " + neededSize + " because the 
size after growing " +
   "exceeds size limitation " + ARRAY_MAX);
 }
-final int length = totalSize() + neededSize;
-if (buffer.length < length) {
-  int newLength = length < ARRAY_MAX / 2 ? length * 2 : ARRAY_MAX;
-  final byte[] tmp = new byte[newLength];
-  Platform.copyMemory(
-buffer,
-Platform.BYTE_ARRAY_OFFSET,
-tmp,
-Platform.BYTE_ARRAY_OFFSET,
-totalSize());
+final int requestedSize = length + neededSize;
+if (buffer.size() < requestedSize) {
+  int newLength = requestedSize < ARRAY_MAX / 2 ? requestedSize * 2 : 
ARRAY_MAX;
+  final ByteArrayMemoryBlock tmp = new ByteArrayMemoryBlock(newLength);
+  MemoryBlock.copyMemory(buffer, tmp, length);
   buffer = tmp;
 }
   }
 
-  private int totalSize() {
-return cursor - Platform.BYTE_ARRAY_OFFSET;
-  }
-
   public void append(UTF8String value) {
 grow(value.numBytes());
-value.writeToMemory(buffer, cursor);
-cursor += value.numBytes();
+value.writeToMemory(buffer.getByteArray(), length + 
Platform.BYTE_ARRAY_OFFSET);
+length += value.numBytes();
   }
 
   public void append(String value) {
@@ -73,6 +66,6 @@ public class UTF8StringBuilder {
   }
 
   public UTF8String build() {
-return UTF8String.fromBytes(buffer, 0, totalSize());
+return UTF8String.fromBytes(buffer.getByteArray(), 0, length);
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/0b19122d/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilderSuite.scala
--
diff --git

svn commit: r26305 - in /dev/spark/2.4.0-SNAPSHOT-2018_04_12_04_02-6a2289e-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-04-12 Thread pwendell

Author: pwendell
Date: Thu Apr 12 11:23:41 2018
New Revision: 26305

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_04_12_04_02-6a2289e docs


[This commit notification would consist of 1458 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r26303 - in /dev/spark/2.3.1-SNAPSHOT-2018_04_12_02_01-5712695-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-04-12 Thread pwendell

Author: pwendell
Date: Thu Apr 12 09:15:55 2018
New Revision: 26303

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_04_12_02_01-5712695 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master e904dfaf0 -> 6a2289ecf


[SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

SQLMetricsTestUtils.currentExecutionIds() was racing with the listener
bus, which lead to some flaky tests.  We should wait till the listener bus is
empty.

I tested by adding some Thread.sleep()s in SQLAppStatusListener, which
reproduced the exceptions I saw on Jenkins.  With this change, they went
away.

Author: Imran Rashid 

Closes #21041 from squito/SPARK-23962.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a2289ec
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a2289ec
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a2289ec

Branch: refs/heads/master
Commit: 6a2289ecf020a99cd9b3bcea7da5e78fb4e0303a
Parents: e904dfa
Author: Imran Rashid 
Authored: Thu Apr 12 15:58:04 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Apr 12 15:58:04 2018 +0800

--
 .../org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a2289ec/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
index 534d8bb..dcc540f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
@@ -34,6 +34,7 @@ trait SQLMetricsTestUtils extends SQLTestUtils {
   import testImplicits._
 
   protected def currentExecutionIds(): Set[Long] = {
+spark.sparkContext.listenerBus.waitUntilEmpty(1)
 statusStore.executionsList.map(_.executionId).toSet
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

2018-04-12 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 03a4dfd69 -> 571269519


[SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

SQLMetricsTestUtils.currentExecutionIds() was racing with the listener
bus, which lead to some flaky tests.  We should wait till the listener bus is
empty.

I tested by adding some Thread.sleep()s in SQLAppStatusListener, which
reproduced the exceptions I saw on Jenkins.  With this change, they went
away.

Author: Imran Rashid 

Closes #21041 from squito/SPARK-23962.

(cherry picked from commit 6a2289ecf020a99cd9b3bcea7da5e78fb4e0303a)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/57126951
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/57126951
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/57126951

Branch: refs/heads/branch-2.3
Commit: 571269519a49e0651575de18de81b788b6548afd
Parents: 03a4dfd
Author: Imran Rashid 
Authored: Thu Apr 12 15:58:04 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Apr 12 15:58:36 2018 +0800

--
 .../org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/57126951/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
index 122d287..e22bbb6 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
@@ -34,6 +34,7 @@ trait SQLMetricsTestUtils extends SQLTestUtils {
   import testImplicits._
 
   protected def currentExecutionIds(): Set[Long] = {
+spark.sparkContext.listenerBus.waitUntilEmpty(1)
 statusStore.executionsList.map(_.executionId).toSet
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to write output on multi level partition

spark git commit: [SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may fail to write output on multi level partition

spark git commit: [SPARK-23971] Should not leak Spark sessions across test suites

svn commit: r26318 - in /dev/spark/2.3.1-SNAPSHOT-2018_04_12_22_02-2995b79-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

spark git commit: [SPARK-23867][SCHEDULER] use droppedCount in logWarning

spark git commit: [SPARK-23867][SCHEDULER] use droppedCount in logWarning

spark git commit: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

[1/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

[2/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

svn commit: r26307 - in /dev/spark/2.4.0-SNAPSHOT-2018_04_12_08_02-0b19122-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

svn commit: r26305 - in /dev/spark/2.4.0-SNAPSHOT-2018_04_12_04_02-6a2289e-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r26303 - in /dev/spark/2.3.1-SNAPSHOT-2018_04_12_02_01-5712695-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

spark git commit: [SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().

17 matches

Site Navigation

Mail list logo

Footer information