date:20180116

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-16 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/20203
  
From a code pov the UI change looks fine, but could you upload a few 
screenshots of the change? Also the UI simply says if the exec is blacklisted 
for the whole app or just a stage, but doesn't specify which stage. Is knowing 
the stage which blacklisted the node important? If so we should try to raise 
that to the UI as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20253
  
**[Test build #86208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86208/testReport)**
 for PR 20253 at commit 
[`408f4ee`](https://github.com/apache/spark/commit/408f4ee7c05f60ac510794a5b1d41420905444c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20019
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20019
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86191/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20243: [SPARK-23052][SS] Migrate ConsoleSink to data sou...

2018-01-16 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20243#discussion_r161908335
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala 
---
@@ -17,58 +17,36 @@
 
 package org.apache.spark.sql.execution.streaming
 
-import org.apache.spark.internal.Logging
-import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
-import org.apache.spark.sql.execution.SQLExecution
-import org.apache.spark.sql.sources.{BaseRelation, 
CreatableRelationProvider, DataSourceRegister, StreamSinkProvider}
-import org.apache.spark.sql.streaming.OutputMode
-import org.apache.spark.sql.types.StructType
-
-class ConsoleSink(options: Map[String, String]) extends Sink with Logging {
-  // Number of rows to display, by default 20 rows
-  private val numRowsToShow = 
options.get("numRows").map(_.toInt).getOrElse(20)
-
-  // Truncate the displayed data if it is too long, by default it is true
-  private val isTruncated = 
options.get("truncate").map(_.toBoolean).getOrElse(true)
+import java.util.Optional
 
-  // Track the batch id
-  private var lastBatchId = -1L
-
-  override def addBatch(batchId: Long, data: DataFrame): Unit = 
synchronized {
-val batchIdStr = if (batchId <= lastBatchId) {
-  s"Rerun batch: $batchId"
-} else {
-  lastBatchId = batchId
-  s"Batch: $batchId"
-}
-
-// scalastyle:off println
-println("---")
-println(batchIdStr)
-println("---")
-// scalastyle:off println
-data.sparkSession.createDataFrame(
-  data.sparkSession.sparkContext.parallelize(data.collect()), 
data.schema)
-  .show(numRowsToShow, isTruncated)
-  }
+import scala.collection.JavaConverters._
 
-  override def toString(): String = s"ConsoleSink[numRows=$numRowsToShow, 
truncate=$isTruncated]"
-}
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming.sources.ConsoleWriter
+import org.apache.spark.sql.sources.{BaseRelation, 
CreatableRelationProvider, DataSourceRegister}
+import org.apache.spark.sql.sources.v2.{DataSourceV2, DataSourceV2Options}
+import org.apache.spark.sql.sources.v2.streaming.MicroBatchWriteSupport
+import org.apache.spark.sql.sources.v2.writer.DataSourceV2Writer
+import org.apache.spark.sql.streaming.OutputMode
+import org.apache.spark.sql.types.StructType
 
 case class ConsoleRelation(override val sqlContext: SQLContext, data: 
DataFrame)
   extends BaseRelation {
   override def schema: StructType = data.schema
 }
 
-class ConsoleSinkProvider extends StreamSinkProvider
+class ConsoleSinkProvider extends DataSourceV2
+  with MicroBatchWriteSupport
   with DataSourceRegister
   with CreatableRelationProvider {
-  def createSink(
-  sqlContext: SQLContext,
-  parameters: Map[String, String],
-  partitionColumns: Seq[String],
-  outputMode: OutputMode): Sink = {
-new ConsoleSink(parameters)
+
+  override def createMicroBatchWriter(
+  queryId: String,
+  epochId: Long,
+  schema: StructType,
+  mode: OutputMode,
+  options: DataSourceV2Options): Optional[DataSourceV2Writer] = {
+Optional.of(new ConsoleWriter(epochId, schema, 
options.asMap.asScala.toMap))
   }
 
   def createRelation(
--- End diff --

I assume so. I'm not familiar with it, but it's not on the streaming source 
codepath.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20019
  
**[Test build #86191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86191/testReport)**
 for PR 20019 at commit 
[`d1e2454`](https://github.com/apache/spark/commit/d1e24542da403e5417a663c675d73a2f95ac77ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20168
  
**[Test build #86207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86207/testReport)**
 for PR 20168 at commit 
[`5a632f5`](https://github.com/apache/spark/commit/5a632f5f60afd2e8c225703532d17ed9e56e47f7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19247: [Spark-21996][SQL] read files with space in name ...

2018-01-16 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19247#discussion_r161907322
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -408,6 +420,18 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 }
   }
 
+  test("SPARK-21996 read from text files -- file name has space") {
--- End diff --

Not. I meant it's not an issue of file formats. There are not some special 
codes in file stream source. If there should be any tests for such issue, they 
should be inside file format tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spar...

2018-01-16 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20274
  
How is the silent mode broken? Could you elaborate on that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19247: [Spark-21996][SQL] read files with space in name ...

2018-01-16 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19247#discussion_r161906360
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -408,6 +420,18 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 }
   }
 
+  test("SPARK-21996 read from text files -- file name has space") {
--- End diff --

yes, for this PR it is, but it would be great if we can ensure that all the 
data sources have the same behavior... Maybe we can do this is another PR if 
you think it is better


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86205/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)**
 for PR 20280 at commit 
[`a7d3396`](https://github.com/apache/spark/commit/a7d339624d3ddf80af63fd3710fdc1e0742ecc6c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20266
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20279
  
**[Test build #86206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86206/testReport)**
 for PR 20279 at commit 
[`a81c2ec`](https://github.com/apache/spark/commit/a81c2ecdafd54a2c5bfb07c6f1f53546eaa96c7c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86188/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #86188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86188/testReport)**
 for PR 19872 at commit 
[`b6b935c`](https://github.com/apache/spark/commit/b6b935cf120b229e9df4d276847312302a116b26).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

2018-01-16 Thread tomasatdatabricks

Github user tomasatdatabricks commented on a diff in the pull request:

https://github.com/apache/spark/pull/20168#discussion_r161904121
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1843,6 +1844,28 @@ def tearDown(self):
 
 class ImageReaderTest(SparkSessionTestCase):
 
+def test_ocv_types(self):
+ocvList = ImageSchema.ocvTypes
+self.assertEqual("Undefined", ocvList[0].name)
+self.assertEqual(-1, ocvList[0].mode)
+self.assertEqual("N/A", ocvList[0].dataType)
+for x in ocvList:
+self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
+self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
+
+def test_conversions(self):
+s = np.random.RandomState(seed=987)
+ary_src = s.rand(4, 10, 10)
--- End diff --

Yes, that was the intention, good catch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

2018-01-16 Thread tomasatdatabricks

Github user tomasatdatabricks commented on a diff in the pull request:

https://github.com/apache/spark/pull/20168#discussion_r161904044
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
@@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 val bytes20 = getData(row).slice(0, 20)
 
 val (expectedMode, expectedBytes) = firstBytes20(filename)
--- End diff --

Yes, good catch. The name is definitely misleading as it is now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86189/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

2018-01-16 Thread tomasatdatabricks

Github user tomasatdatabricks commented on a diff in the pull request:

https://github.com/apache/spark/pull/20168#discussion_r161903575
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
---
@@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
 @Since("2.3.0")
 object ImageSchema {
 
-  val undefinedImageType = "Undefined"
+  /**
+   * OpenCv type representation
+   *
+   * @param mode ordinal for the type
+   * @param dataType open cv data type
+   * @param nChannels number of color channels
+   */
+  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
+def name: String = if (mode == -1) { "Undefined" } else { 
s"CV_$dataType" + s"C$nChannels" }
+override def toString: String = s"OpenCvType(mode = $mode, name = 
$name)"
+  }
 
   /**
-   * (Scala-specific) OpenCV type mapping supported
+   * Return the supported OpenCvType with matching name or raise error if 
there is no matching type.
+   *
+   * @param name: name of existing OpenCvType
+   * @return OpenCvType that matches the given name
*/
-  val ocvTypes: Map[String, Int] = Map(
-undefinedImageType -> -1,
-"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
-  )
+  def ocvTypeByName(name: String): OpenCvType = {
--- End diff --

I prefer the ocvTypeByName unless there is a consensus otherwise. Find 
suggests slightly different usage patterns in my opinion and so does get, to 
some extent, + get would not be consistent with the rest of the python api. I 
don't feel particularly strong about this so if more people think it should be 
change I'd be happy to change it .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #86189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86189/testReport)**
 for PR 19872 at commit 
[`9824bbd`](https://github.com/apache/spark/commit/9824bbd6ca7c85cd493e5e7eef0db15bbaf1ad95).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20253: [SPARK-22908][SS] Roll forward continuous process...

2018-01-16 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20253#discussion_r161901973
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import scala.collection.mutable
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, 
SparkListenerTaskStart}
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
+import org.apache.spark.sql.execution.streaming.StreamExecution
+import 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution
+import org.apache.spark.sql.streaming.Trigger
+import org.apache.spark.sql.test.TestSparkSession
+
+// Trait to configure StreamTest for kafka continuous execution tests.
+trait KafkaContinuousTest extends KafkaSourceTest {
+  override val defaultTrigger = Trigger.Continuous(1000)
+  override val defaultUseV2Sink = true
+
+  // We need more than the default local[2] to be able to schedule all 
partitions simultaneously.
+  override protected def createSparkSession = new TestSparkSession(
+new SparkContext(
+  "local[10]",
+  "continuous-stream-test-sql-context",
+  sparkConf.set("spark.sql.testkey", "true")))
+
+  // In addition to setting the partitions in Kafka, we have to wait until 
the query has
+  // reconfigured to the new count so the test framework can hook in 
properly.
+  override protected def setTopicPartitions(
+  topic: String, newCount: Int, query: StreamExecution) = {
+testUtils.addPartitions(topic, newCount)
+eventually(timeout(streamingTimeout)) {
+  assert(
+query.lastExecution.logical.collectFirst {
+  case DataSourceV2Relation(_, r: KafkaContinuousReader) => r
+}.exists(_.knownPartitions.size == newCount),
+s"query never reconfigured to $newCount partitions")
+}
+  }
+
+  // Continuous processing tasks end asynchronously, so test that they 
actually end.
+  private val tasksEndedListener = new SparkListener() {
+val activeTaskIds = mutable.Set[Long]()
+
+override def onTaskStart(start: SparkListenerTaskStart): Unit = {
+  activeTaskIds.add(start.taskInfo.taskId)
+}
+
+override def onTaskEnd(end: SparkListenerTaskEnd): Unit = {
+  activeTaskIds.remove(end.taskInfo.taskId)
+}
+  }
+  override def beforeEach(): Unit = {
+spark.sparkContext.addSparkListener(tasksEndedListener)
+  }
+
+  override def afterEach(): Unit = {
+eventually(timeout(streamingTimeout)) {
+  assert(tasksEndedListener.activeTaskIds.isEmpty)
+}
+spark.sparkContext.removeSparkListener(tasksEndedListener)
--- End diff --

call super.afterEach()


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20253: [SPARK-22908][SS] Roll forward continuous process...

2018-01-16 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20253#discussion_r161901930
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import scala.collection.mutable
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, 
SparkListenerTaskStart}
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
+import org.apache.spark.sql.execution.streaming.StreamExecution
+import 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution
+import org.apache.spark.sql.streaming.Trigger
+import org.apache.spark.sql.test.TestSparkSession
+
+// Trait to configure StreamTest for kafka continuous execution tests.
+trait KafkaContinuousTest extends KafkaSourceTest {
+  override val defaultTrigger = Trigger.Continuous(1000)
+  override val defaultUseV2Sink = true
+
+  // We need more than the default local[2] to be able to schedule all 
partitions simultaneously.
+  override protected def createSparkSession = new TestSparkSession(
+new SparkContext(
+  "local[10]",
+  "continuous-stream-test-sql-context",
+  sparkConf.set("spark.sql.testkey", "true")))
+
+  // In addition to setting the partitions in Kafka, we have to wait until 
the query has
+  // reconfigured to the new count so the test framework can hook in 
properly.
+  override protected def setTopicPartitions(
+  topic: String, newCount: Int, query: StreamExecution) = {
+testUtils.addPartitions(topic, newCount)
+eventually(timeout(streamingTimeout)) {
+  assert(
+query.lastExecution.logical.collectFirst {
+  case DataSourceV2Relation(_, r: KafkaContinuousReader) => r
+}.exists(_.knownPartitions.size == newCount),
+s"query never reconfigured to $newCount partitions")
+}
+  }
+
+  // Continuous processing tasks end asynchronously, so test that they 
actually end.
+  private val tasksEndedListener = new SparkListener() {
+val activeTaskIds = mutable.Set[Long]()
+
+override def onTaskStart(start: SparkListenerTaskStart): Unit = {
+  activeTaskIds.add(start.taskInfo.taskId)
+}
+
+override def onTaskEnd(end: SparkListenerTaskEnd): Unit = {
+  activeTaskIds.remove(end.taskInfo.taskId)
+}
+  }
+  override def beforeEach(): Unit = {
+spark.sparkContext.addSparkListener(tasksEndedListener)
--- End diff --

you are not calling super.beforeEach. This may have unforeseen 
circumstances.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20253: [SPARK-22908][SS] Roll forward continuous process...

2018-01-16 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20253#discussion_r161901653
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import scala.collection.mutable
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, 
SparkListenerTaskStart}
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
+import org.apache.spark.sql.execution.streaming.StreamExecution
+import 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution
+import org.apache.spark.sql.streaming.Trigger
+import org.apache.spark.sql.test.TestSparkSession
+
+// Trait to configure StreamTest for kafka continuous execution tests.
+trait KafkaContinuousTest extends KafkaSourceTest {
+  override val defaultTrigger = Trigger.Continuous(1000)
+  override val defaultUseV2Sink = true
+
+  // We need more than the default local[2] to be able to schedule all 
partitions simultaneously.
+  override protected def createSparkSession = new TestSparkSession(
+new SparkContext(
+  "local[10]",
+  "continuous-stream-test-sql-context",
+  sparkConf.set("spark.sql.testkey", "true")))
+
+  // In addition to setting the partitions in Kafka, we have to wait until 
the query has
+  // reconfigured to the new count so the test framework can hook in 
properly.
+  override protected def setTopicPartitions(
+  topic: String, newCount: Int, query: StreamExecution) = {
+testUtils.addPartitions(topic, newCount)
+eventually(timeout(streamingTimeout)) {
+  assert(
+query.lastExecution.logical.collectFirst {
+  case DataSourceV2Relation(_, r: KafkaContinuousReader) => r
+}.exists(_.knownPartitions.size == newCount),
+s"query never reconfigured to $newCount partitions")
+}
+  }
+
+  // Continuous processing tasks end asynchronously, so test that they 
actually end.
+  private val tasksEndedListener = new SparkListener() {
+val activeTaskIds = mutable.Set[Long]()
--- End diff --

This is not a synchronized set. We are accessing this from multiple threads 
(listener thread is adding/removing, and test thread is checking if its empty). 
So this is an incorrect test. 
In fact, this can be made simpler using just an AtomicInteger.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20253: [SPARK-22908][SS] Roll forward continuous process...

2018-01-16 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20253#discussion_r161901683
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import scala.collection.mutable
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, 
SparkListenerTaskStart}
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
+import org.apache.spark.sql.execution.streaming.StreamExecution
+import 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution
+import org.apache.spark.sql.streaming.Trigger
+import org.apache.spark.sql.test.TestSparkSession
+
+// Trait to configure StreamTest for kafka continuous execution tests.
+trait KafkaContinuousTest extends KafkaSourceTest {
+  override val defaultTrigger = Trigger.Continuous(1000)
+  override val defaultUseV2Sink = true
+
+  // We need more than the default local[2] to be able to schedule all 
partitions simultaneously.
+  override protected def createSparkSession = new TestSparkSession(
+new SparkContext(
+  "local[10]",
+  "continuous-stream-test-sql-context",
+  sparkConf.set("spark.sql.testkey", "true")))
+
+  // In addition to setting the partitions in Kafka, we have to wait until 
the query has
+  // reconfigured to the new count so the test framework can hook in 
properly.
+  override protected def setTopicPartitions(
+  topic: String, newCount: Int, query: StreamExecution) = {
+testUtils.addPartitions(topic, newCount)
+eventually(timeout(streamingTimeout)) {
+  assert(
+query.lastExecution.logical.collectFirst {
+  case DataSourceV2Relation(_, r: KafkaContinuousReader) => r
+}.exists(_.knownPartitions.size == newCount),
+s"query never reconfigured to $newCount partitions")
+}
+  }
+
+  // Continuous processing tasks end asynchronously, so test that they 
actually end.
+  private val tasksEndedListener = new SparkListener() {
+val activeTaskIds = mutable.Set[Long]()
+
+override def onTaskStart(start: SparkListenerTaskStart): Unit = {
+  activeTaskIds.add(start.taskInfo.taskId)
+}
+
+override def onTaskEnd(end: SparkListenerTaskEnd): Unit = {
+  activeTaskIds.remove(end.taskInfo.taskId)
+}
+  }
+  override def beforeEach(): Unit = {
--- End diff --

add an empty line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86187/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #86187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86187/testReport)**
 for PR 19872 at commit 
[`9fbf012`](https://github.com/apache/spark/commit/9fbf01275159fb7b16cf11687510746d174a7e1f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)**
 for PR 20280 at commit 
[`a7d3396`](https://github.com/apache/spark/commit/a7d339624d3ddf80af63fd3710fdc1e0742ecc6c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)**
 for PR 20280 at commit 
[`2192e49`](https://github.com/apache/spark/commit/2192e494eb8a8f8d3d42b20fdb2b3c681f6bdcb5).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86204/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20282
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)**
 for PR 20280 at commit 
[`2192e49`](https://github.com/apache/spark/commit/2192e494eb8a8f8d3d42b20fdb2b3c681f6bdcb5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20282
  
**[Test build #86203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86203/testReport)**
 for PR 20282 at commit 
[`6bc61b0`](https://github.com/apache/spark/commit/6bc61b024c489d6fd7c1dd1b8a5478bbd6a4fa7e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20266
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86184/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20266
  
**[Test build #86184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86184/testReport)**
 for PR 20266 at commit 
[`8fec65b`](https://github.com/apache/spark/commit/8fec65b163b32e7592b21b4a6c19c69352f41919).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19247: [Spark-21996][SQL] read files with space in name ...

2018-01-16 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19247#discussion_r161893771
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -408,6 +420,18 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 }
   }
 
+  test("SPARK-21996 read from text files -- file name has space") {
--- End diff --

this test should be enough. The issue is in file stream source.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20091
  
**[Test build #86202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86202/testReport)**
 for PR 20091 at commit 
[`62088ca`](https://github.com/apache/spark/commit/62088ca998e90281ce8681f748f26ba6fbcc1471).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-16 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20138
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19247: [Spark-21996][SQL] read files with space in name ...

2018-01-16 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19247#discussion_r161892343
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -86,6 +86,18 @@ abstract class FileStreamSourceTest
 }
   }
 
+  case class AddTextFileDataWithSpaceInFileName(content: String, src: 
File, tmp: File)
--- End diff --

I would suggest that adding a new parameter to AddTextFileData rather than 
introducing a new class, such as
```  case class AddTextFileData(content: String, src: File, tmp: File, 
tempFilePrefix: String = "text")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-16 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20138
  
lgtm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20283: [SPARK-23095][SQL] Decorrelation of scalar subquery fail...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20283
  
**[Test build #86201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86201/testReport)**
 for PR 20283 at commit 
[`5fa80de`](https://github.com/apache/spark/commit/5fa80dec8e2aee07b5c04f7ad01abaccae3b6aff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20283: [SPARK-23095][SQL] Decorrelation of scalar subque...

2018-01-16 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/20283

[SPARK-23095][SQL] Decorrelation of scalar subquery fails with 
java.util.NoSuchElementException

## What changes were proposed in this pull request?
The following SQL involving scalar correlated query returns a map exception.
``` SQL
SELECT t1a
FROM   t1
WHERE  t1a = (SELECT   count(*)
  FROM t2
  WHEREt2c = t1c
  HAVING   count(*) >= 1)
```
``` SQL
key not found: ExprId(278,786682bb-41f9-4bd5-a397-928272cc8e4e)
java.util.NoSuchElementException: key not found: 
ExprId(278,786682bb-41f9-4bd5-a397-928272cc8e4e)
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$.org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$evalSubqueryOnZeroTups(subquery.scala:378)
at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$$anonfun$org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$constructLeftJoins$1.apply(subquery.scala:430)
at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$$anonfun$org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$constructLeftJoins$1.apply(subquery.scala:426)
```

In this case, after evaluating the HAVING clause "count(*) > 1" statically
against the binding of aggregtation result on empty input, we determine
that this query will not have a the count bug. We should simply return
the evalSubqueryOnZeroTups with empty value.
(Please fill in changes proposed in this fix)

## How was this patch tested?
A new test was added in the Subquery bucket.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark scalar-count-defect

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20283


commit 5fa80dec8e2aee07b5c04f7ad01abaccae3b6aff
Author: Dilip Biswal 
Date:   2017-09-21T23:12:22Z

[SPARK-23095] Decorrelation of scalar subquery fails with 
java.util.NoSuchElementException.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20023
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-16 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20203
  
@ajbozarth maybe you have some thoughts on the UI, and whether it makes 
sense to put anything on the executors page?

@CodingCat you also often have good UI suggestions :)

thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20023
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86182/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)**
 for PR 20265 at commit 
[`87af693`](https://github.com/apache/spark/commit/87af693a82f9591a256c55a5eca65041f330a225).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20023
  
**[Test build #86182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86182/testReport)**
 for PR 20023 at commit 
[`090659f`](https://github.com/apache/spark/commit/090659fe5f2471462ada0d54c0c855d9fe4aba7e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
I updated the PR (except one RowGroupSize/OrcStripeSize part).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-16 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r161884194
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala ---
@@ -36,7 +36,9 @@ import org.apache.spark.util.Clock
  * [[TaskSetManager]] this class is designed only to be called from code 
with a lock on the
  * TaskScheduler (e.g. its event handlers). It should not be called from 
other threads.
  */
-private[scheduler] class TaskSetBlacklist(val conf: SparkConf, val 
stageId: Int, val clock: Clock)
+private[scheduler] class TaskSetBlacklist(private val listenerBus: 
LiveListenerBus,
+  val conf: SparkConf, val 
stageId: Int,
--- End diff --

style: if its multiline, each param on its own line, double-indented


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-16 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r161885726
  
--- Diff: 
core/src/test/resources/HistoryServerExpectations/stage_blacklisting_for_stage_expectation.json
 ---
@@ -0,0 +1,639 @@
+{
--- End diff --

nit: "stage" twice in the filename is confusing, how about just 
"blacklisting_for_stage_expectation.json"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-16 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r161884916
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala ---
@@ -128,13 +130,17 @@ private[scheduler] class TaskSetBlacklist(val conf: 
SparkConf, val stageId: Int,
 }
 
 // Check if enough tasks have failed on the executor to blacklist it 
for the entire stage.
-if (execFailures.numUniqueTasksWithFailures >= 
MAX_FAILURES_PER_EXEC_STAGE) {
+val numFailures = execFailures.numUniqueTasksWithFailures
+if (numFailures >= MAX_FAILURES_PER_EXEC_STAGE) {
   if (blacklistedExecs.add(exec)) {
 logInfo(s"Blacklisting executor ${exec} for stage $stageId")
 // This executor has been pushed into the blacklist for this 
stage.  Let's check if it
 // pushes the whole node into the blacklist.
 val blacklistedExecutorsOnNode =
   execsWithFailuresOnNode.filter(blacklistedExecs.contains(_))
+val now = clock.getTimeMillis()
+listenerBus.post(
+  SparkListenerExecutorBlacklistedForStage(now, exec, numFailures, 
stageId, stageAttemptId))
 if (blacklistedExecutorsOnNode.size >= 
MAX_FAILED_EXEC_PER_NODE_STAGE) {
   if (blacklistedNodes.add(host)) {
 logInfo(s"Blacklisting ${host} for stage $stageId")
--- End diff --

if we're going to do this for executors, we should do it for nodes too.  In 
the UI, you'd just show for each executor that it was blacklisted for the 
stage, I dont think you would need to distinguish whether it was blacklisted 
b/c of the entire node, or just the one executor was blacklisted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...

2018-01-16 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20203#discussion_r161885207
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -211,6 +211,11 @@ private[spark] class AppStatusListener(
 updateBlackListStatus(event.executorId, true)
   }
 
+  override def onExecutorBlacklistedForStage(
+event: SparkListenerExecutorBlacklistedForStage): Unit = {
--- End diff --

double-indent this line (4 spaces)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20279
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86192/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20279
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20279
  
**[Test build #86192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86192/testReport)**
 for PR 20279 at commit 
[`7a0b564`](https://github.com/apache/spark/commit/7a0b564bd0c74525ebcea55b31f9658b1c2f0e12).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OneHotEncoderEstimator(JavaEstimator, HasInputCols, 
HasOutputCols, HasHandleInvalid,`
  * `class OneHotEncoderModel(JavaModel, JavaMLReadable, JavaMLWritable):`
  * `class HasOutputCols(Params):`
  * `class DataSourceRDDPartition[T : ClassTag](val index: Int, val 
readTask: ReadTask[T])`
  * `class DataSourceRDD[T: ClassTag](`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20138
  
**[Test build #86199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86199/testReport)**
 for PR 20138 at commit 
[`5bad2af`](https://github.com/apache/spark/commit/5bad2afa009f63f05c7cc8cc82099bda9030f180).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20282: [SPARK-23093][SS] Don't change run id when reconf...

2018-01-16 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/20282#discussion_r161884338
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 ---
@@ -210,15 +214,17 @@ class ContinuousExecution(
   lastExecution.executedPlan // Force the lazy generation of execution 
plan
 }
 
-sparkSession.sparkContext.setLocalProperty(
+sparkSessionForQuery.sparkContext.setLocalProperty(
   ContinuousExecution.START_EPOCH_KEY, currentBatchId.toString)
-sparkSession.sparkContext.setLocalProperty(
-  ContinuousExecution.RUN_ID_KEY, runId.toString)
+val epochCoordinatorId = UUID.randomUUID.toString
--- End diff --

could you add `run_id + random_uuid` so that it's easy to tell which query 
this epoch coordinator belongs to?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20229: [SPARK-23045][ML][SparkR] Update RFormula to use ...

2018-01-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20229


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20229: [SPARK-23045][ML][SparkR] Update RFormula to use OneHotE...

2018-01-16 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20229
  
This should go into branch-2.3 to fix the deprecation warning from the old 
OneHotEncoder, so I'll merge it with master and branch-2.3.  Thanks @MrBago  
and everyone who reviewed!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20282
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86198/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20282
  
**[Test build #86198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86198/testReport)**
 for PR 20282 at commit 
[`8092a4d`](https://github.com/apache/spark/commit/8092a4dfe07640ee473d76e4e88cd7e10f371be6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20282
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20282: [SPARK-23093][SS] Don't change run id when reconfiguring...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20282
  
**[Test build #86198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86198/testReport)**
 for PR 20282 at commit 
[`8092a4d`](https://github.com/apache/spark/commit/8092a4dfe07640ee473d76e4e88cd7e10f371be6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-16 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20138#discussion_r161882635
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -663,6 +665,95 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 freshUI.get.ui.store.job(0)
   }
 
+  test("clean up stale app information") {
+val storeDir = Utils.createTempDir()
+val conf = createTestConf().set(LOCAL_STORE_DIR, 
storeDir.getAbsolutePath())
+val provider = spy(new FsHistoryProvider(conf))
+val appId = "new1"
+
+// Write logs for two app attempts.
+doReturn(1L).when(provider).getNewLastScanTime()
+val attempt1 = newLogFile(appId, Some("1"), inProgress = false)
+writeFile(attempt1, true, None,
+  SparkListenerApplicationStart(appId, Some(appId), 1L, "test", 
Some("1")),
+  SparkListenerJobStart(0, 1L, Nil, null),
+  SparkListenerApplicationEnd(5L)
+  )
+val attempt2 = newLogFile(appId, Some("2"), inProgress = false)
+writeFile(attempt2, true, None,
+  SparkListenerApplicationStart(appId, Some(appId), 1L, "test", 
Some("2")),
+  SparkListenerJobStart(0, 1L, Nil, null),
+  SparkListenerApplicationEnd(5L)
+  )
+updateAndCheck(provider) { list =>
+  assert(list.size === 1)
+  assert(list(0).id === appId)
+  assert(list(0).attempts.size === 2)
+}
+
+// Load the app's UI.
+val ui = provider.getAppUI(appId, Some("1"))
+assert(ui.isDefined)
+
+// Delete the underlying log file for attempt 1 and rescan. The UI 
should go away, but since
+// attempt 2 still exists, listing data should be there.
+doReturn(2L).when(provider).getNewLastScanTime()
+attempt1.delete()
+updateAndCheck(provider) { list =>
+  assert(list.size === 1)
+  assert(list(0).id === appId)
+  assert(list(0).attempts.size === 1)
+}
+assert(!ui.get.valid)
+assert(provider.getAppUI(appId, None) === None)
+
+// Delete the second attempt's log file. Now everything should go away.
+doReturn(3L).when(provider).getNewLastScanTime()
+attempt2.delete()
+updateAndCheck(provider) { list =>
+  assert(list.isEmpty)
+}
+  }
+
+  test("SPARK-21571: clean up removes invalid history files") {
+val clock = new ManualClock(TimeUnit.DAYS.toMillis(120))
--- End diff --

This line:

val maxTime = clock.getTimeMillis() - conf.get(MAX_LOG_AGE_S) * 1000

Without that `maxTime` would be negative and that seems to be triggering a 
bug somewhere else. I need to take a look at exactly what's happening there, 
but it seems unrelated to this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86183/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20266: [SPARK-23072][SQL][TEST] Add a Unicode schema test for f...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20266
  
**[Test build #86183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86183/testReport)**
 for PR 20266 at commit 
[`fb708b7`](https://github.com/apache/spark/commit/fb708b70fe4bb5d29ef55ace7fc0aae61e831c03).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20282: [SPARK-23093][SS] Don't change run id when reconf...

2018-01-16 Thread jose-torres

GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/20282

[SPARK-23093][SS] Don't change run id when reconfiguring a continuous 
processing query.

## What changes were proposed in this pull request?

Keep the run ID static, using a different ID for the epoch coordinator to 
avoid cross-execution message contamination.

## How was this patch tested?

existing unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark fix-runid

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20282.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20282


commit a7c51813d36bdb09b98e119cd50ae9e8afcaca98
Author: Jose Torres 
Date:   2018-01-16T20:30:00Z

don't change run id when reconfiguring




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-16 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20138#discussion_r161880321
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -405,49 +404,70 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 try {
   val newLastScanTime = getNewLastScanTime()
   logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
-  // scan for modified applications, replay and merge them
-  val logInfos = Option(fs.listStatus(new 
Path(logDir))).map(_.toSeq).getOrElse(Nil)
+
+  val updated = Option(fs.listStatus(new 
Path(logDir))).map(_.toSeq).getOrElse(Nil)
 .filter { entry =>
   !entry.isDirectory() &&
 // FsHistoryProvider generates a hidden file which can't be 
read.  Accidentally
 // reading a garbage file is safe, but we would log an error 
which can be scary to
 // the end-user.
 !entry.getPath().getName().startsWith(".") &&
-SparkHadoopUtil.get.checkAccessPermission(entry, 
FsAction.READ) &&
-recordedFileSize(entry.getPath()) < entry.getLen()
+SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ)
+}
+.filter { entry =>
+  try {
+val info = listing.read(classOf[LogInfo], 
entry.getPath().toString())
+if (info.fileSize < entry.getLen()) {
+  // Log size has changed, it should be parsed.
+  true
+} else {
+  // If the SHS view has a valid application, update the time 
the file was last seen so
+  // that the entry is not deleted from the SHS listing.
+  if (info.appId.isDefined) {
+listing.write(info.copy(lastProcessed = newLastScanTime))
+  }
+  false
+}
+  } catch {
+case _: NoSuchElementException =>
+  // If the file is currently not being tracked by the SHS, 
add an entry for it and try
+  // to parse it. This will allow the cleaner code to detect 
the file as stale later on
+  // if it was not possible to parse it.
+  listing.write(LogInfo(entry.getPath().toString(), 
newLastScanTime, None, None,
+entry.getLen()))
+  entry.getLen() > 0
+  }
 }
 .sortWith { case (entry1, entry2) =>
   entry1.getModificationTime() > entry2.getModificationTime()
 }
 
-  if (logInfos.nonEmpty) {
-logDebug(s"New/updated attempts found: ${logInfos.size} 
${logInfos.map(_.getPath)}")
+  if (updated.nonEmpty) {
+logDebug(s"New/updated attempts found: ${updated.size} 
${updated.map(_.getPath)}")
   }
 
-  var tasks = mutable.ListBuffer[Future[_]]()
-
-  try {
-for (file <- logInfos) {
-  tasks += replayExecutor.submit(new Runnable {
-override def run(): Unit = mergeApplicationListing(file)
+  val tasks = updated.map { entry =>
+try {
+  replayExecutor.submit(new Runnable {
+override def run(): Unit = mergeApplicationListing(entry, 
newLastScanTime)
   })
+} catch {
+  // let the iteration over logInfos break, since an exception on
--- End diff --

Actually `RejectedExecutionException` shouldn't ever be thrown here. The 
executor doesn't have a bounded queue, and it's very unlikely you'll ever 
submit `Integer.MAX_VALUE` tasks here.

The code didn't use to catch any exception here (it was added along with 
the comment in a531fe1). Catching the exception doesn't do any harm, I just 
don't think this code will ever trigger.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20236
  
**[Test build #86197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86197/testReport)**
 for PR 20236 at commit 
[`626dd47`](https://github.com/apache/spark/commit/626dd47f7d59fca01e9fceb1e6455405ce57a6f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #86196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86196/testReport)**
 for PR 19872 at commit 
[`46db380`](https://github.com/apache/spark/commit/46db3802561088d1683681ea24f457c39226bdc5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)**
 for PR 20280 at commit 
[`315b8de`](https://github.com/apache/spark/commit/315b8de0fb3e7277b895b98769e52da7aaae32d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86193/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20280
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20281
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20281
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86195/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20281
  
**[Test build #86195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86195/testReport)**
 for PR 20281 at commit 
[`7fc1e00`](https://github.com/apache/spark/commit/7fc1e007a0d97ee5751117f73195a42759554e4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20280
  
@MrBago @HyukjinKwon I think the above behavior of the `Row` class is a 
little screwy, but at least this fixes it to be more consistent.  I'm not sure 
if there is a way to rectify the two different uses without breaking one way or 
the other.  Also to note, using kwargs the performance will likely be really 
poor because it must find the index for each field and this should maybe be 
discouraged.  cc @holdenk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20280
  
After looking into this, it seems like the behavior of the `Row` class is 
as follows:

If a `Row` is made from kwargs, then the order of the fields can not be 
relied upon and whenever accessing data, it must be done like a dict with the 
field name.  When this is the case, the order of the supplied schema doesn't 
matter but the field name must be a subset of what is in each row.

If a `Row` is made from generating a custom class, like `TestRow = 
Row("key", "value")` then `row = TestRow('a', 1)`, then the position of each 
element is what is important and data is accessed by position in the tuple.  
The supplied schema for this must match the types of the rows exactly, however 
field names are not important and can be changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20281: [SPARK-23089][STS] Recreate session log directory if it ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20281
  
**[Test build #86195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86195/testReport)**
 for PR 20281 at commit 
[`7fc1e00`](https://github.com/apache/spark/commit/7fc1e007a0d97ee5751117f73195a42759554e4c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20281: [SPARK-23089][STS] Recreate session log directory...

2018-01-16 Thread mgaido91

GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20281

[SPARK-23089][STS] Recreate session log directory if it doesn't exist

## What changes were proposed in this pull request?

When creating a session directory, Thrift should create the parent 
directory (i.e. /tmp/base_session_log_dir) if it is not present. It is common 
that many tools delete empty directories, so the directory may be deleted. This 
can cause the session log to be disabled.

This was fixed in HIVE-12262: this PR brings it in Spark too.

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23089

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20281






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-16 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/20259#discussion_r161866373
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -179,6 +181,7 @@ private[deploy] class Master(
 }
 persistenceEngine = persistenceEngine_
 leaderElectionAgent = leaderElectionAgent_
+startupTime = System.currentTimeMillis()
--- End diff --

well, my question is what's the difference for a master which just started 
10 mins ago and 1 hour ago in terms of stability?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19893: [SPARK-16139][TEST] Add logging functionality for...

2018-01-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19893


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20236
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86194/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20236
  
**[Test build #86194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86194/testReport)**
 for PR 20236 at commit 
[`4ab1202`](https://github.com/apache/spark/commit/4ab1202b5cde156df9746c688f15008309f482f9).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20259: [SPARK-23066][WEB-UI] Master Page increase master...

2018-01-16 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/20259#discussion_r161865952
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -179,6 +181,7 @@ private[deploy] class Master(
 }
 persistenceEngine = persistenceEngine_
 leaderElectionAgent = leaderElectionAgent_
+startupTime = System.currentTimeMillis()
--- End diff --

I think @guoxiaolongzte refers to how long has it been since last restart 
of master by "startupTime".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20280
  
**[Test build #86193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)**
 for PR 20280 at commit 
[`315b8de`](https://github.com/apache/spark/commit/315b8de0fb3e7277b895b98769e52da7aaae32d6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20236
  
**[Test build #86194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86194/testReport)**
 for PR 20236 at commit 
[`4ab1202`](https://github.com/apache/spark/commit/4ab1202b5cde156df9746c688f15008309f482f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

2018-01-16 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20280#discussion_r161865195
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self):
 self.assertEqual(df.schema.simpleString(), 
"struct")
 self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for 
i in range(100)])
 
-# field names can differ.
-df = rdd.toDF(" a: int, b: string ")
--- End diff --

This test was flawed because it only worked because ("a", "b") is in the 
same alphabetical order as ("key", "value").  If it was ("key", "aaa") then it 
would fail. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2018-01-16 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19893
  
Merging to master.

It would be nice to file a separate bug to eventually look at how to do 
this on the spark-hive module (or maybe it's just not worth the effort).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

2018-01-16 Thread BryanCutler

GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/20280

[SPARK-22232][PYTHON][SQL] Fixed Row pickling to include __from_dict__ flag

## What changes were proposed in this pull request?

When a `Row` object is created using kwargs, the order of the keywords can 
not be relied upon  (except for Python 3.5 that uses an OrderedDict).  The 
fields are sorted in the constructor and a flag `__from_dict__` is set to 
indicate that this object was created from kwargs so that other areas in Spark 
can access row data using field names instead of by position.  This change 
includes the `__from_dict__` flag only when pickling a Row that was made from 
kwargs so that the behavior is preserved if the Row becomes pickled.

## How was this patch tested?

Fixed existing tests that relied on fields and schema being in the same 
alphabetical order.  Added new test to create `Row` from positional arguments 
where order matters.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
pyspark-Row-serialize-SPARK-22232

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20280






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20279
  
**[Test build #86192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86192/testReport)**
 for PR 20279 at commit 
[`7a0b564`](https://github.com/apache/spark/commit/7a0b564bd0c74525ebcea55b31f9658b1c2f0e12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20279: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 ...

2018-01-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20279
  
**[Test build #86190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86190/testReport)**
 for PR 20279 at commit 
[`50a541b`](https://github.com/apache/spark/commit/50a541b5890f328a655a7ef1fca4f8480b9a35f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

301 - 400 of 644 matches

Mail list logo