[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20601
  
**[Test build #87487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87487/testReport)**
 for PR 20601 at commit 
[`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87483/testReport)**
 for PR 20511 at commit 
[`b42fd4d`](https://github.com/apache/spark/commit/b42fd4d4584277aaab925e3d6ed1125f474439e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20623
  
@squito The k8s test passed message is misleading :( The test is still 
running. Hope we don't break 2.3 build :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

2018-02-15 Thread squito
Github user squito closed the pull request at:

https://github.com/apache/spark/pull/20623


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20620
  
**[Test build #87494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87494/testReport)**
 for PR 20620 at commit 
[`bd46d1c`](https://github.com/apache/spark/commit/bd46d1cb63e7a04e0236f7b1bf70b46fb55f3ea4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87482/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20619
  
**[Test build #87482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87482/testReport)**
 for PR 20619 at commit 
[`43f809f`](https://github.com/apache/spark/commit/43f809fd2ff619c901e05bc062ab70aa65371a46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87493/testReport)**
 for PR 20057 at commit 
[`6c0d3df`](https://github.com/apache/spark/commit/6c0d3dfd415e5630dbb02ce65c6adf3db419bdec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20623
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/925/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20623
  
**[Test build #87492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87492/testReport)**
 for PR 20623 at commit 
[`f7a2282`](https://github.com/apache/spark/commit/f7a22827694a3aa92e8a7dd20195e2895e86880a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20623
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Tests are failing on a spark streaming test. I think it's probably because 
of the age of this PR, will rebase to get the changes into the PR that were 
merged into master since I opened the PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20601
  
ack I merged to master but screwed up on 2.3 -- fixing that here: 
https://github.com/apache/spark/pull/20623


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

2018-02-15 Thread squito
GitHub user squito opened a pull request:

https://github.com/apache/spark/pull/20623

[SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stag…

…e page

## What changes were proposed in this pull request?

Fixing exception got at sorting tasks by Host / Executor ID:
```
java.lang.IllegalArgumentException: Invalid sort column: Host
at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1017)
at 
org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:694)
at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:708)
at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
```

Moreover some refactoring to avoid similar problems by introducing 
constants for each header name and reusing them at the identification of the 
corresponding sorting index.

## How was this patch tested?

Manually:

![screen shot 2018-02-13 at 18 57 
10](https://user-images.githubusercontent.com/2017933/36166532-1cfdf3b8-10f3-11e8-8d32-5fcaad2af214.png)

Author: “attilapiros” 

Closes #20601 from attilapiros/SPARK-23413.

(cherry picked from commit 1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark fix_backport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20623


commit f7a22827694a3aa92e8a7dd20195e2895e86880a
Author: “attilapiros” 
Date:   2018-02-15T19:51:24Z

[SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stage page

## What changes were proposed in this pull request?

Fixing exception got at sorting tasks by Host / Executor ID:
```
java.lang.IllegalArgumentException: Invalid sort column: Host
at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1017)
at 
org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:694)
at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:708)
at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
```

Moreover some refactoring to avoid similar problems by introducing 
constants for each header name and reusing them at the identification of the 
corresponding sorting index.

## How was this patch tested?

Manually:

![screen shot 2018-02-13 at 18 57 
10](https://user-images.githubusercontent.com/2017933/36166532-1cfdf3b8-10f3-11e8-8d32-5fcaad2af214.png)

Author: “attilapiros” 

Closes #20601 from attilapiros/SPARK-23413.

(cherry picked from commit 1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...

2018-02-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20554#discussion_r168558972
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 ---
@@ -112,14 +112,18 @@ abstract class KafkaSourceTest extends StreamTest 
with SharedSQLContext {
 query.nonEmpty,
 "Cannot add data when there is no query for finding the active 
kafka source")
 
-  val sources = query.get.logicalPlan.collect {
-case StreamingExecutionRelation(source: KafkaSource, _) => source
-  } ++ (query.get.lastExecution match {
-case null => Seq()
-case e => e.logical.collect {
-  case DataSourceV2Relation(_, reader: KafkaContinuousReader) => 
reader
-}
-  })
+  val sources = {
+query.get.logicalPlan.collect {
+  case StreamingExecutionRelation(source: KafkaSource, _) => source
+  case StreamingExecutionRelation(source: KafkaMicroBatchReader, 
_) => source
+} ++ (query.get.lastExecution match {
+  case null => Seq()
+  case e => e.logical.collect {
+case DataSourceV2Relation(_, reader: KafkaContinuousReader) => 
reader
+  }
+})
+  }.distinct
--- End diff --

Is the distinct for the self join test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...

2018-02-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20554#discussion_r168559060
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
 ---
@@ -303,94 +302,75 @@ class KafkaMicroBatchSourceSuite extends 
KafkaSourceSuiteBase {
 )
   }
 
-  testWithUninterruptibleThread(
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...

2018-02-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20554#discussion_r168591005
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala
 ---
@@ -0,0 +1,410 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.IOUtils
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkEnv
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.execution.streaming.{HDFSMetadataLog, 
SerializedOffset}
+import 
org.apache.spark.sql.kafka010.KafkaSourceProvider.{INSTRUCTION_FOR_FAIL_ON_DATA_LOSS_FALSE,
 INSTRUCTION_FOR_FAIL_ON_DATA_LOSS_TRUE}
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+import org.apache.spark.sql.sources.v2.reader.{DataReader, 
DataReaderFactory}
+import org.apache.spark.sql.sources.v2.reader.streaming.{MicroBatchReader, 
Offset}
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.UninterruptibleThread
+
+/**
+ * A [[MicroBatchReader]] that reads data from Kafka.
+ *
+ * The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ * a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ * example if the last record in a Kafka topic "t", partition 2 is offset 
5, then
+ * KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ * with the semantics of `KafkaConsumer.position()`.
+ *
+ * Zero data lost is not guaranteed when topics are deleted. If zero data 
lost is critical, the user
+ * must make sure all messages in a topic have been processed when 
deleting a topic.
+ *
+ * There is a known issue caused by KAFKA-1894: the query using Kafka 
maybe cannot be stopped.
+ * To avoid this issue, you should make sure stopping the query before 
stopping the Kafka brokers
+ * and not use wrong broker addresses.
+ */
+private[kafka010] class KafkaMicroBatchReader(
+kafkaOffsetReader: KafkaOffsetReader,
+executorKafkaParams: ju.Map[String, Object],
+options: DataSourceOptions,
+metadataPath: String,
+startingOffsets: KafkaOffsetRangeLimit,
+failOnDataLoss: Boolean)
+  extends MicroBatchReader with Logging {
+
+  type PartitionOffsetMap = Map[TopicPartition, Long]
+
+  private var startPartitionOffsets: PartitionOffsetMap = _
+  private var endPartitionOffsets: PartitionOffsetMap = _
+
+  private val pollTimeoutMs = options.getLong(
+"kafkaConsumer.pollTimeoutMs",
+SparkEnv.get.conf.getTimeAsMs("spark.network.timeout", "120s"))
+
+  private val maxOffsetsPerTrigger =
+Option(options.get("maxOffsetsPerTrigger").orElse(null)).map(_.toLong)
+
+  /**
+   * Lazily initialize `initialPartitionOffsets` to make sure that 
`KafkaConsumer.poll` is only
+   * called in StreamExecutionThread. Otherwise, interrupting a thread 
while running
+   * `KafkaConsumer.poll` may hang forever (KAFKA-1894).
+   */
+  private lazy val initialPartitionOffsets = 
getOrCreateInitialPartitionOffsets()
+
+  override def setOffsetRange(start: ju.Optional[Offset], end: 
ju.Optional[Offset]): Unit = {
+// Make sure initialPartitionOffsets is initialized
+initialPartitionOffsets
+
+startPartitionOffsets = Option(start.orElse(null))
+.map(_.asInstanceOf[KafkaSourceOffset].partitionToOffsets)
+.getOrElse(initialPartitionOffsets)
+
+endPartitionOffsets = Option(end.orElse(null))
+.map(_.asInstanceOf[KafkaSourceOffset].partitionToOffsets)
+.g

[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...

2018-02-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20554#discussion_r168558562
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -306,7 +307,7 @@ private[kafka010] class KafkaSource(
 kafkaReader.close()
   }
 
-  override def toString(): String = s"KafkaSource[$kafkaReader]"
+  override def toString(): String = s"KafkaSourceV1[$kafkaReader]"
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...

2018-02-15 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20554#discussion_r167127585
  
--- Diff: 
external/kafka-0-10-sql/src/test/resources/kafka-source-initial-offset-version-2.1.0.bin
 ---
@@ -1 +1 @@
-2{"kafka-initial-offset-2-1-0":{"2":0,"1":0,"0":0}}
\ No newline at end of file
+2{"kafka-initial-offset-2-1-0":{"2":2,"1":1,"0":0}}
--- End diff --

Why does this need to be modified? The point of this file IIUC is to ensure 
that compatibility is maintained with offsets logged in old versions, so I 
worry something's wrong if we need to update it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...

2018-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20601


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20622
  
**[Test build #87491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87491/testReport)**
 for PR 20622 at commit 
[`3d8acd2`](https://github.com/apache/spark/commit/3d8acd2974d11a790ab9cd9338673bba18d683ac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20601
  
Everything that might have changed from this has passed, the failures are 
known flaky tests:

https://issues.apache.org/jira/browse/SPARK-23369

https://issues.apache.org/jira/browse/SPARK-23390

merging to master / 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20622
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20622
  
StreamingOuterJoinSuite failure is a known flakiness issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20622
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87486/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20622
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20622
  
**[Test build #87486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87486/testReport)**
 for PR 20622 at commit 
[`3d8acd2`](https://github.com/apache/spark/commit/3d8acd2974d11a790ab9cd9338673bba18d683ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87484/testReport)**
 for PR 20057 at commit 
[`3a7dda4`](https://github.com/apache/spark/commit/3a7dda4a0df8ef684d8fb803a98434c170953f4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20620
  
**[Test build #87489 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87489/testReport)**
 for PR 20620 at commit 
[`152fec4`](https://github.com/apache/spark/commit/152fec431218161e538c377a6cb82753100dc70b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20620
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87489/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20620
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20057
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87484/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87481/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20601
  
**[Test build #87481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87481/testReport)**
 for PR 20601 at commit 
[`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-15 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20594
  
Success!  Merged to branch-2.3 too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #87490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87490/testReport)**
 for PR 20295 at commit 
[`9ed3779`](https://github.com/apache/spark/commit/9ed3779b665c90e5bb25bc6636997a4b080c3d34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fallback to n...

2018-02-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20567#discussion_r168579785
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1941,12 +1941,24 @@ def toPandas(self):
 timezone = None
 
 if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", 
"false").lower() == "true":
+should_fallback = False
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
-_check_dataframe_localize_timestamps
+from pyspark.sql.types import to_arrow_schema
 from pyspark.sql.utils import 
require_minimum_pyarrow_version
-import pyarrow
 require_minimum_pyarrow_version()
+# Check if its schema is convertible in Arrow format.
+to_arrow_schema(self.schema)
+except Exception as e:
--- End diff --

Do we want to catch more specific exceptions here? i.e. `TypeError` and 
`ImportError`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/20295
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20622
  
thanks for pinging me @jose-torres! Unfortunately I don't know yet 
structured streaming codebase well enough to give a feedback. Thanks anyway for 
looking at it!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-02-15 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r168578156
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1313,131 +1313,178 @@ object functions {
   
//
 
   /**
-   * Computes the cosine inverse of the given value; the returned angle is 
in the range
-   * 0.0 through pi.
+   * @param e the value whose arc cosine is to be returned
+   * @return  cosine inverse of the given value in the range of 0.0 
through pi,
--- End diff --

I am not sure what you mean on _above_. Do you mean reverting this part of 
the change?

How about simply `@return the angle whose cosine is 'e'` and refer to 
java.lang.Math for further details?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20601
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20615: [SPARK-23430][WebUI]ApiHelper.COLUMN_TO_INDEX should mat...

2018-02-15 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20615
  
Great! Closing my PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20615: [SPARK-23430][WebUI]ApiHelper.COLUMN_TO_INDEX sho...

2018-02-15 Thread zsxwing
Github user zsxwing closed the pull request at:

https://github.com/apache/spark/pull/20615


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87479/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20424
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20424
  
**[Test build #87479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87479/testReport)**
 for PR 20424 at commit 
[`eceb24e`](https://github.com/apache/spark/commit/eceb24e61798f9e5da0ed3c4dfb94d677d08b10e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20620
  
**[Test build #87489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87489/testReport)**
 for PR 20620 at commit 
[`152fec4`](https://github.com/apache/spark/commit/152fec431218161e538c377a6cb82753100dc70b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/923/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87488/testReport)**
 for PR 20621 at commit 
[`6b56408`](https://github.com/apache/spark/commit/6b5640833a2d45986a0cf6074d7211a8ba9d2b3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20620
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20612: [SPARK-23424][SQL]Add codegenStageId in comment

2018-02-15 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20612
  
LGTM. Let's wait for one more LGTM from @gatorsmile / @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18672: [kafka010] Fix: On first run DirectKafkaInputDStream was...

2018-02-15 Thread oliviertoupin
Github user oliviertoupin commented on the issue:

https://github.com/apache/spark/pull/18672
  
My PR predate #19431, but essentially fix the same issue. Since #19431 have 
more traction and is more thorough, I'll close this one, and comment on the new 
one if necessary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18672: [kafka010] Fix: On first run DirectKafkaInputDStr...

2018-02-15 Thread oliviertoupin
Github user oliviertoupin closed the pull request at:

https://github.com/apache/spark/pull/18672


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87480/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #87480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87480/testReport)**
 for PR 20295 at commit 
[`9ed3779`](https://github.com/apache/spark/commit/9ed3779b665c90e5bb25bc6636997a4b080c3d34).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-15 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20594
  
Well I succeeded in merging this with master, but the merge script isn't 
working for branch-2.3.  I wait to see if the read-only repo syncs and fixes 
the issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20601
  
**[Test build #87487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87487/testReport)**
 for PR 20601 at commit 
[`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20594


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...

2018-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20617
  
Thank you always guys, @gatorsmile , @HyukjinKwon , and @mgaido91 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168557092
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -414,16 +417,16 @@ class ParquetFileFormat
 } else {
   new ParquetRecordReader[UnsafeRow](new 
ParquetReadSupport(convertTz))
 }
+val recordReaderIterator = new RecordReaderIterator(reader)
+// Register a task completion lister before `initalization`.
+taskContext.foreach(_.addTaskCompletionListener(_ => 
recordReaderIterator.close()))
 reader.initialize(split, hadoopAttemptContext)
-reader
+recordReaderIterator
   }
 
-  val iter = new RecordReaderIterator(parquetReader)
-  taskContext.foreach(_.addTaskCompletionListener(_ => iter.close()))
--- End diff --

According to the reported leakage, this is too late.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...

2018-02-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20620
  
@jose-torres @tdas @zsxwing could you take a look at this please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168556797
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -414,16 +417,16 @@ class ParquetFileFormat
 } else {
   new ParquetRecordReader[UnsafeRow](new 
ParquetReadSupport(convertTz))
 }
+val recordReaderIterator = new RecordReaderIterator(reader)
+// Register a task completion lister before `initalization`.
+taskContext.foreach(_.addTaskCompletionListener(_ => 
recordReaderIterator.close()))
 reader.initialize(split, hadoopAttemptContext)
-reader
+recordReaderIterator
   }
 
-  val iter = new RecordReaderIterator(parquetReader)
-  taskContext.foreach(_.addTaskCompletionListener(_ => iter.close()))
 
   // UnsafeRowParquetRecordReader appends the columns internally to 
avoid another copy.
-  if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] &&
-  enableVectorizedReader) {
+  if (enableVectorizedReader) {
--- End diff --

Yep. It looks possible. I'll update together after getting more reviews. 
Thanks, @kiszk .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20601
  
ah, flaky tests. retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87478/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20601
  
**[Test build #87478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87478/testReport)**
 for PR 20601 at commit 
[`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18672: [kafka010] Fix: On first run DirectKafkaInputDStream was...

2018-02-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/18672
  
Looks like a shortcut to me as well.
There is already a PR for backpressure:
https://github.com/apache/spark/pull/19431

Could you explain what exactly would like to achieve here?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-02-15 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20396
  
I think you can just add a follow-up


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...

2018-02-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20619
  
cc @ala @michal-databricks @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20617: [MINOR][SQL] Fix an error message about inserting...

2018-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20617


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-15 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20594
  
Merging with master and branch-2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-02-15 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20396
  
@srowen sorry, this needs also to add the parameter to the python API; 
given our discussion on JIRA, what should I do? Create a follow-up PR or a new 
ticket on JIRA?
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...

2018-02-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20617
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168552292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -414,16 +417,16 @@ class ParquetFileFormat
 } else {
   new ParquetRecordReader[UnsafeRow](new 
ParquetReadSupport(convertTz))
 }
+val recordReaderIterator = new RecordReaderIterator(reader)
+// Register a task completion lister before `initalization`.
+taskContext.foreach(_.addTaskCompletionListener(_ => 
recordReaderIterator.close()))
 reader.initialize(split, hadoopAttemptContext)
-reader
+recordReaderIterator
   }
 
-  val iter = new RecordReaderIterator(parquetReader)
-  taskContext.foreach(_.addTaskCompletionListener(_ => iter.close()))
 
   // UnsafeRowParquetRecordReader appends the columns internally to 
avoid another copy.
-  if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] &&
-  enableVectorizedReader) {
+  if (enableVectorizedReader) {
--- End diff --

Would it be possible to merge this if-statement into the above if-statement?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20622
  
**[Test build #87486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87486/testReport)**
 for PR 20622 at commit 
[`3d8acd2`](https://github.com/apache/spark/commit/3d8acd2974d11a790ab9cd9338673bba18d683ac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20622
  
cc @mgaido91 - this should completely resolve the other symptom you posted 
in SPARK-23416

cc @zsxwing @tdas 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20622
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20622: [SPARK-23441][SS] Remove queryExecutionThread.int...

2018-02-15 Thread jose-torres
GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/20622

[SPARK-23441][SS] Remove queryExecutionThread.interrupt() from 
ContinuousExecution

## What changes were proposed in this pull request?

Remove queryExecutionThread.interrupt() from ContinuousExecution. As 
detailed in the JIRA, interrupting the thread is only relevant in the 
microbatch case; for continuous processing the query execution can quickly 
clean itself up without.

## How was this patch tested?

existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark SPARK-23441

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20622


commit 3d8acd2974d11a790ab9cd9338673bba18d683ac
Author: Jose Torres 
Date:   2018-02-15T17:27:09Z

remove interrupt




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...

2018-02-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20617
  
Thank you for review, @mgaido91 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20057
  
@danielvdende @Fokko We definitely want to help the community replace Sqoop 
by Spark SQL. However, `truncate` is only used when users use 
SaveMode.Overwrite to write the external JDBC tables. In this 
specific scenario, Spark will truncate an existing table instead of dropping 
and recreating it. 

Could you show me the key missing features that are available in Sqoop but 
not in Spark SQL JDBC connectors?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87485/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87485/testReport)**
 for PR 20621 at commit 
[`2f05ab8`](https://github.com/apache/spark/commit/2f05ab8e82b0940e84cbe407abe49f72cddeef11).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87485/testReport)**
 for PR 20621 at commit 
[`2f05ab8`](https://github.com/apache/spark/commit/2f05ab8e82b0940e84cbe407abe49f72cddeef11).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/921/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...

2018-02-15 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20617
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-15 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20621
  
cc @cloud-fan @HyukjinKwon @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-15 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20621

[SPARK-23436][SQL] Infer partition as Date only if it can be casted to Date

## What changes were proposed in this pull request?

Before the patch, Spark could infer as Date a partition value which cannot 
be casted to Date (this can happen when there are extra characters after a 
valid date, like `2018-02-15AAA`).

When this happens and the input format has metadata which define the schema 
of the table, then `null` is returned as a value for the partition column, 
because the `cast` operator used in 
(`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.

The PR checks in the partition inference that values can be casted to Date 
and Timestamp, in order to infer that datatype to them.

## How was this patch tested?

added UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23436

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20621


commit 2f05ab8e82b0940e84cbe407abe49f72cddeef11
Author: Marco Gaido 
Date:   2018-02-15T16:59:20Z

[SPARK-23436][SQL] Infer partition as Date only if it can be casted to Date




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20057
  
**[Test build #87484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87484/testReport)**
 for PR 20057 at commit 
[`3a7dda4`](https://github.com/apache/spark/commit/3a7dda4a0df8ef684d8fb803a98434c170953f4c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87483/testReport)**
 for PR 20511 at commit 
[`b42fd4d`](https://github.com/apache/spark/commit/b42fd4d4584277aaab925e3d6ed1125f474439e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >