[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660433006


   **[Test build #126096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126096/testReport)**
 for PR 29014 at commit 
[`4a18813`](https://github.com/apache/spark/commit/4a188134a09b0ca9f6d8ee4c758ecdde7b237651).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-17 Thread GitBox


agrawaldevesh commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-660432463


   @holdenk ... It would be great for you to review this PR. This PR stems from 
your suggestion of plumbing a "isWorkerLost" to the executor decommission 
message, as you suggested in 
https://github.com/apache/spark/pull/29014#issuecomment-654420917. It does not 
introduce any other semantic changes. 
   
   I am keeping this as a separate PR instead of folding it into 
https://github.com/apache/spark/pull/29014, to keep the latter easy to revert. 
This PR (#29032) shouldn't be introducing any semantic changes and thus should 
be less risky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660432238







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660432238







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660431827


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126090/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660420607


   **[Test build #126090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126090/testReport)**
 for PR 29085 at commit 
[`5c049b5`](https://github.com/apache/spark/commit/5c049b50e81bac45634b9581efff2bc0cc0917b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660431825







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660431825


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660431781


   **[Test build #126090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126090/testReport)**
 for PR 29085 at commit 
[`5c049b5`](https://github.com/apache/spark/commit/5c049b50e81bac45634b9581efff2bc0cc0917b2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-660429662







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-660429662







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-17 Thread GitBox


SparkQA commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-660429520


   **[Test build #126095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126095/testReport)**
 for PR 29079 at commit 
[`d620940`](https://github.com/apache/spark/commit/d6209407731bbed2602c1d6a05c7c50982561faf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-17 Thread GitBox


viirya commented on pull request #29133:
URL: https://github.com/apache/spark/pull/29133#issuecomment-660429568


   I saw the screenshot of after this change. Can we also add one screenshot of 
before this change too?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751726



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -56,10 +65,85 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
 }
   }
 
-  def processIterator(
+  protected def initProc: (OutputStream, Process, InputStream, CircularBuffer) 
= {
+val cmd = List("/bin/bash", "-c", script)
+val builder = new ProcessBuilder(cmd.asJava)
+
+val proc = builder.start()
+val inputStream = proc.getInputStream
+val outputStream = proc.getOutputStream
+val errorStream = proc.getErrorStream
+
+// In order to avoid deadlocks, we need to consume the error output of the 
child process.
+// To avoid issues caused by large error output, we use a circular buffer 
to limit the amount
+// of error output that we retain. See SPARK-7862 for more discussion of 
the deadlock / hang
+// that motivates this.
+val stderrBuffer = new CircularBuffer(2048)
+new RedirectThread(
+  errorStream,
+  stderrBuffer,
+  s"Thread-${this.getClass.getSimpleName}-STDERR-Consumer").start()
+(outputStream, proc, inputStream, stderrBuffer)
+  }
+
+  protected def processIterator(
   inputIterator: Iterator[InternalRow],
   hadoopConf: Configuration): Iterator[InternalRow]
 
+  protected def createOutputIteratorWithoutSerde(
+  writerThread: BaseScriptTransformationWriterThread,
+  inputStream: InputStream,
+  proc: Process,
+  stderrBuffer: CircularBuffer): Iterator[InternalRow] = {
+new Iterator[InternalRow] {
+  var curLine: String = null
+  val reader = new BufferedReader(new InputStreamReader(inputStream, 
StandardCharsets.UTF_8))
+
+  val processRowWithoutSerde = if (!ioschema.schemaLess) {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"))
+  .zip(fieldWriters)
+  .map { case (data, writer) => writer(data) })
+  } else {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"), 2)
+  .map(CatalystTypeConverters.convertToCatalyst))
+  }
+

Review comment:
   @maropu  Here I change for support schema less mode.
   
   In test case I choose not to use sql since hive serde can't support 
schemaless mode well in spark's way.
   ```
   [info] - SPARK-25990: TRANSFORM should handle schema less correctly *** 
FAILED *** (360 milliseconds)
   [info]   Results do not match for Spark plan:
   [info]HiveScriptTransformation [a#86, b#87, c#88, d#89, e#90], python 
/Users/angerszhu/Documents/project/AngersZhu/spark/sql/core/target/test-classes/test_script.py,
 [key#96, value#97], 
ScriptTransformationIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim,
)),List((field.delim,   
)),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),true)
   [info]   +- Project [_1#75 AS a#86, _2#76 AS b#87, _3#77 AS c#88, _4#78 AS 
d#89, _5#79 AS e#90]
   [info]  +- LocalTableScan [_1#75, _2#76, _3#77, _4#78, _5#79]
   [info]
   [info]
   [info]== Results ==
   [info]!== Expected Answer - 3 ==== 
Actual Answer - 3 ==
   [info]   ![1,1   1.0 1.001969-12-31 
16:00:00.001]   [1,1]
   [info]   ![2,2   2.0 2.001969-12-31 
16:00:00.002]   [2,2]
   [info]   ![3,3   3.0 3.001969-12-31 
16:00:00.003]   [3,3] (SparkPlanTest.scala:96)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
   [info]   at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
   [i
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751726



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -56,10 +65,85 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
 }
   }
 
-  def processIterator(
+  protected def initProc: (OutputStream, Process, InputStream, CircularBuffer) 
= {
+val cmd = List("/bin/bash", "-c", script)
+val builder = new ProcessBuilder(cmd.asJava)
+
+val proc = builder.start()
+val inputStream = proc.getInputStream
+val outputStream = proc.getOutputStream
+val errorStream = proc.getErrorStream
+
+// In order to avoid deadlocks, we need to consume the error output of the 
child process.
+// To avoid issues caused by large error output, we use a circular buffer 
to limit the amount
+// of error output that we retain. See SPARK-7862 for more discussion of 
the deadlock / hang
+// that motivates this.
+val stderrBuffer = new CircularBuffer(2048)
+new RedirectThread(
+  errorStream,
+  stderrBuffer,
+  s"Thread-${this.getClass.getSimpleName}-STDERR-Consumer").start()
+(outputStream, proc, inputStream, stderrBuffer)
+  }
+
+  protected def processIterator(
   inputIterator: Iterator[InternalRow],
   hadoopConf: Configuration): Iterator[InternalRow]
 
+  protected def createOutputIteratorWithoutSerde(
+  writerThread: BaseScriptTransformationWriterThread,
+  inputStream: InputStream,
+  proc: Process,
+  stderrBuffer: CircularBuffer): Iterator[InternalRow] = {
+new Iterator[InternalRow] {
+  var curLine: String = null
+  val reader = new BufferedReader(new InputStreamReader(inputStream, 
StandardCharsets.UTF_8))
+
+  val processRowWithoutSerde = if (!ioschema.schemaLess) {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"))
+  .zip(fieldWriters)
+  .map { case (data, writer) => writer(data) })
+  } else {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"), 2)
+  .map(CatalystTypeConverters.convertToCatalyst))
+  }
+

Review comment:
   @maropu  Here I change for support schema less mode.
   
   In test case I choose not to use sql since hive serde can't support 
schemaless mode well in spark's way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751726



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -56,10 +65,85 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
 }
   }
 
-  def processIterator(
+  protected def initProc: (OutputStream, Process, InputStream, CircularBuffer) 
= {
+val cmd = List("/bin/bash", "-c", script)
+val builder = new ProcessBuilder(cmd.asJava)
+
+val proc = builder.start()
+val inputStream = proc.getInputStream
+val outputStream = proc.getOutputStream
+val errorStream = proc.getErrorStream
+
+// In order to avoid deadlocks, we need to consume the error output of the 
child process.
+// To avoid issues caused by large error output, we use a circular buffer 
to limit the amount
+// of error output that we retain. See SPARK-7862 for more discussion of 
the deadlock / hang
+// that motivates this.
+val stderrBuffer = new CircularBuffer(2048)
+new RedirectThread(
+  errorStream,
+  stderrBuffer,
+  s"Thread-${this.getClass.getSimpleName}-STDERR-Consumer").start()
+(outputStream, proc, inputStream, stderrBuffer)
+  }
+
+  protected def processIterator(
   inputIterator: Iterator[InternalRow],
   hadoopConf: Configuration): Iterator[InternalRow]
 
+  protected def createOutputIteratorWithoutSerde(
+  writerThread: BaseScriptTransformationWriterThread,
+  inputStream: InputStream,
+  proc: Process,
+  stderrBuffer: CircularBuffer): Iterator[InternalRow] = {
+new Iterator[InternalRow] {
+  var curLine: String = null
+  val reader = new BufferedReader(new InputStreamReader(inputStream, 
StandardCharsets.UTF_8))
+
+  val processRowWithoutSerde = if (!ioschema.schemaLess) {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"))
+  .zip(fieldWriters)
+  .map { case (data, writer) => writer(data) })
+  } else {
+prevLine: String =>
+  new GenericInternalRow(
+
prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"), 2)
+  .map(CatalystTypeConverters.convertToCatalyst))
+  }
+

Review comment:
   @maropu  Here I change for support schema less mode.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-17 Thread GitBox


c21 commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-660428967


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751690



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SparkScriptTransformationSuite.scala
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.functions.struct
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+class SparkScriptTransformationSuite extends BaseScriptTransformationSuite {
+
+  import spark.implicits._
+
+  override def scriptType: String = "SPARK"
+
+  test("SPARK-32106: SparkScriptTransformExec should handle different data 
types correctly") {

Review comment:
   > This test should be placed in `BaseScriptTransformationSuite`?
   
   Yea, moved





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751677



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala
##
@@ -172,19 +237,17 @@ case class HiveScriptTransformationExec(
 if (!hasNext) {
   throw new NoSuchElementException
 }
-if (outputSerde == null) {
+nextRow()
+  }
+
+  val nextRow: () => InternalRow = if (outputSerde == null) {

Review comment:
   > hm... could we write it like this here? 
[maropu@f3e05c6](https://github.com/maropu/spark/commit/f3e05c6e1ea1e195ff2cbc9e3aa70c45cf9cc79f)
   
   Changed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660428786







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660428786







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r456751646



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
##
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.sql.{Date, Timestamp}
+
+import org.scalatest.Assertions._
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.{SparkException, TaskContext, TestUtils}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.Column
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, Expression}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+abstract class BaseScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils
+  with BeforeAndAfterEach {
+  import testImplicits._
+  import ScriptTransformationIOSchema._
+
+  protected val uncaughtExceptionHandler = new TestUncaughtExceptionHandler
+
+  private var defaultUncaughtExceptionHandler: Thread.UncaughtExceptionHandler 
= _
+
+  protected override def beforeAll(): Unit = {
+super.beforeAll()
+defaultUncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler
+Thread.setDefaultUncaughtExceptionHandler(uncaughtExceptionHandler)
+  }
+
+  protected override def afterAll(): Unit = {
+super.afterAll()
+Thread.setDefaultUncaughtExceptionHandler(defaultUncaughtExceptionHandler)
+  }
+
+  override protected def afterEach(): Unit = {
+super.afterEach()
+uncaughtExceptionHandler.cleanStatus()
+  }
+
+  def isHive23OrSpark: Boolean
+
+  def createScriptTransformationExec(
+  input: Seq[Expression],
+  script: String,
+  output: Seq[Attribute],
+  child: SparkPlan,
+  ioschema: ScriptTransformationIOSchema): BaseScriptTransformationExec
+
+  test("cat without SerDe") {
+assume(TestUtils.testCommandAvailable("/bin/bash"))
+
+val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+checkAnswer(
+  rowsDf,
+  (child: SparkPlan) => createScriptTransformationExec(
+input = Seq(rowsDf.col("a").expr),
+script = "cat",
+output = Seq(AttributeReference("a", StringType)()),
+child = child,
+ioschema = defaultIOSchema
+  ),
+  rowsDf.collect())
+assert(uncaughtExceptionHandler.exception.isEmpty)
+  }
+
+  test("script transformation should not swallow errors from upstream 
operators (no serde)") {
+assume(TestUtils.testCommandAvailable("/bin/bash"))
+
+val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+val e = intercept[TestFailedException] {
+  checkAnswer(
+rowsDf,
+(child: SparkPlan) => createScriptTransformationExec(
+  input = Seq(rowsDf.col("a").expr),
+  script = "cat",
+  output = Seq(AttributeReference("a", StringType)()),
+  child = ExceptionInjectingOperator(child),
+  ioschema = defaultIOSchema
+),
+rowsDf.collect())
+}
+assert(e.getMessage().contains("intentional exception"))
+// Before SPARK-25158, uncaughtExceptionHandler will catch 
IllegalArgumentException
+assert(uncaughtExceptionHandler.exception.isEmpty)
+  }
+
+  test("SPARK-25990: TRANSFORM should handle different data types correctly") {
+assume(TestUtils.testCommandAvailable("python"))
+val scriptFilePath = getTestResourcePath("test_script.py")
+
+withTempView("v") {
+  val df = Seq(
+(1, "1", 1.0, BigDecimal(1.0), new Timestamp(1)),
+(2, "2", 2.0, BigDecimal(2.0), new Timestamp(2)),
+(3, "3", 3.0, BigDecimal(3.0), new Timestamp(3))
+  ).toDF("a", "b", "c", "d", "e") // Note column d's data type is 
Decimal(38, 18)
+  df.createTempView("v")
+
+  val query = sql(
+s"""
+   |SELECT
+   |TRANSFORM(a, b, c, d, e)
+   

[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660428628


   **[Test build #126094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126094/testReport)**
 for PR 29085 at commit 
[`6811721`](https://github.com/apache/spark/commit/6811721cb77c9f05d51ce1d6e269400265117bf3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660427858







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660427858







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


SparkQA commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660427670


   **[Test build #126093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126093/testReport)**
 for PR 29107 at commit 
[`c23898e`](https://github.com/apache/spark/commit/c23898ef1b120ea9e5d1659cdb76502214dce97b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


viirya commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660426922


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-660425624







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-660425624







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-660389881


   **[Test build #126081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126081/testReport)**
 for PR 29130 at commit 
[`f9479b6`](https://github.com/apache/spark/commit/f9479b6eea893663e7f8a6c92918101099c33ef5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-660425484


   **[Test build #126081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126081/testReport)**
 for PR 29130 at commit 
[`f9479b6`](https://github.com/apache/spark/commit/f9479b6eea893663e7f8a6c92918101099c33ef5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait ShuffledJoin extends BaseJoinExec `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660425265







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660425265







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660425094


   **[Test build #126092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126092/testReport)**
 for PR 29085 at commit 
[`04684a8`](https://github.com/apache/spark/commit/04684a89f2080dddb7b39ad5564eb42f7b0d00b9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660424078


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126085/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660424076


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660424076







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660410501


   **[Test build #126085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126085/testReport)**
 for PR 29014 at commit 
[`9f41504`](https://github.com/apache/spark/commit/9f415044673def2022b5c813205be2f2ed399045).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660423931


   **[Test build #126085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126085/testReport)**
 for PR 29014 at commit 
[`9f41504`](https://github.com/apache/spark/commit/9f415044673def2022b5c813205be2f2ed399045).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ExecutorDecommissionInfo(message: String, 
isHostDecommissioned: Boolean)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660423538


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126077/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660423535


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660423535







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660381500


   **[Test build #126077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126077/testReport)**
 for PR 29107 at commit 
[`c23898e`](https://github.com/apache/spark/commit/c23898ef1b120ea9e5d1659cdb76502214dce97b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


SparkQA commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-660423434


   **[Test build #126077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126077/testReport)**
 for PR 29107 at commit 
[`c23898e`](https://github.com/apache/spark/commit/c23898ef1b120ea9e5d1659cdb76502214dce97b).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660422979


   **[Test build #126091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126091/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AngersZh commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660422452


   > As @cloud-fan [suggested 
above](https://github.com/apache/spark/pull/29085#discussion_r454299734), you 
need to move the two tests into `sql/core`; could you check my PR for your 
branch? [AngersZh#5](https://github.com/AngersZh/spark/pull/5)
   
   Thanks a lot, I tried a lot but can't solve the conflict. Now I can 
understand the inheritance relationship more clearly



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660422415







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660422415







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


imback82 commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660422348


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] StefanXiepj commented on pull request #29144: [SPARK-15118][SQL] Spark SQLConf should support loading hive properties in hive-site.xml

2020-07-17 Thread GitBox


StefanXiepj commented on pull request #29144:
URL: https://github.com/apache/spark/pull/29144#issuecomment-660421270


   > #24489 has addressed your major concern. For the other conf, we can 
discuss it case by case. In general, we are not respecting Hive conf.
   
   @gatorsmile This pr will  be closed, thanks very much for taking the time to 
review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660420722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660420722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on pull request #29144: [SPARK-15118][SQL] Spark SQLConf should support loading hive properties in hive-site.xml

2020-07-17 Thread GitBox


gatorsmile commented on pull request #29144:
URL: https://github.com/apache/spark/pull/29144#issuecomment-660420728


   https://github.com/apache/spark/pull/24489 has addressed your major concern. 
For the other conf, we can discuss it case by case. In general, we are not 
respecting Hive conf. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-660420607


   **[Test build #126090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126090/testReport)**
 for PR 29085 at commit 
[`5c049b5`](https://github.com/apache/spark/commit/5c049b50e81bac45634b9581efff2bc0cc0917b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660419631


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126087/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660415855


   **[Test build #126087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126087/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660419628


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


SparkQA commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660419768


   **[Test build #126089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126089/testReport)**
 for PR 29135 at commit 
[`ba7c3a4`](https://github.com/apache/spark/commit/ba7c3a4d3ad5827f03c09356f93079732284e29d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660419628







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660419604


   **[Test build #126087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126087/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29150: [SPARK-32353][TEST] Update docker/spark-test and clean up unused stuff

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29150:
URL: https://github.com/apache/spark/pull/29150#issuecomment-660280141


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660418988







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660418988







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


SparkQA commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660418864


   **[Test build #126088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126088/testReport)**
 for PR 29135 at commit 
[`7159582`](https://github.com/apache/spark/commit/715958229de1f9193b18f8955ac6064403046b15).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


viirya commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r456742673



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -337,6 +337,10 @@ trait CheckAnalysis extends PredicateHelper {
 
   case Tail(limitExpr, _) => checkLimitLikeClause("tail", limitExpr)
 
+  case Union(_, byName, allowMissingCol) if byName || allowMissingCol 
=>
+failAnalysis("Union should not be with true `byName` or " +
+  "`allowMissingCol` flags after analysis phase.")

Review comment:
   And, yes, prevent a unexpected bug during analysis.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3676,3 +3678,63 @@ object UpdateOuterReferences extends Rule[LogicalPlan] {
 }
   }
 }
+
+/**
+ * Resolves different children of Union to a common set of columns. Note that 
this must be
+ * run before `TypeCoercion`, because `TypeCoercion` should be run on 
correctly resolved
+ * column by name.
+ */
+object ResolveUnion extends Rule[LogicalPlan] {
+  private def unionTwoSides(
+  left: LogicalPlan,
+  right: LogicalPlan,
+  allowMissingCol: Boolean): LogicalPlan = {
+val resolver = SQLConf.get.resolver
+val leftOutputAttrs = left.output
+val rightOutputAttrs = right.output
+
+// Builds a project list for `right` based on `left` output names
+val rightProjectList = leftOutputAttrs.map { lattr =>
+  rightOutputAttrs.find { rattr => resolver(lattr.name, rattr.name) 
}.getOrElse {
+if (allowMissingCol) {
+  Alias(Literal(null, lattr.dataType), lattr.name)()
+} else {
+  throw new AnalysisException(
+s"""Cannot resolve column name "${lattr.name}" among """ +
+  s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""")
+}
+  }
+}
+
+// Delegates failure checks to `CheckAnalysis`
+val notFoundAttrs = rightOutputAttrs.diff(rightProjectList)
+val rightChild = Project(rightProjectList ++ notFoundAttrs, right)
+
+// Builds a project for `logicalPlan` based on `right` output names, if 
allowing
+// missing columns.
+val leftChild = if (allowMissingCol) {
+  val missingAttrs = notFoundAttrs.map { attr =>
+Alias(Literal(null, attr.dataType), attr.name)()
+  }
+  if (missingAttrs.nonEmpty) {
+Project(leftOutputAttrs ++ missingAttrs, left)
+  } else {
+left
+  }
+} else {
+  left
+}
+Union(leftChild, rightChild)
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
+case e if !e.childrenResolved => e
+
+case Union(children, byName, allowMissingCol)
+  if byName =>

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


viirya commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r456742634



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -337,6 +337,10 @@ trait CheckAnalysis extends PredicateHelper {
 
   case Tail(limitExpr, _) => checkLimitLikeClause("tail", limitExpr)
 
+  case Union(_, byName, allowMissingCol) if byName || allowMissingCol 
=>
+failAnalysis("Union should not be with true `byName` or " +
+  "`allowMissingCol` flags after analysis phase.")

Review comment:
   Usually not. This mainly prevents we accidentally create a Union with 
byName or allowMissingCol after ResolveUnion rule.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660415996







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660415996







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660415855


   **[Test build #126087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126087/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


imback82 commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660415514


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


maropu commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r456741460



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -337,6 +337,10 @@ trait CheckAnalysis extends PredicateHelper {
 
   case Tail(limitExpr, _) => checkLimitLikeClause("tail", limitExpr)
 
+  case Union(_, byName, allowMissingCol) if byName || allowMissingCol 
=>
+failAnalysis("Union should not be with true `byName` or " +
+  "`allowMissingCol` flags after analysis phase.")

Review comment:
   Just a question; users can see this error message? That's the case of an 
analyzer bug?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29151: [SPARK-29909][BUILD] Use python3 in build scripts

2020-07-17 Thread GitBox


holdenk commented on pull request #29151:
URL: https://github.com/apache/spark/pull/29151#issuecomment-660413587


   The pip packaging failure is interesting. Let me know if you want me to
   take a deeper look at our packaging tests.
   
   On Fri, Jul 17, 2020 at 7:00 PM UCB AMPLab  wrote:
   
   > Test FAILed.
   > Refer to this link for build results (access rights to CI server needed):
   > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126079/
   > Test FAILed.
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   -- 
   Cell : 425-233-8271
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29143:
URL: https://github.com/apache/spark/pull/29143#issuecomment-660413372







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29143:
URL: https://github.com/apache/spark/pull/29143#issuecomment-660413372







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox


SparkQA commented on pull request #29143:
URL: https://github.com/apache/spark/pull/29143#issuecomment-660413170


   **[Test build #126086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126086/testReport)**
 for PR 29143 at commit 
[`66bf522`](https://github.com/apache/spark/commit/66bf5224a81a26e59f1a2dad497d1db8e84f6788).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


agrawaldevesh commented on a change in pull request #29014:
URL: https://github.com/apache/spark/pull/29014#discussion_r456736612



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1767,8 +1767,13 @@ private[spark] class DAGScheduler(
 
   // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
   if (bmAddress != null) {
-val hostToUnregisterOutputs = if 
(env.blockManager.externalShuffleServiceEnabled &&
-  unRegisterOutputOnHostOnFetchFailure) {
+val externalShuffleServiceEnabled = 
env.blockManager.externalShuffleServiceEnabled
+val isHostDecommissioned = taskScheduler
+  .getExecutorDecommissionInfo(bmAddress.executorId)
+  .exists(_.isHostDecommissioned)

Review comment:
   `isShuffleLost` is not very applicable here. Actually that method was 
too narrow in scope so I inlined it. It strictly means that shuffle is lost for 
this executor. 
   
   In this context, we already know that shuffle is lost for the executor: We 
are simply trying to determine if it is also lost for the entire host. I 
updated the code to reflect the logic/intent better. 
   
   I will create a follow up Jira under the master ticket to track changing 
this logic when "Local Fetch" is merged in.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-17 Thread GitBox


maropu commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r456740147



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3676,3 +3678,63 @@ object UpdateOuterReferences extends Rule[LogicalPlan] {
 }
   }
 }
+
+/**
+ * Resolves different children of Union to a common set of columns. Note that 
this must be
+ * run before `TypeCoercion`, because `TypeCoercion` should be run on 
correctly resolved
+ * column by name.
+ */
+object ResolveUnion extends Rule[LogicalPlan] {
+  private def unionTwoSides(
+  left: LogicalPlan,
+  right: LogicalPlan,
+  allowMissingCol: Boolean): LogicalPlan = {
+val resolver = SQLConf.get.resolver
+val leftOutputAttrs = left.output
+val rightOutputAttrs = right.output
+
+// Builds a project list for `right` based on `left` output names
+val rightProjectList = leftOutputAttrs.map { lattr =>
+  rightOutputAttrs.find { rattr => resolver(lattr.name, rattr.name) 
}.getOrElse {
+if (allowMissingCol) {
+  Alias(Literal(null, lattr.dataType), lattr.name)()
+} else {
+  throw new AnalysisException(
+s"""Cannot resolve column name "${lattr.name}" among """ +
+  s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""")
+}
+  }
+}
+
+// Delegates failure checks to `CheckAnalysis`
+val notFoundAttrs = rightOutputAttrs.diff(rightProjectList)
+val rightChild = Project(rightProjectList ++ notFoundAttrs, right)
+
+// Builds a project for `logicalPlan` based on `right` output names, if 
allowing
+// missing columns.
+val leftChild = if (allowMissingCol) {
+  val missingAttrs = notFoundAttrs.map { attr =>
+Alias(Literal(null, attr.dataType), attr.name)()
+  }
+  if (missingAttrs.nonEmpty) {
+Project(leftOutputAttrs ++ missingAttrs, left)
+  } else {
+left
+  }
+} else {
+  left
+}
+Union(leftChild, rightChild)
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
+case e if !e.childrenResolved => e
+
+case Union(children, byName, allowMissingCol)
+  if byName =>

Review comment:
   nit: `case Union(children, byName, allowMissingCol) if byName =>`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660412892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] StefanXiepj commented on pull request #29144: [SPARK-15118][SQL] Spark SQLConf should support loading hive properties in hive-site.xml

2020-07-17 Thread GitBox


StefanXiepj commented on pull request #29144:
URL: https://github.com/apache/spark/pull/29144#issuecomment-660412941


   hi @maropu ,
   
   Could you please take a look at this and give some advice? Thx.
   
   Jeff.r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660412892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660381558


   **[Test build #126078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126078/testReport)**
 for PR 28708 at commit 
[`16b7376`](https://github.com/apache/spark/commit/16b7376f39cf8ba27fae898a5dd58ba6e64f38f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660412762


   **[Test build #126078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126078/testReport)**
 for PR 28708 at commit 
[`16b7376`](https://github.com/apache/spark/commit/16b7376f39cf8ba27fae898a5dd58ba6e64f38f9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660411447


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126080/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660411445


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660411445







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660384743


   **[Test build #126080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126080/testReport)**
 for PR 28708 at commit 
[`8494bdd`](https://github.com/apache/spark/commit/8494bdd94285c7cc5a41e151da920710be7f4671).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox


SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-660411307


   **[Test build #126080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126080/testReport)**
 for PR 28708 at commit 
[`8494bdd`](https://github.com/apache/spark/commit/8494bdd94285c7cc5a41e151da920710be7f4671).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660410674







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660410624


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126075/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660410674







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660410623


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660410623







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660337252


   **[Test build #126075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126075/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-660410501


   **[Test build #126085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126085/testReport)**
 for PR 29014 at commit 
[`9f41504`](https://github.com/apache/spark/commit/9f415044673def2022b5c813205be2f2ed399045).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox


SparkQA commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-660410438


   **[Test build #126075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126075/testReport)**
 for PR 28676 at commit 
[`9caeecd`](https://github.com/apache/spark/commit/9caeecddaa07ef825b73835a3666502df468f881).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660406839







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


AmplabJenkins commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660406839







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29151: [SPARK-29909][BUILD] Use python3 in build scripts

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29151:
URL: https://github.com/apache/spark/pull/29151#issuecomment-660406373


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126079/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox


SparkQA commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-660406621


   **[Test build #126084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126084/testReport)**
 for PR 29135 at commit 
[`2253499`](https://github.com/apache/spark/commit/2253499de077b26aa77c647af3c514abfba4483e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29151: [SPARK-29909][BUILD] Use python3 in build scripts

2020-07-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29151:
URL: https://github.com/apache/spark/pull/29151#issuecomment-660406371


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >