[GitHub] [spark] sathyaprakashg edited a comment on pull request #28703: SPARK-29897 Add implicit cast for SubtractTimestamps

2020-06-07 Thread GitBox


sathyaprakashg edited a comment on pull request #28703:
URL: https://github.com/apache/spark/pull/28703#issuecomment-640384737


   @bart-samwel  If you are referring to two test new statements i added, both 
are actually having timestamp in both left and right expression and returns 
interval. So, we have only one type, which is 
   `timestamp - timestamp returns interval`
   In the SubtractTimestamps case class we can see input date types for both 
left and right expressions are timestamp
   `override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType, 
TimestampType)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sathyaprakashg commented on pull request #28703: SPARK-29897 Add implicit cast for SubtractTimestamps

2020-06-07 Thread GitBox


sathyaprakashg commented on pull request #28703:
URL: https://github.com/apache/spark/pull/28703#issuecomment-640384737


   @bart-samwel  If you are referring to two test new statements i added, both 
are actually having timestamp in both left and right expression and returns 
interval. So, we have only one type, which is 
   timestamp - timestamp returns interval



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28749:
URL: https://github.com/apache/spark/pull/28749#issuecomment-640381851







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-07 Thread GitBox


SparkQA commented on pull request #28749:
URL: https://github.com/apache/spark/pull/28749#issuecomment-640383499


   **[Test build #123615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123615/testReport)**
 for PR 28749 at commit 
[`87113bc`](https://github.com/apache/spark/commit/87113bc38da7f2d3de8501b8745b6def4f33a6e3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28749:
URL: https://github.com/apache/spark/pull/28749#issuecomment-640381851







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-07 Thread GitBox


HyukjinKwon opened a new pull request #28749:
URL: https://github.com/apache/spark/pull/28749


   ### What changes were proposed in this pull request?
   
   This PR proposes to use existing util 
`org.apache.spark.util.Utils.exceptionString` for the same codes at:
   
   ```
   jwriter = jvm.java.io.StringWriter()
   e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
   stacktrace = jwriter.toString()
   ```
   
   ### Why are the changes needed?
   
   To deduplicate codes. Plus, less communication between JVM and Py4j.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually tested.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-07 Thread GitBox


HyukjinKwon commented on pull request #28749:
URL: https://github.com/apache/spark/pull/28749#issuecomment-640381524


   @ueshin, can you take a quick look when you're available?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28748:
URL: https://github.com/apache/spark/pull/28748#issuecomment-640376942







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28748:
URL: https://github.com/apache/spark/pull/28748#issuecomment-640376942







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab

2020-06-07 Thread GitBox


SparkQA commented on pull request #28748:
URL: https://github.com/apache/spark/pull/28748#issuecomment-640376563


   **[Test build #123614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123614/testReport)**
 for PR 28748 at commit 
[`9a6e5d9`](https://github.com/apache/spark/commit/9a6e5d9f199109fbaa073c7c728cbe1d99830060).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab

2020-06-07 Thread GitBox


iRakson commented on pull request #28748:
URL: https://github.com/apache/spark/pull/28748#issuecomment-640375459


   cc @sarutak Kindly take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson opened a new pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab

2020-06-07 Thread GitBox


iRakson opened a new pull request #28748:
URL: https://github.com/apache/spark/pull/28748


   
   
   ### What changes were proposed in this pull request?
   #28747 reverted #28439 due to some flaky test case. This PR fixes the flaky 
test and adds pagination support.
   
   WIP tag is added just to test whether this PR is working fine or not.
   
   
   
   ### Why are the changes needed?
   To support pagination for streaming tab
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, Now streaming tab tables will be paginated.
   
   
   
   ### How was this patch tested?
   Manually.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-06-07 Thread GitBox


Ngone51 commented on pull request #28258:
URL: https://github.com/apache/spark/pull/28258#issuecomment-640365869


   LGTM. I also tested manually with 
`spark.standalone.submit.waitAppCompletion` on/off and exceptional case of 
Master exiting. Everything looks fine!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


gerashegalov commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r436452604



##
File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
##
@@ -74,6 +74,10 @@ class LocalSparkCluster(
 
   def stop(): Unit = {
 logInfo("Shutting down local Spark cluster.")
+// SPARK-31922: wait one more second before shutting down rpcEnvs of 
master and worker,
+// in order to let the cluster have time to handle the 
`UnregisterApplication` message.
+// Otherwise, we could hit "RpcEnv already stopped" error.
+Thread.sleep(1000)
 // Stop the workers before the master so they don't get upset that it 
disconnected
 workerRpcEnvs.foreach(_.shutdown())

Review comment:
   you may be right about this but this contradicts to the 
[scaladoc](https://github.com/apache/spark/blob/264b0f36cedacd9a22b45a3e14b2186230432be6/core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala#L119)
 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640357727







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640357727







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640357312


   **[Test build #123613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123613/testReport)**
 for PR 28593 at commit 
[`3fd6d02`](https://github.com/apache/spark/commit/3fd6d02d66aa2abfbe80450366a1d25a332e66ee).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


Ngone51 commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r436446619



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##
@@ -557,7 +557,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 }
   }
 
-  override def reviveOffers(): Unit = {
+  override def reviveOffers(): Unit = Utils.tryLogNonFatalError {

Review comment:
   This change fixes the failure of test 
`org.apache.spark.launcher.LauncherBackendSuite.standalone/client: launcher 
handle`. After sleeping one more second, the application launched by the 
`SparkLauncher` now has a chance to submit tasks to TaskScheduler and call 
`reviveOffers` on the SchedulerBackend. At the same time, the `SparkLauncher` 
will ask the application to stop. Therefore, the SchedulerBackend could have 
been already stopped when it receives `ReviveOffers` messages, which would fail 
the entire application at the end.
   
   So, I use ` Utils.tryLogNonFatalError` to fix it and I think this should be 
fine since we've already use it at:
   
   
https://github.com/apache/spark/blob/c560428fe0113f17362bae2b369910049914696f/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L137-L139
   






This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28746:
URL: https://github.com/apache/spark/pull/28746#issuecomment-640349737







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28746:
URL: https://github.com/apache/spark/pull/28746#issuecomment-640349737







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


SparkQA commented on pull request #28746:
URL: https://github.com/apache/spark/pull/28746#issuecomment-640349410


   **[Test build #123612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123612/testReport)**
 for PR 28746 at commit 
[`eba978e`](https://github.com/apache/spark/commit/eba978eb0b76fa7ed1a4ebdf268666949f8bcf64).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


Ngone51 commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r436445018



##
File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
##
@@ -74,6 +74,10 @@ class LocalSparkCluster(
 
   def stop(): Unit = {
 logInfo("Shutting down local Spark cluster.")
+// SPARK-31922: wait one more second before shutting down rpcEnvs of 
master and worker,
+// in order to let the cluster have time to handle the 
`UnregisterApplication` message.
+// Otherwise, we could hit "RpcEnv already stopped" error.
+Thread.sleep(1000)
 // Stop the workers before the master so they don't get upset that it 
disconnected
 workerRpcEnvs.foreach(_.shutdown())

Review comment:
   It's not really necessary since shutdown is performed in a 
synchronization way. Therefore, the worker will close the connection to the 
master firstly. And `awaitTermination()` doesn't make sure everything stops but 
only the `Dispather`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


wangyum commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r436442434



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1039,7 +1039,7 @@ class JoinSuite extends QueryTest with SharedSparkSession 
with AdaptiveSparkPlan
 val pythonEvals = collect(joinNode.get) {
   case p: BatchEvalPythonExec => p
 }
-assert(pythonEvals.size == 2)
+assert(pythonEvals.size == 4)

Review comment:
   @HyukjinKwon I'm not sure if this change can optimize python udf?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-640344367







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-640344367







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD

2020-06-07 Thread GitBox


SparkQA commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-640343993


   **[Test build #123611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123611/testReport)**
 for PR 28737 at commit 
[`1e17fd0`](https://github.com/apache/spark/commit/1e17fd0c05849308b68481ecb609a15e19ee962e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD

2020-06-07 Thread GitBox


uncleGen commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-640342988


   cc @cloud-fan @xuanyuanking 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on a change in pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page

2020-06-07 Thread GitBox


iRakson commented on a change in pull request #28439:
URL: https://github.com/apache/spark/pull/28439#discussion_r436440515



##
File path: 
streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala
##
@@ -125,24 +125,47 @@ class UISeleniumSuite
 
 // Check batch tables
 val h4Text = findAll(cssSelector("h4")).map(_.text).toSeq
-h4Text.exists(_.matches("Active Batches \\(\\d+\\)")) should be (true)
+h4Text.exists(_.matches("Running Batches \\(\\d+\\)")) should be (true)

Review comment:
   This is causing all the failures. I will remove these tests and raise 
again





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640341115


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123609/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #27986:
URL: https://github.com/apache/spark/pull/27986#issuecomment-640341165







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640341106


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640310014


   **[Test build #123609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)**
 for PR 28593 at commit 
[`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #27986:
URL: https://github.com/apache/spark/pull/27986#issuecomment-640341165







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640341106







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640340563


   **[Test build #123609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)**
 for PR 28593 at commit 
[`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

2020-06-07 Thread GitBox


SparkQA commented on pull request #27986:
URL: https://github.com/apache/spark/pull/27986#issuecomment-640339926


   **[Test build #123610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123610/testReport)**
 for PR 27986 at commit 
[`1e6ed30`](https://github.com/apache/spark/commit/1e6ed30f12d4a3ed50f647a3b9b848a2e5b547b8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


viirya commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436436434



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -608,10 +608,14 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 execution.MapPartitionsInRWithArrowExec(
   f, p, b, is, ot, planLater(child)) :: Nil
   case logical.FlatMapGroupsInPandas(grouping, func, output, child) =>
-execution.python.FlatMapGroupsInPandasExec(grouping, func, output, 
planLater(child)) :: Nil
-  case logical.FlatMapCoGroupsInPandas(leftGroup, rightGroup, func, 
output, left, right) =>
+val groupingExprs = grouping.map(NamedExpression.fromExpression)
+execution.python.FlatMapGroupsInPandasExec(
+  groupingExprs, func, output, planLater(child)) :: Nil
+  case logical.FlatMapCoGroupsInPandas(leftExprs, rightExprs, func, 
output, left, right) =>
+val leftAttrs = leftExprs.map(NamedExpression.fromExpression)
+val rightAttrs = rightExprs.map(NamedExpression.fromExpression)
 execution.python.FlatMapCoGroupsInPandasExec(
-  leftGroup, rightGroup, func, output, planLater(left), 
planLater(right)) :: Nil
+  leftAttrs, rightAttrs, func, output, planLater(left), 
planLater(right)) :: Nil

Review comment:
   leftNamedExprs/rightNamedExprs or leftGroupingExprs/rightGroupingExprs? 
They are not attributes actually.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   ok, looks good after re-checking. 

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInPandasExec.scala
##
@@ -60,42 +60,51 @@ case class FlatMapCoGroupsInPandasExec(
   private val pythonRunnerConf = ArrowUtils.getPythonRunnerConfMap(conf)
   private val pandasFunction = func.asInstanceOf[PythonUDF].func
   private val chainedFunc = Seq(ChainedPythonFunctions(Seq(pandasFunction)))
+  private val inputExprs =
+func.asInstanceOf[PythonUDF].children.map(_.asInstanceOf[NamedExpression])
+  private val leftExprs =
+left.output.filter(e => inputExprs.exists(_.semanticEquals(e)))
+  private val 

[GitHub] [spark] turboFei removed a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExists

2020-06-07 Thread GitBox


turboFei removed a comment on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-632923671


   have sent an email into that email thread, thanks a lot @Ngone51 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak edited a comment on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page

2020-06-07 Thread GitBox


sarutak edited a comment on pull request #28439:
URL: https://github.com/apache/spark/pull/28439#issuecomment-640322160


   The suite passes on my laptop with both sbt and Maven so the suite can be 
flaky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page

2020-06-07 Thread GitBox


sarutak commented on pull request #28439:
URL: https://github.com/apache/spark/pull/28439#issuecomment-640322160


   The suite pass on my laptop with both sbt and Maven so the suite can be 
flaky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD

2020-06-07 Thread GitBox


uncleGen commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-640320765


   pending on fix ut failure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


HyukjinKwon commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436425345



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   Just for doubly sure, `column + 1` case is being tested at 
https://github.com/apache/spark/blob/ab0890bdb18dcd0441f6082afbe4c84219611e87/python/pyspark/sql/tests/test_pandas_cogrouped_map.py#L161-L174
   I know it because it failed during I prepare the fix :D.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


HyukjinKwon commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436425399



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   Just for doubly sure, `column + 1` case is being tested at 
https://github.com/apache/spark/blob/ab0890bdb18dcd0441f6082afbe4c84219611e87/python/pyspark/sql/tests/test_pandas_cogrouped_map.py#L161-L174





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


HyukjinKwon commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436424768



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
##
@@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark
-import org.apache.spark.sql.catalyst.util.quoteIdentifier
+import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL}
 import org.apache.spark.sql.types._
 
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
   private[expressions] val jvmId = UUID.randomUUID()
   def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId)
   def unapply(expr: NamedExpression): Option[(String, DataType)] = 
Some((expr.name, expr.dataType))
+  def fromExpression(expr: Expression): NamedExpression = expr match {
+case ne: NamedExpression => ne
+case _: Expression => Alias(expr, toPrettySQL(expr))()
+  }

Review comment:
   Yeah, let me take a look separate with a separate JIRA.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


HyukjinKwon commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436424483



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   Actually the `groupingExprs` will be projected at 
https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4R66
 for the input iterator before the actual execution.
   
   The `groupingExprs` were already dropped in this code without this fix 
https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4L93
   
   I believe there's no difference virtually in the execution path here.
   
   For analysis,
   
   With this change: `groupingExprs` at `FlatMapGroupsInPandasExec`, for 
example, `column + 1`.  The attributes `column` inside `column + 1` will be 
properly resolved, and then it becomes an alias to project later during 
execution.
   
   Without this change: `Project`'s output contains the grouping expression as 
a separate attribute reference, `column + 1` (whereas the current fix keeps it 
as an expression). `FlatMapGroupsInPandasExec` contains the attribute reference 
as a grouping expression , and this grouping attribute will be used to project 
later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


HyukjinKwon commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436424483



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   Actually the `groupingExprs` will be projected at 
https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4R66
 for the input iterator before the actual execution.
   
   The `groupingExprs` were already dropped in this code without this fix 
https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4L93
   
   I believe there's no difference virtually in the execution path here.
   
   For analysis,
   
   With this change: `groupingExprs` at `FlatMapGroupsInPandasExec`, for 
example, `column + 1`.  The attributes `column` inside `column + 1` will be 
properly resolved, and then it becomes an alias to project later during 
execution.
   
   Without this change: `Project`'s output contains the grouping expression as 
a separate attribute reference, `column + 1` (whereas the current fix keeps it 
as an expression). `FlatMapGroupsInPandasExec` contains the attribute reference 
as a grouping expression , and this grouping expression will be used to project 
later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


gerashegalov commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r436423623



##
File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
##
@@ -74,6 +74,10 @@ class LocalSparkCluster(
 
   def stop(): Unit = {
 logInfo("Shutting down local Spark cluster.")
+// SPARK-31922: wait one more second before shutting down rpcEnvs of 
master and worker,
+// in order to let the cluster have time to handle the 
`UnregisterApplication` message.
+// Otherwise, we could hit "RpcEnv already stopped" error.
+Thread.sleep(1000)
 // Stop the workers before the master so they don't get upset that it 
disconnected
 workerRpcEnvs.foreach(_.shutdown())

Review comment:
   additionally, there might be a problem with the comment 
   >  // Stop the workers before the master so they don't get upset that it 
disconnected
   
   and the implementation. the code does not really wait for workers to stop 
before the master. we could rewrite it like:
   ```
   Seq(workerRpcEnvs, masterRpcEnvs).foreach { rpcEnvArr =>
 rpcEnvArr.foreach(rpcEnv => Utils.tryLog {
   rpcEnv.shutdown()
   rpcEnv.awaitTermination()
 })
 rpcEnvArr.clear()
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode

2020-06-07 Thread GitBox


gerashegalov commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r436423623



##
File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
##
@@ -74,6 +74,10 @@ class LocalSparkCluster(
 
   def stop(): Unit = {
 logInfo("Shutting down local Spark cluster.")
+// SPARK-31922: wait one more second before shutting down rpcEnvs of 
master and worker,
+// in order to let the cluster have time to handle the 
`UnregisterApplication` message.
+// Otherwise, we could hit "RpcEnv already stopped" error.
+Thread.sleep(1000)
 // Stop the workers before the master so they don't get upset that it 
disconnected
 workerRpcEnvs.foreach(_.shutdown())

Review comment:
   additionally, there might be a problem with the comment and the 
implementation. the code does not really wait for workers to stop before the 
master. we could rewrite it like:
   ```
   Seq(workerRpcEnvs, masterRpcEnvs).foreach { rpcEnvArr =>
 rpcEnvArr.foreach(rpcEnv => Utils.tryLog {
   rpcEnv.shutdown()
   rpcEnv.awaitTermination()
 })
 rpcEnvArr.clear()
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page

2020-06-07 Thread GitBox


sarutak commented on pull request #28439:
URL: https://github.com/apache/spark/pull/28439#issuecomment-640312280


   @iRakson 
   This PR passed PR builder's test but doesn't pass QA so I've reverted.
   
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-1.2/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3/lastCompletedBuild/testReport/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/lastCompletedBuild/testReport/
   
   Could you confirm them?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak closed pull request #28747: Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page"

2020-06-07 Thread GitBox


sarutak closed pull request #28747:
URL: https://github.com/apache/spark/pull/28747


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #28747: Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page"

2020-06-07 Thread GitBox


sarutak opened a new pull request #28747:
URL: https://github.com/apache/spark/pull/28747


   This PR reverts #28439 due to that PR breaks QA build.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640310268







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] koertkuipers commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled

2020-06-07 Thread GitBox


koertkuipers commented on pull request #27986:
URL: https://github.com/apache/spark/pull/27986#issuecomment-640310420


   adaptive execution estimates the number of partitions for a shuffle using 
`spark.sql.adaptive.shuffle.targetPostShuffleInputSize` as its target size per 
shuffled partition. i was surprised to find out however that it does not do 
this for a `DataFrame.repartition(...)`. i dont understand why since under the 
hood its also just a shuffle no different than a `DataFrame.groupBy`.
   will this pull request fix this issue? from looking at code i dont 
understand if it does so, it doesnt look like it to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640310268







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640310014


   **[Test build #123609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)**
 for PR 28593 at commit 
[`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #27632: [SPARK-30872][SQL] Constraints inferred from inferred attributes

2020-06-07 Thread GitBox


github-actions[bot] commented on pull request #27632:
URL: https://github.com/apache/spark/pull/27632#issuecomment-640302609


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #26193: [SPARK-25065][k8s] Allow setting up correct logging configuration on driver and executor.

2020-06-07 Thread GitBox


github-actions[bot] commented on pull request #26193:
URL: https://github.com/apache/spark/pull/26193#issuecomment-640302620


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #27683: [SPARK-30917][SQL]: The behaviour of UnaryMinus should not depend on SQLConf.get

2020-06-07 Thread GitBox


github-actions[bot] commented on pull request #27683:
URL: https://github.com/apache/spark/pull/27683#issuecomment-640302605


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls

2020-06-07 Thread GitBox


github-actions[bot] commented on pull request #26727:
URL: https://github.com/apache/spark/pull/26727#issuecomment-640302618


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #27629: [SPARK-28067][SQL]Fix incorrect results during aggregate sum for decimal overflow by throwing exception

2020-06-07 Thread GitBox


github-actions[bot] commented on pull request #27629:
URL: https://github.com/apache/spark/pull/27629#issuecomment-640302613


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #25398: [SPARK-28659][SQL] Use data source if convertible in insert overwrite directory

2020-06-07 Thread GitBox


github-actions[bot] closed pull request #25398:
URL: https://github.com/apache/spark/pull/25398


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-640289256







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-640289256







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-640256844


   **[Test build #123608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)**
 for PR 27066 at commit 
[`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-07 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-640289036


   **[Test build #123608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)**
 for PR 27066 at commit 
[`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640272817







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640272817







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640238082


   **[Test build #123607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)**
 for PR 28642 at commit 
[`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


SparkQA commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640272578


   **[Test build #123607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)**
 for PR 28642 at commit 
[`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-07 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-640256844


   **[Test build #123608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)**
 for PR 27066 at commit 
[`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-07 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r436385980



##
File path: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
##
@@ -47,55 +47,48 @@
   private int numRecords;
   private int numRecordsRemaining;
 
-  private byte[] arr = new byte[1024 * 1024];
+  private byte[] arr = new byte[1024];

Review comment:
   Does this look good? Perhaps you have some suggestion.
   
   private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO =
   ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size.ratio")
 .doc("The multiplication ratio is the parameter that controls the 
initial read buffer " +
   "size. The multiplication ratio value range is from 1 to 1024. This 
parameter increases "  +
   "the initial read buffer size in 1KB increments. It will result in 
the initial buffer " +
   "size in the range from 1KB to 1MB. The read buffer size is 
dynamically adjusted " +
   "afterward based on data length read from the spilled file.")
 .intConf
 .checkValue(v => 1 <= v && v <= DEFAULT_BUFFER_SIZE_RATIO,
   s"The value must be in allowed range [1, 
${DEFAULT_BUFFER_SIZE_RATIO}].")
 .createWithDefault(DEFAULT_BUFFER_SIZE_RATIO)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640250211


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640250217


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123606/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640250211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640250078


   **[Test build #123606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)**
 for PR 28593 at commit 
[`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640234241


   **[Test build #123606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)**
 for PR 28593 at commit 
[`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640249398







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640249398







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640213213


   **[Test build #123605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123605/testReport)**
 for PR 28490 at commit 
[`0af3166`](https://github.com/apache/spark/commit/0af316675f376472d6deab40c82401a55a765e20).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


SparkQA commented on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640249149


   **[Test build #123605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123605/testReport)**
 for PR 28490 at commit 
[`0af3166`](https://github.com/apache/spark/commit/0af316675f376472d6deab40c82401a55a765e20).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


viirya commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436380321



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala
##
@@ -59,65 +59,65 @@ private[python] object PandasGroupUtils {
*/
   def groupAndProject(
   input: Iterator[InternalRow],
-  groupingAttributes: Seq[Attribute],
+  groupingExprs: Seq[NamedExpression],
   inputSchema: Seq[Attribute],
-  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
-val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema)
+  dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, 
Iterator[InternalRow])] = {
+val groupedIter = GroupedIterator(input, groupingExprs, inputSchema)
 val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema)
 groupedIter.map {
   case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj))
 }
   }
 
   /**
-   * Returns a the deduplicated attributes of the spark plan and the arg 
offsets of the
+   * Returns a the deduplicated named expressions of the spark plan and the 
arg offsets of the
* keys and values.
*
-   * The deduplicated attributes are needed because the spark plan may contain 
an attribute
-   * twice; once in the key and once in the value.  For any such attribute we 
need to
+   * The deduplicated expressions are needed because the spark plan may 
contain an expression
+   * twice; once in the key and once in the value.  For any such expression we 
need to
* deduplicate.
*
-   * The arg offsets are used to distinguish grouping grouping attributes and 
data attributes
+   * The arg offsets are used to distinguish grouping expressions and data 
expressions
* as following:
*
* argOffsets[0] is the length of the argOffsets array
*
-   * argOffsets[1] is the length of grouping attribute
-   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
attributes
+   * argOffsets[1] is the length of grouping expression
+   * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping 
expressions
*
-   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes
+   * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions
*/
   def resolveArgOffsets(
-child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], 
Array[Int]) = {
+  dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression])
+: (Seq[NamedExpression], Array[Int]) = {
 
-val dataAttributes = child.output.drop(groupingAttributes.length)
-val groupingIndicesInData = groupingAttributes.map { attribute =>
-  dataAttributes.indexWhere(attribute.semanticEquals)
+val groupingIndicesInData = groupingExprs.map { expression =>
+  dataExprs.indexWhere(expression.semanticEquals)
 }

Review comment:
   I feel this looks not precisely correct at all cases. Seems `dataExprs` 
are inputs to Python UDFs. Is it possible that `groupingExprs` are not just 
child's outputs but expressions like `column + 1`? 
   
   In `RelationalGroupedDataset`, we added one projection previously to put 
these grouping expressions with original child's outputs. Now we don't have it. 
So can we always find semantically equal expr in `dataExprs` for a grouping 
expression? `dataExprs` are input expressions in left/right plan for 
`FlatMapCoGroupsInPandasExec`, so I guess we cannot find `column + 1` in it.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640238262







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640238262







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-07 Thread GitBox


SparkQA commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-640238082


   **[Test build #123607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)**
 for PR 28642 at commit 
[`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] manuzhang commented on pull request #28669: [SPARK-31864][SQL] Adjust AQE skew join trigger condition

2020-06-07 Thread GitBox


manuzhang commented on pull request #28669:
URL: https://github.com/apache/spark/pull/28669#issuecomment-640235282


   @cloud-fan @maryannxue @JkSelf I'm seeing a case where partitions 
[0,0,0,...,13GB] were coalesced to [13GB]  and took 17 min for a SortMergeJoin. 
With coalescing disabled, partitions would be split into [0,0,0,..., 256MB, 
256MB,...,256MB] by OptimizeSkewedJoin and only took 38s. WDYT ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-64023







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-64023







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640234241


   **[Test build #123606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)**
 for PR 28593 at commit 
[`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640227146


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123604/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640227143


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640212130


   **[Test build #123604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123604/testReport)**
 for PR 28593 at commit 
[`7ad82da`](https://github.com/apache/spark/commit/7ad82da701b10f17af2a1ba764fc8afc2a11ff7b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640227143







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-07 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-640227004


   **[Test build #123604 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123604/testReport)**
 for PR 28593 at commit 
[`7ad82da`](https://github.com/apache/spark/commit/7ad82da701b10f17af2a1ba764fc8afc2a11ff7b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640217133







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


AmplabJenkins commented on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640217133







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


SparkQA removed a comment on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640179013


   **[Test build #123603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123603/testReport)**
 for PR 28490 at commit 
[`1ee0542`](https://github.com/apache/spark/commit/1ee0542e20eea131ff27e4114e3547d32191a6a2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


TJX2014 commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436361472



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
##
@@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark
-import org.apache.spark.sql.catalyst.util.quoteIdentifier
+import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL}
 import org.apache.spark.sql.types._
 
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
   private[expressions] val jvmId = UUID.randomUUID()
   def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId)
   def unapply(expr: NamedExpression): Option[(String, DataType)] = 
Some((expr.name, expr.dataType))
+  def fromExpression(expr: Expression): NamedExpression = expr match {
+case ne: NamedExpression => ne
+case _: Expression => Alias(expr, toPrettySQL(expr))()
+  }

Review comment:
   I find `org.apache.spark.sql.Dataset#groupBy(cols: Column*)` is invoked 
through py4j instead of `groupBy(col1: String, cols: String*)`, is it possible 
to change param sent in python side only to invoke `groupBy(col1: String, cols: 
String*)`, which may also be helpful to this jira :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


SparkQA commented on pull request #28490:
URL: https://github.com/apache/spark/pull/28490#issuecomment-640216881


   **[Test build #123603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123603/testReport)**
 for PR 28490 at commit 
[`1ee0542`](https://github.com/apache/spark/commit/1ee0542e20eea131ff27e4114e3547d32191a6a2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-07 Thread GitBox


TJX2014 commented on a change in pull request #28745:
URL: https://github.com/apache/spark/pull/28745#discussion_r436361472



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
##
@@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark
-import org.apache.spark.sql.catalyst.util.quoteIdentifier
+import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL}
 import org.apache.spark.sql.types._
 
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
   private[expressions] val jvmId = UUID.randomUUID()
   def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId)
   def unapply(expr: NamedExpression): Option[(String, DataType)] = 
Some((expr.name, expr.dataType))
+  def fromExpression(expr: Expression): NamedExpression = expr match {
+case ne: NamedExpression => ne
+case _: Expression => Alias(expr, toPrettySQL(expr))()
+  }

Review comment:
   I find `org.apache.spark.sql.Dataset#groupBy(cols: Column*)` is invoked 
through py4j instead of `groupBy(col1: String, cols: String*)`, is it possible 
to change param sent in python side only to invoke `groupBy(col1: String, cols: 
String*)` :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-06-07 Thread GitBox


AngersZh commented on a change in pull request #28490:
URL: https://github.com/apache/spark/pull/28490#discussion_r436360282



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1481,7 +1486,35 @@ class Analyzer(
 
   case q: LogicalPlan =>
 logTrace(s"Attempting to resolve 
${q.simpleString(SQLConf.get.maxToStringFields)}")

Review comment:
   > w/ some code cleanup;
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >