[GitHub] [spark] sathyaprakashg edited a comment on pull request #28703: SPARK-29897 Add implicit cast for SubtractTimestamps
sathyaprakashg edited a comment on pull request #28703: URL: https://github.com/apache/spark/pull/28703#issuecomment-640384737 @bart-samwel If you are referring to two test new statements i added, both are actually having timestamp in both left and right expression and returns interval. So, we have only one type, which is `timestamp - timestamp returns interval` In the SubtractTimestamps case class we can see input date types for both left and right expressions are timestamp `override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType, TimestampType)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sathyaprakashg commented on pull request #28703: SPARK-29897 Add implicit cast for SubtractTimestamps
sathyaprakashg commented on pull request #28703: URL: https://github.com/apache/spark/pull/28703#issuecomment-640384737 @bart-samwel If you are referring to two test new statements i added, both are actually having timestamp in both left and right expression and returns interval. So, we have only one type, which is timestamp - timestamp returns interval This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
AmplabJenkins removed a comment on pull request #28749: URL: https://github.com/apache/spark/pull/28749#issuecomment-640381851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
SparkQA commented on pull request #28749: URL: https://github.com/apache/spark/pull/28749#issuecomment-640383499 **[Test build #123615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123615/testReport)** for PR 28749 at commit [`87113bc`](https://github.com/apache/spark/commit/87113bc38da7f2d3de8501b8745b6def4f33a6e3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
AmplabJenkins commented on pull request #28749: URL: https://github.com/apache/spark/pull/28749#issuecomment-640381851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
HyukjinKwon opened a new pull request #28749: URL: https://github.com/apache/spark/pull/28749 ### What changes were proposed in this pull request? This PR proposes to use existing util `org.apache.spark.util.Utils.exceptionString` for the same codes at: ``` jwriter = jvm.java.io.StringWriter() e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) stacktrace = jwriter.toString() ``` ### Why are the changes needed? To deduplicate codes. Plus, less communication between JVM and Py4j. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28749: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
HyukjinKwon commented on pull request #28749: URL: https://github.com/apache/spark/pull/28749#issuecomment-640381524 @ueshin, can you take a quick look when you're available? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab
AmplabJenkins removed a comment on pull request #28748: URL: https://github.com/apache/spark/pull/28748#issuecomment-640376942 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab
AmplabJenkins commented on pull request #28748: URL: https://github.com/apache/spark/pull/28748#issuecomment-640376942 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab
SparkQA commented on pull request #28748: URL: https://github.com/apache/spark/pull/28748#issuecomment-640376563 **[Test build #123614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123614/testReport)** for PR 28748 at commit [`9a6e5d9`](https://github.com/apache/spark/commit/9a6e5d9f199109fbaa073c7c728cbe1d99830060). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab
iRakson commented on pull request #28748: URL: https://github.com/apache/spark/pull/28748#issuecomment-640375459 cc @sarutak Kindly take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson opened a new pull request #28748: [WIP][SPARK-30119][WEBUI]Support pagination for streaming tab
iRakson opened a new pull request #28748: URL: https://github.com/apache/spark/pull/28748 ### What changes were proposed in this pull request? #28747 reverted #28439 due to some flaky test case. This PR fixes the flaky test and adds pagination support. WIP tag is added just to test whether this PR is working fine or not. ### Why are the changes needed? To support pagination for streaming tab ### Does this PR introduce _any_ user-facing change? Yes, Now streaming tab tables will be paginated. ### How was this patch tested? Manually. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
Ngone51 commented on pull request #28258: URL: https://github.com/apache/spark/pull/28258#issuecomment-640365869 LGTM. I also tested manually with `spark.standalone.submit.waitAppCompletion` on/off and exceptional case of Master exiting. Everything looks fine! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
gerashegalov commented on a change in pull request #28746: URL: https://github.com/apache/spark/pull/28746#discussion_r436452604 ## File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala ## @@ -74,6 +74,10 @@ class LocalSparkCluster( def stop(): Unit = { logInfo("Shutting down local Spark cluster.") +// SPARK-31922: wait one more second before shutting down rpcEnvs of master and worker, +// in order to let the cluster have time to handle the `UnregisterApplication` message. +// Otherwise, we could hit "RpcEnv already stopped" error. +Thread.sleep(1000) // Stop the workers before the master so they don't get upset that it disconnected workerRpcEnvs.foreach(_.shutdown()) Review comment: you may be right about this but this contradicts to the [scaladoc](https://github.com/apache/spark/blob/264b0f36cedacd9a22b45a3e14b2186230432be6/core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala#L119) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640357727 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640357727 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640357312 **[Test build #123613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123613/testReport)** for PR 28593 at commit [`3fd6d02`](https://github.com/apache/spark/commit/3fd6d02d66aa2abfbe80450366a1d25a332e66ee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
Ngone51 commented on a change in pull request #28746: URL: https://github.com/apache/spark/pull/28746#discussion_r436446619 ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -557,7 +557,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } } - override def reviveOffers(): Unit = { + override def reviveOffers(): Unit = Utils.tryLogNonFatalError { Review comment: This change fixes the failure of test `org.apache.spark.launcher.LauncherBackendSuite.standalone/client: launcher handle`. After sleeping one more second, the application launched by the `SparkLauncher` now has a chance to submit tasks to TaskScheduler and call `reviveOffers` on the SchedulerBackend. At the same time, the `SparkLauncher` will ask the application to stop. Therefore, the SchedulerBackend could have been already stopped when it receives `ReviveOffers` messages, which would fail the entire application at the end. So, I use ` Utils.tryLogNonFatalError` to fix it and I think this should be fine since we've already use it at: https://github.com/apache/spark/blob/c560428fe0113f17362bae2b369910049914696f/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L137-L139 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
AmplabJenkins removed a comment on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-640349737 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
AmplabJenkins commented on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-640349737 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
SparkQA commented on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-640349410 **[Test build #123612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123612/testReport)** for PR 28746 at commit [`eba978e`](https://github.com/apache/spark/commit/eba978eb0b76fa7ed1a4ebdf268666949f8bcf64). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
Ngone51 commented on a change in pull request #28746: URL: https://github.com/apache/spark/pull/28746#discussion_r436445018 ## File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala ## @@ -74,6 +74,10 @@ class LocalSparkCluster( def stop(): Unit = { logInfo("Shutting down local Spark cluster.") +// SPARK-31922: wait one more second before shutting down rpcEnvs of master and worker, +// in order to let the cluster have time to handle the `UnregisterApplication` message. +// Otherwise, we could hit "RpcEnv already stopped" error. +Thread.sleep(1000) // Stop the workers before the master so they don't get upset that it disconnected workerRpcEnvs.foreach(_.shutdown()) Review comment: It's not really necessary since shutdown is performed in a synchronization way. Therefore, the worker will close the connection to the master firstly. And `awaitTermination()` doesn't make sure everything stops but only the `Dispather`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
wangyum commented on a change in pull request #28642: URL: https://github.com/apache/spark/pull/28642#discussion_r436442434 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1039,7 +1039,7 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan val pythonEvals = collect(joinNode.get) { case p: BatchEvalPythonExec => p } -assert(pythonEvals.size == 2) +assert(pythonEvals.size == 4) Review comment: @HyukjinKwon I'm not sure if this change can optimize python udf? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD
AmplabJenkins removed a comment on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-640344367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD
AmplabJenkins commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-640344367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD
SparkQA commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-640343993 **[Test build #123611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123611/testReport)** for PR 28737 at commit [`1e17fd0`](https://github.com/apache/spark/commit/1e17fd0c05849308b68481ecb609a15e19ee962e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD
uncleGen commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-640342988 cc @cloud-fan @xuanyuanking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page
iRakson commented on a change in pull request #28439: URL: https://github.com/apache/spark/pull/28439#discussion_r436440515 ## File path: streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ## @@ -125,24 +125,47 @@ class UISeleniumSuite // Check batch tables val h4Text = findAll(cssSelector("h4")).map(_.text).toSeq -h4Text.exists(_.matches("Active Batches \\(\\d+\\)")) should be (true) +h4Text.exists(_.matches("Running Batches \\(\\d+\\)")) should be (true) Review comment: This is causing all the failures. I will remove these tests and raise again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640341115 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123609/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled
AmplabJenkins removed a comment on pull request #27986: URL: https://github.com/apache/spark/pull/27986#issuecomment-640341165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640341106 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640310014 **[Test build #123609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)** for PR 28593 at commit [`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled
AmplabJenkins commented on pull request #27986: URL: https://github.com/apache/spark/pull/27986#issuecomment-640341165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640341106 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640340563 **[Test build #123609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)** for PR 28593 at commit [`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled
SparkQA commented on pull request #27986: URL: https://github.com/apache/spark/pull/27986#issuecomment-640339926 **[Test build #123610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123610/testReport)** for PR 27986 at commit [`1e6ed30`](https://github.com/apache/spark/commit/1e6ed30f12d4a3ed50f647a3b9b848a2e5b547b8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
viirya commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436436434 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -608,10 +608,14 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { execution.MapPartitionsInRWithArrowExec( f, p, b, is, ot, planLater(child)) :: Nil case logical.FlatMapGroupsInPandas(grouping, func, output, child) => -execution.python.FlatMapGroupsInPandasExec(grouping, func, output, planLater(child)) :: Nil - case logical.FlatMapCoGroupsInPandas(leftGroup, rightGroup, func, output, left, right) => +val groupingExprs = grouping.map(NamedExpression.fromExpression) +execution.python.FlatMapGroupsInPandasExec( + groupingExprs, func, output, planLater(child)) :: Nil + case logical.FlatMapCoGroupsInPandas(leftExprs, rightExprs, func, output, left, right) => +val leftAttrs = leftExprs.map(NamedExpression.fromExpression) +val rightAttrs = rightExprs.map(NamedExpression.fromExpression) execution.python.FlatMapCoGroupsInPandasExec( - leftGroup, rightGroup, func, output, planLater(left), planLater(right)) :: Nil + leftAttrs, rightAttrs, func, output, planLater(left), planLater(right)) :: Nil Review comment: leftNamedExprs/rightNamedExprs or leftGroupingExprs/rightGroupingExprs? They are not attributes actually. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: ok, looks good after re-checking. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInPandasExec.scala ## @@ -60,42 +60,51 @@ case class FlatMapCoGroupsInPandasExec( private val pythonRunnerConf = ArrowUtils.getPythonRunnerConfMap(conf) private val pandasFunction = func.asInstanceOf[PythonUDF].func private val chainedFunc = Seq(ChainedPythonFunctions(Seq(pandasFunction))) + private val inputExprs = +func.asInstanceOf[PythonUDF].children.map(_.asInstanceOf[NamedExpression]) + private val leftExprs = +left.output.filter(e => inputExprs.exists(_.semanticEquals(e))) + private val
[GitHub] [spark] turboFei removed a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExists
turboFei removed a comment on pull request #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-632923671 have sent an email into that email thread, thanks a lot @Ngone51 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page
sarutak edited a comment on pull request #28439: URL: https://github.com/apache/spark/pull/28439#issuecomment-640322160 The suite passes on my laptop with both sbt and Maven so the suite can be flaky. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page
sarutak commented on pull request #28439: URL: https://github.com/apache/spark/pull/28439#issuecomment-640322160 The suite pass on my laptop with both sbt and Maven so the suite can be flaky. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] uncleGen commented on pull request #28737: [SPARK-31913][SQL]: Fix StackOverflowError in FileScanRDD
uncleGen commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-640320765 pending on fix ut failure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436425345 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: Just for doubly sure, `column + 1` case is being tested at https://github.com/apache/spark/blob/ab0890bdb18dcd0441f6082afbe4c84219611e87/python/pyspark/sql/tests/test_pandas_cogrouped_map.py#L161-L174 I know it because it failed during I prepare the fix :D. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436425399 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: Just for doubly sure, `column + 1` case is being tested at https://github.com/apache/spark/blob/ab0890bdb18dcd0441f6082afbe4c84219611e87/python/pyspark/sql/tests/test_pandas_cogrouped_map.py#L161-L174 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436424768 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ## @@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark -import org.apache.spark.sql.catalyst.util.quoteIdentifier +import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL} import org.apache.spark.sql.types._ object NamedExpression { private val curId = new java.util.concurrent.atomic.AtomicLong() private[expressions] val jvmId = UUID.randomUUID() def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId) def unapply(expr: NamedExpression): Option[(String, DataType)] = Some((expr.name, expr.dataType)) + def fromExpression(expr: Expression): NamedExpression = expr match { +case ne: NamedExpression => ne +case _: Expression => Alias(expr, toPrettySQL(expr))() + } Review comment: Yeah, let me take a look separate with a separate JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436424483 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: Actually the `groupingExprs` will be projected at https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4R66 for the input iterator before the actual execution. The `groupingExprs` were already dropped in this code without this fix https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4L93 I believe there's no difference virtually in the execution path here. For analysis, With this change: `groupingExprs` at `FlatMapGroupsInPandasExec`, for example, `column + 1`. The attributes `column` inside `column + 1` will be properly resolved, and then it becomes an alias to project later during execution. Without this change: `Project`'s output contains the grouping expression as a separate attribute reference, `column + 1` (whereas the current fix keeps it as an expression). `FlatMapGroupsInPandasExec` contains the attribute reference as a grouping expression , and this grouping attribute will be used to project later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436424483 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: Actually the `groupingExprs` will be projected at https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4R66 for the input iterator before the actual execution. The `groupingExprs` were already dropped in this code without this fix https://github.com/apache/spark/pull/28745/files/2800eb238465547074498eae762199a53efc4277#diff-e7c34a6080e15837af82863db34fb1c4L93 I believe there's no difference virtually in the execution path here. For analysis, With this change: `groupingExprs` at `FlatMapGroupsInPandasExec`, for example, `column + 1`. The attributes `column` inside `column + 1` will be properly resolved, and then it becomes an alias to project later during execution. Without this change: `Project`'s output contains the grouping expression as a separate attribute reference, `column + 1` (whereas the current fix keeps it as an expression). `FlatMapGroupsInPandasExec` contains the attribute reference as a grouping expression , and this grouping expression will be used to project later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
gerashegalov commented on a change in pull request #28746: URL: https://github.com/apache/spark/pull/28746#discussion_r436423623 ## File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala ## @@ -74,6 +74,10 @@ class LocalSparkCluster( def stop(): Unit = { logInfo("Shutting down local Spark cluster.") +// SPARK-31922: wait one more second before shutting down rpcEnvs of master and worker, +// in order to let the cluster have time to handle the `UnregisterApplication` message. +// Otherwise, we could hit "RpcEnv already stopped" error. +Thread.sleep(1000) // Stop the workers before the master so they don't get upset that it disconnected workerRpcEnvs.foreach(_.shutdown()) Review comment: additionally, there might be a problem with the comment > // Stop the workers before the master so they don't get upset that it disconnected and the implementation. the code does not really wait for workers to stop before the master. we could rewrite it like: ``` Seq(workerRpcEnvs, masterRpcEnvs).foreach { rpcEnvArr => rpcEnvArr.foreach(rpcEnv => Utils.tryLog { rpcEnv.shutdown() rpcEnv.awaitTermination() }) rpcEnvArr.clear() } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gerashegalov commented on a change in pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
gerashegalov commented on a change in pull request #28746: URL: https://github.com/apache/spark/pull/28746#discussion_r436423623 ## File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala ## @@ -74,6 +74,10 @@ class LocalSparkCluster( def stop(): Unit = { logInfo("Shutting down local Spark cluster.") +// SPARK-31922: wait one more second before shutting down rpcEnvs of master and worker, +// in order to let the cluster have time to handle the `UnregisterApplication` message. +// Otherwise, we could hit "RpcEnv already stopped" error. +Thread.sleep(1000) // Stop the workers before the master so they don't get upset that it disconnected workerRpcEnvs.foreach(_.shutdown()) Review comment: additionally, there might be a problem with the comment and the implementation. the code does not really wait for workers to stop before the master. we could rewrite it like: ``` Seq(workerRpcEnvs, masterRpcEnvs).foreach { rpcEnvArr => rpcEnvArr.foreach(rpcEnv => Utils.tryLog { rpcEnv.shutdown() rpcEnv.awaitTermination() }) rpcEnvArr.clear() } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28439: [SPARK-30119][WEBUI] Add Pagination Support to Streaming Page
sarutak commented on pull request #28439: URL: https://github.com/apache/spark/pull/28439#issuecomment-640312280 @iRakson This PR passed PR builder's test but doesn't pass QA so I've reverted. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-1.2/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3/lastCompletedBuild/testReport/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/lastCompletedBuild/testReport/org.apache.spark.streaming/UISeleniumSuite/attaching_and_detaching_a_Streaming_tab/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/lastCompletedBuild/testReport/ Could you confirm them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak closed pull request #28747: Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page"
sarutak closed pull request #28747: URL: https://github.com/apache/spark/pull/28747 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #28747: Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page"
sarutak opened a new pull request #28747: URL: https://github.com/apache/spark/pull/28747 This PR reverts #28439 due to that PR breaks QA build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640310268 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] koertkuipers commented on pull request #27986: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled
koertkuipers commented on pull request #27986: URL: https://github.com/apache/spark/pull/27986#issuecomment-640310420 adaptive execution estimates the number of partitions for a shuffle using `spark.sql.adaptive.shuffle.targetPostShuffleInputSize` as its target size per shuffled partition. i was surprised to find out however that it does not do this for a `DataFrame.repartition(...)`. i dont understand why since under the hood its also just a shuffle no different than a `DataFrame.groupBy`. will this pull request fix this issue? from looking at code i dont understand if it does so, it doesnt look like it to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640310268 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640310014 **[Test build #123609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123609/testReport)** for PR 28593 at commit [`ec2cf54`](https://github.com/apache/spark/commit/ec2cf54b6d566dd7afcd65753374c1dd5dc8d47f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27632: [SPARK-30872][SQL] Constraints inferred from inferred attributes
github-actions[bot] commented on pull request #27632: URL: https://github.com/apache/spark/pull/27632#issuecomment-640302609 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #26193: [SPARK-25065][k8s] Allow setting up correct logging configuration on driver and executor.
github-actions[bot] commented on pull request #26193: URL: https://github.com/apache/spark/pull/26193#issuecomment-640302620 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27683: [SPARK-30917][SQL]: The behaviour of UnaryMinus should not depend on SQLConf.get
github-actions[bot] commented on pull request #27683: URL: https://github.com/apache/spark/pull/27683#issuecomment-640302605 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls
github-actions[bot] commented on pull request #26727: URL: https://github.com/apache/spark/pull/26727#issuecomment-640302618 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27629: [SPARK-28067][SQL]Fix incorrect results during aggregate sum for decimal overflow by throwing exception
github-actions[bot] commented on pull request #27629: URL: https://github.com/apache/spark/pull/27629#issuecomment-640302613 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #25398: [SPARK-28659][SQL] Use data source if convertible in insert overwrite directory
github-actions[bot] closed pull request #25398: URL: https://github.com/apache/spark/pull/25398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-640289256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column
AmplabJenkins commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-640289256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column
SparkQA removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-640256844 **[Test build #123608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)** for PR 27066 at commit [`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-640289036 **[Test build #123608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)** for PR 27066 at commit [`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
AmplabJenkins removed a comment on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640272817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
AmplabJenkins commented on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640272817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
SparkQA removed a comment on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640238082 **[Test build #123607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)** for PR 28642 at commit [`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
SparkQA commented on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640272578 **[Test build #123607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)** for PR 28642 at commit [`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-640256844 **[Test build #123608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123608/testReport)** for PR 27066 at commit [`ab36504`](https://github.com/apache/spark/commit/ab36504c885d7b2a3a18c02addb7f88456f200f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
siknezevic commented on a change in pull request #27246: URL: https://github.com/apache/spark/pull/27246#discussion_r436385980 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java ## @@ -47,55 +47,48 @@ private int numRecords; private int numRecordsRemaining; - private byte[] arr = new byte[1024 * 1024]; + private byte[] arr = new byte[1024]; Review comment: Does this look good? Perhaps you have some suggestion. private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO = ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size.ratio") .doc("The multiplication ratio is the parameter that controls the initial read buffer " + "size. The multiplication ratio value range is from 1 to 1024. This parameter increases " + "the initial read buffer size in 1KB increments. It will result in the initial buffer " + "size in the range from 1KB to 1MB. The read buffer size is dynamically adjusted " + "afterward based on data length read from the spilled file.") .intConf .checkValue(v => 1 <= v && v <= DEFAULT_BUFFER_SIZE_RATIO, s"The value must be in allowed range [1, ${DEFAULT_BUFFER_SIZE_RATIO}].") .createWithDefault(DEFAULT_BUFFER_SIZE_RATIO) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640250211 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640250217 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123606/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640250211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640250078 **[Test build #123606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)** for PR 28593 at commit [`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640234241 **[Test build #123606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)** for PR 28593 at commit [`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AmplabJenkins removed a comment on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640249398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AmplabJenkins commented on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640249398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
SparkQA removed a comment on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640213213 **[Test build #123605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123605/testReport)** for PR 28490 at commit [`0af3166`](https://github.com/apache/spark/commit/0af316675f376472d6deab40c82401a55a765e20). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
SparkQA commented on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640249149 **[Test build #123605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123605/testReport)** for PR 28490 at commit [`0af3166`](https://github.com/apache/spark/commit/0af316675f376472d6deab40c82401a55a765e20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
viirya commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436380321 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala ## @@ -59,65 +59,65 @@ private[python] object PandasGroupUtils { */ def groupAndProject( input: Iterator[InternalRow], - groupingAttributes: Seq[Attribute], + groupingExprs: Seq[NamedExpression], inputSchema: Seq[Attribute], - dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])] = { -val groupedIter = GroupedIterator(input, groupingAttributes, inputSchema) + dedupSchema: Seq[NamedExpression]): Iterator[(InternalRow, Iterator[InternalRow])] = { +val groupedIter = GroupedIterator(input, groupingExprs, inputSchema) val dedupProj = UnsafeProjection.create(dedupSchema, inputSchema) groupedIter.map { case (k, groupedRowIter) => (k, groupedRowIter.map(dedupProj)) } } /** - * Returns a the deduplicated attributes of the spark plan and the arg offsets of the + * Returns a the deduplicated named expressions of the spark plan and the arg offsets of the * keys and values. * - * The deduplicated attributes are needed because the spark plan may contain an attribute - * twice; once in the key and once in the value. For any such attribute we need to + * The deduplicated expressions are needed because the spark plan may contain an expression + * twice; once in the key and once in the value. For any such expression we need to * deduplicate. * - * The arg offsets are used to distinguish grouping grouping attributes and data attributes + * The arg offsets are used to distinguish grouping expressions and data expressions * as following: * * argOffsets[0] is the length of the argOffsets array * - * argOffsets[1] is the length of grouping attribute - * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping attributes + * argOffsets[1] is the length of grouping expression + * argOffsets[2 .. argOffsets[0]+2] is the arg offsets for grouping expressions * - * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data attributes + * argOffsets[argOffsets[0]+2 .. ] is the arg offsets for data expressions */ def resolveArgOffsets( -child: SparkPlan, groupingAttributes: Seq[Attribute]): (Seq[Attribute], Array[Int]) = { + dataExprs: Seq[NamedExpression], groupingExprs: Seq[NamedExpression]) +: (Seq[NamedExpression], Array[Int]) = { -val dataAttributes = child.output.drop(groupingAttributes.length) -val groupingIndicesInData = groupingAttributes.map { attribute => - dataAttributes.indexWhere(attribute.semanticEquals) +val groupingIndicesInData = groupingExprs.map { expression => + dataExprs.indexWhere(expression.semanticEquals) } Review comment: I feel this looks not precisely correct at all cases. Seems `dataExprs` are inputs to Python UDFs. Is it possible that `groupingExprs` are not just child's outputs but expressions like `column + 1`? In `RelationalGroupedDataset`, we added one projection previously to put these grouping expressions with original child's outputs. Now we don't have it. So can we always find semantically equal expr in `dataExprs` for a grouping expression? `dataExprs` are input expressions in left/right plan for `FlatMapCoGroupsInPandasExec`, so I guess we cannot find `column + 1` in it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
AmplabJenkins removed a comment on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640238262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
AmplabJenkins commented on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640238262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition
SparkQA commented on pull request #28642: URL: https://github.com/apache/spark/pull/28642#issuecomment-640238082 **[Test build #123607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123607/testReport)** for PR 28642 at commit [`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] manuzhang commented on pull request #28669: [SPARK-31864][SQL] Adjust AQE skew join trigger condition
manuzhang commented on pull request #28669: URL: https://github.com/apache/spark/pull/28669#issuecomment-640235282 @cloud-fan @maryannxue @JkSelf I'm seeing a case where partitions [0,0,0,...,13GB] were coalesced to [13GB] and took 17 min for a SortMergeJoin. With coalescing disabled, partitions would be split into [0,0,0,..., 256MB, 256MB,...,256MB] by OptimizeSkewedJoin and only took 38s. WDYT ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-64023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-64023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640234241 **[Test build #123606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123606/testReport)** for PR 28593 at commit [`eeb0a61`](https://github.com/apache/spark/commit/eeb0a61498556056aed9f94a7e9c864bd23e6ce6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640227146 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123604/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640227143 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640212130 **[Test build #123604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123604/testReport)** for PR 28593 at commit [`7ad82da`](https://github.com/apache/spark/commit/7ad82da701b10f17af2a1ba764fc8afc2a11ff7b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640227143 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-640227004 **[Test build #123604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123604/testReport)** for PR 28593 at commit [`7ad82da`](https://github.com/apache/spark/commit/7ad82da701b10f17af2a1ba764fc8afc2a11ff7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AmplabJenkins removed a comment on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640217133 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AmplabJenkins commented on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640217133 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
SparkQA removed a comment on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640179013 **[Test build #123603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123603/testReport)** for PR 28490 at commit [`1ee0542`](https://github.com/apache/spark/commit/1ee0542e20eea131ff27e4114e3547d32191a6a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
TJX2014 commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436361472 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ## @@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark -import org.apache.spark.sql.catalyst.util.quoteIdentifier +import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL} import org.apache.spark.sql.types._ object NamedExpression { private val curId = new java.util.concurrent.atomic.AtomicLong() private[expressions] val jvmId = UUID.randomUUID() def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId) def unapply(expr: NamedExpression): Option[(String, DataType)] = Some((expr.name, expr.dataType)) + def fromExpression(expr: Expression): NamedExpression = expr match { +case ne: NamedExpression => ne +case _: Expression => Alias(expr, toPrettySQL(expr))() + } Review comment: I find `org.apache.spark.sql.Dataset#groupBy(cols: Column*)` is invoked through py4j instead of `groupBy(col1: String, cols: String*)`, is it possible to change param sent in python side only to invoke `groupBy(col1: String, cols: String*)`, which may also be helpful to this jira :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
SparkQA commented on pull request #28490: URL: https://github.com/apache/spark/pull/28490#issuecomment-640216881 **[Test build #123603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123603/testReport)** for PR 28490 at commit [`1ee0542`](https://github.com/apache/spark/commit/1ee0542e20eea131ff27e4114e3547d32191a6a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
TJX2014 commented on a change in pull request #28745: URL: https://github.com/apache/spark/pull/28745#discussion_r436361472 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ## @@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark -import org.apache.spark.sql.catalyst.util.quoteIdentifier +import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL} import org.apache.spark.sql.types._ object NamedExpression { private val curId = new java.util.concurrent.atomic.AtomicLong() private[expressions] val jvmId = UUID.randomUUID() def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId) def unapply(expr: NamedExpression): Option[(String, DataType)] = Some((expr.name, expr.dataType)) + def fromExpression(expr: Expression): NamedExpression = expr match { +case ne: NamedExpression => ne +case _: Expression => Alias(expr, toPrettySQL(expr))() + } Review comment: I find `org.apache.spark.sql.Dataset#groupBy(cols: Column*)` is invoked through py4j instead of `groupBy(col1: String, cols: String*)`, is it possible to change param sent in python side only to invoke `groupBy(col1: String, cols: String*)` :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AngersZh commented on a change in pull request #28490: URL: https://github.com/apache/spark/pull/28490#discussion_r436360282 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1481,7 +1486,35 @@ class Analyzer( case q: LogicalPlan => logTrace(s"Attempting to resolve ${q.simpleString(SQLConf.get.maxToStringFields)}") Review comment: > w/ some code cleanup; Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org