[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


AngersZh commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r609329246



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))
 Aggregate(newGroups, aggs, child)
 }
+
+private def containUnresolvedOrdinal(e: Expression): Boolean = e match {
+  case _: UnresolvedOrdinal => true
+  case gs: BaseGroupingSets => gs.children.exists(containUnresolvedOrdinal)
+  case _ => false
+}
+
+private def resolveGroupByExpressionOrdinal(
+expr: Expression,
+aggs: Seq[Expression]): Expression = expr match {
+  case ordinal @ UnresolvedOrdinal(index) =>
+if (index > 0 && index <= aggs.size) {
+  aggs(index - 1)
+} else {
+  throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)

Review comment:
   > Could you add tests for this code path?
   
   Yea

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))

Review comment:
   > `((resolveGroupByExpressionOrdinal(_, aggs)))` -> 
`(resolveGroupByExpressionOrdinal(_, aggs))`
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-07 Thread GitBox


SparkQA commented on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-815466094


   **[Test build #137040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137040/testReport)**
 for PR 32054 at commit 
[`34c7dd6`](https://github.com/apache/spark/commit/34c7dd645942105cbbc5bf5cd08ba19b53d3b0aa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-07 Thread GitBox


SparkQA commented on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-815462255






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

2021-04-07 Thread GitBox


SparkQA commented on pull request #32082:
URL: https://github.com/apache/spark/pull/32082#issuecomment-815460314


   **[Test build #137041 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137041/testReport)**
 for PR 32082 at commit 
[`c522276`](https://github.com/apache/spark/commit/c522276bf1f051af6200c17bdf51a4aa0f565b0a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-04-07 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-815459515


   **[Test build #137047 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137047/testReport)**
 for PR 31517 at commit 
[`f61b041`](https://github.com/apache/spark/commit/f61b0410491f6cdc75bdf51dfc13857a6cd5b65a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815457683


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815457653






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries

2021-04-07 Thread GitBox


SparkQA commented on pull request #32086:
URL: https://github.com/apache/spark/pull/32086#issuecomment-815455277






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32086:
URL: https://github.com/apache/spark/pull/32086#issuecomment-815455305


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41631/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815454994


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137046/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815454110


   **[Test build #137046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137046/testReport)**
 for PR 32032 at commit 
[`b78cfdb`](https://github.com/apache/spark/commit/b78cfdb896d8ae0d6da8e96631749ad47fa35623).
* This patch passes all tests.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

2021-04-07 Thread GitBox


SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815453184


   **[Test build #137058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137058/testReport)**
 for PR 31666 at commit 
[`b1bf28d`](https://github.com/apache/spark/commit/b1bf28d9ca413cf2dee2957e5d6d8eeb4c9a4f6e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries

2021-04-07 Thread GitBox


SparkQA commented on pull request #32086:
URL: https://github.com/apache/spark/pull/32086#issuecomment-815452977


   **[Test build #137057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137057/testReport)**
 for PR 32086 at commit 
[`d274e25`](https://github.com/apache/spark/commit/d274e2501425b1b63161a78d7dbfe70c9fdef4c4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815451838


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137053/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815451836


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41627/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32085:
URL: https://github.com/apache/spark/pull/32085#issuecomment-815451839


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41630/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41626/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


SparkQA commented on pull request #32085:
URL: https://github.com/apache/spark/pull/32085#issuecomment-815443094


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


SparkQA commented on pull request #32085:
URL: https://github.com/apache/spark/pull/32085#issuecomment-815441227


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KarlManong closed pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method

2021-04-07 Thread GitBox


KarlManong closed pull request #32080:
URL: https://github.com/apache/spark/pull/32080


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KarlManong commented on pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method

2021-04-07 Thread GitBox


KarlManong commented on pull request #32080:
URL: https://github.com/apache/spark/pull/32080#issuecomment-815440922


   > BTW, this doesn't look like 
[SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) because the 
case in [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) ends 
without throwing `Exception`. I'd like to recommend to have a separate JIRA 
issue if your case is considering unhandled `Exception` from your apps. It 
could be a general `spark-submit` issue for all resource managers like Mesos.
   
   Yes, my mistake.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KarlManong commented on a change in pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method

2021-04-07 Thread GitBox


KarlManong commented on a change in pull request #32080:
URL: https://github.com/apache/spark/pull/32080#discussion_r609291585



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -1031,6 +1031,8 @@ object SparkSubmit extends CommandLineUtils with Logging {
 } catch {
   case e: SparkUserAppException =>
 exitFn(e.exitCode)
+  case _: Throwable => 
+exitFn(1)

Review comment:
   Yes, you are right.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


SparkQA commented on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815438615


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41627/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786


   **[Test build #137053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)**
 for PR 32074 at commit 
[`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

2021-04-07 Thread GitBox


SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815437826






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815437760


   **[Test build #137053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)**
 for PR 32074 at commit 
[`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-815435047


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41629/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-815435047


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41629/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-04-07 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-815435025






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815433365


   **[Test build #137056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137056/testReport)**
 for PR 32032 at commit 
[`594981a`](https://github.com/apache/spark/commit/594981a643a2e3c183b56a44f00c52ad4267c517).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815432956


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41628/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815418245


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815432956


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41628/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815432933






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-07 Thread GitBox


SparkQA commented on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-815432387


   **[Test build #137055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137055/testReport)**
 for PR 32054 at commit 
[`711f408`](https://github.com/apache/spark/commit/711f408eb1c72c3ae3956689e19474e3e8e5a45b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries

2021-04-07 Thread GitBox


SparkQA commented on pull request #32086:
URL: https://github.com/apache/spark/pull/32086#issuecomment-815432339


   **[Test build #137054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137054/testReport)**
 for PR 32086 at commit 
[`1deb43a`](https://github.com/apache/spark/commit/1deb43a4dfa9656ea18c87796d5a230e88b4d0de).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-815431356


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41625/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-815431356


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41625/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-07 Thread GitBox


allisonwang-db commented on a change in pull request #32054:
URL: https://github.com/apache/spark/pull/32054#discussion_r609152666



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##
@@ -1765,4 +1765,35 @@ class SubquerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 }
   }
+
+  test("SPARK-34946: correlated scalar subquery in grouping expressions only") 
{

Review comment:
   Agree. And it would also be easier to check for all analysis errors if 
we have the tests in one test suite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-07 Thread GitBox


allisonwang-db commented on a change in pull request #32054:
URL: https://github.com/apache/spark/pull/32054#discussion_r609265541



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -305,6 +305,12 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 s"nor is it an aggregate function. " +
 "Add to group by or wrap in first() (or first_value) if 
you don't care " +
 "which value you get.")
+  case s: ScalarSubquery
+  if s.children.nonEmpty && 
!groupingExprs.exists(_.semanticEquals(s)) =>
+failAnalysis(s"Correlated scalar subquery '${s.sql}' is 
neither " +
+  s"present in the group by, nor in an aggregate function. Add 
it to group by " +
+  s"using ordinal position or wrap it in first() (or 
first_value) if you don't " +
+  s"care which value you get.")

Review comment:
   Sounds good. Thanks for letting me know!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #32086: [DO-NOT-MERGE] Increase the number of retries

2021-04-07 Thread GitBox


HyukjinKwon opened a new pull request #32086:
URL: https://github.com/apache/spark/pull/32086


   ### What changes were proposed in this pull request?
   
   See if it makes the situation better.
   
   ### Why are the changes needed?
   
   ### Does this PR introduce _any_ user-facing change?
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-04-07 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-815421455






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2021-04-07 Thread GitBox


HyukjinKwon commented on pull request #24559:
URL: https://github.com/apache/spark/pull/24559#issuecomment-815420455


    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32052: [SPARK-34955][SQL] ADD JAR command cannot add jar files which contains whitespaces in the path

2021-04-07 Thread GitBox


HyukjinKwon commented on pull request #32052:
URL: https://github.com/apache/spark/pull/32052#issuecomment-815420143


   I agree with this change. Thanks @sarutak and @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-07 Thread GitBox


SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786


   **[Test build #137053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)**
 for PR 32074 at commit 
[`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815418245


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815418216


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32078: [SPARK-34762][BUILD][FOLLOWUP] Remove the workaround for SPARK-34762

2021-04-07 Thread GitBox


HyukjinKwon commented on pull request #32078:
URL: https://github.com/apache/spark/pull/32078#issuecomment-815416885


   Thanks guys!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32079: [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


dongjoon-hyun commented on pull request #32079:
URL: https://github.com/apache/spark/pull/32079#issuecomment-815416368


   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


SparkQA commented on pull request #32085:
URL: https://github.com/apache/spark/pull/32085#issuecomment-815415610


   **[Test build #137052 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137052/testReport)**
 for PR 32085 at commit 
[`f69b7e2`](https://github.com/apache/spark/commit/f69b7e291f188962b72f5411bc9aeed0c7dcfddf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815415395


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41621/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815398553


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41624/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-815415003


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41623/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-815415003


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41623/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-815414975






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-04-07 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-815414900


   **[Test build #137051 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137051/testReport)**
 for PR 29087 at commit 
[`1278705`](https://github.com/apache/spark/commit/12787053aec9d015506d5c59c58e91dd23d5bb82).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815414821


   **[Test build #137050 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137050/testReport)**
 for PR 30144 at commit 
[`f67242d`](https://github.com/apache/spark/commit/f67242dd67d07ea7fbbe998943fcfe9461c6ada5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32037:
URL: https://github.com/apache/spark/pull/32037#issuecomment-815414560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815396462


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


gengliangwang commented on pull request #32085:
URL: https://github.com/apache/spark/pull/32085#issuecomment-815414590


   This is to backport https://github.com/apache/spark/pull/32066 to branch-3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32037:
URL: https://github.com/apache/spark/pull/32037#issuecomment-815414560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type

2021-04-07 Thread GitBox


SparkQA commented on pull request #32037:
URL: https://github.com/apache/spark/pull/32037#issuecomment-815414539






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

2021-04-07 Thread GitBox


SparkQA commented on pull request #32084:
URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403


   **[Test build #137048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)**
 for PR 32084 at commit 
[`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


SparkQA commented on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815414458


   **[Test build #137049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137049/testReport)**
 for PR 32083 at commit 
[`3b924c0`](https://github.com/apache/spark/commit/3b924c01cc2e329ede64725a4aca9ffd1f37f44e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #32085: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


gengliangwang opened a new pull request #32085:
URL: https://github.com/apache/spark/pull/32085


   
   
   ### What changes were proposed in this pull request?
   
   The `explain()` method prints the arguments of tree nodes in 
logical/physical plans. The arguments could contain a map-type option that 
contains sensitive data.
   We should map-type options in the output of `explain()`. Otherwise, we will 
see sensitive data in explain output or Spark UI.
   
![image](https://user-images.githubusercontent.com/1097932/113719178-326ffb00-96a2-11eb-8a2c-28fca3e72941.png)
   
   
   ### Why are the changes needed?
   
   Data security.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, redact the map-type options in the output of `explain()`
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #30334:
URL: https://github.com/apache/spark/pull/30334#issuecomment-815413565


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41622/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator

2021-04-07 Thread GitBox


SparkQA commented on pull request #30334:
URL: https://github.com/apache/spark/pull/30334#issuecomment-815413557


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41622/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #30334:
URL: https://github.com/apache/spark/pull/30334#issuecomment-815413565


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41622/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32082:
URL: https://github.com/apache/spark/pull/32082#issuecomment-815413210


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32030:
URL: https://github.com/apache/spark/pull/32030#issuecomment-815413211


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137038/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32030:
URL: https://github.com/apache/spark/pull/32030#issuecomment-815413211


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137038/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32082:
URL: https://github.com/apache/spark/pull/32082#issuecomment-815413210


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods

2021-04-07 Thread GitBox


SparkQA removed a comment on pull request #32030:
URL: https://github.com/apache/spark/pull/32030#issuecomment-815353635


   **[Test build #137038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137038/testReport)**
 for PR 32030 at commit 
[`1d42fb7`](https://github.com/apache/spark/commit/1d42fb7024b44f3f3debbacf64e19e1c6f61d47a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

2021-04-07 Thread GitBox


SparkQA removed a comment on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815299424


   **[Test build #137035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)**
 for PR 31666 at commit 
[`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

2021-04-07 Thread GitBox


SparkQA removed a comment on pull request #32082:
URL: https://github.com/apache/spark/pull/32082#issuecomment-815355595


   **[Test build #137039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137039/testReport)**
 for PR 32082 at commit 
[`1ae5520`](https://github.com/apache/spark/commit/1ae552090d91a2390eb6a8fb2d4908e264ea4771).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

2021-04-07 Thread GitBox


SparkQA commented on pull request #32082:
URL: https://github.com/apache/spark/pull/32082#issuecomment-815409002


   **[Test build #137039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137039/testReport)**
 for PR 32082 at commit 
[`1ae5520`](https://github.com/apache/spark/commit/1ae552090d91a2390eb6a8fb2d4908e264ea4771).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you opened a new pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union

2021-04-07 Thread GitBox


ulysses-you opened a new pull request #32084:
URL: https://github.com/apache/spark/pull/32084


   
   
   ### What changes were proposed in this pull request?
   
   Coalesce children if plan is `Union`.
   
   ### Why are the changes needed?
   
   The rule `CoalesceShufflePartitions` can only coalesce paritition if
   * leaf node is ShuffleQueryStage
   * all shuffle have same partition number
   
   With `Union`, it might break the assumption. Let's say we have such plan
   ```
   Union
  HashAggregate
 ShuffleQueryStage
  FileScan
   ```
   `CoalesceShufflePartitions` can not optimize it and the result partition 
would be `shuffle partition + FileScan partition` which can be quite lagre.
   
   It's better to support partial optimize with `Union`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Probably yes, the result partition might changed.
   
   ### How was this patch tested?
   
   Add test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods

2021-04-07 Thread GitBox


SparkQA commented on pull request #32030:
URL: https://github.com/apache/spark/pull/32030#issuecomment-815406859


   **[Test build #137038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137038/testReport)**
 for PR 32030 at commit 
[`1d42fb7`](https://github.com/apache/spark/commit/1d42fb7024b44f3f3debbacf64e19e1c6f61d47a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

2021-04-07 Thread GitBox


SparkQA commented on pull request #31666:
URL: https://github.com/apache/spark/pull/31666#issuecomment-815405904


   **[Test build #137035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)**
 for PR 31666 at commit 
[`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


maropu commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r609224510



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))
 Aggregate(newGroups, aggs, child)
 }
+
+private def containUnresolvedOrdinal(e: Expression): Boolean = e match {
+  case _: UnresolvedOrdinal => true
+  case gs: BaseGroupingSets => gs.children.exists(containUnresolvedOrdinal)
+  case _ => false
+}
+
+private def resolveGroupByExpressionOrdinal(
+expr: Expression,
+aggs: Seq[Expression]): Expression = expr match {
+  case ordinal @ UnresolvedOrdinal(index) =>
+if (index > 0 && index <= aggs.size) {
+  aggs(index - 1)
+} else {
+  throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)

Review comment:
   Could you add tests for this code path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-07 Thread GitBox


maropu commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r609223977



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))

Review comment:
   `((resolveGroupByExpressionOrdinal(_, aggs)))` -> 
`(resolveGroupByExpressionOrdinal(_, aggs))`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32079: [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain()

2021-04-07 Thread GitBox


gengliangwang commented on pull request #32079:
URL: https://github.com/apache/spark/pull/32079#issuecomment-815402168


   @dongjoon-hyun Sure, I will backport to 3.0 and update the JIRA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


ueshin commented on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815401615


   ok to test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-04-07 Thread GitBox


maropu commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-815400618


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815398553


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41624/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815398540


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41624/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-04-07 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-815398569


   **[Test build #137047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137047/testReport)**
 for PR 31517 at commit 
[`f61b041`](https://github.com/apache/spark/commit/f61b0410491f6cdc75bdf51dfc13857a6cd5b65a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA removed a comment on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815395423


   **[Test build #137045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)**
 for PR 30144 at commit 
[`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815397425


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137045/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815397425


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137045/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815397407


   **[Test build #137045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)**
 for PR 30144 at commit 
[`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class SQLProcessor(object):`
 * `trait FunctionRegistryBase[T] `
 * `trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] 
with Logging `
 * `trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T] `
 * `trait FunctionRegistry extends FunctionRegistryBase[Expression] `
 * `trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan] `
 * `class NoSuchFunctionException(`
 * `case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends 
Rule[LogicalPlan] `
 * `abstract class QuaternaryExpression extends Expression with 
QuaternaryLike[Expression] `
 * `abstract class Covariance(val left: Expression, val right: Expression, 
nullOnDivideByZero: Boolean)`
 * `trait BaseGroupingSets extends Expression with CodegenFallback `
 * `case class Cube(`
 * `trait SimpleHigherOrderFunction extends HigherOrderFunction with 
BinaryLike[Expression] `
 * `trait QuaternaryLike[T <: TreeNode[T]] `
 * `trait DataWritingCommand extends UnaryCommand `
 * `trait RunnableCommand extends Command `
 * `trait BaseCacheTableExec extends LeafV2CommandExec `
 * `sealed trait V1FallbackWriters extends LeafV2CommandExec with 
SupportsV1Write `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32083: [WIP] [SPARK-34886] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


AmplabJenkins commented on pull request #32083:
URL: https://github.com/apache/spark/pull/32083#issuecomment-815396462


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-815396136


   **[Test build #137046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137046/testReport)**
 for PR 32032 at commit 
[`b78cfdb`](https://github.com/apache/spark/commit/b78cfdb896d8ae0d6da8e96631749ad47fa35623).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


imback82 commented on a change in pull request #32032:
URL: https://github.com/apache/spark/pull/32032#discussion_r609213767



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala
##
@@ -94,19 +94,19 @@ case class CacheTableAsSelectExec(
   override lazy val relationName: String = tempViewName
 
   override lazy val planToCache: LogicalPlan = {
-Dataset.ofRows(sparkSession,
-  CreateViewCommand(
-name = TableIdentifier(tempViewName),
-userSpecifiedColumns = Nil,
-comment = None,
-properties = Map.empty,
-originalText = Some(originalText),
-child = query,
-allowExisting = false,
-replace = false,
-viewType = LocalTempView
-  )
-)
+CreateViewCommand(
+  name = TableIdentifier(tempViewName),
+  userSpecifiedColumns = Nil,
+  comment = None,
+  properties = Map.empty,
+  originalText = Some(originalText),
+  plan = query,
+  allowExisting = false,
+  replace = false,
+  viewType = LocalTempView,
+  isAnalyzed = true
+).run(sparkSession)

Review comment:
   We can just call `run` now that the `plan` is analyzed; no need to go 
thru `Dataset.ofRows`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-databricks opened a new pull request #32083: [WIP] [SPARK-34886] Port/integrate Koalas DataFrame unit test into PySpark

2021-04-07 Thread GitBox


xinrong-databricks opened a new pull request #32083:
URL: https://github.com/apache/spark/pull/32083


   ### What changes were proposed in this pull request?
   Now that we merged the Koalas main code into the PySpark code base (#32036), 
we should port the Koalas DataFrame unit test to PySpark.
   
   ### Why are the changes needed?
   Currently, the pandas-on-Spark modules are not tested at all. We should 
enable the DataFrame unit test first.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Enable the DataFrame unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type

2021-04-07 Thread GitBox


yaooqinn commented on a change in pull request #32037:
URL: https://github.com/apache/spark/pull/32037#discussion_r609213392



##
File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
##
@@ -21,6 +21,49 @@ import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 
+
+/**
+ * Base trait for TPC-DS related tests.
+ *
+ * Datatype mapping for TPC-DS and Spark SQL, see more at:
+ *   http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.9.0.pdf
+ *
+ *|---|---|
+ *|TPC-DS |  Spark  SQL   |
+ *|---|---|
+ *|  Identifier   |  INT  |
+ *|---|---|
+ *|Integer|  INT  |

Review comment:
   > After this PR, the field schemas are now consistent with those DDLs in 
the `tpcds.sql` from tpc-ds tool kit, see 
https://gist.github.com/yaooqinn/b9978a77bbf4f871a95d6a9103019907
   
   removed, according to this line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

2021-04-07 Thread GitBox


SparkQA commented on pull request #30144:
URL: https://github.com/apache/spark/pull/30144#issuecomment-815395423


   **[Test build #137045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)**
 for PR 30144 at commit 
[`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-07 Thread GitBox


AmplabJenkins removed a comment on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-814862152


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136992/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >