[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AngersZh commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r609329246 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Aggregate(newGroups, aggs, child) } + +private def containUnresolvedOrdinal(e: Expression): Boolean = e match { + case _: UnresolvedOrdinal => true + case gs: BaseGroupingSets => gs.children.exists(containUnresolvedOrdinal) + case _ => false +} + +private def resolveGroupByExpressionOrdinal( +expr: Expression, +aggs: Seq[Expression]): Expression = expr match { + case ordinal @ UnresolvedOrdinal(index) => +if (index > 0 && index <= aggs.size) { + aggs(index - 1) +} else { + throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) Review comment: > Could you add tests for this code path? Yea ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Review comment: > `((resolveGroupByExpressionOrdinal(_, aggs)))` -> `(resolveGroupByExpressionOrdinal(_, aggs))` Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
SparkQA commented on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-815466094 **[Test build #137040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137040/testReport)** for PR 32054 at commit [`34c7dd6`](https://github.com/apache/spark/commit/34c7dd645942105cbbc5bf5cd08ba19b53d3b0aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
SparkQA commented on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-815462255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation
SparkQA commented on pull request #32082: URL: https://github.com/apache/spark/pull/32082#issuecomment-815460314 **[Test build #137041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137041/testReport)** for PR 32082 at commit [`c522276`](https://github.com/apache/spark/commit/c522276bf1f051af6200c17bdf51a4aa0f565b0a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-815459515 **[Test build #137047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137047/testReport)** for PR 31517 at commit [`f61b041`](https://github.com/apache/spark/commit/f61b0410491f6cdc75bdf51dfc13857a6cd5b65a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
AmplabJenkins commented on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815457683 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41632/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
SparkQA commented on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815457653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries
SparkQA commented on pull request #32086: URL: https://github.com/apache/spark/pull/32086#issuecomment-815455277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries
AmplabJenkins commented on pull request #32086: URL: https://github.com/apache/spark/pull/32086#issuecomment-815455305 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41631/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
AmplabJenkins commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815454994 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137046/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815454110 **[Test build #137046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137046/testReport)** for PR 32032 at commit [`b78cfdb`](https://github.com/apache/spark/commit/b78cfdb896d8ae0d6da8e96631749ad47fa35623). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN
SparkQA commented on pull request #31666: URL: https://github.com/apache/spark/pull/31666#issuecomment-815453184 **[Test build #137058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137058/testReport)** for PR 31666 at commit [`b1bf28d`](https://github.com/apache/spark/commit/b1bf28d9ca413cf2dee2957e5d6d8eeb4c9a4f6e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries
SparkQA commented on pull request #32086: URL: https://github.com/apache/spark/pull/32086#issuecomment-815452977 **[Test build #137057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137057/testReport)** for PR 32086 at commit [`d274e25`](https://github.com/apache/spark/commit/d274e2501425b1b63161a78d7dbfe70c9fdef4c4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
AmplabJenkins commented on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815451838 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137053/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark
AmplabJenkins commented on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815451836 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41627/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()
AmplabJenkins commented on pull request #32085: URL: https://github.com/apache/spark/pull/32085#issuecomment-815451839 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41630/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
AmplabJenkins commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-815451835 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41626/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()
SparkQA commented on pull request #32085: URL: https://github.com/apache/spark/pull/32085#issuecomment-815443094 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()
SparkQA commented on pull request #32085: URL: https://github.com/apache/spark/pull/32085#issuecomment-815441227 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] KarlManong closed pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method
KarlManong closed pull request #32080: URL: https://github.com/apache/spark/pull/32080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] KarlManong commented on pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method
KarlManong commented on pull request #32080: URL: https://github.com/apache/spark/pull/32080#issuecomment-815440922 > BTW, this doesn't look like [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) because the case in [SPARK-34674](https://issues.apache.org/jira/browse/SPARK-34674) ends without throwing `Exception`. I'd like to recommend to have a separate JIRA issue if your case is considering unhandled `Exception` from your apps. It could be a general `spark-submit` issue for all resource managers like Mesos. Yes, my mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] KarlManong commented on a change in pull request #32080: [SPARK-34674] Spark app on k8s doesn't terminate without call to sparkContext.stop() method
KarlManong commented on a change in pull request #32080: URL: https://github.com/apache/spark/pull/32080#discussion_r609291585 ## File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ## @@ -1031,6 +1031,8 @@ object SparkSubmit extends CommandLineUtils with Logging { } catch { case e: SparkUserAppException => exitFn(e.exitCode) + case _: Throwable => +exitFn(1) Review comment: Yes, you are right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark
SparkQA commented on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815438615 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41627/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
SparkQA removed a comment on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786 **[Test build #137053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
SparkQA commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-815437826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
SparkQA commented on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815437760 **[Test build #137053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
AmplabJenkins removed a comment on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-815435047 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41629/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
AmplabJenkins commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-815435047 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41629/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
SparkQA commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-815435025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815433365 **[Test build #137056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137056/testReport)** for PR 32032 at commit [`594981a`](https://github.com/apache/spark/commit/594981a643a2e3c183b56a44f00c52ad4267c517). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins removed a comment on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815432956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41628/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
AmplabJenkins removed a comment on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815418245 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41621/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815432956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41628/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815432933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
SparkQA commented on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-815432387 **[Test build #137055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137055/testReport)** for PR 32054 at commit [`711f408`](https://github.com/apache/spark/commit/711f408eb1c72c3ae3956689e19474e3e8e5a45b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32086: [DO-NOT-MERGE] Increase the number of retries
SparkQA commented on pull request #32086: URL: https://github.com/apache/spark/pull/32086#issuecomment-815432339 **[Test build #137054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137054/testReport)** for PR 32086 at commit [`1deb43a`](https://github.com/apache/spark/commit/1deb43a4dfa9656ea18c87796d5a230e88b4d0de). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-815431356 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41625/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-815431356 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41625/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
allisonwang-db commented on a change in pull request #32054: URL: https://github.com/apache/spark/pull/32054#discussion_r609152666 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ## @@ -1765,4 +1765,35 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } } + + test("SPARK-34946: correlated scalar subquery in grouping expressions only") { Review comment: Agree. And it would also be easier to check for all analysis errors if we have the tests in one test suite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
allisonwang-db commented on a change in pull request #32054: URL: https://github.com/apache/spark/pull/32054#discussion_r609265541 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -305,6 +305,12 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { s"nor is it an aggregate function. " + "Add to group by or wrap in first() (or first_value) if you don't care " + "which value you get.") + case s: ScalarSubquery + if s.children.nonEmpty && !groupingExprs.exists(_.semanticEquals(s)) => +failAnalysis(s"Correlated scalar subquery '${s.sql}' is neither " + + s"present in the group by, nor in an aggregate function. Add it to group by " + + s"using ordinal position or wrap it in first() (or first_value) if you don't " + + s"care which value you get.") Review comment: Sounds good. Thanks for letting me know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #32086: [DO-NOT-MERGE] Increase the number of retries
HyukjinKwon opened a new pull request #32086: URL: https://github.com/apache/spark/pull/32086 ### What changes were proposed in this pull request? See if it makes the situation better. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-815421455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
HyukjinKwon commented on pull request #24559: URL: https://github.com/apache/spark/pull/24559#issuecomment-815420455  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32052: [SPARK-34955][SQL] ADD JAR command cannot add jar files which contains whitespaces in the path
HyukjinKwon commented on pull request #32052: URL: https://github.com/apache/spark/pull/32052#issuecomment-815420143 I agree with this change. Thanks @sarutak and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
SparkQA commented on pull request #32074: URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786 **[Test build #137053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
AmplabJenkins commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815418245 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41621/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815418216 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41621/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32078: [SPARK-34762][BUILD][FOLLOWUP] Remove the workaround for SPARK-34762
HyukjinKwon commented on pull request #32078: URL: https://github.com/apache/spark/pull/32078#issuecomment-815416885 Thanks guys! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32079: [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain()
dongjoon-hyun commented on pull request #32079: URL: https://github.com/apache/spark/pull/32079#issuecomment-815416368 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()
SparkQA commented on pull request #32085: URL: https://github.com/apache/spark/pull/32085#issuecomment-815415610 **[Test build #137052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137052/testReport)** for PR 32085 at commit [`f69b7e2`](https://github.com/apache/spark/commit/f69b7e291f188962b72f5411bc9aeed0c7dcfddf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815415395 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41621/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins removed a comment on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815398553 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41624/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-815415003 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41623/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-815415003 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41623/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-815414975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
SparkQA commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-815414900 **[Test build #137051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137051/testReport)** for PR 29087 at commit [`1278705`](https://github.com/apache/spark/commit/12787053aec9d015506d5c59c58e91dd23d5bb82). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815414821 **[Test build #137050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137050/testReport)** for PR 30144 at commit [`f67242d`](https://github.com/apache/spark/commit/f67242dd67d07ea7fbbe998943fcfe9461c6ada5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type
AmplabJenkins removed a comment on pull request #32037: URL: https://github.com/apache/spark/pull/32037#issuecomment-815414560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41620/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark
AmplabJenkins removed a comment on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815396462 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32085: [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain()
gengliangwang commented on pull request #32085: URL: https://github.com/apache/spark/pull/32085#issuecomment-815414590 This is to backport https://github.com/apache/spark/pull/32066 to branch-3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type
AmplabJenkins commented on pull request #32037: URL: https://github.com/apache/spark/pull/32037#issuecomment-815414560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41620/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type
SparkQA commented on pull request #32037: URL: https://github.com/apache/spark/pull/32037#issuecomment-815414539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
SparkQA commented on pull request #32084: URL: https://github.com/apache/spark/pull/32084#issuecomment-815414403 **[Test build #137048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137048/testReport)** for PR 32084 at commit [`e4bae2b`](https://github.com/apache/spark/commit/e4bae2bf77fa088d41edd02e9155a108fab36c34). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark
SparkQA commented on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815414458 **[Test build #137049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137049/testReport)** for PR 32083 at commit [`3b924c0`](https://github.com/apache/spark/commit/3b924c01cc2e329ede64725a4aca9ffd1f37f44e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #32085: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()
gengliangwang opened a new pull request #32085: URL: https://github.com/apache/spark/pull/32085 ### What changes were proposed in this pull request? The `explain()` method prints the arguments of tree nodes in logical/physical plans. The arguments could contain a map-type option that contains sensitive data. We should map-type options in the output of `explain()`. Otherwise, we will see sensitive data in explain output or Spark UI. ![image](https://user-images.githubusercontent.com/1097932/113719178-326ffb00-96a2-11eb-8a2c-28fca3e72941.png) ### Why are the changes needed? Data security. ### Does this PR introduce _any_ user-facing change? Yes, redact the map-type options in the output of `explain()` ### How was this patch tested? Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator
AmplabJenkins removed a comment on pull request #30334: URL: https://github.com/apache/spark/pull/30334#issuecomment-815413565 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41622/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator
SparkQA commented on pull request #30334: URL: https://github.com/apache/spark/pull/30334#issuecomment-815413557 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41622/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30334: [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator
AmplabJenkins commented on pull request #30334: URL: https://github.com/apache/spark/pull/30334#issuecomment-815413565 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41622/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN
AmplabJenkins removed a comment on pull request #31666: URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation
AmplabJenkins removed a comment on pull request #32082: URL: https://github.com/apache/spark/pull/32082#issuecomment-815413210 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137039/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods
AmplabJenkins removed a comment on pull request #32030: URL: https://github.com/apache/spark/pull/32030#issuecomment-815413211 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137038/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods
AmplabJenkins commented on pull request #32030: URL: https://github.com/apache/spark/pull/32030#issuecomment-815413211 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137038/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation
AmplabJenkins commented on pull request #32082: URL: https://github.com/apache/spark/pull/32082#issuecomment-815413210 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137039/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN
AmplabJenkins commented on pull request #31666: URL: https://github.com/apache/spark/pull/31666#issuecomment-815413212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods
SparkQA removed a comment on pull request #32030: URL: https://github.com/apache/spark/pull/32030#issuecomment-815353635 **[Test build #137038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137038/testReport)** for PR 32030 at commit [`1d42fb7`](https://github.com/apache/spark/commit/1d42fb7024b44f3f3debbacf64e19e1c6f61d47a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN
SparkQA removed a comment on pull request #31666: URL: https://github.com/apache/spark/pull/31666#issuecomment-815299424 **[Test build #137035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)** for PR 31666 at commit [`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation
SparkQA removed a comment on pull request #32082: URL: https://github.com/apache/spark/pull/32082#issuecomment-815355595 **[Test build #137039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137039/testReport)** for PR 32082 at commit [`1ae5520`](https://github.com/apache/spark/commit/1ae552090d91a2390eb6a8fb2d4908e264ea4771). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation
SparkQA commented on pull request #32082: URL: https://github.com/apache/spark/pull/32082#issuecomment-815409002 **[Test build #137039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137039/testReport)** for PR 32082 at commit [`1ae5520`](https://github.com/apache/spark/commit/1ae552090d91a2390eb6a8fb2d4908e264ea4771). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32084: [SPARK-34980][SQL] Support coalesce partition through union
ulysses-you opened a new pull request #32084: URL: https://github.com/apache/spark/pull/32084 ### What changes were proposed in this pull request? Coalesce children if plan is `Union`. ### Why are the changes needed? The rule `CoalesceShufflePartitions` can only coalesce paritition if * leaf node is ShuffleQueryStage * all shuffle have same partition number With `Union`, it might break the assumption. Let's say we have such plan ``` Union HashAggregate ShuffleQueryStage FileScan ``` `CoalesceShufflePartitions` can not optimize it and the result partition would be `shuffle partition + FileScan partition` which can be quite lagre. It's better to support partial optimize with `Union`. ### Does this PR introduce _any_ user-facing change? Probably yes, the result partition might changed. ### How was this patch tested? Add test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32030: [WIP] Improve the performance of mapChildren and withNewChildren methods
SparkQA commented on pull request #32030: URL: https://github.com/apache/spark/pull/32030#issuecomment-815406859 **[Test build #137038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137038/testReport)** for PR 32030 at commit [`1d42fb7`](https://github.com/apache/spark/commit/1d42fb7024b44f3f3debbacf64e19e1c6f61d47a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31666: [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN
SparkQA commented on pull request #31666: URL: https://github.com/apache/spark/pull/31666#issuecomment-815405904 **[Test build #137035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137035/testReport)** for PR 31666 at commit [`c7c3df6`](https://github.com/apache/spark/commit/c7c3df63592ad31c61e5a4c684a8d9aa3906f5e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
maropu commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r609224510 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Aggregate(newGroups, aggs, child) } + +private def containUnresolvedOrdinal(e: Expression): Boolean = e match { + case _: UnresolvedOrdinal => true + case gs: BaseGroupingSets => gs.children.exists(containUnresolvedOrdinal) + case _ => false +} + +private def resolveGroupByExpressionOrdinal( +expr: Expression, +aggs: Seq[Expression]): Expression = expr match { + case ordinal @ UnresolvedOrdinal(index) => +if (index > 0 && index <= aggs.size) { + aggs(index - 1) +} else { + throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) Review comment: Could you add tests for this code path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
maropu commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r609223977 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1788,16 +1788,30 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Review comment: `((resolveGroupByExpressionOrdinal(_, aggs)))` -> `(resolveGroupByExpressionOrdinal(_, aggs))` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32079: [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain()
gengliangwang commented on pull request #32079: URL: https://github.com/apache/spark/pull/32079#issuecomment-815402168 @dongjoon-hyun Sure, I will backport to 3.0 and update the JIRA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #32083: [WIP][SPARK-34886][PYTHON] Port/integrate Koalas DataFrame unit test into PySpark
ueshin commented on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815401615 ok to test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause
maropu commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-815400618 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815398553 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41624/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815398540 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41624/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-815398569 **[Test build #137047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137047/testReport)** for PR 31517 at commit [`f61b041`](https://github.com/apache/spark/commit/f61b0410491f6cdc75bdf51dfc13857a6cd5b65a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA removed a comment on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815395423 **[Test build #137045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)** for PR 30144 at commit [`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins removed a comment on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815397425 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137045/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
AmplabJenkins commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815397425 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137045/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815397407 **[Test build #137045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)** for PR 30144 at commit [`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SQLProcessor(object):` * `trait FunctionRegistryBase[T] ` * `trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging ` * `trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T] ` * `trait FunctionRegistry extends FunctionRegistryBase[Expression] ` * `trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan] ` * `class NoSuchFunctionException(` * `case class ResolveTableValuedFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan] ` * `abstract class QuaternaryExpression extends Expression with QuaternaryLike[Expression] ` * `abstract class Covariance(val left: Expression, val right: Expression, nullOnDivideByZero: Boolean)` * `trait BaseGroupingSets extends Expression with CodegenFallback ` * `case class Cube(` * `trait SimpleHigherOrderFunction extends HigherOrderFunction with BinaryLike[Expression] ` * `trait QuaternaryLike[T <: TreeNode[T]] ` * `trait DataWritingCommand extends UnaryCommand ` * `trait RunnableCommand extends Command ` * `trait BaseCacheTableExec extends LeafV2CommandExec ` * `sealed trait V1FallbackWriters extends LeafV2CommandExec with SupportsV1Write ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32083: [WIP] [SPARK-34886] Port/integrate Koalas DataFrame unit test into PySpark
AmplabJenkins commented on pull request #32083: URL: https://github.com/apache/spark/pull/32083#issuecomment-815396462 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-815396136 **[Test build #137046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137046/testReport)** for PR 32032 at commit [`b78cfdb`](https://github.com/apache/spark/commit/b78cfdb896d8ae0d6da8e96631749ad47fa35623). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
imback82 commented on a change in pull request #32032: URL: https://github.com/apache/spark/pull/32032#discussion_r609213767 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala ## @@ -94,19 +94,19 @@ case class CacheTableAsSelectExec( override lazy val relationName: String = tempViewName override lazy val planToCache: LogicalPlan = { -Dataset.ofRows(sparkSession, - CreateViewCommand( -name = TableIdentifier(tempViewName), -userSpecifiedColumns = Nil, -comment = None, -properties = Map.empty, -originalText = Some(originalText), -child = query, -allowExisting = false, -replace = false, -viewType = LocalTempView - ) -) +CreateViewCommand( + name = TableIdentifier(tempViewName), + userSpecifiedColumns = Nil, + comment = None, + properties = Map.empty, + originalText = Some(originalText), + plan = query, + allowExisting = false, + replace = false, + viewType = LocalTempView, + isAnalyzed = true +).run(sparkSession) Review comment: We can just call `run` now that the `plan` is analyzed; no need to go thru `Dataset.ofRows`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-databricks opened a new pull request #32083: [WIP] [SPARK-34886] Port/integrate Koalas DataFrame unit test into PySpark
xinrong-databricks opened a new pull request #32083: URL: https://github.com/apache/spark/pull/32083 ### What changes were proposed in this pull request? Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas DataFrame unit test to PySpark. ### Why are the changes needed? Currently, the pandas-on-Spark modules are not tested at all. We should enable the DataFrame unit test first. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Enable the DataFrame unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32037: [SPARK-34944][SQL][TESTS] Replace bigint with int for web_returns and store_returns in TPCDS tests to employ correct data type
yaooqinn commented on a change in pull request #32037: URL: https://github.com/apache/spark/pull/32037#discussion_r609213392 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -21,6 +21,49 @@ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession + +/** + * Base trait for TPC-DS related tests. + * + * Datatype mapping for TPC-DS and Spark SQL, see more at: + * http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.9.0.pdf + * + *|---|---| + *|TPC-DS | Spark SQL | + *|---|---| + *| Identifier | INT | + *|---|---| + *|Integer| INT | Review comment: > After this PR, the field schemas are now consistent with those DDLs in the `tpcds.sql` from tpc-ds tool kit, see https://gist.github.com/yaooqinn/b9978a77bbf4f871a95d6a9103019907 removed, according to this line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30144: [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics
SparkQA commented on pull request #30144: URL: https://github.com/apache/spark/pull/30144#issuecomment-815395423 **[Test build #137045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137045/testReport)** for PR 30144 at commit [`3d6e2c4`](https://github.com/apache/spark/commit/3d6e2c475747c5c3eb981aee8faa504b5af7df59). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
AmplabJenkins removed a comment on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-814862152 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136992/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org