[GitHub] [spark] chitralverma commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
chitralverma commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510762780 Will this also handle the issues with array types? in the golden files the array types also change after conversion to string This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
AmplabJenkins removed a comment on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#issuecomment-510762337 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
AmplabJenkins removed a comment on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#issuecomment-510762340 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107576/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510762424 **[Test build #107580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107580/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
wangyum commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510781800 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on issue #25058: [SPARK-21067][SQL] Fix Thrift Server - CTAS fail with Unable to move source
xianyinxin commented on issue #25058: [SPARK-21067][SQL] Fix Thrift Server - CTAS fail with Unable to move source URL: https://github.com/apache/spark/pull/25058#issuecomment-510781623 @jerryshao I don't know if I understand the problem correctly. I submitted a patch under that jira which is based on spark-2.3.2 several months ago. My understanding is that It is the working thread who created the FS. This FS client would be closed if the working thread closes. If we let the main thread create the FS, the problem could be resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerryshao commented on issue #25058: [SPARK-21067][SQL] Fix Thrift Server - CTAS fail with Unable to move source
jerryshao commented on issue #25058: [SPARK-21067][SQL] Fix Thrift Server - CTAS fail with Unable to move source URL: https://github.com/apache/spark/pull/25058#issuecomment-510790332 Yes, you can carefully track which thread to create FS and guarantee the same thread to close FS, then this problem could be solved, but it is a little hard to track. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ketank-new commented on issue #24861: [SPARK-26985][CORE] Fix "access only some column of the all of columns " for big endian architecture BUG
ketank-new commented on issue #24861: [SPARK-26985][CORE] Fix "access only some column of the all of columns " for big endian architecture BUG URL: https://github.com/apache/spark/pull/24861#issuecomment-510809275 @srowen : I was just preparing myself to PR into branch 2.4 as asked by you As i updated my master i notice my s390x related changes have already been reflecting in master so i just wanted to re check with you 1) Is there is anything remaining from my side to make sure my s390x changes get tagged to a version? 2) Do i still need to raise a PR to spark branch 2.4? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510813986 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510813994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107582/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510813891 **[Test build #107582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107582/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510772163 **[Test build #107582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107582/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510823080 **[Test build #107589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107589/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25125: [SPARK-28357][CORE][TEST] Fix Flaky Test - FileAppenderSuite.rollingfile appender - size-based rolling compressed
HyukjinKwon commented on issue #25125: [SPARK-28357][CORE][TEST] Fix Flaky Test - FileAppenderSuite.rollingfile appender - size-based rolling compressed URL: https://github.com/apache/spark/pull/25125#issuecomment-510823121 Merged to master, branch-2.4 and branch-2.3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables.
cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables. URL: https://github.com/apache/spark/pull/25040#discussion_r302897660 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/DescribeTableSchemas.scala ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans + +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.types.{MetadataBuilder, StringType, StructField, StructType} + +private[sql] object DescribeTableSchemas { + val DESCRIBE_TABLE_ATTRIBUTES = Seq( Review comment: We shouldn't define attributes in an object. `AttributeReference` will be assigned a unique ID when created, and in general we should create new attributes when creating a new logical plan. For example, if you do `df1 = sql("desc table t1"); df2 = sql("desc table ");`, `df1.join(df2)` would fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
SparkQA removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799899 **[Test build #107586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107586/testReport)** for PR 25047 at commit [`0153024`](https://github.com/apache/spark/commit/0153024c1eee44af9128275bf62f6ff8613afdfb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510845556 **[Test build #107586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107586/testReport)** for PR 25047 at commit [`0153024`](https://github.com/apache/spark/commit/0153024c1eee44af9128275bf62f6ff8613afdfb). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-510850732 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-510849683 @squito @srowen @dongjoon-hyun by having a handler (as mentioned in the ticket by HenryYu) without running shutdownhooks we could solve also: https://issues.apache.org/jira/browse/SPARK-27812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-510849683 @squito @srowen @dongjoon-hyun by having a handler (as mentioned in the ticket by HenryYu) without running shutdownhook we could solve also: https://issues.apache.org/jira/browse/SPARK-27812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-510850738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12719/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test
AmplabJenkins commented on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test URL: https://github.com/apache/spark/pull/25133#issuecomment-510850660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12718/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
AmplabJenkins commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-510850738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12719/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
AmplabJenkins commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-510850732 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test
AmplabJenkins commented on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test URL: https://github.com/apache/spark/pull/25133#issuecomment-510850655 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24860: [SPARK-28034][SQL][TEST] Port with.sql
dongjoon-hyun closed pull request #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510760043 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25041: [SPARK-28133][SQL] Add acosh/asinh/atanh functions to SQL
dongjoon-hyun commented on a change in pull request #25041: [SPARK-28133][SQL] Add acosh/asinh/atanh functions to SQL URL: https://github.com/apache/spark/pull/25041#discussion_r302839725 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala ## @@ -557,6 +581,32 @@ case class Sin(child: Expression) extends UnaryMathExpression(math.sin, "SIN") """) case class Sinh(child: Expression) extends UnaryMathExpression(math.sinh, "SINH") +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Returns inverse hyperbolic sine of `expr`. + """, + arguments = """ +Arguments: + * expr - hyperbolic angle + """, + examples = """ +Examples: + > SELECT _FUNC_(0); + 0.0 + """, + since = "3.0.0") +case class Asinh(child: Expression) + extends UnaryMathExpression((x: Double) => x match { +case Double.NegativeInfinity => Double.NegativeInfinity +case _ => math.log(x + math.sqrt(x * x + 1.0)) }, "ASINH") { + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, c => + s"""${ev.value} = \"$c\" == \"Double.NEGATIVE_INFINITY\" ? """ + Review comment: @Tonix517 . This will fail like the following. ``` spark-sql> CREATE TABLE i AS SELECT double('-Infinity') a; spark-sql> SELECT asinh(a) FROM i; NaN ``` `codegen` part is difficult for new developers. You need to check the generated java code by your PR manually. For example, the following is the generated code by this PR. ``` /* 035 */ project_value_0 = project_value_0 = "inputadapter_value_0" == "Double.NEGATIVE_INFINITY" ? java.lang.Double.NEGATIVE_INFINITY : java.lang.Math.log(inputadapter_value_0 + java.lang.Math.sqrt(inputadapter_value_0 * inputadapter_value_0 + 1.0));; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510760046 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12705/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
HyukjinKwon commented on a change in pull request #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#discussion_r302839644 ## File path: sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-case.sql.out ## @@ -184,34 +184,34 @@ struct -- !query 19 SELECT CASE WHEN CAST(udf(1=0) AS boolean) THEN 1/0 WHEN 1=1 THEN 1 ELSE 2/0 END -- !query 19 schema -struct +struct -- !query 19 output 1.0 -- !query 20 SELECT CASE 1 WHEN 0 THEN 1/udf(0) WHEN 1 THEN 1 ELSE 2/0 END -- !query 20 schema -struct +struct -- !query 20 output 1.0 -- !query 21 SELECT CASE WHEN i > 100 THEN udf(1/0) ELSE udf(0) END FROM case_tbl -- !query 21 schema -struct 100) THEN udf((cast(1 as double) / cast(0 as double))) ELSE udf(0) END:string> +struct 100) THEN CAST(udf(cast((cast(1 as double) / cast(0 as double)) as string)) AS DOUBLE) ELSE CAST(CAST(udf(cast(0 as string)) AS INT) AS DOUBLE) END:double> -- !query 21 output -0 -0 -0 -0 +0.0 +0.0 +0.0 +0.0 Review comment: It's closer to the original output: https://github.com/apache/spark/blob/fe3e34dda68fd54212df1dd01b8acb9a9bc6a0ad/sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out#L205-L208 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
SparkQA commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510766188 **[Test build #107581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107581/testReport)** for PR 25131 at commit [`ba91966`](https://github.com/apache/spark/commit/ba919665cca5e5118cf0eb90e305d7d4ad12e2d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile opened a new pull request #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
gatorsmile opened a new pull request #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131 ## What changes were proposed in this pull request? A code gen test in WholeStageCodeGenSuite was flaky because it used the codegen metrics class to test if the generated code for equivalent plans was identical under a particular flag. This patch switches the test to compare the generated code directly. ## How was this patch tested? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost URL: https://github.com/apache/spark/pull/24462#issuecomment-510766013 @yifeih Thank you, I understand now. But can your way (making `MapStatus` able to contain an empty location in order to not resubmit map stage tasks) deals with this condition: the `MapStatus` returned contains a valid location, at the same time, we don't want the Driver to unregister this shuffle output when executors lost(Maybe due to the map output is also backed up in DFS)? In other words, what Driver decides to do when invalidating an executor(what this PR works on) and how the Executors tell Driver the `MapStatus`(with or without a location) are two different things. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799899 **[Test build #107586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107586/testReport)** for PR 25047 at commit [`0153024`](https://github.com/apache/spark/commit/0153024c1eee44af9128275bf62f6ff8613afdfb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables.
cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables. URL: https://github.com/apache/spark/pull/25040#discussion_r302911446 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/DescribeTableSchemas.scala ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans + +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.types.{MetadataBuilder, StringType, StructField, StructType} + +private[sql] object DescribeTableSchemas { + val DESCRIBE_TABLE_ATTRIBUTES = Seq( Review comment: or, we call follow `DescribeCommandBase`: create attributes in an abstract class instead of object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
SparkQA commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510835068 **[Test build #107590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107590/testReport)** for PR 25029 at commit [`7d9d96f`](https://github.com/apache/spark/commit/7d9d96f6f6e6e41eb6d25d03e4dc9a9fd93728ad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510839195 **[Test build #107591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107591/testReport)** for PR 25130 at commit [`a363c23`](https://github.com/apache/spark/commit/a363c23f7d254ffde61b2550403105241ea7afec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510838615 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510838621 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12717/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510847623 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510847628 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107584/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
SparkQA removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510721370 **[Test build #107573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107573/testReport)** for PR 25111 at commit [`58ff049`](https://github.com/apache/spark/commit/58ff049323265c31b9951f5e4b851eaeb28aed67). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510758840 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107573/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510758836 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510758836 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510758840 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107573/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510758452 **[Test build #107573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107573/testReport)** for PR 25111 at commit [`58ff049`](https://github.com/apache/spark/commit/58ff049323265c31b9951f5e4b851eaeb28aed67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chitralverma commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
chitralverma commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510764646 sure! thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon edited a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510763658 Array issue will still stands but I think this can address most of our cases. I would like to avoid add all combinations of Python / Scalar UDFs for tests that mainly targets plans. Let's work around array ones in those tests specifically. Those set of tests should really target plan specifically. I will comment in that PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#discussion_r302843863 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out ## @@ -0,0 +1,494 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 32 + + +-- !query 0 +create temporary view courseSales as select * from values + ("dotNET", 2012, 1), + ("Java", 2012, 2), + ("dotNET", 2012, 5000), + ("dotNET", 2013, 48000), + ("Java", 2013, 3) + as courseSales(course, year, earnings) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view years as select * from values + (2012, 1), + (2013, 2) + as years(y, s) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view yearsWithComplexTypes as select * from values + (2012, array(1, 1), map('1', 1), struct(1, 'a')), + (2013, array(2, 2), map('2', 2), struct(2, 'b')) + as yearsWithComplexTypes(y, a, m, s) +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3 +SELECT * FROM ( + SELECT udf(year), course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 3 schema +struct +-- !query 3 output +2012 15000 2 +2013 48000 3 + + +-- !query 4 +SELECT * FROM courseSales +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +) +-- !query 4 schema +struct +-- !query 4 output +Java 2 3 +dotNET 15000 48000 + + +-- !query 5 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(avg(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 5 schema +struct +-- !query 5 output +2012 15000 7500.0 2 2.0 +2013 48000 48000.0 3 3.0 + + +-- !query 6 +SELECT * FROM ( + SELECT udf(course) as course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 6 schema +struct +-- !query 6 output +63000 5 + + +-- !query 7 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(min(year)) + FOR course IN ('dotNET', 'Java') +) +-- !query 7 schema +struct +-- !query 7 output +63000 20125 2012 + + +-- !query 8 +SELECT * FROM ( + SELECT course, year, earnings, udf(s) as s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)) + FOR s IN (1, 2) +) +-- !query 8 schema +struct +-- !query 8 output +Java 20122 nan +Java 2013nan 3 +dotNET 201215000 nan +dotNET 2013nan 48000 + + +-- !query 9 +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)), udf(min(s)) + FOR course IN ('dotNET', 'Java') +) +-- !query 9 schema +struct +-- !query 9 output +2012 15000 1 2 1 +2013 48000 2 3 2 + + +-- !query 10 +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings * s)) + FOR course IN ('dotNET', 'Java') +) +-- !query 10 schema +struct +-- !query 10 output +2012 15000 2 +2013 96000 6 + + +-- !query 11 +SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012, 2013) +) +-- !query 11 schema +struct<2012_s:string,2013_s:string,2012_a:string,2013_a:string,c:string> +-- !query 11 output +15000 48000 7500.0 48000.0 dotNET +2 3 2.0 3.0 Java + + +-- !query 12 +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012 as firstYear, 2013 secondYear) +) +-- !query 12 schema +struct +-- !query 12 output +15000 48000 7500.0 48000.0 dotNET +2 3 2.0 3.0 Java + + +-- !query 13 +SELECT * FROM courseSales +PIVOT ( + udf(abs(earnings)) + FOR year IN (2012, 2013) +) +-- !query 13 schema +struct<> +-- !query 13 output +org.apache.spark.sql.AnalysisException +Aggregate expression required for pivot, but 'coursesales.`earnings`' did not appear in any aggregate function.; + + +-- !query 14 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), year + FOR course IN ('dotNET', 'Java') +) +-- !query 14 schema +struct<> +-- !query 14 output +org.apache.spark.sql.AnalysisException +Aggregate expression required for pivot, but '__auto_generated_subquery_name.`year`' did not appear in any aggregate function.; + + +-- !query 15 +SELECT * FROM ( + SELECT course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +) +-- !query 15 schema +struct<> +--
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510771931 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510774972 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chitralverma commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
chitralverma commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#discussion_r302865222 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out ## @@ -0,0 +1,494 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 32 + + +-- !query 0 +create temporary view courseSales as select * from values + ("dotNET", 2012, 1), + ("Java", 2012, 2), + ("dotNET", 2012, 5000), + ("dotNET", 2013, 48000), + ("Java", 2013, 3) + as courseSales(course, year, earnings) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view years as select * from values + (2012, 1), + (2013, 2) + as years(y, s) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view yearsWithComplexTypes as select * from values + (2012, array(1, 1), map('1', 1), struct(1, 'a')), + (2013, array(2, 2), map('2', 2), struct(2, 'b')) + as yearsWithComplexTypes(y, a, m, s) +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3 +SELECT * FROM ( + SELECT udf(year), course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 3 schema +struct +-- !query 3 output +2012 15000 2 +2013 48000 3 + + +-- !query 4 +SELECT * FROM courseSales +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +) +-- !query 4 schema +struct +-- !query 4 output +Java 2 3 +dotNET 15000 48000 + + +-- !query 5 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(avg(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 5 schema +struct +-- !query 5 output +2012 15000 7500.0 2 2.0 +2013 48000 48000.0 3 3.0 + + +-- !query 6 +SELECT * FROM ( + SELECT udf(course) as course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +) +-- !query 6 schema +struct +-- !query 6 output +63000 5 + + +-- !query 7 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(min(year)) + FOR course IN ('dotNET', 'Java') +) +-- !query 7 schema +struct +-- !query 7 output +63000 20125 2012 + + +-- !query 8 +SELECT * FROM ( + SELECT course, year, earnings, udf(s) as s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)) + FOR s IN (1, 2) +) +-- !query 8 schema +struct +-- !query 8 output +Java 20122 nan +Java 2013nan 3 +dotNET 201215000 nan +dotNET 2013nan 48000 + + +-- !query 9 +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)), udf(min(s)) + FOR course IN ('dotNET', 'Java') +) +-- !query 9 schema +struct +-- !query 9 output +2012 15000 1 2 1 +2013 48000 2 3 2 + + +-- !query 10 +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings * s)) + FOR course IN ('dotNET', 'Java') +) +-- !query 10 schema +struct +-- !query 10 output +2012 15000 2 +2013 96000 6 + + +-- !query 11 +SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012, 2013) +) +-- !query 11 schema +struct<2012_s:string,2013_s:string,2012_a:string,2013_a:string,c:string> +-- !query 11 output +15000 48000 7500.0 48000.0 dotNET +2 3 2.0 3.0 Java + + +-- !query 12 +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012 as firstYear, 2013 secondYear) +) +-- !query 12 schema +struct +-- !query 12 output +15000 48000 7500.0 48000.0 dotNET +2 3 2.0 3.0 Java + + +-- !query 13 +SELECT * FROM courseSales +PIVOT ( + udf(abs(earnings)) + FOR year IN (2012, 2013) +) +-- !query 13 schema +struct<> +-- !query 13 output +org.apache.spark.sql.AnalysisException +Aggregate expression required for pivot, but 'coursesales.`earnings`' did not appear in any aggregate function.; + + +-- !query 14 +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), year + FOR course IN ('dotNET', 'Java') +) +-- !query 14 schema +struct<> +-- !query 14 output +org.apache.spark.sql.AnalysisException +Aggregate expression required for pivot, but '__auto_generated_subquery_name.`year`' did not appear in any aggregate function.; + + +-- !query 15 +SELECT * FROM ( + SELECT course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +) +-- !query 15 schema +struct<> +--
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510822368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12715/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510822368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12715/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510822359 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510822359 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables.
cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables. URL: https://github.com/apache/spark/pull/25040#discussion_r302917683 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalog.v2.{Identifier, TableCatalog} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.encoders.RowEncoder +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, GenericRowWithSchema} +import org.apache.spark.sql.catalyst.plans.DescribeTableSchemas +import org.apache.spark.sql.execution.LeafExecNode +import org.apache.spark.sql.sources.v2.Table + +case class DescribeTableExec( +catalog: TableCatalog, +ident: Identifier, +isExtended: Boolean) extends LeafExecNode { + + import DescribeTableExec._ + + override def output: Seq[AttributeReference] = DescribeTableSchemas.DESCRIBE_TABLE_ATTRIBUTES + + override protected def doExecute(): RDD[InternalRow] = { +val rows = new ArrayBuffer[InternalRow]() +if (catalog.tableExists(ident)) { + val table = catalog.loadTable(ident) + addSchema(rows, table) + + if (isExtended) { +addPartitioning(rows, table) +addProperties(rows, table) + } + +} else { + rows += toCatalystRow(s"Table $ident does not exist.", "", "") +} +sparkContext.parallelize(rows) + } + + private def addSchema(rows: ArrayBuffer[InternalRow], table: Table): Unit = { +rows ++= table.schema.map{ column => + toCatalystRow( +column.name, column.dataType.simpleString, column.getComment().getOrElse("")) +} + } + + private def addPartitioning(rows: ArrayBuffer[InternalRow], table: Table): Unit = { +rows += EMPTY_ROW +rows += toCatalystRow(" Partitioning", "", "") +rows += toCatalystRow("--", "", "") +if (table.partitioning.isEmpty) { + rows += toCatalystRow("Not partitioned", "", "") +} else { + rows ++= table.partitioning.zipWithIndex.map { +case (transform, index) => toCatalystRow(s"Part $index", transform.describe(), "") + } +} + } + + private def addProperties(rows: ArrayBuffer[InternalRow], table: Table): Unit = { +rows += EMPTY_ROW +rows += toCatalystRow(" Table Property", " Value", "") +rows += toCatalystRow("", "---", "") +rows ++= table.properties.asScala.toList.sortBy(_._1).map { + case (key, value) => toCatalystRow(key, value, "") +} + } +} + +private object DescribeTableExec { + private val ENCODER = RowEncoder(DescribeTableSchemas.DESCRIBE_TABLE_SCHEMA) + private val EMPTY_ROW = toCatalystRow("", "", "") + + private def toCatalystRow(strs: String*): InternalRow = { +ENCODER.resolveAndBind().toRow( Review comment: the encoder only need to call `resolveAndBind` once This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302921436 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return the plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +// child might contain an inner CTE that has priority so traverse and substitute inner CTEs +// in child first +val traversedChild: LogicalPlan = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +// Substitute CTE definitions from last to first as a CTE definition can reference a +// previous one +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +// A CTE definition might contain an inner CTE that has priority so traverse and +// substitute ctePlan +// A
[GitHub] [spark] AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510834507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12716/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302921019 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return the plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +// child might contain an inner CTE that has priority so traverse and substitute inner CTEs +// in child first +val traversedChild: LogicalPlan = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +// Substitute CTE definitions from last to first as a CTE definition can reference a +// previous one +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +// A CTE definition might contain an inner CTE that has priority so traverse and Review comment: fixed
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302921057 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return the plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +// child might contain an inner CTE that has priority so traverse and substitute inner CTEs +// in child first +val traversedChild: LogicalPlan = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +// Substitute CTE definitions from last to first as a CTE definition can reference a +// previous one +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +// A CTE definition might contain an inner CTE that has priority so traverse and +// substitute ctePlan Review comment:
[GitHub] [spark] AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510834504 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510834504 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510834507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12716/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test
AmplabJenkins removed a comment on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test URL: https://github.com/apache/spark/pull/25133#issuecomment-510850660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12718/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test
AmplabJenkins removed a comment on issue #25133: [SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to prevent invalid locale error during test URL: https://github.com/apache/spark/pull/25133#issuecomment-510850655 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25112: [SPARK-28347][K8S] Add gcompat to spark k8s Dockerfile
dongjoon-hyun commented on issue #25112: [SPARK-28347][K8S] Add gcompat to spark k8s Dockerfile URL: https://github.com/apache/spark/pull/25112#issuecomment-510758158 @vanzin . Can we have SPARK-26995 in `branch-2.4`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510772163 **[Test build #107582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107582/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510776996 **[Test build #107583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107583/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510776387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12709/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510776376 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510783841 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510783667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12710/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799243 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799243 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12712/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-510799249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12712/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510819706 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107583/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510820451 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510820521 **[Test build #107588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107588/testReport)** for PR 25130 at commit [`103c673`](https://github.com/apache/spark/commit/103c673186edf89c2999fa7e5d6547387d3d923f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables.
cloud-fan commented on a change in pull request #25040: [SPARK-28238][SQL] Implement DESCRIBE TABLE for Data Source V2 Tables. URL: https://github.com/apache/spark/pull/25040#discussion_r302911446 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/DescribeTableSchemas.scala ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans + +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.types.{MetadataBuilder, StringType, StructField, StructType} + +private[sql] object DescribeTableSchemas { + val DESCRIBE_TABLE_ATTRIBUTES = Seq( Review comment: or, we call follow `DescribeCommandBase`: create attributes in an abstract class instead of object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510825521 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107585/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302920838 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302920913 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2322,6 +2328,8 @@ class SQLConf extends Serializable with Logging { def defaultV2Catalog: Option[String] = getConf(DEFAULT_V2_CATALOG) + def legacyCTESubstitutionEnabled: Boolean = getConf(LEGACY_CTE_SUBSTITUTION_ENABLED) Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
SparkQA commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#issuecomment-510761892 **[Test build #107576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107576/testReport)** for PR 25098 at commit [`125a504`](https://github.com/apache/spark/commit/125a504059123dc4274c07796f81645a716e090f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510761916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12706/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510761914 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510761498 Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
dongjoon-hyun commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510761463 Please test with `maven`, too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op
HyukjinKwon edited a comment on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making them no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510760863 FYI @skonto, @imback82, @huaxingao, @vinodkc, @manuzhang, @chitralverma After this fix, we won't have to worry about those mismatch anymore but just insert `udf`s without, virtually, notable restrictions like `CAST` or `upper` workarounds. After this fix, we can get rid of all those workaround if there are any in the PR of your guys This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on issue #25118: [SPARK-27878][SQL] Support ARRAY(subquery) expressions
peter-toth commented on issue #25118: [SPARK-27878][SQL] Support ARRAY(subquery) expressions URL: https://github.com/apache/spark/pull/25118#issuecomment-510769573 I found only BigQuery that supports it: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#creating-arrays-from-subqueries This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
AmplabJenkins commented on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510773251 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107581/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510773316 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins commented on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510773322 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107579/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
SparkQA removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510760558 **[Test build #107579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107579/testReport)** for PR 25130 at commit [`1791fe6`](https://github.com/apache/spark/commit/1791fe668fa4db276915b442acc1a760cf96d358). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25128: [SPARK-28270][test-maven][FOLLOW-UP][SQL][PYTHON][TESTS] Avoid cast input of UDF as double in the failed test in udf-aggregate_part1.sql
SparkQA removed a comment on issue #25128: [SPARK-28270][test-maven][FOLLOW-UP][SQL][PYTHON][TESTS] Avoid cast input of UDF as double in the failed test in udf-aggregate_part1.sql URL: https://github.com/apache/spark/pull/25128#issuecomment-510718472 **[Test build #107572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107572/testReport)** for PR 25128 at commit [`78d636d`](https://github.com/apache/spark/commit/78d636d49efc416ac50367bb751b8819f263316b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510773316 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
AmplabJenkins removed a comment on issue #25130: [SPARK-28359][test-maven][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-510773208 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name
AmplabJenkins removed a comment on issue #25131: [SPARK-28361] [SQL] [TEST] Test equality of generated code with id in class name URL: https://github.com/apache/spark/pull/25131#issuecomment-510773248 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org