[GitHub] [spark] SparkQA commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
SparkQA commented on pull request #32446: URL: https://github.com/apache/spark/pull/32446#issuecomment-834089554 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42758/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834085008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
HyukjinKwon commented on pull request #32446: URL: https://github.com/apache/spark/pull/32446#issuecomment-834083871 Looks pretty good to me too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #32451: [SPARK-35144][SQL] Migrate to transformWithPruning for object rules
gengliangwang commented on a change in pull request #32451: URL: https://github.com/apache/spark/pull/32451#discussion_r627939566 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala ## @@ -207,7 +211,8 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] { } } - def apply(plan: LogicalPlan): LogicalPlan = plan transform { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( +_.containsAllPatterns(OBJECT_CONSUMER, PROJECT), ruleId) { Review comment: why do we need `OBJECT_CONSUMER` here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
yaooqinn commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834079593 > is this a regression? this is not a regression, but hive supports it. > It seems to me that this should fail, as it's similar to `sql("select a b from values(1) t(a)").repartitionBy("a")` For resolving attributes, `distribute by` and `cluster by` clauses should behave the same as `sort by` and `group by`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on a change in pull request #32450: [SPARK-35282][SQL] Support AQE side shuffled hash join formula
advancedxy commented on a change in pull request #32450: URL: https://github.com/apache/spark/pull/32450#discussion_r627935120 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala ## @@ -49,12 +50,15 @@ object Statistics { * @param attributeStats Statistics for Attributes. * @param isRuntime Whether the statistics is inferred from query stage runtime statistics during * adaptive query execution. + * @param mapOutputStatistics the map output statistics from query stage runtime statistics during + *adaptive query execution. */ case class Statistics( sizeInBytes: BigInt, rowCount: Option[BigInt] = None, attributeStats: AttributeMap[ColumnStat] = AttributeMap(Nil), -isRuntime: Boolean = false) { +isRuntime: Boolean = false, +mapOutputStatistics: Option[MapOutputStatistics] = None) { Review comment: FYI, we took anther approach to support SHJ in AQE. We added a rule in `AdaptiveSparkPlanExec` to convert SMJ to SHJ according to shuffle stats, which requires no changes in `Statistics.scala` as the statistics is ready in `ShuffleStageInfo`. The SMJ could also be converted to SHJ if applicable even if `PREFER_SORTMERGE` is set. cc @Liulietong cc @luuliietong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627933663 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.functions + +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault, JavaLongAddMagic, JavaLongAddStaticMagic} + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._ +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog} +import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, ScalarFunction, UnboundFunction} +import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{DataType, LongType, StructType} + +/** + * Benchmark to measure DataSourceV2 UDF performance + * {{{ + * To run this benchmark: + * 1. without sbt: + * bin/spark-submit --class + *--jars , + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/FunctionBenchmark-results.txt". + * }}} + * '''NOTE''': to update the result of this benchmark, please use Github benchmark action: + * https://spark.apache.org/developer-tools.html#github-workflow-benchmarks + */ +object FunctionBenchmark extends SqlBasedBenchmark { Review comment: Sounds good. I'll change it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627933496 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java ## @@ -29,33 +29,62 @@ * * The JVM type of result values produced by this function must be the type used by Spark's * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * The mapping between {@link DataType} and the corresponding JVM type is defined below. * * IMPORTANT: the default implementation of {@link #produceResult} throws - * {@link UnsupportedOperationException}. Users can choose to override this method, or implement - * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes individual parameters - * instead of a {@link InternalRow}. The magic method will be loaded by Spark through Java - * reflection and will also provide better performance in general, due to optimizations such as - * codegen, removal of Java boxing, etc. - * + * {@link UnsupportedOperationException}. Users must choose to either override this method, or + * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes individual parameters + * instead of a {@link InternalRow}. The magic method approach is generally recommended because it + * provides better performance over the default {@link #produceResult}, due to optimizations such + * as whole-stage codegen, elimination of Java boxing, etc. + * + * In addition, for stateless Java functions, users can optionally define the + * {@link #MAGIC_METHOD_NAME} as a static method, which further avoids certain runtime costs such + * as nullness check on the method receiver, potential Java dynamic dispatch, etc. Review comment: For non-static method `Invoke` needs to check if the method receiver is null or not, and only invoke it if it is not null, but for static method this is not necessary. ```scala val code = obj.code + code""" boolean ${ev.isNull} = true; $javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; if (!${obj.isNull}) { < check if receiver is null $argCode ${ev.isNull} = $resultIsNull; if (!${ev.isNull}) { $evaluate } } """ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala
SparkQA removed a comment on pull request #32461: URL: https://github.com/apache/spark/pull/32461#issuecomment-833982664 **[Test build #138221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138221/testReport)** for PR 32461 at commit [`6fc4523`](https://github.com/apache/spark/commit/6fc4523a4505ae4e8b5f8036d00f042988c2bb5c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627932159 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.functions + +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault, JavaLongAddMagic, JavaLongAddStaticMagic} + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._ +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog} +import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, ScalarFunction, UnboundFunction} +import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{DataType, LongType, StructType} + +/** + * Benchmark to measure DataSourceV2 UDF performance + * {{{ + * To run this benchmark: + * 1. without sbt: + * bin/spark-submit --class + *--jars , + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/FunctionBenchmark-results.txt". + * }}} + * '''NOTE''': to update the result of this benchmark, please use Github benchmark action: + * https://spark.apache.org/developer-tools.html#github-workflow-benchmarks + */ +object FunctionBenchmark extends SqlBasedBenchmark { + val catalogName: String = "benchmark_catalog" + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +val N = 500L * 1000 * 1000 +Seq(true, false).foreach { codegenEnabled => + Seq(true, false).foreach { resultNullable => +scalarFunctionBenchmark(N, codegenEnabled = codegenEnabled, + resultNullable = resultNullable) + } +} + } + + private def scalarFunctionBenchmark( + N: Long, + codegenEnabled: Boolean, + resultNullable: Boolean): Unit = { +withSQLConf(s"spark.sql.catalog.$catalogName" -> classOf[InMemoryCatalog].getName) { + createFunction("java_long_add_default", +new JavaLongAdd(new JavaLongAddDefault(resultNullable))) + createFunction("java_long_add_magic", new JavaLongAdd(new JavaLongAddMagic(resultNullable))) + createFunction("java_long_add_static_magic", +new JavaLongAdd(new JavaLongAddStaticMagic(resultNullable))) + createFunction("scala_long_add_default", +LongAddUnbound(new LongAddWithProduceResult(resultNullable))) + createFunction("scala_long_add_magic", LongAddUnbound(new LongAddWithMagic(resultNullable))) + + val codeGenFactoryMode = if (codegenEnabled) FALLBACK else NO_CODEGEN + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled.toString, + SQLConf.CODEGEN_FACTORY_MODE.key -> codeGenFactoryMode.toString) { +val name = s"scalar function (long + long) -> long, result_nullable = $resultNullable " + +s"codegen = $codegenEnabled" +val benchmark = new Benchmark(name, N, output = output) +benchmark.addCase(s"with native_long_add", numIters = 3) { _ => + spark.range(N).selectExpr("id + id").noop() +} +Seq("java_long_add_default", "java_long_add_magic", "java_long_add_static_magic", +"scala_long_add_default", "scala_long_add_magic").foreach { functionName => + benchmark.addCase(s"with $functionName", numIters = 3) { _ => Review comment: Will remove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] [spark] sunchao commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
sunchao commented on pull request #32407: URL: https://github.com/apache/spark/pull/32407#issuecomment-834075095 > BTW, could you rebase this PR to the master branch, @sunchao ? There was a bug causing TPCDS UT failure in master branch and it's fixed a few hours ago. Thanks @dongjoon-hyun . Will do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala
SparkQA commented on pull request #32461: URL: https://github.com/apache/spark/pull/32461#issuecomment-834074852 **[Test build #138221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138221/testReport)** for PR 32461 at commit [`6fc4523`](https://github.com/apache/spark/commit/6fc4523a4505ae4e8b5f8036d00f042988c2bb5c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
sunchao commented on pull request #32446: URL: https://github.com/apache/spark/pull/32446#issuecomment-834074281 Yes tests passed on my own fork. The issue was that the method `getWithFastCheck` was only introduced since Hive version 2.1.0 so it caused some weird class loader issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
sunchao commented on a change in pull request #32446: URL: https://github.com/apache/spark/pull/32446#discussion_r627930998 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ## @@ -303,7 +303,7 @@ private[hive] class HiveClientImpl( // with the side-effect of Hive.get(conf) to avoid using out-of-date HiveConf. // See discussion in https://github.com/apache/spark/pull/16826/files#r104606859 // for more details. -Hive.get(conf) +shim.getHive(conf) Review comment: yeah it shouldn't - both function are doing the same in this case by updating the `Hive` object's config with provided `conf`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
dongjoon-hyun commented on a change in pull request #32446: URL: https://github.com/apache/spark/pull/32446#discussion_r627930318 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ## @@ -303,7 +303,7 @@ private[hive] class HiveClientImpl( // with the side-effect of Hive.get(conf) to avoid using out-of-date HiveConf. // See discussion in https://github.com/apache/spark/pull/16826/files#r104606859 // for more details. -Hive.get(conf) +shim.getHive(conf) Review comment: Here, line 303 claims that we need `the side-effect of Hive.get(conf)`. Could you confirm that `shim.getHive(conf)` doesn't break the side-effect assumption? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand
imback82 commented on a change in pull request #32447: URL: https://github.com/apache/spark/pull/32447#discussion_r627930278 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command { val isAnalyzed: Boolean def childrenToAnalyze: Seq[LogicalPlan] override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else childrenToAnalyze + override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) childrenToAnalyze else Nil Review comment: And we can improve the EXPLAIN for physical plans as well as a future PR if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
dongjoon-hyun commented on pull request #32446: URL: https://github.com/apache/spark/pull/32446#issuecomment-834072391 Is this ready for review back? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
SparkQA commented on pull request #32446: URL: https://github.com/apache/spark/pull/32446#issuecomment-834071285 **[Test build #138236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138236/testReport)** for PR 32446 at commit [`88697a4`](https://github.com/apache/spark/commit/88697a43ba63963a1951f8d99a697fab4ca5692f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand
imback82 commented on a change in pull request #32447: URL: https://github.com/apache/spark/pull/32447#discussion_r627928708 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command { val isAnalyzed: Boolean def childrenToAnalyze: Seq[LogicalPlan] override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else childrenToAnalyze + override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) childrenToAnalyze else Nil Review comment: There is a change, but I think it's for better: Before: ``` == Parsed Logical Plan == 'CacheTableAsSelect tempTable, SELECT key FROM testData, false, false +- 'Project ['key] +- 'UnresolvedRelation [testData], [], false == Analyzed Logical Plan == CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, false, true == Optimized Logical Plan == CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, false, true == Physical Plan == CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, false ``` New: ``` == Parsed Logical Plan == 'CacheTableAsSelect tempTable, SELECT key FROM testData, false, false +- 'Project ['key] +- 'UnresolvedRelation [testData], [], false == Analyzed Logical Plan == CacheTableAsSelect tempTable, SELECT key FROM testData, false, true +- Project [key#13] +- SubqueryAlias testdata +- View (`testData`, [key#13,value#14]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14] +- ExternalRDD [obj#12] == Optimized Logical Plan == CacheTableAsSelect tempTable, SELECT key FROM testData, false, true +- Project [key#13] +- SubqueryAlias testdata +- View (`testData`, [key#13,value#14]) +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14] +- ExternalRDD [obj#12] == Physical Plan == CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, false ``` WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
cloud-fan commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627928025 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java ## @@ -29,33 +29,62 @@ * * The JVM type of result values produced by this function must be the type used by Spark's * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * The mapping between {@link DataType} and the corresponding JVM type is defined below. * * IMPORTANT: the default implementation of {@link #produceResult} throws - * {@link UnsupportedOperationException}. Users can choose to override this method, or implement - * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes individual parameters - * instead of a {@link InternalRow}. The magic method will be loaded by Spark through Java - * reflection and will also provide better performance in general, due to optimizations such as - * codegen, removal of Java boxing, etc. - * + * {@link UnsupportedOperationException}. Users must choose to either override this method, or + * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes individual parameters + * instead of a {@link InternalRow}. The magic method approach is generally recommended because it + * provides better performance over the default {@link #produceResult}, due to optimizations such + * as whole-stage codegen, elimination of Java boxing, etc. + * + * In addition, for stateless Java functions, users can optionally define the + * {@link #MAGIC_METHOD_NAME} as a static method, which further avoids certain runtime costs such + * as nullness check on the method receiver, potential Java dynamic dispatch, etc. Review comment: hmm, how static method helps with the null check? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
cloud-fan commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627927694 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.functions + +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault, JavaLongAddMagic, JavaLongAddStaticMagic} + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._ +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog} +import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, ScalarFunction, UnboundFunction} +import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{DataType, LongType, StructType} + +/** + * Benchmark to measure DataSourceV2 UDF performance + * {{{ + * To run this benchmark: + * 1. without sbt: + * bin/spark-submit --class + *--jars , + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/FunctionBenchmark-results.txt". + * }}} + * '''NOTE''': to update the result of this benchmark, please use Github benchmark action: + * https://spark.apache.org/developer-tools.html#github-workflow-benchmarks + */ +object FunctionBenchmark extends SqlBasedBenchmark { + val catalogName: String = "benchmark_catalog" + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +val N = 500L * 1000 * 1000 +Seq(true, false).foreach { codegenEnabled => + Seq(true, false).foreach { resultNullable => +scalarFunctionBenchmark(N, codegenEnabled = codegenEnabled, + resultNullable = resultNullable) + } +} + } + + private def scalarFunctionBenchmark( + N: Long, + codegenEnabled: Boolean, + resultNullable: Boolean): Unit = { +withSQLConf(s"spark.sql.catalog.$catalogName" -> classOf[InMemoryCatalog].getName) { + createFunction("java_long_add_default", +new JavaLongAdd(new JavaLongAddDefault(resultNullable))) + createFunction("java_long_add_magic", new JavaLongAdd(new JavaLongAddMagic(resultNullable))) + createFunction("java_long_add_static_magic", +new JavaLongAdd(new JavaLongAddStaticMagic(resultNullable))) + createFunction("scala_long_add_default", +LongAddUnbound(new LongAddWithProduceResult(resultNullable))) + createFunction("scala_long_add_magic", LongAddUnbound(new LongAddWithMagic(resultNullable))) + + val codeGenFactoryMode = if (codegenEnabled) FALLBACK else NO_CODEGEN + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled.toString, + SQLConf.CODEGEN_FACTORY_MODE.key -> codeGenFactoryMode.toString) { +val name = s"scalar function (long + long) -> long, result_nullable = $resultNullable " + +s"codegen = $codegenEnabled" +val benchmark = new Benchmark(name, N, output = output) +benchmark.addCase(s"with native_long_add", numIters = 3) { _ => + spark.range(N).selectExpr("id + id").noop() +} +Seq("java_long_add_default", "java_long_add_magic", "java_long_add_static_magic", +"scala_long_add_default", "scala_long_add_magic").foreach { functionName => + benchmark.addCase(s"with $functionName", numIters = 3) { _ => Review comment: nit: the `with` seems useless in the case name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
cloud-fan commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627927440 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.functions + +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd +import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault, JavaLongAddMagic, JavaLongAddStaticMagic} + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._ +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog} +import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, ScalarFunction, UnboundFunction} +import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{DataType, LongType, StructType} + +/** + * Benchmark to measure DataSourceV2 UDF performance + * {{{ + * To run this benchmark: + * 1. without sbt: + * bin/spark-submit --class + *--jars , + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/FunctionBenchmark-results.txt". + * }}} + * '''NOTE''': to update the result of this benchmark, please use Github benchmark action: + * https://spark.apache.org/developer-tools.html#github-workflow-benchmarks + */ +object FunctionBenchmark extends SqlBasedBenchmark { Review comment: how about `V2FunctionBenchmark`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834068621 **[Test build #138235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138235/testReport)** for PR 32464 at commit [`ce9d446`](https://github.com/apache/spark/commit/ce9d4469ac2d05b5c02cfe2940220ef14088bb37). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
AmplabJenkins removed a comment on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834068232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to
AmplabJenkins removed a comment on pull request #32459: URL: https://github.com/apache/spark/pull/32459#issuecomment-834068238 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42755/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
AmplabJenkins removed a comment on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-834068235 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to
AmplabJenkins commented on pull request #32459: URL: https://github.com/apache/spark/pull/32459#issuecomment-834068238 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42755/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
AmplabJenkins commented on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-834068235 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
AmplabJenkins commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834068232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
SparkQA commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834067906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
cloud-fan commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r627926417 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,44 @@ +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +scalar function (long + long) -> long, result_nullable = true codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +with native_long_add 13448 13658 337 37.2 26.9 1.0X +with java_long_add_default 110416 1114151142 4.5 220.8 0.1X +with java_long_add_magic 17072 17128 50 29.3 34.1 0.8X +with java_long_add_static_magic 15912 16121 189 31.4 31.8 0.8X +with scala_long_add_default 114506 114714 342 4.4 229.0 0.1X +with scala_long_add_magic 16589 16858 457 30.1 33.2 0.8X + +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +scalar function (long + long) -> long, result_nullable = false codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative +- +with native_long_add 14448 14633 274 34.6 28.9 1.0X +with java_long_add_default 68122 68223 129 7.3 136.2 0.2X +with java_long_add_magic 16724 16792 93 29.9 33.4 0.9X +with java_long_add_static_magic 14704 14761 95 34.0 29.4 1.0X Review comment: wow this is on par with the native one! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
cloud-fan commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834065842 is this a regression? It seems to me that this should fail, as it's similar to `sql("select a b from values(1) t(a)").repartitionBy("a")` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
SparkQA commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834065404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to
SparkQA commented on pull request #32459: URL: https://github.com/apache/spark/pull/32459#issuecomment-834063671 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42755/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk edited a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
kiszk edited a comment on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-834059937 This code seems to generate # of methods that is equal to # of columns. Am I correct? Does this splitting cause no performance degradation? If there is a possibility, it would be good to introduce a threshold. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
kiszk commented on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-834059937 This code seems to generate # of methods that is equals to # of columns. Am I correct? Does this splitting cause no performance degradation? If there is a possibility, it would be good to introduce a threshold. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #32463: [SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
sigmod commented on pull request #32463: URL: https://github.com/apache/spark/pull/32463#issuecomment-834055715 @hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
c21 commented on pull request #32430: URL: https://github.com/apache/spark/pull/32430#issuecomment-834054871 Thank you all for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
SparkQA removed a comment on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-833962887 **[Test build #138220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138220/testReport)** for PR 32457 at commit [`cb182b8`](https://github.com/apache/spark/commit/cb182b888439d3efe1e46aa0aa44fb1ede96ff8f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
SparkQA commented on pull request #32457: URL: https://github.com/apache/spark/pull/32457#issuecomment-834053230 **[Test build #138220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138220/testReport)** for PR 32457 at commit [`cb182b8`](https://github.com/apache/spark/commit/cb182b888439d3efe1e46aa0aa44fb1ede96ff8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
AmplabJenkins removed a comment on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834051212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42754/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
AmplabJenkins commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834051212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42754/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834051208 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42754/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
yaooqinn commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834050531 cc @cloud-fan @maropu @HyukjinKwon thanks for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
AmplabJenkins removed a comment on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834048734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138232/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA removed a comment on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834046598 **[Test build #138232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)** for PR 32464 at commit [`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
SparkQA commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834048943 **[Test build #138234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138234/testReport)** for PR 32465 at commit [`03ed3a5`](https://github.com/apache/spark/commit/03ed3a5a665adecd7a49d22242506ed1df96aa0f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
AmplabJenkins commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834048734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138232/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834048710 **[Test build #138232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)** for PR 32464 at commit [`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
AmplabJenkins removed a comment on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834048271 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42752/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
AmplabJenkins commented on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834048271 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42752/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
SparkQA commented on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834048248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on pull request #31756: [SPARK-34637] [SQL] Support DPP + AQE when the broadcast exchange can be reused
JkSelf commented on pull request #31756: URL: https://github.com/apache/spark/pull/31756#issuecomment-834046989 @tgravescs This PR is mainly to solve the limitations of [PR#31258](https://github.com/apache/spark/pull/31258). When DPP + AQE is supported in [PR#31258](https://github.com/apache/spark/pull/31258), only the broadcast exchange on the build side can be executed first. Then the probe side can reuse the exchange of the build side in the DPP subquery, otherwise DPP will not be supported in AQE. This approach mainly contain two steps. 1. In `PlanAdaptiveDynamicPruningFilters` rule, judge whether the broadcast exchange can be reused, if so, it will insert the DPP subquery filter on the probe side. 2. Create a `AdaptiveSparkPlanExec` with the broadcast exchange and then we can reuse the existing reuse logic to reuse the broadcast exchange in `AdaptiveSparkPlanExec` plan。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to
SparkQA commented on pull request #32459: URL: https://github.com/apache/spark/pull/32459#issuecomment-834046652 **[Test build #138233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138233/testReport)** for PR 32459 at commit [`8e9f6cb`](https://github.com/apache/spark/commit/8e9f6cb8d5b19792fc408c7b9fe9bcc77a4a56d7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
SparkQA commented on pull request #32465: URL: https://github.com/apache/spark/pull/32465#issuecomment-834046574 **[Test build #138231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138231/testReport)** for PR 32465 at commit [`0c711e3`](https://github.com/apache/spark/commit/0c711e3a081dc644c3a2d3c47207046eb4457ee1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
SparkQA commented on pull request #32464: URL: https://github.com/apache/spark/pull/32464#issuecomment-834046598 **[Test build #138232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)** for PR 32464 at commit [`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins removed a comment on pull request #32442: URL: https://github.com/apache/spark/pull/32442#issuecomment-834046371 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42751/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
AmplabJenkins commented on pull request #32442: URL: https://github.com/apache/spark/pull/32442#issuecomment-834046371 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42751/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand
cloud-fan commented on a change in pull request #32447: URL: https://github.com/apache/spark/pull/32447#discussion_r627908066 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command { val isAnalyzed: Boolean def childrenToAnalyze: Seq[LogicalPlan] override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else childrenToAnalyze + override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) childrenToAnalyze else Nil Review comment: Does it have real impact in EXPLAIN? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32361: [SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manipulation
HeartSaVioR commented on pull request #32361: URL: https://github.com/apache/spark/pull/32361#issuecomment-834045784 > We can further refine the CheckpointFileManager interface, as it knows the checkpoint location and all its APIs can simply accept relative paths. Sounds like a nice improvement; once checkpoint file manager is initialized with checkpoint root dir, callers shouldn't bother with figuring out the full path of destination. Every target should be inside of checkpoint root dir, except temp files checkpoint file manager creates "internally". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32361: [SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manipulation
viirya commented on pull request #32361: URL: https://github.com/apache/spark/pull/32361#issuecomment-834045474 Thanks @HeartSaVioR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
viirya commented on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834045359 Thanks @cloud-fan @dongjoon-hyun. I will merge once CI passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
dongjoon-hyun commented on pull request #32407: URL: https://github.com/apache/spark/pull/32407#issuecomment-834045380 BTW, could you rebase this PR to the master branch, @sunchao ? There was a bug causing TPCDS UT failure in master branch and it's fixed a few hours ago. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased
yaooqinn opened a new pull request #32465: URL: https://github.com/apache/spark/pull/32465 ### What changes were proposed in this pull request? This PR makes the below case work well. ```sql select a b from values(1) t(a) distribute by a; ``` ```logtalk == Parsed Logical Plan == 'RepartitionByExpression ['a] +- 'Project ['a AS b#42] +- 'SubqueryAlias t +- 'UnresolvedInlineTable [a], [List(1)] == Analyzed Logical Plan == org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input columns: [b]; line 1 pos 62; 'RepartitionByExpression ['a] +- Project [a#48 AS b#42] +- SubqueryAlias t +- LocalRelation [a#48] ``` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? yes, the original attributes can be used in `distribute by` / `cluster by` and hints like `/*+ REPARTITION(3, c) */` ### How was this patch tested? new tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries
dongjoon-hyun commented on pull request #32462: URL: https://github.com/apache/spark/pull/32462#issuecomment-834042835 We are not going to bring SPARK-35327, right? If you want SPARK-35327 too, let's hold on this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32442: URL: https://github.com/apache/spark/pull/32442#issuecomment-834042788 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42751/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming
beliefer opened a new pull request #32464: URL: https://github.com/apache/spark/pull/32464 ### What changes were proposed in this pull request? This PR group exception messages in `sql/core/src/main/scala/org/apache/spark/sql/streaming`. ### Why are the changes needed? It will largely help with standardization of error messages and its maintenance. ### Does this PR introduce _any_ user-facing change? No. Error messages remain unchanged. ### How was this patch tested? No new tests - pass all original tests to make sure it doesn't break any existing behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
dongjoon-hyun closed pull request #32430: URL: https://github.com/apache/spark/pull/32430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
dongjoon-hyun commented on pull request #32430: URL: https://github.com/apache/spark/pull/32430#issuecomment-834041404 It seems that there is some delay at GitHub Action. I checked that it's already passed. https://user-images.githubusercontent.com/9700541/117394778-9eb16a80-aeab-11eb-8e75-e5aee9c93ba7.png;> Thank you, @c21 and all. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
dongjoon-hyun commented on a change in pull request #32430: URL: https://github.com/apache/spark/pull/32430#discussion_r627903213 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala ## @@ -24,14 +24,13 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext import org.apache.spark.sql.execution.{CodegenSupport, LeafExecNode, WholeStageCodegenExec} -import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.adaptive.{DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.test.SQLTestData.TestData import org.apache.spark.sql.types.StructType -// Disable AQE because the WholeStageCodegenExec is added when running QueryStageExec Review comment: Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines
dongjoon-hyun commented on pull request #32435: URL: https://github.com/apache/spark/pull/32435#issuecomment-834040182 Thank you, @byungsoo-oh and @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala
sigmod commented on pull request #32461: URL: https://github.com/apache/spark/pull/32461#issuecomment-834036401 @hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
viirya commented on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834034759 Added the shared method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match
SparkQA commented on pull request #32413: URL: https://github.com/apache/spark/pull/32413#issuecomment-834032204 **[Test build #138230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138230/testReport)** for PR 32413 at commit [`0ec8117`](https://github.com/apache/spark/commit/0ec8117aaae0708b19e817c61c780eff6af37cce). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
SparkQA commented on pull request #32442: URL: https://github.com/apache/spark/pull/32442#issuecomment-834026504 **[Test build #138229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138229/testReport)** for PR 32442 at commit [`4f8b782`](https://github.com/apache/spark/commit/4f8b7828a3448120e0d1fd2daeb9e8d3ab1a67eb). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins removed a comment on pull request #32462: URL: https://github.com/apache/spark/pull/32462#issuecomment-834025819 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42747/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results
AmplabJenkins removed a comment on pull request #32454: URL: https://github.com/apache/spark/pull/32454#issuecomment-834025816 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42748/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
AmplabJenkins removed a comment on pull request #32463: URL: https://github.com/apache/spark/pull/32463#issuecomment-834025820 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)
AmplabJenkins removed a comment on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-834025817 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
AmplabJenkins removed a comment on pull request #32407: URL: https://github.com/apache/spark/pull/32407#issuecomment-834025821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42750/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
AmplabJenkins removed a comment on pull request #32430: URL: https://github.com/apache/spark/pull/32430#issuecomment-834025815 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42749/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins commented on pull request #32462: URL: https://github.com/apache/spark/pull/32462#issuecomment-834025819 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42747/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results
AmplabJenkins commented on pull request #32454: URL: https://github.com/apache/spark/pull/32454#issuecomment-834025816 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42748/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
AmplabJenkins commented on pull request #32407: URL: https://github.com/apache/spark/pull/32407#issuecomment-834025821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42750/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
AmplabJenkins commented on pull request #32463: URL: https://github.com/apache/spark/pull/32463#issuecomment-834025820 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)
AmplabJenkins commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-834025817 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
AmplabJenkins commented on pull request #32430: URL: https://github.com/apache/spark/pull/32430#issuecomment-834025815 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42749/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results
SparkQA commented on pull request #32454: URL: https://github.com/apache/spark/pull/32454#issuecomment-834020513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction
SparkQA commented on pull request #32407: URL: https://github.com/apache/spark/pull/32407#issuecomment-834020284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE
SparkQA commented on pull request #32430: URL: https://github.com/apache/spark/pull/32430#issuecomment-834020134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
SparkQA commented on pull request #32463: URL: https://github.com/apache/spark/pull/32463#issuecomment-834019899 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries
SparkQA commented on pull request #32462: URL: https://github.com/apache/spark/pull/32462#issuecomment-834019686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
SparkQA commented on pull request #32463: URL: https://github.com/apache/spark/pull/32463#issuecomment-834018398 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42746/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #27432: [SPARK-28325][SQL]Support ANSI SQL: SIMILAR TO ... ESCAPE syntax
beliefer commented on pull request #27432: URL: https://github.com/apache/spark/pull/27432#issuecomment-834013656 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)
SparkQA removed a comment on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-833957330 **[Test build #138219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138219/testReport)** for PR 32455 at commit [`8a13cfb`](https://github.com/apache/spark/commit/8a13cfbcd57b7e93e0009c6b93d784184a880761). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)
SparkQA commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-834010898 **[Test build #138219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138219/testReport)** for PR 32455 at commit [`8a13cfb`](https://github.com/apache/spark/commit/8a13cfbcd57b7e93e0009c6b93d784184a880761). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #32377: [SPARK-35021][SQL] Group exception messages in connector/catalog
beliefer commented on pull request #32377: URL: https://github.com/apache/spark/pull/32377#issuecomment-834008831 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines
HyukjinKwon closed pull request #32435: URL: https://github.com/apache/spark/pull/32435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines
HyukjinKwon commented on pull request #32435: URL: https://github.com/apache/spark/pull/32435#issuecomment-834005292 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32431: [SPARK-35173][SQL][PYTHON] Add multiple columns adding support
viirya commented on a change in pull request #32431: URL: https://github.com/apache/spark/pull/32431#discussion_r627875037 ## File path: python/pyspark/sql/dataframe.py ## @@ -2423,6 +2423,38 @@ def freqItems(self, cols, support=None): support = 0.01 return DataFrame(self._jdf.stat().freqItems(_to_seq(self._sc, cols), support), self.sql_ctx) +def withColumns(self, colsMap): +""" +Returns a new :class:`DataFrame` by adding multiple columns or replacing the +existing columns that has the same name. + +The colsMap is a map of column name and column, the column must only refer to attribute +supplied by this Dataset. It is an error to add columns that refers to some other Dataset. Review comment: refers -> refer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org