[GitHub] [spark] SparkQA commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


SparkQA commented on pull request #32446:
URL: https://github.com/apache/spark/pull/32446#issuecomment-834089554


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42758/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834085008






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


HyukjinKwon commented on pull request #32446:
URL: https://github.com/apache/spark/pull/32446#issuecomment-834083871


   Looks pretty good to me too!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #32451: [SPARK-35144][SQL] Migrate to transformWithPruning for object rules

2021-05-06 Thread GitBox


gengliangwang commented on a change in pull request #32451:
URL: https://github.com/apache/spark/pull/32451#discussion_r627939566



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
##
@@ -207,7 +211,8 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] {
 }
   }
 
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+_.containsAllPatterns(OBJECT_CONSUMER, PROJECT), ruleId) {

Review comment:
   why do we need `OBJECT_CONSUMER` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


yaooqinn commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834079593


   > is this a regression?
   
   this is not a regression, but hive supports it.
   
   > It seems to me that this should fail, as it's similar to `sql("select a b 
from values(1) t(a)").repartitionBy("a")`
   
   For resolving attributes,  `distribute by` and `cluster by`  clauses should 
behave the same as  `sort by` and `group by`?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] advancedxy commented on a change in pull request #32450: [SPARK-35282][SQL] Support AQE side shuffled hash join formula

2021-05-06 Thread GitBox


advancedxy commented on a change in pull request #32450:
URL: https://github.com/apache/spark/pull/32450#discussion_r627935120



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
##
@@ -49,12 +50,15 @@ object Statistics {
  * @param attributeStats Statistics for Attributes.
  * @param isRuntime Whether the statistics is inferred from query stage 
runtime statistics during
  *  adaptive query execution.
+ * @param mapOutputStatistics the map output statistics from query stage 
runtime statistics during
+ *adaptive query execution.
  */
 case class Statistics(
 sizeInBytes: BigInt,
 rowCount: Option[BigInt] = None,
 attributeStats: AttributeMap[ColumnStat] = AttributeMap(Nil),
-isRuntime: Boolean = false) {
+isRuntime: Boolean = false,
+mapOutputStatistics: Option[MapOutputStatistics] = None) {

Review comment:
   FYI, we took anther approach to support SHJ in AQE. We added a rule in 
`AdaptiveSparkPlanExec` to convert SMJ to SHJ according to shuffle stats, which 
requires no changes in `Statistics.scala` as the statistics is ready in 
`ShuffleStageInfo`.
   
   The SMJ could also be converted to SHJ if applicable even if 
`PREFER_SORTMERGE` is set. cc @Liulietong
   
   cc @luuliietong




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627933663



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala
##
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.functions
+
+import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd
+import 
test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault,
 JavaLongAddMagic, JavaLongAddStaticMagic}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._
+import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog}
+import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, 
ScalarFunction, UnboundFunction}
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, LongType, StructType}
+
+/**
+ * Benchmark to measure DataSourceV2 UDF performance
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/FunctionBenchmark-results.txt".
+ * }}}
+ * '''NOTE''': to update the result of this benchmark, please use Github 
benchmark action:
+ *   https://spark.apache.org/developer-tools.html#github-workflow-benchmarks
+ */
+object FunctionBenchmark extends SqlBasedBenchmark {

Review comment:
   Sounds good. I'll change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627933496



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##
@@ -29,33 +29,62 @@
  * 
  * The JVM type of result values produced by this function must be the type 
used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is 
defined below.
  * 
  * IMPORTANT: the default implementation of {@link #produceResult} 
throws
- * {@link UnsupportedOperationException}. Users can choose to override this 
method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes 
individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark 
through Java
- * reflection and will also provide better performance in general, due to 
optimizations such as
- * codegen, removal of Java boxing, etc.
- *
+ * {@link UnsupportedOperationException}. Users must choose to either override 
this method, or
+ * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes 
individual parameters
+ * instead of a {@link InternalRow}. The magic method approach is generally 
recommended because it
+ * provides better performance over the default {@link #produceResult}, due to 
optimizations such
+ * as whole-stage codegen, elimination of Java boxing, etc.
+ * 
+ * In addition, for stateless Java functions, users can optionally define the
+ * {@link #MAGIC_METHOD_NAME} as a static method, which further avoids certain 
runtime costs such
+ * as nullness check on the method receiver, potential Java dynamic dispatch, 
etc.

Review comment:
   For non-static method `Invoke` needs to check if the method receiver is 
null or not, and only invoke it if it is not null, but for static method this 
is not necessary.
   
   ```scala
   val code = obj.code + code"""
 boolean ${ev.isNull} = true;
 $javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
 if (!${obj.isNull}) { < check if receiver is null
   $argCode
   ${ev.isNull} = $resultIsNull;
   if (!${ev.isNull}) {
 $evaluate
   }
 }
"""
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala

2021-05-06 Thread GitBox


SparkQA removed a comment on pull request #32461:
URL: https://github.com/apache/spark/pull/32461#issuecomment-833982664


   **[Test build #138221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138221/testReport)**
 for PR 32461 at commit 
[`6fc4523`](https://github.com/apache/spark/commit/6fc4523a4505ae4e8b5f8036d00f042988c2bb5c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627932159



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.functions
+
+import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd
+import 
test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault,
 JavaLongAddMagic, JavaLongAddStaticMagic}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._
+import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog}
+import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, 
ScalarFunction, UnboundFunction}
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, LongType, StructType}
+
+/**
+ * Benchmark to measure DataSourceV2 UDF performance
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/FunctionBenchmark-results.txt".
+ * }}}
+ * '''NOTE''': to update the result of this benchmark, please use Github 
benchmark action:
+ *   https://spark.apache.org/developer-tools.html#github-workflow-benchmarks
+ */
+object FunctionBenchmark extends SqlBasedBenchmark {
+  val catalogName: String = "benchmark_catalog"
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+val N = 500L * 1000 * 1000
+Seq(true, false).foreach { codegenEnabled =>
+  Seq(true, false).foreach { resultNullable =>
+scalarFunctionBenchmark(N, codegenEnabled = codegenEnabled,
+  resultNullable = resultNullable)
+  }
+}
+  }
+
+  private def scalarFunctionBenchmark(
+  N: Long,
+  codegenEnabled: Boolean,
+  resultNullable: Boolean): Unit = {
+withSQLConf(s"spark.sql.catalog.$catalogName" -> 
classOf[InMemoryCatalog].getName) {
+  createFunction("java_long_add_default",
+new JavaLongAdd(new JavaLongAddDefault(resultNullable)))
+  createFunction("java_long_add_magic", new JavaLongAdd(new 
JavaLongAddMagic(resultNullable)))
+  createFunction("java_long_add_static_magic",
+new JavaLongAdd(new JavaLongAddStaticMagic(resultNullable)))
+  createFunction("scala_long_add_default",
+LongAddUnbound(new LongAddWithProduceResult(resultNullable)))
+  createFunction("scala_long_add_magic", LongAddUnbound(new 
LongAddWithMagic(resultNullable)))
+
+  val codeGenFactoryMode = if (codegenEnabled) FALLBACK else NO_CODEGEN
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
codegenEnabled.toString,
+  SQLConf.CODEGEN_FACTORY_MODE.key -> codeGenFactoryMode.toString) {
+val name = s"scalar function (long + long) -> long, result_nullable = 
$resultNullable " +
+s"codegen = $codegenEnabled"
+val benchmark = new Benchmark(name, N, output = output)
+benchmark.addCase(s"with native_long_add", numIters = 3) { _ =>
+  spark.range(N).selectExpr("id + id").noop()
+}
+Seq("java_long_add_default", "java_long_add_magic", 
"java_long_add_static_magic",
+"scala_long_add_default", "scala_long_add_magic").foreach { 
functionName =>
+  benchmark.addCase(s"with $functionName", numIters = 3) { _ =>

Review comment:
   Will remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] [spark] sunchao commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


sunchao commented on pull request #32407:
URL: https://github.com/apache/spark/pull/32407#issuecomment-834075095


   > BTW, could you rebase this PR to the master branch, @sunchao ? There was a 
bug causing TPCDS UT failure in master branch and it's fixed a few hours ago.
   
   Thanks @dongjoon-hyun . Will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala

2021-05-06 Thread GitBox


SparkQA commented on pull request #32461:
URL: https://github.com/apache/spark/pull/32461#issuecomment-834074852


   **[Test build #138221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138221/testReport)**
 for PR 32461 at commit 
[`6fc4523`](https://github.com/apache/spark/commit/6fc4523a4505ae4e8b5f8036d00f042988c2bb5c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


sunchao commented on pull request #32446:
URL: https://github.com/apache/spark/pull/32446#issuecomment-834074281


   Yes tests passed on my own fork. The issue was that the method 
`getWithFastCheck` was only introduced since Hive version 2.1.0 so it caused 
some weird class loader issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


sunchao commented on a change in pull request #32446:
URL: https://github.com/apache/spark/pull/32446#discussion_r627930998



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
##
@@ -303,7 +303,7 @@ private[hive] class HiveClientImpl(
 // with the side-effect of Hive.get(conf) to avoid using out-of-date 
HiveConf.
 // See discussion in 
https://github.com/apache/spark/pull/16826/files#r104606859
 // for more details.
-Hive.get(conf)
+shim.getHive(conf)

Review comment:
   yeah it shouldn't - both function are doing the same in this case by 
updating the `Hive` object's config with provided `conf`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


dongjoon-hyun commented on a change in pull request #32446:
URL: https://github.com/apache/spark/pull/32446#discussion_r627930318



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
##
@@ -303,7 +303,7 @@ private[hive] class HiveClientImpl(
 // with the side-effect of Hive.get(conf) to avoid using out-of-date 
HiveConf.
 // See discussion in 
https://github.com/apache/spark/pull/16826/files#r104606859
 // for more details.
-Hive.get(conf)
+shim.getHive(conf)

Review comment:
   Here, line 303 claims that we need `the side-effect of Hive.get(conf)`.
   Could you confirm that `shim.getHive(conf)` doesn't break the side-effect 
assumption?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand

2021-05-06 Thread GitBox


imback82 commented on a change in pull request #32447:
URL: https://github.com/apache/spark/pull/32447#discussion_r627930278



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command {
   val isAnalyzed: Boolean
   def childrenToAnalyze: Seq[LogicalPlan]
   override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else 
childrenToAnalyze
+  override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) 
childrenToAnalyze else Nil

Review comment:
   And we can improve the EXPLAIN for physical plans as well as a future PR 
if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


dongjoon-hyun commented on pull request #32446:
URL: https://github.com/apache/spark/pull/32446#issuecomment-834072391


   Is this ready for review back?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32446: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-05-06 Thread GitBox


SparkQA commented on pull request #32446:
URL: https://github.com/apache/spark/pull/32446#issuecomment-834071285


   **[Test build #138236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138236/testReport)**
 for PR 32446 at commit 
[`88697a4`](https://github.com/apache/spark/commit/88697a43ba63963a1951f8d99a697fab4ca5692f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand

2021-05-06 Thread GitBox


imback82 commented on a change in pull request #32447:
URL: https://github.com/apache/spark/pull/32447#discussion_r627928708



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command {
   val isAnalyzed: Boolean
   def childrenToAnalyze: Seq[LogicalPlan]
   override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else 
childrenToAnalyze
+  override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) 
childrenToAnalyze else Nil

Review comment:
   There is a change, but I think it's for better:
   Before:
   ```
   == Parsed Logical Plan ==
   'CacheTableAsSelect tempTable, SELECT key FROM testData, false, false
   +- 'Project ['key]
  +- 'UnresolvedRelation [testData], [], false
   
   == Analyzed Logical Plan ==
   CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, 
false, true
   
   == Optimized Logical Plan ==
   CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, 
false, true
   
   == Physical Plan ==
   CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, 
false
   ```
   
   New:
   ```
   == Parsed Logical Plan ==
   'CacheTableAsSelect tempTable, SELECT key FROM testData, false, false
   +- 'Project ['key]
  +- 'UnresolvedRelation [testData], [], false
   
   == Analyzed Logical Plan ==
   CacheTableAsSelect tempTable, SELECT key FROM testData, false, true
  +- Project [key#13]
 +- SubqueryAlias testdata
+- View (`testData`, [key#13,value#14])
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS 
value#14]
  +- ExternalRDD [obj#12]
   
   == Optimized Logical Plan ==
   CacheTableAsSelect tempTable, SELECT key FROM testData, false, true
  +- Project [key#13]
 +- SubqueryAlias testdata
+- View (`testData`, [key#13,value#14])
   +- SerializeFromObject [knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS 
value#14]
  +- ExternalRDD [obj#12]
   
   == Physical Plan ==
   CacheTableAsSelect tempTable, Project [key#13], SELECT key FROM testData, 
false
   ```
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


cloud-fan commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627928025



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##
@@ -29,33 +29,62 @@
  * 
  * The JVM type of result values produced by this function must be the type 
used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is 
defined below.
  * 
  * IMPORTANT: the default implementation of {@link #produceResult} 
throws
- * {@link UnsupportedOperationException}. Users can choose to override this 
method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes 
individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark 
through Java
- * reflection and will also provide better performance in general, due to 
optimizations such as
- * codegen, removal of Java boxing, etc.
- *
+ * {@link UnsupportedOperationException}. Users must choose to either override 
this method, or
+ * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes 
individual parameters
+ * instead of a {@link InternalRow}. The magic method approach is generally 
recommended because it
+ * provides better performance over the default {@link #produceResult}, due to 
optimizations such
+ * as whole-stage codegen, elimination of Java boxing, etc.
+ * 
+ * In addition, for stateless Java functions, users can optionally define the
+ * {@link #MAGIC_METHOD_NAME} as a static method, which further avoids certain 
runtime costs such
+ * as nullness check on the method receiver, potential Java dynamic dispatch, 
etc.

Review comment:
   hmm, how static method helps with the null check?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


cloud-fan commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627927694



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.functions
+
+import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd
+import 
test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault,
 JavaLongAddMagic, JavaLongAddStaticMagic}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._
+import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog}
+import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, 
ScalarFunction, UnboundFunction}
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, LongType, StructType}
+
+/**
+ * Benchmark to measure DataSourceV2 UDF performance
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/FunctionBenchmark-results.txt".
+ * }}}
+ * '''NOTE''': to update the result of this benchmark, please use Github 
benchmark action:
+ *   https://spark.apache.org/developer-tools.html#github-workflow-benchmarks
+ */
+object FunctionBenchmark extends SqlBasedBenchmark {
+  val catalogName: String = "benchmark_catalog"
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+val N = 500L * 1000 * 1000
+Seq(true, false).foreach { codegenEnabled =>
+  Seq(true, false).foreach { resultNullable =>
+scalarFunctionBenchmark(N, codegenEnabled = codegenEnabled,
+  resultNullable = resultNullable)
+  }
+}
+  }
+
+  private def scalarFunctionBenchmark(
+  N: Long,
+  codegenEnabled: Boolean,
+  resultNullable: Boolean): Unit = {
+withSQLConf(s"spark.sql.catalog.$catalogName" -> 
classOf[InMemoryCatalog].getName) {
+  createFunction("java_long_add_default",
+new JavaLongAdd(new JavaLongAddDefault(resultNullable)))
+  createFunction("java_long_add_magic", new JavaLongAdd(new 
JavaLongAddMagic(resultNullable)))
+  createFunction("java_long_add_static_magic",
+new JavaLongAdd(new JavaLongAddStaticMagic(resultNullable)))
+  createFunction("scala_long_add_default",
+LongAddUnbound(new LongAddWithProduceResult(resultNullable)))
+  createFunction("scala_long_add_magic", LongAddUnbound(new 
LongAddWithMagic(resultNullable)))
+
+  val codeGenFactoryMode = if (codegenEnabled) FALLBACK else NO_CODEGEN
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
codegenEnabled.toString,
+  SQLConf.CODEGEN_FACTORY_MODE.key -> codeGenFactoryMode.toString) {
+val name = s"scalar function (long + long) -> long, result_nullable = 
$resultNullable " +
+s"codegen = $codegenEnabled"
+val benchmark = new Benchmark(name, N, output = output)
+benchmark.addCase(s"with native_long_add", numIters = 3) { _ =>
+  spark.range(N).selectExpr("id + id").noop()
+}
+Seq("java_long_add_default", "java_long_add_magic", 
"java_long_add_static_magic",
+"scala_long_add_default", "scala_long_add_magic").foreach { 
functionName =>
+  benchmark.addCase(s"with $functionName", numIters = 3) { _ =>

Review comment:
   nit: the `with` seems useless in the case name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


cloud-fan commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627927440



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/functions/FunctionBenchmark.scala
##
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.functions
+
+import test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd
+import 
test.org.apache.spark.sql.connector.catalog.functions.JavaLongAdd.{JavaLongAddDefault,
 JavaLongAddMagic, JavaLongAddStaticMagic}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode._
+import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryCatalog}
+import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, 
ScalarFunction, UnboundFunction}
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, LongType, StructType}
+
+/**
+ * Benchmark to measure DataSourceV2 UDF performance
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/FunctionBenchmark-results.txt".
+ * }}}
+ * '''NOTE''': to update the result of this benchmark, please use Github 
benchmark action:
+ *   https://spark.apache.org/developer-tools.html#github-workflow-benchmarks
+ */
+object FunctionBenchmark extends SqlBasedBenchmark {

Review comment:
   how about `V2FunctionBenchmark`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834068621


   **[Test build #138235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138235/testReport)**
 for PR 32464 at commit 
[`ce9d446`](https://github.com/apache/spark/commit/ce9d4469ac2d05b5c02cfe2940220ef14088bb37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834068232






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32459:
URL: https://github.com/apache/spark/pull/32459#issuecomment-834068238


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42755/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-834068235


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32459:
URL: https://github.com/apache/spark/pull/32459#issuecomment-834068238


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42755/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-834068235


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834068232






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


SparkQA commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834067906






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


cloud-fan commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627926417



##
File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt
##
@@ -0,0 +1,44 @@
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+scalar function (long + long) -> long, result_nullable = true codegen = true:  
Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+with native_long_add   
   13448  13658 337 37.2  26.9   1.0X
+with java_long_add_default 
  110416 1114151142  4.5 220.8   0.1X
+with java_long_add_magic   
   17072  17128  50 29.3  34.1   0.8X
+with java_long_add_static_magic
   15912  16121 189 31.4  31.8   0.8X
+with scala_long_add_default
  114506 114714 342  4.4 229.0   0.1X
+with scala_long_add_magic  
   16589  16858 457 30.1  33.2   0.8X
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
+Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+scalar function (long + long) -> long, result_nullable = false codegen = true: 
 Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+-
+with native_long_add   
14448  14633 274 34.6  28.9   1.0X
+with java_long_add_default 
68122  68223 129  7.3 136.2   0.2X
+with java_long_add_magic   
16724  16792  93 29.9  33.4   0.9X
+with java_long_add_static_magic
14704  14761  95 34.0  29.4   1.0X

Review comment:
   wow this is on par with the native one!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


cloud-fan commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834065842


   is this a regression? It seems to me that this should fail, as it's similar 
to `sql("select a b from values(1) t(a)").repartitionBy("a")`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


SparkQA commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834065404






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to

2021-05-06 Thread GitBox


SparkQA commented on pull request #32459:
URL: https://github.com/apache/spark/pull/32459#issuecomment-834063671


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42755/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kiszk edited a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


kiszk edited a comment on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-834059937


   This code seems to generate # of methods that is equal to # of columns. Am I 
correct?   
   Does this splitting cause no performance degradation?  If there is a 
possibility, it would be good to introduce a threshold.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kiszk commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


kiszk commented on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-834059937


   This code seems to generate # of methods that is equals to # of columns. Am 
I correct?   
   Does this splitting cause no performance degradation?  If there is a 
possibility, it would be good to introduce a threshold.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on pull request #32463: [SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-05-06 Thread GitBox


sigmod commented on pull request #32463:
URL: https://github.com/apache/spark/pull/32463#issuecomment-834055715


   @hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


c21 commented on pull request #32430:
URL: https://github.com/apache/spark/pull/32430#issuecomment-834054871


   Thank you all for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


SparkQA removed a comment on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-833962887


   **[Test build #138220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138220/testReport)**
 for PR 32457 at commit 
[`cb182b8`](https://github.com/apache/spark/commit/cb182b888439d3efe1e46aa0aa44fb1ede96ff8f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32457: [SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec

2021-05-06 Thread GitBox


SparkQA commented on pull request #32457:
URL: https://github.com/apache/spark/pull/32457#issuecomment-834053230


   **[Test build #138220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138220/testReport)**
 for PR 32457 at commit 
[`cb182b8`](https://github.com/apache/spark/commit/cb182b888439d3efe1e46aa0aa44fb1ede96ff8f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834051212


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42754/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834051212


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42754/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834051208


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42754/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


yaooqinn commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834050531


   cc @cloud-fan @maropu @HyukjinKwon thanks for reviewing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834048734


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138232/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA removed a comment on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834046598


   **[Test build #138232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)**
 for PR 32464 at commit 
[`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


SparkQA commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834048943


   **[Test build #138234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138234/testReport)**
 for PR 32465 at commit 
[`03ed3a5`](https://github.com/apache/spark/commit/03ed3a5a665adecd7a49d22242506ed1df96aa0f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834048734


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138232/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834048710


   **[Test build #138232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)**
 for PR 32464 at commit 
[`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834048271


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42752/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834048271


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42752/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


SparkQA commented on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834048248






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on pull request #31756: [SPARK-34637] [SQL] Support DPP + AQE when the broadcast exchange can be reused

2021-05-06 Thread GitBox


JkSelf commented on pull request #31756:
URL: https://github.com/apache/spark/pull/31756#issuecomment-834046989


   @tgravescs 
   This PR is mainly to solve the limitations of 
[PR#31258](https://github.com/apache/spark/pull/31258). When DPP + AQE is 
supported in [PR#31258](https://github.com/apache/spark/pull/31258), only the 
broadcast exchange on the build side can be executed first. Then the probe side 
can reuse the exchange of the build side in the DPP subquery, otherwise DPP 
will not be supported in AQE.
   
   This approach mainly contain two steps.
   1. In `PlanAdaptiveDynamicPruningFilters` rule, judge whether the broadcast 
exchange can be reused, if so, it will insert the DPP subquery filter on the 
probe side.
   2. Create a `AdaptiveSparkPlanExec` with the broadcast exchange and then we 
can reuse the existing reuse logic to reuse the broadcast exchange in 
`AdaptiveSparkPlanExec` plan。 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32459: [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to

2021-05-06 Thread GitBox


SparkQA commented on pull request #32459:
URL: https://github.com/apache/spark/pull/32459#issuecomment-834046652


   **[Test build #138233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138233/testReport)**
 for PR 32459 at commit 
[`8e9f6cb`](https://github.com/apache/spark/commit/8e9f6cb8d5b19792fc408c7b9fe9bcc77a4a56d7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


SparkQA commented on pull request #32465:
URL: https://github.com/apache/spark/pull/32465#issuecomment-834046574


   **[Test build #138231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138231/testReport)**
 for PR 32465 at commit 
[`0c711e3`](https://github.com/apache/spark/commit/0c711e3a081dc644c3a2d3c47207046eb4457ee1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


SparkQA commented on pull request #32464:
URL: https://github.com/apache/spark/pull/32464#issuecomment-834046598


   **[Test build #138232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138232/testReport)**
 for PR 32464 at commit 
[`23c7f91`](https://github.com/apache/spark/commit/23c7f9183372148a110ae538f9d80bfc5c3b09b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32442:
URL: https://github.com/apache/spark/pull/32442#issuecomment-834046371


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42751/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32442:
URL: https://github.com/apache/spark/pull/32442#issuecomment-834046371


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42751/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32447: [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand

2021-05-06 Thread GitBox


cloud-fan commented on a change in pull request #32447:
URL: https://github.com/apache/spark/pull/32447#discussion_r627908066



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -46,5 +47,6 @@ trait AnalysisOnlyCommand extends Command {
   val isAnalyzed: Boolean
   def childrenToAnalyze: Seq[LogicalPlan]
   override final def children: Seq[LogicalPlan] = if (isAnalyzed) Nil else 
childrenToAnalyze
+  override def innerChildren: Seq[QueryPlan[_]] = if (isAnalyzed) 
childrenToAnalyze else Nil

Review comment:
   Does it have real impact in EXPLAIN?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #32361: [SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manipulation

2021-05-06 Thread GitBox


HeartSaVioR commented on pull request #32361:
URL: https://github.com/apache/spark/pull/32361#issuecomment-834045784


   > We can further refine the CheckpointFileManager interface, as it knows the 
checkpoint location and all its APIs can simply accept relative paths.
   
   Sounds like a nice improvement; once checkpoint file manager is initialized 
with checkpoint root dir, callers shouldn't bother with figuring out the full 
path of destination. Every target should be inside of checkpoint root dir, 
except temp files checkpoint file manager creates "internally".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32361: [SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manipulation

2021-05-06 Thread GitBox


viirya commented on pull request #32361:
URL: https://github.com/apache/spark/pull/32361#issuecomment-834045474


   Thanks @HeartSaVioR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


viirya commented on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834045359


   Thanks @cloud-fan @dongjoon-hyun. I will merge once CI passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


dongjoon-hyun commented on pull request #32407:
URL: https://github.com/apache/spark/pull/32407#issuecomment-834045380


   BTW, could you rebase this PR to the master branch, @sunchao ? There was a 
bug causing TPCDS UT failure in master branch and it's fixed a few hours ago.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn opened a new pull request #32465: [SPARK-35331][SQL] Attributes become unknown in RepartitionByExpression after aliased

2021-05-06 Thread GitBox


yaooqinn opened a new pull request #32465:
URL: https://github.com/apache/spark/pull/32465


   
   
   
   ### What changes were proposed in this pull request?
   
   
   This PR makes the below case work well. 
   
   ```sql
   select a b from values(1) t(a) distribute by a;
   ```
   
   ```logtalk
   == Parsed Logical Plan ==
   'RepartitionByExpression ['a]
   +- 'Project ['a AS b#42]
  +- 'SubqueryAlias t
 +- 'UnresolvedInlineTable [a], [List(1)]
   
   == Analyzed Logical Plan ==
   org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input 
columns: [b]; line 1 pos 62;
   'RepartitionByExpression ['a]
   +- Project [a#48 AS b#42]
  +- SubqueryAlias t
 +- LocalRelation [a#48]
   ```
   ### Why are the changes needed?
   
   
   bugfix
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   yes, the original attributes can be used in `distribute by` / `cluster by` 
and hints like `/*+ REPARTITION(3, c) */`
   
   ### How was this patch tested?
   
   
   new tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-05-06 Thread GitBox


dongjoon-hyun commented on pull request #32462:
URL: https://github.com/apache/spark/pull/32462#issuecomment-834042835


   We are not going to bring SPARK-35327, right? If you want SPARK-35327 too, 
let's hold on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES

2021-05-06 Thread GitBox


SparkQA commented on pull request #32442:
URL: https://github.com/apache/spark/pull/32442#issuecomment-834042788


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42751/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request #32464: [SPARK-35062][SQL] Group exception messages in sql/streaming

2021-05-06 Thread GitBox


beliefer opened a new pull request #32464:
URL: https://github.com/apache/spark/pull/32464


   ### What changes were proposed in this pull request?
   This PR group exception messages in 
`sql/core/src/main/scala/org/apache/spark/sql/streaming`.
   
   
   ### Why are the changes needed?
   It will largely help with standardization of error messages and its 
maintenance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. Error messages remain unchanged.
   
   
   ### How was this patch tested?
   No new tests - pass all original tests to make sure it doesn't break any 
existing behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


dongjoon-hyun closed pull request #32430:
URL: https://github.com/apache/spark/pull/32430


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


dongjoon-hyun commented on pull request #32430:
URL: https://github.com/apache/spark/pull/32430#issuecomment-834041404


   It seems that there is some delay at GitHub Action. I checked that it's 
already passed.
   https://user-images.githubusercontent.com/9700541/117394778-9eb16a80-aeab-11eb-8e75-e5aee9c93ba7.png;>
   
   Thank you, @c21 and all. Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


dongjoon-hyun commented on a change in pull request #32430:
URL: https://github.com/apache/spark/pull/32430#discussion_r627903213



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala
##
@@ -24,14 +24,13 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
 import org.apache.spark.sql.execution.{CodegenSupport, LeafExecNode, 
WholeStageCodegenExec}
-import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.adaptive.{DisableAdaptiveExecutionSuite, 
EnableAdaptiveExecutionSuite}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.test.SQLTestData.TestData
 import org.apache.spark.sql.types.StructType
 
-// Disable AQE because the WholeStageCodegenExec is added when running 
QueryStageExec

Review comment:
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines

2021-05-06 Thread GitBox


dongjoon-hyun commented on pull request #32435:
URL: https://github.com/apache/spark/pull/32435#issuecomment-834040182


   Thank you, @byungsoo-oh and @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on pull request #32461: [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala

2021-05-06 Thread GitBox


sigmod commented on pull request #32461:
URL: https://github.com/apache/spark/pull/32461#issuecomment-834036401


   @hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for 
review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


viirya commented on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834034759


   Added the shared method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-06 Thread GitBox


SparkQA commented on pull request #32413:
URL: https://github.com/apache/spark/pull/32413#issuecomment-834032204


   **[Test build #138230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138230/testReport)**
 for PR 32413 at commit 
[`0ec8117`](https://github.com/apache/spark/commit/0ec8117aaae0708b19e817c61c780eff6af37cce).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES

2021-05-06 Thread GitBox


SparkQA commented on pull request #32442:
URL: https://github.com/apache/spark/pull/32442#issuecomment-834026504


   **[Test build #138229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138229/testReport)**
 for PR 32442 at commit 
[`4f8b782`](https://github.com/apache/spark/commit/4f8b7828a3448120e0d1fd2daeb9e8d3ab1a67eb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32462:
URL: https://github.com/apache/spark/pull/32462#issuecomment-834025819


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42747/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32454:
URL: https://github.com/apache/spark/pull/32454#issuecomment-834025816


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42748/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32463:
URL: https://github.com/apache/spark/pull/32463#issuecomment-834025820


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32455:
URL: https://github.com/apache/spark/pull/32455#issuecomment-834025817


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32407:
URL: https://github.com/apache/spark/pull/32407#issuecomment-834025821


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32430:
URL: https://github.com/apache/spark/pull/32430#issuecomment-834025815


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42749/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32462:
URL: https://github.com/apache/spark/pull/32462#issuecomment-834025819


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42747/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32454:
URL: https://github.com/apache/spark/pull/32454#issuecomment-834025816


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42748/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32407:
URL: https://github.com/apache/spark/pull/32407#issuecomment-834025821


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32463:
URL: https://github.com/apache/spark/pull/32463#issuecomment-834025820


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32455:
URL: https://github.com/apache/spark/pull/32455#issuecomment-834025817


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


AmplabJenkins commented on pull request #32430:
URL: https://github.com/apache/spark/pull/32430#issuecomment-834025815


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42749/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32454: [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results

2021-05-06 Thread GitBox


SparkQA commented on pull request #32454:
URL: https://github.com/apache/spark/pull/32454#issuecomment-834020513






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-06 Thread GitBox


SparkQA commented on pull request #32407:
URL: https://github.com/apache/spark/pull/32407#issuecomment-834020284






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

2021-05-06 Thread GitBox


SparkQA commented on pull request #32430:
URL: https://github.com/apache/spark/pull/32430#issuecomment-834020134






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-05-06 Thread GitBox


SparkQA commented on pull request #32463:
URL: https://github.com/apache/spark/pull/32463#issuecomment-834019899


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32462: [SPARK-34795][SPARK-35192][SPARK-35293][SQL][TESTS][3.1] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-05-06 Thread GitBox


SparkQA commented on pull request #32462:
URL: https://github.com/apache/spark/pull/32462#issuecomment-834019686






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-05-06 Thread GitBox


SparkQA commented on pull request #32463:
URL: https://github.com/apache/spark/pull/32463#issuecomment-834018398


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #27432: [SPARK-28325][SQL]Support ANSI SQL: SIMILAR TO ... ESCAPE syntax

2021-05-06 Thread GitBox


beliefer commented on pull request #27432:
URL: https://github.com/apache/spark/pull/27432#issuecomment-834013656


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)

2021-05-06 Thread GitBox


SparkQA removed a comment on pull request #32455:
URL: https://github.com/apache/spark/pull/32455#issuecomment-833957330


   **[Test build #138219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138219/testReport)**
 for PR 32455 at commit 
[`8a13cfb`](https://github.com/apache/spark/commit/8a13cfbcd57b7e93e0009c6b93d784184a880761).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32455: [WIP][SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4(latest)

2021-05-06 Thread GitBox


SparkQA commented on pull request #32455:
URL: https://github.com/apache/spark/pull/32455#issuecomment-834010898


   **[Test build #138219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138219/testReport)**
 for PR 32455 at commit 
[`8a13cfb`](https://github.com/apache/spark/commit/8a13cfbcd57b7e93e0009c6b93d784184a880761).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #32377: [SPARK-35021][SQL] Group exception messages in connector/catalog

2021-05-06 Thread GitBox


beliefer commented on pull request #32377:
URL: https://github.com/apache/spark/pull/32377#issuecomment-834008831


   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines

2021-05-06 Thread GitBox


HyukjinKwon closed pull request #32435:
URL: https://github.com/apache/spark/pull/32435


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32435: [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines

2021-05-06 Thread GitBox


HyukjinKwon commented on pull request #32435:
URL: https://github.com/apache/spark/pull/32435#issuecomment-834005292


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32431: [SPARK-35173][SQL][PYTHON] Add multiple columns adding support

2021-05-06 Thread GitBox


viirya commented on a change in pull request #32431:
URL: https://github.com/apache/spark/pull/32431#discussion_r627875037



##
File path: python/pyspark/sql/dataframe.py
##
@@ -2423,6 +2423,38 @@ def freqItems(self, cols, support=None):
 support = 0.01
 return DataFrame(self._jdf.stat().freqItems(_to_seq(self._sc, cols), 
support), self.sql_ctx)
 
+def withColumns(self, colsMap):
+"""
+Returns a new :class:`DataFrame` by adding multiple columns or 
replacing the
+existing columns that has the same name.
+
+The colsMap is a map of column name and column, the column must only 
refer to attribute
+supplied by this Dataset. It is an error to add columns that refers to 
some other Dataset.

Review comment:
   refers -> refer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >