[GitHub] [spark] cloud-fan commented on a change in pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

GitBox Mon, 14 Jun 2021 07:51:27 -0700


cloud-fan commented on a change in pull request #32885:
URL: https://github.com/apache/spark/pull/32885#discussion_r651019687




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -73,7 +73,9 @@ case class HashAggregateExec(
   // This is for testing. We force TungstenAggregationIterator to fall back to 
the unsafe row hash
   // map and/or the sort-based aggregation once it has processed a given 
number of input rows.
   private val testFallbackStartsAt: Option[(Int, Int)] = {
-    sqlContext.getConf("spark.sql.TungstenAggregate.testFallbackStartsAt", 
null) match {
+    Option(sqlContext).map { sc =>

Review comment:
       This is a hidden bug. `SubqueryExpression` will be sent to the executor 
side and build `Projection`, and be put in `EquivalentExpressions`, which needs 
to call `canonicalized`.
   
   This means, Spark may serialize and send `HashAggregateExec` to the executor 
side, where `sqlContext` should be null.
   
   It's hidden for a long time because `ScalarSubquery` didn't implement 
`canonicalized`, so the bug is not triggered. However, it also means 
`semanticHash` is wrong.
   
   I think it only affects common subquery elimination, and shouldn't be a 
serious bug.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

Reply via email to