cloud-fan commented on a change in pull request #32885:
URL: https://github.com/apache/spark/pull/32885#discussion_r651019687
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -73,7 +73,9 @@ case class HashAggregateExec(
// This is for testing. We force TungstenAggregationIterator to fall back to
the unsafe row hash
// map and/or the sort-based aggregation once it has processed a given
number of input rows.
private val testFallbackStartsAt: Option[(Int, Int)] = {
- sqlContext.getConf("spark.sql.TungstenAggregate.testFallbackStartsAt",
null) match {
+ Option(sqlContext).map { sc =>
Review comment:
This is a hidden bug. `SubqueryExpression` will be sent to the executor
side and build `Projection`, and be put in `EquivalentExpressions`, which needs
to call `canonicalized`.
This means, Spark may serialize and send `HashAggregateExec` to the executor
side, where `sqlContext` should be null.
It's hidden for a long time because `ScalarSubquery` didn't implement
`canonicalized`, so the bug is not triggered. However, it also means
`semanticHash` is wrong.
I think it only affects common subquery elimination, and shouldn't be a
serious bug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]