karuppayya commented on code in PR #52213:
URL: https://github.com/apache/spark/pull/52213#discussion_r2389429529
##########
sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala:
##########
@@ -205,6 +206,9 @@ class InjectRuntimeFilterSuite extends QueryTest with
SQLTestUtils with SharedSp
sql("analyze table bf5part compute statistics for columns a5, b5, c5, d5,
e5, f5")
sql("analyze table bf5filtered compute statistics for columns a5, b5, c5,
d5, e5, f5")
+ // Tests depend on intermediate results that would otherwise be cleaned up
when
Review Comment:
I think i found the root cause.
When AQE's
[AQEPropagateEmptyRelation](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala#L43)
rule detects that a relation returns zero rows, it re-optimizes(as part of
adaptive execution) the query by replacing the entire join operation with an
EmptyRelation. This causes the main query to terminate early during
reoptimization.
Secondary Issue: Subqueries (specifically those generating bloom filters in
our case) continue running asynchronously in separate threads, unaware that the
main query has already completed. This creates a race condition where:
1. Main query terminates from empty relation optimization
2. Shuffle cleanup occurs as part of the main query execution end event
3. Subqueries attempt to access shuffle data(since they are still running)
that has already been cleaned up
4. Subqueries fail with FetchFailedException or similar errors
In case of the InjectRuntimeFilterSuite
1. Filter conditions on relations returned zero rows
2. Join operations were replaced with EmptyRelation by AQE
3. Bloom filter subqueries continued executing asynchronously
4. When FetchFailedException occurred, the SparkContext stopped
5. This caused cascading failures in subsequent tests
_Immediate workaround_: Fixed the test data so that filyter conditions
return atleast one row
_Solution_: To terminate all the subqueries(if any) when the actual query
ends.
cc: @cloud-fan @dongjoon-hyun
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]