karuppayya commented on code in PR #52213:
URL: https://github.com/apache/spark/pull/52213#discussion_r2389429529


##########
sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala:
##########
@@ -205,6 +206,9 @@ class InjectRuntimeFilterSuite extends QueryTest with 
SQLTestUtils with SharedSp
     sql("analyze table bf5part compute statistics for columns a5, b5, c5, d5, 
e5, f5")
     sql("analyze table bf5filtered compute statistics for columns a5, b5, c5, 
d5, e5, f5")
 
+    // Tests depend on intermediate results that would otherwise be cleaned up 
when

Review Comment:
   I think i found the root cause. 
   
   When AQE's 
[AQEPropagateEmptyRelation](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala#L43)
 rule detects that a relation returns zero rows, it re-optimizes(as part of 
adaptive execution) the query by replacing the entire join operation with an 
EmptyRelation. This causes the main query to terminate early during 
reoptimization. 
   Secondary Issue: Subqueries (specifically those generating bloom filters in 
our case) continue running asynchronously in separate threads, unaware that the 
main query has already completed. This creates a race condition where:
   
   1. Main query terminates from empty relation optimization
   2. Shuffle cleanup occurs as part of the main query execution end event
   3. Subqueries attempt to access shuffle data(since they are still running) 
that has already been cleaned up
   4. Subqueries fail with FetchFailedException or similar errors
   
   In case of the InjectRuntimeFilterSuite
   1. Filter conditions on relations returned zero rows
   2. Join operations were replaced with EmptyRelation by AQE
   3. Bloom filter subqueries continued executing asynchronously
   4. When FetchFailedException occurred, the SparkContext stopped
   5. This caused cascading failures in subsequent tests
   
   _Immediate workaround_: Fixed the test data so that filyter conditions 
return atleast one row
   _Solution_: To terminate all the subqueries(if any) when the actual query 
ends.
   
   cc: @cloud-fan @dongjoon-hyun 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to