ahshahid commented on PR #49153:
URL: https://github.com/apache/spark/pull/49153#issuecomment-2537106508

   Previous comments:
   
   Do you happen to know why we didn't catch the following? Do you think you 
can provide a test coverage for that?
   
   If these classes implement equals and hashCode taking into account the 
pushed runtime filters, we would see that TPCDS Q14b which should ideally be 
reusing the exchange containing Union , is not happening due to multiple bugs 
which surface in AQE.
   
   @dongjoon-hyun I think the reason for not catching the issue of reuse of 
exchange is a mix of multiple things
   
   Spark is not testing with any concrete DataSourceV2 implementation. ( like 
iceberg)
   The simulation of DataSourceV2 impl using InMemoryTableScan is buggy because 
of equals / hashcode not taking into account pushed runtime filters, as a 
result any reuse of exchange bug would not be caught ( i.e mismatch of cached 
exchange plans would not be detected, giving a false assurance of re-use0
   If I am not wrong, the tpcds tests are run using Hive as DataSource and not 
sure if it supports push down of runtime filters.
   The bug in AQE only shows in TPCDS if table are partitioned and equi join 
involves partitioning column. I am not sure if right now various tpcds tests 
use partitioned table or not.
   Yes I have been able to reproduce the issue using InMemoryTableScans as 
DataSourceV2 impl for tpcds tests. I will checkin a prototype test for 
reproducing the bug using q14b and if needed all other queries of tpcds can be 
run.
   
   Though I ought to point out that while running my test I also hit the issue 
of computeStats being called twice ( which throws error only in testing). I 
have not debugged that... yet. And not sure if the assertion of computeStats 
occuring only once is maintainable..
   
   
   added bug test in PR https://github.com/apache/spark/pull/43824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to