ahshahid commented on PR #49153: URL: https://github.com/apache/spark/pull/49153#issuecomment-2537106508
Previous comments: Do you happen to know why we didn't catch the following? Do you think you can provide a test coverage for that? If these classes implement equals and hashCode taking into account the pushed runtime filters, we would see that TPCDS Q14b which should ideally be reusing the exchange containing Union , is not happening due to multiple bugs which surface in AQE. @dongjoon-hyun I think the reason for not catching the issue of reuse of exchange is a mix of multiple things Spark is not testing with any concrete DataSourceV2 implementation. ( like iceberg) The simulation of DataSourceV2 impl using InMemoryTableScan is buggy because of equals / hashcode not taking into account pushed runtime filters, as a result any reuse of exchange bug would not be caught ( i.e mismatch of cached exchange plans would not be detected, giving a false assurance of re-use0 If I am not wrong, the tpcds tests are run using Hive as DataSource and not sure if it supports push down of runtime filters. The bug in AQE only shows in TPCDS if table are partitioned and equi join involves partitioning column. I am not sure if right now various tpcds tests use partitioned table or not. Yes I have been able to reproduce the issue using InMemoryTableScans as DataSourceV2 impl for tpcds tests. I will checkin a prototype test for reproducing the bug using q14b and if needed all other queries of tpcds can be run. Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable.. added bug test in PR https://github.com/apache/spark/pull/43824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
