yadavay-amzn opened a new pull request, #56079:
URL: https://github.com/apache/spark/pull/56079

   ### What changes were proposed in this pull request?
   
   Remove the test-mode assertion in `DataSourceV2Relation.computeStats()` that 
throws when stats are requested before scan pushdown has been applied.
   
   ### Why are the changes needed?
   
   The `operatorOptimizationBatch` (containing `PushDownLeftSemiAntiJoin`) runs 
before `earlyScanPushDownRules` in the optimizer. When 
`PushDownLeftSemiAntiJoin` evaluates whether a join can be planned as 
broadcast, it calls `plan.stats` on DSv2 relations that have not yet had 
pushdown applied. The test-mode assertion throws `SparkException`, crashing the 
query.
   
   Repro: any LEFT SEMI or LEFT ANTI join over an Aggregate on a DSv2 table 
triggers this path.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes -- queries with LEFT SEMI/ANTI joins on DSv2 tables no longer crash in 
test mode. In production mode the assertion was already inactive, but stats 
estimates were potentially inflated (pre-pushdown size). The method now 
consistently returns fallback stats from the catalog.
   
   ### How was this patch tested?
   
   Added test in `DataSourceV2SQLSuiteV2Filter` exercising LEFT SEMI join over 
Aggregate on a DSv2 table. Verifies correct query results via `checkAnswer`.
   
   - Without fix: `SparkException: [INTERNAL_ERROR] BUG: computeStats called 
before pushdown`
   - With fix: correct results returned
   - Full DSv2 test suites pass (456 tests, no regressions)
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to