holdenk opened a new pull request, #46143: URL: https://github.com/apache/spark/pull/46143
### What changes were proposed in this pull request? Changes the filter pushDown optimizer to not push down past projections of the same element if we reasonable expect that computing that element is likely to be expensive. This is a more complex alternative to https://github.com/apache/spark/pull/45802 which also moves parts of projections down so that the filters can move further down. This introduces an "expectedCost" mechanism which we may or may not want. Previous filter ordering work used filter pushdowns as an approximation of expression cost but here we need more granularity. As an alternative we could introduce a flag for expensive rather than numeric operations. Another alternative would be seeing if the predicate can be "converted" as a proxy for cheap. ### Future Work / What else remains to do? Right now if a cond is expensive and it references something in the projection we don't push-down. We could probably do better and gate this on if the thing we are reference is expensive rather than the condition it's self. We could do this as a follow up item or as part of this PR. ### Why are the changes needed? Currently Spark may double compute expensive operations (like json parsing, UDF eval, etc.) as a result of filter pushdown past projections. ### Does this PR introduce _any_ user-facing change? SQL optimizer change may impact some user queries, results should be the same and hopefully a little faster. ### How was this patch tested? New tests were added to the FilterPushDownSuite, and the initial problem of double evaluation was confirmed with a github gist ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
