disliketd commented on PR #53263: URL: https://github.com/apache/spark/pull/53263#issuecomment-3612935127
I have concerns about this approach. The motivating use case relies on parsing string outputs from ```SHOW PARTITIONS``` to drive logic, which is an anti-pattern compared to standard scalar subqueries ```(WHERE col = (SELECT MAX(col)...))```. Furthermore, blindly treating all ```CommandResult``` nodes as 'selective' ```(hasSelectivePredicate = true)``` seems risky. If the command returns all partitions, we incur the DPP overhead without any pruning benefit. We shouldn't modify core optimizer heuristics to support a fragile query pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
