imback82 commented on issue #25015: [SPARK-28217][SQL] Allow a pluggable statistics plan visitor for a logical plan. URL: https://github.com/apache/spark/pull/25015#issuecomment-507980634 > What's the use case here? How does one use this without having fields to store stats? Today, cost/stats calculation in Catalyst is hard-coded and difficult to extend/customize (i.e., it only supports "size in bytes" and "basic stats" plan visitor). Cost/stats estimation/calculation has been known as a hard problem for decades, and people have been trying numerous approaches in both literature and practice. Indeed, some of our own customers have requested flexibility that allows them to plug-in their own cost/stats calculation mechanisms. This PR provides an extension point where a user can plug in a custom statistics plan visitor which can estimate/calculate stats/costs differently from the built-in ones, without of course, disrupting the existing use cases.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
