peter-toth commented on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-1076099398
> * since we've already had `WithCTE` and `CTERef`, we probably don't want to re-invent a similar wheel for a narrower case (i.e., shared evaluation of subqueries). I suspect the additional performance gain is marginal, compared to the gain of shared scan and aggregate computation. I agree that `WithCTE` is similar to `CommonSubqueries` as they are both root nodes and both host common definitions. But, I'm a bit reluctant to accept that we should handle common scalar subqueries as CTEs as we know that `CTERef` are invented for 1 special purpose and always do an extra shuffle. I think the difference between a CTE, that can return many-many rows and a subquery, that always returns max. one value, is huge. I don't see why we should force using `CTERef` if we can easily give a better alternative. > * after this PR, subquery related rules also may have to think about `SubqueryReference` and `CommonSubqueries`; I'm not sure about this. `CommonSubqueries` does nothing, just like `WithCTE`. The point of `SubqueryReference` is exactly to move away the (merged) subquery from an arbitrary node to a common place to avoid any rule changing it at its original place. Once the reference is inserted, subsequent rules should handle it as they handle a literal and no longer as a subquery. > * what would happen if a `WithCTE` co-exists with `CommonSubqueries`; Actually this is a good question and this PR doesn't have any test to cover that rare scenario. As we agree that both are definition hosting root nodes, how about combining them into one node (`CommonDefinitions`?) that can host CTEs and scalar subqueries as well? > * `ColumnPruning` may also need to consider `SubqueryReference` and `CommonSubqueries`. I'm not sure I get this. Why `ColumnPruning` should consider these new nodes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
