peter-toth commented on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-1076099398


   > * since we've already had `WithCTE` and `CTERef`,  we probably don't want 
to re-invent a similar wheel for a narrower case (i.e., shared evaluation of 
subqueries). I suspect the additional performance gain is marginal, compared to 
the gain of shared scan and aggregate computation.
   
   I agree that `WithCTE` is similar to `CommonSubqueries` as they are both 
root nodes and both host common definitions.
   But, I'm a bit reluctant to accept that we should handle common scalar 
subqueries as CTEs as we know that `CTERef` are invented for 1 special purpose 
and always do an extra shuffle. I think the difference between a CTE, that can 
return many-many rows and a subquery, that always returns max. one value, is 
huge. I don't see why we should force using `CTERef` if we can easily give a 
better alternative. 
   
   > * after this PR, subquery related rules also may have to think about 
`SubqueryReference` and `CommonSubqueries`;
   
   I'm not sure about this. `CommonSubqueries` does nothing, just like 
`WithCTE`. The point of `SubqueryReference` is exactly to move away the 
(merged) subquery from an arbitrary node to a common place to avoid any rule 
changing it at its original place. Once the reference is inserted, subsequent 
rules should handle it as they handle a literal and no longer as a subquery.
   
   > * what would happen if a `WithCTE` co-exists with `CommonSubqueries`;
   
   Actually this is a good question and this PR doesn't have any test to cover 
that rare scenario. As we agree that both are definition hosting root nodes, 
how about combining them into one node (`CommonDefinitions`?) that can host 
CTEs and scalar subqueries as well?
   
   > * `ColumnPruning` may also need to consider `SubqueryReference` and 
`CommonSubqueries`.
   
   I'm not sure I get this. Why `ColumnPruning` should consider these new nodes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to