sigmod commented on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-1076603246
Thanks, @peter-toth! > I have a follow-up PR to support merging different filter predicates with OR, > I just didn't want to make this PR more complex > This PR wants to deal with that scope only. Definitely. I was just saying `CTERelationRef` can be used in more general cases beyond non-deterministic CTE definitions. Let's not expand the scope of this PR. > @sigmod, how about doing this kind of transformation? It looks good to me. IIUC, you want to wrap columns with a struct so that you can execute it as a scalar subquery? > and adding a flag to cte CTERelationDef that it hosts a scalar query Sounds good to me. Will you add a "optimization" rule to add such an "annotation" by looking at the plan holistically, e.g., all consumers of a CTE are simply to pull out a field value? I'm thinking of the following scenario for future improvements: - a non-subquery plan subtree can share the plan structure with scalar subqueries too - in this case, the CTE is reused by both subqueries and ordinary plan subtrees We might also want to make sure MergeSubqueries do not prevent such reuse opportunities down the road. > + changing WithCTEStrategy a bit to avoid extra shuffles in those cases as > ReuseExchangeAndSubquery can insert ReusedSubqueryExec nodes (no need to insert ReusedExchangeExec). Will you rewrite the physical plan to change the consumer subqueries to GetStructField? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
