peter-toth edited a comment on pull request #32298:
URL: https://github.com/apache/spark/pull/32298#issuecomment-1075510596


   > Since we already have 
[WithCTE](https://github.com/apache/spark/blob/efe43306fcab18f076f755c81c0406ebc1a5fee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L703-L710)
 and 
[CTERelationRef](https://github.com/apache/spark/blob/efe43306fcab18f076f755c81c0406ebc1a5fee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L678-L686),
 the rewrite looks similar to what you want to achieve, while do not need to 
add yet-another Logical/Exec node?
   
   `WithCTE` and `CTERelationRef` nodes, that remained in logical plan (because 
of not inlined CTEs), look to serve only one purpose, that is to handle queries 
with multiple references to non-deterministic CTEs. That's why they are planned 
with an extra shuffle exchange in 
[WithCTEStrategy](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L681-L706).
 That extra exchange is needed for `ReuseExchangeAndSubquery` to kick in and 
ensure that the CTE is executed only once.
   But I think that an extra shuffle could mean performance degradation in case 
of scalar subqueries (CTEs returning only one row).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to