peter-toth edited a comment on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-1075510596
> Since we already have [WithCTE](https://github.com/apache/spark/blob/efe43306fcab18f076f755c81c0406ebc1a5fee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L703-L710) and [CTERelationRef](https://github.com/apache/spark/blob/efe43306fcab18f076f755c81c0406ebc1a5fee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L678-L686), the rewrite looks similar to what you want to achieve, while do not need to add yet-another Logical/Exec node? `WithCTE` and `CTERelationRef` nodes, that remained in logical plan (because of not inlined CTEs), look to serve only one purpose, that is to handle queries with multiple references to non-deterministic CTEs. That's why they are planned with an extra shuffle exchange in [WithCTEStrategy](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L681-L706). That extra exchange is needed for `ReuseExchangeAndSubquery` to kick in and ensure that the CTE is executed only once. But I think that an extra shuffle could mean performance degradation in case of scalar subqueries (CTEs returning only one row). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
