cloud-fan commented on PR #49352: URL: https://github.com/apache/spark/pull/49352#issuecomment-2569620223
@peter-toth I find this "reference lineage" is a clearer design, because 1. When we optimize out a CTE relation, and the nested `WithCTE` inside it is self-contained, we don't need to do anything. We can treat the CTE references in the main query of nested `WithCTE` as references of this CTE relation, then optimizing out this CTE relation triggers ref count update for the nested `WithCTE` and makes ref counts all 0. But it doesn't help with anything. 2. When we optimize out a CTE relation, and the nested `WithCTE` inside it does reference certain relations in the current `WithCTE`, then we do need to update the ref count, but only for relations that are defined by the current `WithCTE`. So duplicated `WithCTE` (with conflicting IDs) is not really an issue, because they are always self-contained (they are from views or DataFrame queries). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
