Re: [PR] [SPARK-50722][SQL] Trace the lineage of CTE references [spark]

via GitHub Fri, 03 Jan 2025 10:05:27 -0800


cloud-fan commented on PR #49352:
URL: https://github.com/apache/spark/pull/49352#issuecomment-2569620223


   @peter-toth I find this "reference lineage" is a clearer design, because
   1. When we optimize out a CTE relation, and the nested `WithCTE` inside it 
is self-contained, we don't need to do anything. We can treat the CTE 
references in the main query of nested `WithCTE` as references of this CTE 
relation, then optimizing out this CTE relation triggers ref count update for 
the nested `WithCTE` and makes ref counts all 0. But it doesn't help with 
anything.
   2. When we optimize out a CTE relation, and the nested `WithCTE` inside it 
does reference certain relations in the current `WithCTE`, then we do need to 
update the ref count, but only for relations that are defined by the current 
`WithCTE`.
   
   So duplicated `WithCTE` (with conflicting IDs) is not really an issue, 
because they are always self-contained (they are from views or DataFrame 
queries).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50722][SQL] Trace the lineage of CTE references [spark]

Reply via email to