nebojsa-db commented on PR #47180:
URL: https://github.com/apache/spark/pull/47180#issuecomment-2206686086

   > I'm not a big fan of this approach, as this duplicates the handling of 
IDENTIFIER clauses in `CTESubstitution`.
   > 
   > IMO, the root cause is we special-case CTE resolution and run 
`CTESubstitution` as an individual batch at the very beginning. The ideal 
solution is to look up CTE relations together with the normal table lookup.
   > 
   > My idea: let's split CTE resolution into two steps:
   > 
   > 1. identify the available CTE relations for each `UnresolvedRelation`. 
Given the position of `UnresolvedRelation`, the available CTE relations can be 
very different (e.g. in the main query, in the CTE relations, in nested CTE, 
etc.). Then we wrap `UnresolvedRelation` with a new node `WithCTERelations` to 
hold available CTE relations.
   > 2. In the analyzer main batch, we wait for the IDENTIFIER clause to be 
handled, then unwrap `WithCTERelations` by looking up CTE relations and 
resoving `UnresolvedRelation`. If the lookup fails, restore to 
`UnresolvedRelation` so that normal table lookup rule can handle it later.
   
   @cloud-fan Please take a look at the pushed changes now, I've created a 
rough draft changes which should work with your approach (if I understood 
correctly). I don't have deep understanding of all possible uses of CTEs and if 
changing the order of these few rules could cause some major issues?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to