nebojsa-db commented on PR #47180: URL: https://github.com/apache/spark/pull/47180#issuecomment-2206686086
> I'm not a big fan of this approach, as this duplicates the handling of IDENTIFIER clauses in `CTESubstitution`. > > IMO, the root cause is we special-case CTE resolution and run `CTESubstitution` as an individual batch at the very beginning. The ideal solution is to look up CTE relations together with the normal table lookup. > > My idea: let's split CTE resolution into two steps: > > 1. identify the available CTE relations for each `UnresolvedRelation`. Given the position of `UnresolvedRelation`, the available CTE relations can be very different (e.g. in the main query, in the CTE relations, in nested CTE, etc.). Then we wrap `UnresolvedRelation` with a new node `WithCTERelations` to hold available CTE relations. > 2. In the analyzer main batch, we wait for the IDENTIFIER clause to be handled, then unwrap `WithCTERelations` by looking up CTE relations and resoving `UnresolvedRelation`. If the lookup fails, restore to `UnresolvedRelation` so that normal table lookup rule can handle it later. @cloud-fan Please take a look at the pushed changes now, I've created a rough draft changes which should work with your approach (if I understood correctly). I don't have deep understanding of all possible uses of CTEs and if changing the order of these few rules could cause some major issues? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
