peter-toth commented on code in PR #43614:
URL: https://github.com/apache/spark/pull/43614#discussion_r1390209748


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -156,7 +171,15 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
       // If a CTE relation is never used, it will disappear after inline. Here 
we explicitly check
       // analysis for it, to make sure the entire query plan is valid.
       try {
-        if (refCount == 0) checkAnalysis0(relation.child)

Review Comment:
   Sorry, for the late reply. (I was travelling this week.)
   
   The reason why I would suggest the `cleanCTEMap()` approach is because you 
might be checking a CTE 2 times now. Not in this new code, as `visited` map 
makes sure to check a CTE only once, but later when we check the inlined plan 
it might contain a CTE that was referenced from an other CTE with ref count = 0 
(so the CTE is checked in the new code) and also from a third CTE with valid 
references (so the CTE is checked in the inlined plan again).
   
   As for checking the leafs first, checking the CTEs in id order guarantees 
that. This is because how we do CTE resolution (assign ids to defs) in 
`CTESubstitution`. `cleanCTEMap()` also uses this property to quickly clear 
useless transitive reference counts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to