mikhailnik-db commented on code in PR #53527:
URL: https://github.com/apache/spark/pull/53527#discussion_r2631300496


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3239,15 +3246,21 @@ class Analyzer(
         p
 
       // The star will be expanded differently if we insert `Generate` under 
`Project` too early.
-      case p @ Project(projectList, child) if 
!projectList.exists(_.exists(_.isInstanceOf[Star])) =>
+      // We also wait for all functions and generators to be resolved to 
ensure left-to-right
+      // generator ordering.
+      case p @ Project(projectList, child)
+          if !projectList.exists(_.exists(_.isInstanceOf[Star])) &&
+             !hasUnresolvedGeneratorOrFunction(projectList) =>
+        var hasSeenGenerator = false
         val (resolvedGenerator, newProjectList) = projectList
           .map(trimNonTopLevelAliases)
           .foldLeft((None: Option[Generate], Nil: Seq[NamedExpression])) { 
(res, e) =>
             e match {
               // If there are more than one generator, we only rewrite the 
first one and wait for
               // the next analyzer iteration to rewrite the next one.
-              case AliasedGenerator(generator, names, outer) if res._1.isEmpty 
&&
+              case AliasedGenerator(generator, names, outer) if 
!hasSeenGenerator &&

Review Comment:
   Here we change semantic a little:
   * `res._1.isEmpty` means that we did not go down this branch and did not 
extract anything.
   * `hasSeenGenerator` check almost the same, but also makes sure we did not 
miss any generators behind (e.g., because it did not match 
`AliasedGenerator(generator, names, outer)`). 
   
   In other words, I make this condition more strict: now even if we see in the 
project list smth like `UnresolvedAlias(Generator)` we save it in 
`hasSeenGenerator`. Therefore, instead of extracting the next (right) 
generator, we do nothing and wait for the leftmost to be resolved first.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3239,15 +3246,21 @@ class Analyzer(
         p
 
       // The star will be expanded differently if we insert `Generate` under 
`Project` too early.
-      case p @ Project(projectList, child) if 
!projectList.exists(_.exists(_.isInstanceOf[Star])) =>
+      // We also wait for all functions and generators to be resolved to 
ensure left-to-right
+      // generator ordering.
+      case p @ Project(projectList, child)
+          if !projectList.exists(_.exists(_.isInstanceOf[Star])) &&
+             !hasUnresolvedGeneratorOrFunction(projectList) =>
+        var hasSeenGenerator = false
         val (resolvedGenerator, newProjectList) = projectList
           .map(trimNonTopLevelAliases)
           .foldLeft((None: Option[Generate], Nil: Seq[NamedExpression])) { 
(res, e) =>
             e match {
               // If there are more than one generator, we only rewrite the 
first one and wait for
               // the next analyzer iteration to rewrite the next one.
-              case AliasedGenerator(generator, names, outer) if res._1.isEmpty 
&&
+              case AliasedGenerator(generator, names, outer) if 
!hasSeenGenerator &&

Review Comment:
   Here we change semantics a little:
   * `res._1.isEmpty` means that we did not go down this branch and did not 
extract anything.
   * `hasSeenGenerator` check almost the same, but also makes sure we did not 
miss any generators behind (e.g., because it did not match 
`AliasedGenerator(generator, names, outer)`). 
   
   In other words, I make this condition more strict: now even if we see in the 
project list smth like `UnresolvedAlias(Generator)` we save it in 
`hasSeenGenerator`. Therefore, instead of extracting the next (right) 
generator, we do nothing and wait for the leftmost to be resolved first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to