ulysses-you commented on code in PR #37239:
URL: https://github.com/apache/spark/pull/37239#discussion_r926182509


##########
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##########
@@ -1440,4 +1440,18 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
       }
     }
   }
+
+  test("SPARK-39825: Fix PushDownLeftSemiAntiJoin push through project") {
+    // before fix, it would throw:
+    //   java.lang.RuntimeException: Max iterations (100) reached for batch 
Operator Optimization
+    //   before Inferring Filters, please set 
'spark.sql.optimizer.maxIterations' to a larger value
+    withTable("t") {
+      Seq((1, DoubleData(1, "a"))).toDF("c", "nested")
+        .write
+        .saveAsTable("t")
+      spark.sql("select c, nested.id from t")
+        .join(Seq(1).toDF("c"), Seq("c"), "left_semi")
+        .collect()

Review Comment:
   For a plan:
   ```sql
   Project [i#1314, v#1572]
   +- Join LeftSemi, (i#1314 = i#1579)
      :- Project [i#1314, data#1315.v AS v#1572]
      :  +- Relation spark_catalog.default.tbl[i#1314,data#1315,v#1316] parquet
      +- Project [value#1576 AS i#1579]
         +- LocalRelation [value#1576]
   ```
   
   The Optimizer workflow:
   1. PushDownLeftSemiAntiJoin push join through projet
   2. ColumnPruning NestedColumnAliasing generate a new alias for nested column 
and add a project below join
   3. CollapseProject and RemoveNoopOperators clean up the unnecessary project, 
so the plan is going to the beginning 
   
   This issue also exists for non-nested column, but the batch is marked as not 
effective, the issue is not getting worse.
   For nested column pruning we will create a new alias, so after applying the 
whole batch the plan is different with previous due to the different alias expr 
id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to