ulysses-you commented on code in PR #37239:
URL: https://github.com/apache/spark/pull/37239#discussion_r926182509
##########
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##########
@@ -1440,4 +1440,18 @@ class JoinSuite extends QueryTest with
SharedSparkSession with AdaptiveSparkPlan
}
}
}
+
+ test("SPARK-39825: Fix PushDownLeftSemiAntiJoin push through project") {
+ // before fix, it would throw:
+ // java.lang.RuntimeException: Max iterations (100) reached for batch
Operator Optimization
+ // before Inferring Filters, please set
'spark.sql.optimizer.maxIterations' to a larger value
+ withTable("t") {
+ Seq((1, DoubleData(1, "a"))).toDF("c", "nested")
+ .write
+ .saveAsTable("t")
+ spark.sql("select c, nested.id from t")
+ .join(Seq(1).toDF("c"), Seq("c"), "left_semi")
+ .collect()
Review Comment:
For a plan:
```sql
Project [i#1314, v#1572]
+- Join LeftSemi, (i#1314 = i#1579)
:- Project [i#1314, data#1315.v AS v#1572]
: +- Relation spark_catalog.default.tbl[i#1314,data#1315,v#1316] parquet
+- Project [value#1576 AS i#1579]
+- LocalRelation [value#1576]
```
The Optimizer workflow:
1. PushDownLeftSemiAntiJoin push join through projet
2. ColumnPruning NestedColumnAliasing generate a new alias for nested column
and add a project below join
3. CollapseProject and RemoveNoopOperators clean up the unnecessary project,
so the plan is going to the beginning
This issue also exists for non-nested column, but the batch is marked as not
effective, the issue is not getting worse.
For nested column pruning we will create a new alias, so after applying the
whole batch the plan is different with previous due to the different alias expr
id.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]