LuciferYang commented on a change in pull request #29434:
URL: https://github.com/apache/spark/pull/29434#discussion_r474070475
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala
##########
@@ -329,7 +329,7 @@ class StarJoinCostBasedReorderSuite extends PlanTest with
StatsEstimationTestBas
//
// Number of generated plans: 46 (vs. 82)
val query =
-
d1.join(t3).join(t4).join(f1).join(d2).join(t5).join(t6).join(d3).join(t1).join(t2)
+
d1.join(t3).join(t4).join(f1).join(d3).join(d2).join(t5).join(t6).join(t1).join(t2)
Review comment:
@cloud-fan @srowen I verified the tests in the above description, the
candidate input set is `Seq(d1, t3, t4, f1, d2, t5, t6, d3, t1,
t2).permutations`, total of 3628000 input in different orders.
We define original expected optimization result A as
```
f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
.join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
.join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
.join(t4.join(t3, Inner, Some(nameToAttr("t3_c2") ===
nameToAttr("t4_c2"))), Inner,
Some(nameToAttr("d1_c2") === nameToAttr("t3_c1")))
.join(t2.join(t1, Inner, Some(nameToAttr("t1_c2") ===
nameToAttr("t2_c2"))), Inner,
Some(nameToAttr("d3_c2") === nameToAttr("t1_c1")))
.join(t5.join(t6, Inner, Some(nameToAttr("t5_c2") ===
nameToAttr("t6_c2"))), Inner,
Some(nameToAttr("d2_c2") === nameToAttr("t5_c1")))
```
and define the other one optimization result B as
```
f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
.join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
.join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
.join(t4.join(t3, Inner, Some(nameToAttr("t3_c2") ===
nameToAttr("t4_c2"))), Inner,
Some(nameToAttr("d1_c2") === nameToAttr("t3_c1")))
.join(t2.join(t1, Inner, Some(nameToAttr("t1_c2") ===
nameToAttr("t2_c2"))), Inner,
Some(nameToAttr("d3_c2") === nameToAttr("t1_c1")))
.join(t5.join(t6, Inner, Some(nameToAttr("t5_c2") ===
nameToAttr("t6_c2"))), Inner,
Some(nameToAttr("d2_c2") === nameToAttr("t5_c1")))
```
Some test results are as follows:
- Scala 2.12 use HashMap: 1813600 results were candidate A, 1814400 results
were candidate B
- Scala 2.12 use LinkedHashMap: 1814400 results were candidate A, 1813600
results were candidate B
I will feedback on the test results in Scala 2.13 later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]