LuciferYang commented on a change in pull request #29434:
URL: https://github.com/apache/spark/pull/29434#discussion_r474070475



##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala
##########
@@ -329,7 +329,7 @@ class StarJoinCostBasedReorderSuite extends PlanTest with 
StatsEstimationTestBas
     //
     // Number of generated plans: 46 (vs. 82)
     val query =
-      
d1.join(t3).join(t4).join(f1).join(d2).join(t5).join(t6).join(d3).join(t1).join(t2)
+      
d1.join(t3).join(t4).join(f1).join(d3).join(d2).join(t5).join(t6).join(t1).join(t2)

Review comment:
       @cloud-fan @srowen  I verified the above tests, the candidate input set 
is `Seq(d1, t3, t4, f1, d2, t5, t6, d3, t1, t2).permutations`, total of 3628000 
input in different orders.
   
   We define original expected optimization result A as
   
   ```
     f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
      .join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
      .join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
      .join(t4.join(t3, Inner, Some(nameToAttr("t3_c2") === 
nameToAttr("t4_c2"))), Inner,
        Some(nameToAttr("d1_c2") === nameToAttr("t3_c1")))
      .join(t2.join(t1, Inner, Some(nameToAttr("t1_c2") === 
nameToAttr("t2_c2"))), Inner,
        Some(nameToAttr("d3_c2") === nameToAttr("t1_c1")))
      .join(t5.join(t6, Inner, Some(nameToAttr("t5_c2") === 
nameToAttr("t6_c2"))), Inner,
        Some(nameToAttr("d2_c2") === nameToAttr("t5_c1")))
   ```
   
   and define the other one optimization result B as
   
   ```
    f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
      .join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
      .join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
      .join(t4.join(t3, Inner, Some(nameToAttr("t3_c2") === 
nameToAttr("t4_c2"))), Inner,
        Some(nameToAttr("d1_c2") === nameToAttr("t3_c1")))
      .join(t2.join(t1, Inner, Some(nameToAttr("t1_c2") === 
nameToAttr("t2_c2"))), Inner,
        Some(nameToAttr("d3_c2") === nameToAttr("t1_c1")))
      .join(t5.join(t6, Inner, Some(nameToAttr("t5_c2") === 
nameToAttr("t6_c2"))), Inner,
        Some(nameToAttr("d2_c2") === nameToAttr("t5_c1")))
   ```
   
   Some test results are as follows:
   
   - Scala 2.12 use HashMap: 1813600 results were candidate A, 1814400 results 
were candidate B
   
   - Scala 2.12 use LinkedHashMap: 1814400 results were candidate A, 1813600 
results were candidate B
   
   I will feedback on the test results in Scala 2.13 later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to