LuciferYang commented on pull request #29638:
URL: https://github.com/apache/spark/pull/29638#issuecomment-686890566


   > Hmm, why this is needed? Firstly I thought CostBasedJoinReorder will 
produce non-deterministic for same query. But I looked at the JIRA description, 
seems for different input, the rule will produce different output. Doesn't it 
sound reasonable? Different input causes different output.
   
   @viirya viirya Sorry, I didn't describe it clearly. Actually, there are 2 
problems we found in  SPARK-32526:
   
   1. For same Scala version,  different input causes different output as I 
describe in SPARK-32687, for example:
   
   ```
   d1.join(t3).join(t4).join(f1).join(d3).join(d2)
     .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
   ```
   
   and 
   
   ```
   d1.join(t3).join(f1).join(d2).join(t4).join(d3)
    .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
   ```
   have different optimization result, I think this is acceptable if the 
candidates have same cost, but @cloud-fan has some different view in 
https://github.com/apache/spark/pull/29434, I'm not sure I understand it 
correctly.
   
   
   2.  For different Scala version (2.12 vs 2.13), same input maybe causes 
different output,  for example
   
   ```
   
d1.join(t3).join(t4).join(f1).join(d2).join(t5).join(t6).join(d3).join(t1).join(t2)
           .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("d2_c2") === nameToAttr("t5_c1")) &&
             (nameToAttr("t5_c2") === nameToAttr("t6_c2")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")) &&
             (nameToAttr("d3_c2") === nameToAttr("t1_c1")) &&
             (nameToAttr("t1_c2") === nameToAttr("t2_c2")))
   ```
   in Scala 2.12 and Scala 2.13 have different optimization result. If everyone 
thinks that `different input causes different output` is reasonable,  I will 
close this first. But maybe we also need resolve problem 2, I will describe the 
problem in another jira based on problem 2 and try to fix it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to