LuciferYang edited a comment on pull request #29638:
URL: https://github.com/apache/spark/pull/29638#issuecomment-686890566


   > Hmm, why this is needed? Firstly I thought CostBasedJoinReorder will 
produce non-deterministic for same query. But I looked at the JIRA description, 
seems for different input, the rule will produce different output. Doesn't it 
sound reasonable? Different input causes different output.
   
   @viirya Sorry, I didn't describe it clearly. Actually, there are 2 problems 
we found in  SPARK-32526:
   
   1. For same Scala version,  different input causes different output as I 
describe in SPARK-32687, for example:
   
   ```
   d1.join(t3).join(t4).join(f1).join(d3).join(d2)
     .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
   ```
   
   and 
   
   ```
   d1.join(t3).join(f1).join(d2).join(t4).join(d3)
    .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
   ```
   have different optimization result, I think this is acceptable if the 
candidates have same cost, but @cloud-fan maybe has some different view in 
https://github.com/apache/spark/pull/29434, I'm not sure I understand it 
correctly.
   
   
   2.  For different Scala version (2.12 vs 2.13), same input maybe causes 
different output,  for example
   
   ```
   
d1.join(t3).join(t4).join(f1).join(d2).join(t5).join(t6).join(d3).join(t1).join(t2)
           .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
             (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
             (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
             (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
             (nameToAttr("d2_c2") === nameToAttr("t5_c1")) &&
             (nameToAttr("t5_c2") === nameToAttr("t6_c2")) &&
             (nameToAttr("f1_fk3") === nameToAttr("d3_pk")) &&
             (nameToAttr("d3_c2") === nameToAttr("t1_c1")) &&
             (nameToAttr("t1_c2") === nameToAttr("t2_c2")))
   ```
   in Scala 2.12 and Scala 2.13 have different optimization result. This pr 
also can fix this problem. If everyone thinks that `different input causes 
different output` is reasonable,  I will close this first. But maybe we also 
need resolve problem 2, I will describe the problem in another jira based on 
problem 2 and try to fix it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to