tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496131422



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging 
{
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the 
join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, 
because when building
+    // For the higher level k, we only need to search from lev to lev - k, 
because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered 
input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       I pondered about this a bit more. I think this is a bit more general 
explanaiton:
   
   After the first round of optimisation the plans are ordered: ABCD....
   Lets ignore the brackets to keep it general.
   On the second run we just have to redraw the brackets without changing the 
order.
   
   We know that the result is as left-deep as possible. 
   
   So, if we would be iterating from the smaller ones, then we could have (K) 
and (LM), that we want to join, but because it has to be left-deep, we must 
order it as ((LM)K).
   
   But, if we start from the bigger ones, then we hit (KL) and (M) first and 
joining these will not reorder the letters.
   
   For stability I think, that there are at least 4 factors, that must match 
with each others:
   How do we join: left-deep, right-deep....
   The iteration order in one "level": left-to-right, right-to-left.
   The order of picking levels: bigger first, smaller first.
   Which one we keep, when joins have equal cost: first, last...
   
   There might be more. Changing the order of picking levels gets us to 
stability in one change.
   There are most definitely more ways to achieve stability, but it seems, that 
it would require more than one change.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to