[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

GitBox Tue, 29 Sep 2020 04:38:41 -0700


tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496647197




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging 
{
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the 
join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, 
because when building
+    // For the higher level k, we only need to search from lev to lev - k, 
because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered 
input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > Yea, that's a design and why do we need to keep the same 
`semanticHash` values between different JVMs?
   
   It does not allow us sort by it and then compare results in the 
`PlanStbilitySuite`. 

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging 
{
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the 
join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, 
because when building
+    // For the higher level k, we only need to search from lev to lev - k, 
because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered 
input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > Yea, that's a design and why do we need to keep the same 
`semanticHash` values between different JVMs?
   
   It does not allow us sort by it and then compare results in the 
`PlanStabilitySuite`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Reply via email to