LuciferYang opened a new pull request #29638:
URL: https://github.com/apache/spark/pull/29638


   ### What changes were proposed in this pull request?
   The optimization result of `CostBasedJoinReorder` is non-deterministic now, 
it affected by the input order if there have more than one same cost candidate 
plans.
   
   In this pr give a way to make it deterministic as much as possible, the main 
change of this pr as follow:
   
   - Sort the items which use to produce candidate plans in descending order 
according to `rowCount` and `sizeInBytes` to ensure the deterministic of input 
items as much as possible.
   
   - Change to use `LinkedHashMap` instead of `HashMap` to make sure that items 
are inserted and iterated in the same order.
   
   - Add a new test case to `StarJoinCostBasedReorderSuite` to verify all input 
permutations have the same optimization result
   
   - Regenerate golden files used by `TPCDSV2_7_PlanStabilityWithStatsSuite`, 
`TPCDSV1_4_PlanStabilityWithStatsSuite` and 
`TPCDSModifiedPlanStabilityWithStatsSuite` affected by this sort behavior
   
   ### Why are the changes needed?
   Let `CostBasedJoinReorder` produce optimization result, independent of input 
order. 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   
   - Add a new test case named `Test 7: SPARK-32687 Verify all input 
permutations have the same optimization result`
   
   - Scala 2.12: Pass the Jenkins or GitHub Action
   
   - Scala 2.13: `mvn test -pl sql/catalyst -Pscala-2.13` Pass
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to