LuciferYang opened a new pull request #29638: URL: https://github.com/apache/spark/pull/29638
### What changes were proposed in this pull request? The optimization result of `CostBasedJoinReorder` is non-deterministic now, it affected by the input order if there have more than one same cost candidate plans. In this pr give a way to make it deterministic as much as possible, the main change of this pr as follow: - Sort the items which use to produce candidate plans in descending order according to `rowCount` and `sizeInBytes` to ensure the deterministic of input items as much as possible. - Change to use `LinkedHashMap` instead of `HashMap` to make sure that items are inserted and iterated in the same order. - Add a new test case to `StarJoinCostBasedReorderSuite` to verify all input permutations have the same optimization result - Regenerate golden files used by `TPCDSV2_7_PlanStabilityWithStatsSuite`, `TPCDSV1_4_PlanStabilityWithStatsSuite` and `TPCDSModifiedPlanStabilityWithStatsSuite` affected by this sort behavior ### Why are the changes needed? Let `CostBasedJoinReorder` produce optimization result, independent of input order. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Add a new test case named `Test 7: SPARK-32687 Verify all input permutations have the same optimization result` - Scala 2.12: Pass the Jenkins or GitHub Action - Scala 2.13: `mvn test -pl sql/catalyst -Pscala-2.13` Pass ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
