Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17286#discussion_r106179640
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
---
@@ -128,38 +128,43 @@ case class CostBasedJoinReorder(conf: CatalystConf)
extends Rule[LogicalPlan] wi
object JoinReorderDP extends PredicateHelper {
def search(
- conf: CatalystConf,
+ conf: SQLConf,
items: Seq[LogicalPlan],
conditions: Set[Expression],
topOutput: AttributeSet): Option[LogicalPlan] = {
// Level i maintains all found plans for i + 1 items.
// Create the initial plans: each plan is a single item with zero cost.
- val itemIndex = items.zipWithIndex
+ val itemIndex = items.zipWithIndex.map(_.swap).toMap
val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
- case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0,
0))
- }.toMap)
+ case (id, item) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0,
0))
+ })
- for (lev <- 1 until items.length) {
+ // Build plans for next levels until the last level has only one plan.
This plan contains
+ // all items that can be joined, so there's no need to continue.
+ while (foundPlans.size < items.length && foundPlans.last.size > 1) {
// Build plans for the next level.
foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
}
- val plansLastLevel = foundPlans(items.length - 1)
- if (plansLastLevel.isEmpty) {
- // Failed to find a plan, fall back to the original plan
- None
- } else {
- // There must be only one plan at the last level, which contains all
items.
- assert(plansLastLevel.size == 1 && plansLastLevel.head._1.size ==
items.length)
- Some(plansLastLevel.head._2.plan)
+ // Find the best plan
+ assert(foundPlans.last.size <= 1)
--- End diff --
hmm you have a good point here. If we have several disconnect item sets,
e.g. {AB} and {CD}, or a more complex case: {ABCD}, {EFG}, {LM}... These cases
need to be dealt with.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]