[ https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro resolved SPARK-33935. -------------------------------------- Fix Version/s: 3.2.0 3.1.0 Assignee: Tanel Kiis Resolution: Fixed Resolved by https://github.com/apache/spark/pull/30965 > Fix CBOs cost function > ----------------------- > > Key: SPARK-33935 > URL: https://issues.apache.org/jira/browse/SPARK-33935 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Tanel Kiis > Assignee: Tanel Kiis > Priority: Major > Fix For: 3.1.0, 3.2.0 > > > The parameter spark.sql.cbo.joinReorder.card.weight is decumented as: > {code:title=spark.sql.cbo.joinReorder.card.weight} > The weight of cardinality (number of rows) for plan cost comparison in join > reorder: rows * weight + size * (1 - weight). > {code} > But in the implementation the formula is a bit different: > {code:title=Current implementation} > def betterThan(other: JoinPlan, conf: SQLConf): Boolean = { > if (other.planCost.card == 0 || other.planCost.size == 0) { > false > } else { > val relativeRows = BigDecimal(this.planCost.card) / > BigDecimal(other.planCost.card) > val relativeSize = BigDecimal(this.planCost.size) / > BigDecimal(other.planCost.size) > relativeRows * conf.joinReorderCardWeight + > relativeSize * (1 - conf.joinReorderCardWeight) < 1 > } > } > {code} > This change has an unfortunate consequence: > given two plans A and B, both A betterThan B and B betterThan A might give > the same results. This happes when one has many rows with small sizes and > other has few rows with large sizes. > A example values, that have this fenomen with the default weight value (0.7): > A.card = 500, B.card = 300 > A.size = 30, B.size = 80 > Both A betterThan B and B betterThan A would have score above 1 and would > return false. > A new implementation is proposed, that matches the documentation: > {code:title=Proposed implementation} > def betterThan(other: JoinPlan, conf: SQLConf): Boolean = { > val oldCost = BigDecimal(this.planCost.card) * > conf.joinReorderCardWeight + > BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight) > val newCost = BigDecimal(other.planCost.card) * > conf.joinReorderCardWeight + > BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight) > newCost < oldCost > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org