[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

wzhfy Tue, 19 Dec 2017 01:19:21 -0800

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19594#discussion_r157699245
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
 ---
    @@ -191,8 +191,19 @@ case class JoinEstimation(join: Join) extends Logging {
           val rInterval = ValueInterval(rightKeyStat.min, rightKeyStat.max, 
rightKey.dataType)
           if (ValueInterval.isIntersected(lInterval, rInterval)) {
             val (newMin, newMax) = ValueInterval.intersect(lInterval, 
rInterval, leftKey.dataType)
    -        val (card, joinStat) = computeByNdv(leftKey, rightKey, newMin, 
newMax)
    -        keyStatsAfterJoin += (leftKey -> joinStat, rightKey -> joinStat)
    +        val (card, joinStat) = (leftKeyStat.histogram, 
rightKeyStat.histogram) match {
    +          case (Some(l: Histogram), Some(r: Histogram)) =>
    +            computeByEquiHeightHistogram(leftKey, rightKey, l, r, newMin, 
newMax)
    +          case _ =>
    +            computeByNdv(leftKey, rightKey, newMin, newMax)
    +        }
    +        keyStatsAfterJoin += (
    +          // Histograms are propagated as unchanged. During future 
estimation, they should be
    +          // truncated by the updated max/min. In this way, only pointers 
of the histograms are
    +          // propagated and thus reduce memory consumption.
    +          leftKey -> joinStat.copy(histogram = leftKeyStat.histogram),
    +          rightKey -> joinStat.copy(histogram = rightKeyStat.histogram)
    --- End diff --
    
    I put it here because `computeByEquiHeightHistogram` returns a single 
stats, here we keep the histogram for leftKey and rightKey respectively.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

Reply via email to