Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157331840
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +115,183 @@ object EstimationUtils {
}
}
+ /**
+ * Returns overlapped ranges between two histograms, in the given value
range [newMin, newMax].
+ */
+ def getOverlappedRanges(
+ leftHistogram: Histogram,
+ rightHistogram: Histogram,
+ newMin: Double,
+ newMax: Double): Seq[OverlappedRange] = {
+ val overlappedRanges = new ArrayBuffer[OverlappedRange]()
+ // Only bins whose range intersect [newMin, newMax] have join
possibility.
+ val leftBins = leftHistogram.bins
+ .filter(b => b.lo <= newMax && b.hi >= newMin)
+ val rightBins = rightHistogram.bins
+ .filter(b => b.lo <= newMax && b.hi >= newMin)
+
+ leftBins.foreach { lb =>
+ rightBins.foreach { rb =>
--- End diff --
We only collect `OverlappedRange` when [left part and right part
intersect](https://github.com/apache/spark/pull/19594/files#diff-56eed9f23127c954d9add0f6c5c93820R237),
and the decision is based on some computation, it's not very convenient to use
it as guards. So it seems `yield` form is not very suitable for this case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]