Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143684083
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala
 ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    @wzhfy that formula is asymmetric which feels wrong; it may happen to fix 
this but maybe would fail another future case. It would be a little more 
principled to round the average.
    
    Yeah I know that [1,2,2,2,2,2,2,2,3] can't happen in this test, just 
illustrating a general point.
    
    Hm, what's the case where the quantile is between 39 and 40? the input is 
0-99 in that case? I don't see a test for the 40% quantile so wondering if we 
really do have a problem or are misunderstanding the failure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to