Github user juliuszsompolski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21133#discussion_r184656896
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
    @@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
           checkAnswer(query, expected)
         }
       }
    +
    +  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
    +    failAfter(30 seconds) {
    +      checkAnswer(sql("select approx_percentile(id, array(0.1)) from 
range(10000000)"),
    +        Row(Array(999160)))
    --- End diff --
    
    nit:
    With the approx nature of the algorithm, could the exact answer not get 
flakty through some small changes in code or config? (like e.g. the split of 
range into tasks, and then different merging of partial aggrs producing 
slightly different results)
    maybe just asserting on collect().length == 1 would do?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to