[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

srowen Thu, 05 Oct 2017 10:17:46 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143000448
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1038,8 +1038,8 @@ def summary(self, *statistics):
             |   mean|               3.5| null|
             | stddev|2.1213203435596424| null|
             |    min|                 2|Alice|
    -        |    25%|                 5| null|
    -        |    50%|                 5| null|
    +        |    25%|                 2| null|
    --- End diff --
    
    Although this looks like a big change, the test data set has only two data 
elements, with values 2 and 5, so these are pretty equally valid. It's probably 
more logical that the 25% percentile is 2 if 75% is 5.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Reply via email to