Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19438#discussion_r143000448
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1038,8 +1038,8 @@ def summary(self, *statistics):
| mean| 3.5| null|
| stddev|2.1213203435596424| null|
| min| 2|Alice|
- | 25%| 5| null|
- | 50%| 5| null|
+ | 25%| 2| null|
--- End diff --
Although this looks like a big change, the test data set has only two data
elements, with values 2 and 5, so these are pretty equally valid. It's probably
more logical that the 25% percentile is 2 if 75% is 5.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]