Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19438#discussion_r143000448 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1038,8 +1038,8 @@ def summary(self, *statistics): | mean| 3.5| null| | stddev|2.1213203435596424| null| | min| 2|Alice| - | 25%| 5| null| - | 50%| 5| null| + | 25%| 2| null| --- End diff -- Although this looks like a big change, the test data set has only two data elements, with values 2 and 5, so these are pretty equally valid. It's probably more logical that the 25% percentile is 2 if 75% is 5.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org