On 12/26/19 10:31 AM, David Mertz wrote:
This came up in discussion here before, maybe a year ago, I think.  There was a decision not to change the implementation, but that seemed like a mistake (and the discussion was about broader things).

Anyway, I propose that the obviously broken version of `statistics.median()` be replaced with a better implementation.

Python 3.8.0 (default, Nov  6 2019, 21:49:08)
>>> import numpy as np
>>> import pandas as pd
>>> import statistics
>>> nan = float('nan')
>>> items1 = [nan, 1, 2, 3, 4]
>>> items2 = [1, 2, 3, 4, nan]
>>> statistics.median(items1)
2
>>> statistics.median(items2)
3
>>> np.median(items1)
nan
>>> np.median(items2)
nan
>>> pd.Series(items1).median()
2.5
>>> pd.Series(items2).median()
2.5

The NumPy and Pandas answers are both "reasonable" under slightly different philosophies of how to handle bad values. I think raising an exception for NaNs would also be reasonable enough.

The one thing that is NOT reasonable is returning different answers for median depending on the order of the elements.
Getting garbage answers for garbage input isn't THAT unreasonable. Perhaps it could be argued that detecting common garbage input and rejecting it (perhaps with an exception) would make more sense.


Note that the statistics module documentation implies the issue, as median implies that it requires the sequence to be orderable, and nan isn't orderable. Since the statistics module seems to be designed to handle types other than floats, detecting nan values is extra expensive, so I think it can be excused for not checking.

--
Richard Damon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OPPWCFJ7UDHXL5WEXXADXMRQFTJHEFPX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to