[... apologies if this is dup, got a bounce ...] > [David Mertz <me...@gnosis.cx>] >> I have to say though that the existing behavior of `statistics.median[_low|_high|]` >> is SURPRISING if not outright wrong. It is the behavior in existing Python, >> but it is very strange. >> >> The implementation simply does whatever `sorted()` does, which is an >> implementation detail. In particular, NaN's being neither less than nor >> greater than any floating point number, just stay where they are during >> sorting. > > I expect you inferred that from staring at a handful of examples, but > it's illusion. Python's sort uses only __lt__ comparisons, and if > those don't implement a total ordering then _nothing_ is defined about > sort's result (beyond that it's some permutation of the original > list).
Thanks Tim for clarifying. Is it even the case that sorts are STABLE in the face of non-total orderings under __lt__? A couple quick examples don't refute that, but what I tried was not very thorough, nor did I think much about TimSort itself. > So, certainly, if you want median to be predictable in the presence of > NaNs, sort's behavior in the presence of NaNs can't be relied on in > any respect. Playing with Tim's examples, this suggests that statistics.median() is simply outright WRONG. I can think of absolutely no way to characterize these as reasonable results: Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 09:50:42) In [4]: statistics.median([9, 9, 9, nan, 1, 2, 3, 4, 5]) Out[4]: 1 In [5]: statistics.median([9, 9, 9, nan, 1, 2, 3, 4]) Out[5]: nan
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/