Thanks David for laying a proposal out clearly: +1 to the whole thing.
-CHB On Sun, Dec 29, 2019 at 4:06 PM David Mertz <me...@gnosis.cx> wrote: > Several points: > > * NaN as missing-value is widely used outside the Python standard > library. One could argue, somewhat reasonably, that Pandas and NumPy and > PyTorch misinterpret the IEEE-754 intention here, but this is EVERYWHERE in > numeric/scientific Python. We could DOCUMENT that None is a better > placeholder for *missing* but we shouldn't be obnoxious to millions of > users of stuff outside stdlib. > > * sorted() is WAY too low-level to add this logic to, and numeric types > with NaNs are much too special for the generic sorting. That said, we DO > NOT NEED IT. list.sort() and sorted() and friends already take a key > parameter. This lets the appropriate tool—i.e. the statistics module, and > other things—develop a total_order() key function to match the IEEE > suggested ordering. There is absolutely no reason or need to change > sorted() to accommodate this. > > * Yes, obviously I made the subject line about statistics.median(), but > the xtile() functions have all the same concerns, and live in the same > module. > > * For quiet NaNs, it really is easy to get them innocently. E.g.: > > def my_results(it): > for x in it: > x_1 = func1_with_asymptotes(x) > x_2 = func2_with_asymptotes(x) > result = x_1 / x_2 > yield result > > median = statistics.median(my_results(my_iter)) > > That's perfectly reasonable code that will SOMETIMES wind up with qNaNs in > the collection of values... but that USUALLY will not. > > * There is absolutely no need to lose any efficiency by making the > statistics functions more friendly. All we need is an optional parameter > whose spelling I've suggested as `on_nan` (but bikeshed freely). Under at > least one value of that parameter, we can keep EXACTLY the current > implementation, with all its warts and virtues as-is. Maybe a spelling for > that option could be 'unsafe' or 'fast'? > > * Another option can be 'ignore' (maybe 'skip', but 'ignore' is more > Pandas-like) which is simply: > > def median(it, on_nan=DEFAULT): > if on_nan == 'unsafe': > ... do all the current stuff ... > elif on_nan == "ignore": > return median((x for x in it if not is_nan(x)), on_nan='unsafe') > elif on_nan = "ieee_total_order": > ... something with sorted(it, key=total_order) ... > > Yes, this requires agreeing on the right implementation of is_nan(), with > several plausible versions proposed in this thread. > > * With the 'raise' and 'poison' ('propagate'?) options, the implementation > would be more like this: > > items = [] > for x in it: > if is_nan(x): > if on_nan == 'raise': > raise ValueError('No median exists of collections with NaNs') > elif on_nan == 'poison': > return float('nan') > else: > items.append(x) > return median(items, on_nan='unsafe') > > > I think that's everything, really. Nothing gets any slower, all use cases > are accommodated. > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/V5JTTRAXFAQSWCE3LY3JOZITGS5LG3GB/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IZOHKMJWWY4ZDTRKYNEKR4RPGDSWW73M/ Code of Conduct: http://python.org/psf/codeofconduct/