Thanks David for laying a proposal out clearly:

+1 to the whole thing.

-CHB


On Sun, Dec 29, 2019 at 4:06 PM David Mertz <me...@gnosis.cx> wrote:

> Several points:
>
> * NaN as missing-value is widely used outside the Python standard
> library.  One could argue, somewhat reasonably, that Pandas and NumPy and
> PyTorch misinterpret the IEEE-754 intention here, but this is EVERYWHERE in
> numeric/scientific Python.  We could DOCUMENT that None is a better
> placeholder for *missing* but we shouldn't be obnoxious to millions of
> users of stuff outside stdlib.
>
> * sorted() is WAY too low-level to add this logic to, and numeric types
> with NaNs are much too special for the generic sorting.  That said, we DO
> NOT NEED IT.  list.sort() and sorted() and friends already take a key
> parameter.  This lets the appropriate tool—i.e. the statistics module, and
> other things—develop a total_order() key function to match the IEEE
> suggested ordering.  There is absolutely no reason or need to change
> sorted() to accommodate this.
>
> * Yes, obviously I made the subject line about statistics.median(), but
> the xtile() functions have all the same concerns, and live in the same
> module.
>
> * For quiet NaNs, it really is easy to get them innocently.  E.g.:
>
> def my_results(it):
>     for x in it:
>         x_1 = func1_with_asymptotes(x)
>         x_2 = func2_with_asymptotes(x)
>         result = x_1 / x_2
>         yield result
>
> median = statistics.median(my_results(my_iter))
>
> That's perfectly reasonable code that will SOMETIMES wind up with qNaNs in
> the collection of values... but that USUALLY will not.
>
> * There is absolutely no need to lose any efficiency by making the
> statistics functions more friendly.  All we need is an optional parameter
> whose spelling I've suggested as `on_nan` (but bikeshed freely).  Under at
> least one value of that parameter, we can keep EXACTLY the current
> implementation, with all its warts and virtues as-is.  Maybe a spelling for
> that option could be 'unsafe' or 'fast'?
>
> * Another option can be 'ignore' (maybe 'skip', but 'ignore' is more
> Pandas-like) which is simply:
>
> def median(it, on_nan=DEFAULT):
>     if on_nan == 'unsafe':
>         ... do all the current stuff ...
>     elif on_nan == "ignore":
>         return median((x for x in it if not is_nan(x)), on_nan='unsafe')
>     elif on_nan = "ieee_total_order":
>         ... something with sorted(it, key=total_order) ...
>
> Yes, this requires agreeing on the right implementation of is_nan(), with
> several plausible versions proposed in this thread.
>
> * With the 'raise' and 'poison' ('propagate'?) options, the implementation
> would be more like this:
>
> items = []
> for x in it:
>     if is_nan(x):
>         if on_nan == 'raise':
>             raise ValueError('No median exists of collections with NaNs')
>         elif on_nan == 'poison':
>             return float('nan')
>         else:
>             items.append(x)
> return median(items, on_nan='unsafe')
>
>
> I think that's everything, really.  Nothing gets any slower, all use cases
> are accommodated.
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/V5JTTRAXFAQSWCE3LY3JOZITGS5LG3GB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IZOHKMJWWY4ZDTRKYNEKR4RPGDSWW73M/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to