[Python-ideas] Re: NAN handling in statistics functions

Christopher Barker Fri, 27 Aug 2021 10:28:14 -0700

If folks want faster processing (checking for, replacing) of NaNs in
sequences, a function written in C could be added to the math module. Or
the statistics module)


Now that I said that, it might make sense to put such a function in the
statistics package, for use their anyway.

Personally, I think if you are working with large enough datasets to care,
you probably should use numpy anyway.

-CHB

On Fri, Aug 27, 2021 at 3:39 AM Jeff Allen <ja...@farowl.co.uk> wrote:

> On 26/08/2021 19:41, Brendan Barnwell wrote:
>
> On 2021-08-23 20:53, Steven D'Aprano wrote:
>
> So I propose that statistics functions gain a keyword only parameter to
> specify the desired behaviour when a NAN is found:
>
> - raise an exception
> - return NAN
> - ignore it (filter out NANs)
>
> which seem to be the three most common preference. (It seems to be
> split roughly equally between the three.)
>
> Thoughts? Objections?
>
> I'd like to suggest that there isn't a single answer that is most natural
> for all functions. There may be as few as two.
>
> Guido's proposal was that mean return nan because the naive arithmetic
> formula would return nan. The awkward first example was median(), which is
> based on order (comparison). Now Brendan has pointed out:
>
>     One important thing we should think about is whether to add similar
> handling to `max` and `min`.  These are builtin functions, not in the
> statistics module, but they have similarly confusing behavior with NAN:
> compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`.
>
> The real behaviour of max() is to return the first argument that is not
> exceeded by any that follow, so:
>
> >>> max(nan, nan2, 1, 2) is nan
> True
> >>> max(nan2, nan, 1, 2) is nan2
> True
>
> As a definition, that is not as easy to understand as "return the largest
> argument". The behaviour is because in Python, x>nan is False. This choice,
> which is often sensible, makes the set of float values less than totally
> ordered. It seems to me to be an error in principle to apply a function
> whose simple definition assumes a total ordering, to a set that cannot be
> ordered. So most natural to me would be to raise an error for this class of
> function.
>
> Meanwhile, functions that have a purely arithmetic definition most
> naturally return nan. Are there any other classes of function than
> comparison or arithmetic? Counting, perhaps or is that comparison again?
>
> Proposals for a general solution, especially if based on a replacement
> value, are more a question of how you would like to pre-filter your set. An
> API could offer some filters, or it may be clearer left to the caller. It
> is no doubt too late to alter the default behaviour of familiar functions,
> but there could be a "strict" mode.
>
> --
>
> Jeff Allen
>
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/FQNZLNISKHV74CYJMU2HPG5273VMWXUK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7CQK5AOT4L5IN4YVVSG7JONAQQCHN6CO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: NAN handling in statistics functions

Reply via email to