If folks want faster processing (checking for, replacing) of NaNs in sequences, a function written in C could be added to the math module. Or the statistics module)
Now that I said that, it might make sense to put such a function in the statistics package, for use their anyway. Personally, I think if you are working with large enough datasets to care, you probably should use numpy anyway. -CHB On Fri, Aug 27, 2021 at 3:39 AM Jeff Allen <ja...@farowl.co.uk> wrote: > On 26/08/2021 19:41, Brendan Barnwell wrote: > > On 2021-08-23 20:53, Steven D'Aprano wrote: > > So I propose that statistics functions gain a keyword only parameter to > specify the desired behaviour when a NAN is found: > > - raise an exception > - return NAN > - ignore it (filter out NANs) > > which seem to be the three most common preference. (It seems to be > split roughly equally between the three.) > > Thoughts? Objections? > > I'd like to suggest that there isn't a single answer that is most natural > for all functions. There may be as few as two. > > Guido's proposal was that mean return nan because the naive arithmetic > formula would return nan. The awkward first example was median(), which is > based on order (comparison). Now Brendan has pointed out: > > One important thing we should think about is whether to add similar > handling to `max` and `min`. These are builtin functions, not in the > statistics module, but they have similarly confusing behavior with NAN: > compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`. > > The real behaviour of max() is to return the first argument that is not > exceeded by any that follow, so: > > >>> max(nan, nan2, 1, 2) is nan > True > >>> max(nan2, nan, 1, 2) is nan2 > True > > As a definition, that is not as easy to understand as "return the largest > argument". The behaviour is because in Python, x>nan is False. This choice, > which is often sensible, makes the set of float values less than totally > ordered. It seems to me to be an error in principle to apply a function > whose simple definition assumes a total ordering, to a set that cannot be > ordered. So most natural to me would be to raise an error for this class of > function. > > Meanwhile, functions that have a purely arithmetic definition most > naturally return nan. Are there any other classes of function than > comparison or arithmetic? Counting, perhaps or is that comparison again? > > Proposals for a general solution, especially if based on a replacement > value, are more a question of how you would like to pre-filter your set. An > API could offer some filters, or it may be clearer left to the caller. It > is no doubt too late to alter the default behaviour of familiar functions, > but there could be a "strict" mode. > > -- > > Jeff Allen > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/FQNZLNISKHV74CYJMU2HPG5273VMWXUK/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CQK5AOT4L5IN4YVVSG7JONAQQCHN6CO/ Code of Conduct: http://python.org/psf/codeofconduct/