On Wed, 9 Jan 2019 at 05:20, Steven D'Aprano <st...@pearwood.info> wrote: > > On Mon, Jan 07, 2019 at 11:27:22AM +1100, Steven D'Aprano wrote: > > [...] > > I propose adding a "nan_policy" keyword-only parameter to the relevant > > statistics functions (mean, median, variance etc), and defining the > > following policies: > > > I asked some heavy users of statistics software (not just Python users) > what behaviour they would find useful, and as I feared, I got no > conclusive answer. So far, the answers seem to be almost evenly split > into four camps: > > - don't do anything, it is the caller's responsibility to filter NANs; > > - raise an immediate error; > > - return a NAN; > > - treat them as missing data.
I would prefer to raise an exception in on nan. It's much easier to debug an exception than a nan. Take a look at the Julia docs for their statistics module: https://docs.julialang.org/en/v1/stdlib/Statistics/index.html In julia they have defined an explicit "missing" value. With that you can explicitly distinguish between a calculation error and missing data. The obvious Python equivalent would be None. > On consideration of all the views expressed, thank you to everyone who > commented, I'm now inclined to default to returning a NAN (which happens > to be the current behaviour of mean etc, but not median except by > accident) even if it impacts performance. Whichever way you go with this it might make sense to provide helper functions for users to deal with nans e.g.: xbar = mean(without_nans(data)) xbar = mode(replace_nans_with_None(data)) -- Oscar _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/