[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Sun, 29 Dec 2019 17:18:28 -0800

On Dec 29, 2019, at 16:08, David Mertz <[email protected]> wrote:
> 
> * There is absolutely no need to lose any efficiency by making the statistics 
> functions more friendly.  All we need is an optional parameter whose spelling 
> I've suggested as `on_nan` (but bikeshed freely).  Under at least one value 
> of that parameter, we can keep EXACTLY the current implementation, with all 
> its warts and virtues as-is.  Maybe a spelling for that option could be 
> 'unsafe' or 'fast'?


This seems like the right way to go to me.

However, rather than coming up with appropriately-general implementations of 
each of these things, wouldn’t taking a key function to pass through to sorted 
be simpler for some? In particular, coming up with a total_order function that 
works for all valid number-like types is difficult; letting the user pass 
key=math.total_order or decimal.Decimal.compare_total or 
partial(decimal.Decimal.compare_total, context=my_context) or whatever is 
appropriate is a lot simpler and a lot more flexible. Anyone who knows that’s 
what they want should know how to pass it.

Plus, finding the median_low or _high, with a key function actually seems 
useful even without NaNs. “Find the median employee by salary” doesn’t seem 
like a meaningless operation.

A key function could also take care of raise, but not ignore or poison, and at 
least ignore seems like it’s needed. So your API still makes sense otherwise. 
(But, while we’re painting the shed, maybe enum values instead of bare strings? 
They could be StrEnum values where FAST.value == 'fast' for people who are used 
to Pandas, I suppose.)

Maybe the is_nan function could also be a parameter, like the key function. By 
default it’s just the method with a fallback to math or cmath (or it’s just the 
method, and float and complex add those methods, or it’s a new function that 
calls a new protocol method, or whatever). That doesn’t work for every possible 
type that might otherwise work with statistics, but if you have some other 
type—or want some other unusual but sensible behavior (e.g., you’re the one guy 
who actually needs to ignore qNaNs but raise early on sNaNs), you can write it 
and pass it. I’m still not convinced anyone will ever want anymore other than 
the method/math/cmath version, but if they do, I think they’d know it and be 
fine with passing it in explicitly.

As far as your implementation, I don’t think anything but ignore needs to 
preprocess things. Raise can just pass a key function that raises on NaN to 
sorted. Poison can do the same but handle the exception by returning NaN. Who 
cares that it might take slightly longer to hit the first NaN that way than by 
doing an extra pass, if it’s simpler and slightly faster for the 
non-exceptional case?
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/M4R5P6DWWUITSUWO5CNYTIBNTYRN6LX6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to