[Python-ideas] Re: Fix statistics.median()?

David Mertz Sun, 29 Dec 2019 17:48:40 -0800

Actually, I wouldn't mind passing a key function to _median(), but that is
way too advanced for the beginner users to have to think about.  So maybe
median() could call _median() internally where needed, but the underscore
version could exist also.


On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert <[email protected]> wrote:

> On Dec 29, 2019, at 16:08, David Mertz <[email protected]> wrote:
> >
> > * There is absolutely no need to lose any efficiency by making the
> statistics functions more friendly.  All we need is an optional parameter
> whose spelling I've suggested as `on_nan` (but bikeshed freely).  Under at
> least one value of that parameter, we can keep EXACTLY the current
> implementation, with all its warts and virtues as-is.  Maybe a spelling for
> that option could be 'unsafe' or 'fast'?
>
> This seems like the right way to go to me.
>
> However, rather than coming up with appropriately-general implementations
> of each of these things, wouldn’t taking a key function to pass through to
> sorted be simpler for some? In particular, coming up with a total_order
> function that works for all valid number-like types is difficult; letting
> the user pass key=math.total_order or decimal.Decimal.compare_total or
> partial(decimal.Decimal.compare_total, context=my_context) or whatever is
> appropriate is a lot simpler and a lot more flexible. Anyone who knows
> that’s what they want should know how to pass it.
>
> Plus, finding the median_low or _high, with a key function actually seems
> useful even without NaNs. “Find the median employee by salary” doesn’t seem
> like a meaningless operation.
>
> A key function could also take care of raise, but not ignore or poison,
> and at least ignore seems like it’s needed. So your API still makes sense
> otherwise. (But, while we’re painting the shed, maybe enum values instead
> of bare strings? They could be StrEnum values where FAST.value == 'fast'
> for people who are used to Pandas, I suppose.)
>
> Maybe the is_nan function could also be a parameter, like the key
> function. By default it’s just the method with a fallback to math or cmath
> (or it’s just the method, and float and complex add those methods, or it’s
> a new function that calls a new protocol method, or whatever). That doesn’t
> work for every possible type that might otherwise work with statistics, but
> if you have some other type—or want some other unusual but sensible
> behavior (e.g., you’re the one guy who actually needs to ignore qNaNs but
> raise early on sNaNs), you can write it and pass it. I’m still not
> convinced anyone will ever want anymore other than the method/math/cmath
> version, but if they do, I think they’d know it and be fine with passing it
> in explicitly.
>
> As far as your implementation, I don’t think anything but ignore needs to
> preprocess things. Raise can just pass a key function that raises on NaN to
> sorted. Poison can do the same but handle the exception by returning NaN.
> Who cares that it might take slightly longer to hit the first NaN that way
> than by doing an extra pass, if it’s simpler and slightly faster for the
> non-exceptional case?
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/7XLJCT2SM2VTRXGSOSF5JZ6EWUA7XICY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to