[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Sun, 29 Dec 2019 18:26:51 -0800

On Dec 29, 2019, at 17:30, David Mertz <[email protected]> wrote:
> 
> 
>> On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert <[email protected]> wrote:
>> On Dec 29, 2019, at 16:08, David Mertz <[email protected]> wrote:
>> > 
>> > * There is absolutely no need to lose any efficiency by making the 
>> > statistics functions more friendly.  All we need is an optional parameter 
>> > whose spelling I've suggested as `on_nan` (but bikeshed freely).  Under at 
>> > least one value of that parameter, we can keep EXACTLY the current 
>> > implementation, with all its warts and virtues as-is.  Maybe a spelling 
>> > for that option could be 'unsafe' or 'fast'?
>> 
>> This seems like the right way to go to me.
>> However, rather than coming up with appropriately-general implementations of 
>> each of these things, wouldn’t taking a key function to pass through to 
>> sorted be simpler for some?
> 
> No, it wouldn't.  That cannot deal with the 'raise' or 'poison' behaviors.


I already said the same thing in the very next paragraph: that you’d still want 
your API on top of the key parameter.

> Moreover, coming up with statistics.is_nan() and statistics.total_order() 
> really isn't hard.  Chris Barker already provided a pretty good is_nan() that 
> we could use.  
> 
>> In particular, coming up with a total_order function that works for all 
>> valid number-like types is difficult; letting the user pass 
>> key=math.total_order or decimal.Decimal.compare_total or 
>> partial(decimal.Decimal.compare_total, context=my_context) or whatever is 
>> appropriate is a lot simpler and a lot more flexible. Anyone who knows 
>> that’s what they want should know how to pass it.
> 
> Here it is. I could save a line by not using the 'else'.
> 
> def total_order(x):
>     if is_nan(x):
>         return (math.copysign(1, x))
>     else:
>         return (0, x)

This doesn’t give you IEEE total order. Under what circumstances would you 
prefer this to, say, Decimal.compare_total, which does?

I suppose there may be _some_ users who really need a total order that’s sort 
of like the IEEE one but don’t actually need it to be the same, and also really 
need it to be general enough to work with Decimal or some other type that can 
convert to float (including overflowing properly for copysign to work) and know 
their type can do so, but who don’t know how to write that key function 
themselves. But enough such users that it’s worth catering to them?

>> Plus, finding the median_low or _high, with a key function actually seems 
>> useful even without NaNs. “Find the median employee by salary” doesn’t seem 
>> like a meaningless operation.
> 
> I'll give you that. That could be useful.  However, I feel like that's too 
> much for the module itself.  It's easy to write it yourself with no function.
> 
>     median_employee = statistics.median((e.salary, e) for e in employees)[1]

By the same argument, nothing actually needs a key function because you can 
always decorate-sort-undecorate. And yet, key functions are useful in all kinds 
of places.

Likewise, it’s even easier to write ignore-nan yourself than to write the DSU 
yourself:

    median = statistics.median(x for x in xs if not x.isnan())

… and yet this whole proposal is still useful, isn’t it?

So, why isn’t adding a key parameter (as well as an on_nan that takes 
fast/ignore/raise/poison) to median useful?

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/TYFCFKHUBTKIFLUMFXUZIZ2NDCFPR7JX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to