Actually, I wouldn't mind passing a key function to _median(), but that is way too advanced for the beginner users to have to think about. So maybe median() could call _median() internally where needed, but the underscore version could exist also.
On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert <abarn...@yahoo.com> wrote: > On Dec 29, 2019, at 16:08, David Mertz <me...@gnosis.cx> wrote: > > > > * There is absolutely no need to lose any efficiency by making the > statistics functions more friendly. All we need is an optional parameter > whose spelling I've suggested as `on_nan` (but bikeshed freely). Under at > least one value of that parameter, we can keep EXACTLY the current > implementation, with all its warts and virtues as-is. Maybe a spelling for > that option could be 'unsafe' or 'fast'? > > This seems like the right way to go to me. > > However, rather than coming up with appropriately-general implementations > of each of these things, wouldn’t taking a key function to pass through to > sorted be simpler for some? In particular, coming up with a total_order > function that works for all valid number-like types is difficult; letting > the user pass key=math.total_order or decimal.Decimal.compare_total or > partial(decimal.Decimal.compare_total, context=my_context) or whatever is > appropriate is a lot simpler and a lot more flexible. Anyone who knows > that’s what they want should know how to pass it. > > Plus, finding the median_low or _high, with a key function actually seems > useful even without NaNs. “Find the median employee by salary” doesn’t seem > like a meaningless operation. > > A key function could also take care of raise, but not ignore or poison, > and at least ignore seems like it’s needed. So your API still makes sense > otherwise. (But, while we’re painting the shed, maybe enum values instead > of bare strings? They could be StrEnum values where FAST.value == 'fast' > for people who are used to Pandas, I suppose.) > > Maybe the is_nan function could also be a parameter, like the key > function. By default it’s just the method with a fallback to math or cmath > (or it’s just the method, and float and complex add those methods, or it’s > a new function that calls a new protocol method, or whatever). That doesn’t > work for every possible type that might otherwise work with statistics, but > if you have some other type—or want some other unusual but sensible > behavior (e.g., you’re the one guy who actually needs to ignore qNaNs but > raise early on sNaNs), you can write it and pass it. I’m still not > convinced anyone will ever want anymore other than the method/math/cmath > version, but if they do, I think they’d know it and be fine with passing it > in explicitly. > > As far as your implementation, I don’t think anything but ignore needs to > preprocess things. Raise can just pass a key function that raises on NaN to > sorted. Poison can do the same but handle the exception by returning NaN. > Who cares that it might take slightly longer to hit the first NaN that way > than by doing an extra pass, if it’s simpler and slightly faster for the > non-exceptional case? > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7XLJCT2SM2VTRXGSOSF5JZ6EWUA7XICY/ Code of Conduct: http://python.org/psf/codeofconduct/