Sorry for all these posts, but maybe someone mentioned this already, but maybe this is a time to consider a new algorithm anyway:
https://rcoh.me/posts/linear-time-median-finding/ And doing the NaN-check inline might be faster than pre-filtering. -CHB On Sun, Dec 29, 2019 at 4:39 PM Christopher Barker <python...@gmail.com> wrote: > On Sun, Dec 29, 2019 at 4:05 PM Christopher Barker <python...@gmail.com> > wrote: > >> >>> You mean performance? Sure, but as I've argued before (no idea if anyone >> agrees with me) the statistics package is already not a high performance >> package anyway. If it turns out that it slows it down by, say, a factor of >> two or more, then yes, maybe we need to forget it. >> > > You never know 'till you profile, so I did a quick experiment -- adding a > NaN filter is substantial overhead: > > This is for a list of 10,000 random floats (no nans in there, but the > check is made by pre-filtering with a generator comprehension) > > # this just calls statistics.median directly > In [14]: %timeit plainmedian(lots_of_floats) > > 1.54 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > > # this filters with math.isnan() > In [15]: %timeit nanmedianfloat(lots_of_floats) > > 3.5 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) > > # this filters with a complex NAN-checker that works with most types and > values: floats, Decimals, numpy scalars, ... > In [16]: %timeit nanmedian(lots_of_floats) > > 13.5 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) > > So the simple math,isnan filter slows it down by a factor of a bit more > than two -- maybe tolerable. and the full featured isnan checker by almost > a factor of ten -- that's pretty bad. > > I suspect if it were inline more, it could be median bit faster, and I'm > sure the nan-checking code could be better optimized, but this is a pretty > big hit. > > Note that numpy has a number of "nan*" functions, for nan-aware versions > that treat NaN as missing values (including nanquantile) -- we could take a > similar route, and have new names or a flag to disable or enable > nan-checking. > > Code enclosed > > - CHB > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NYQUE2EVEZGD6P2WNDPI2FWGYQCOLSK7/ Code of Conduct: http://python.org/psf/codeofconduct/