Sorry for all these posts, but maybe someone mentioned this already, but
maybe this is a time to consider a new algorithm anyway:

https://rcoh.me/posts/linear-time-median-finding/

And doing the NaN-check inline might be faster than pre-filtering.

-CHB


On Sun, Dec 29, 2019 at 4:39 PM Christopher Barker <python...@gmail.com>
wrote:

> On Sun, Dec 29, 2019 at 4:05 PM Christopher Barker <python...@gmail.com>
> wrote:
>
>>
>>> You mean performance? Sure, but as I've argued before (no idea if anyone
>> agrees with me) the statistics package is already not a high performance
>> package anyway. If it turns out that it slows it down by, say, a factor of
>> two or more, then yes, maybe we need to forget it.
>>
>
> You never know 'till you profile, so I did a quick experiment -- adding a
> NaN filter is substantial overhead:
>
> This is for a list of 10,000 random floats (no nans in there, but the
> check is made by pre-filtering with a generator comprehension)
>
> # this just calls statistics.median directly
> In [14]: %timeit plainmedian(lots_of_floats)
>
> 1.54 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>
> # this filters with math.isnan()
> In [15]: %timeit nanmedianfloat(lots_of_floats)
>
> 3.5 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> # this filters with a complex NAN-checker that works with most types and
> values: floats, Decimals, numpy scalars, ...
> In [16]: %timeit nanmedian(lots_of_floats)
>
> 13.5 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> So the simple math,isnan filter slows it down by a factor of a bit more
> than two -- maybe tolerable. and the full featured isnan checker by almost
> a factor of ten -- that's pretty bad.
>
> I suspect if it were inline more, it could be median  bit faster, and I'm
> sure the nan-checking code could be better optimized, but this is a pretty
> big hit.
>
> Note that numpy has a number of "nan*" functions, for nan-aware versions
> that treat NaN as missing values (including nanquantile) -- we could take a
> similar route, and have new names or a flag to disable or enable
> nan-checking.
>
> Code enclosed
>
> - CHB
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NYQUE2EVEZGD6P2WNDPI2FWGYQCOLSK7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to