[issue35775] Add a general selection function to statistics

Rémi Lapeyre Fri, 24 May 2019 08:36:04 -0700

Rémi Lapeyre <[email protected]> added the comment:

Hi Steven, thanks for taking the time to reviewing my patch.


Regarding the relevance of add select(), I was looking for work to do in the 
bug tracker and found some references to it 
(https://bugs.python.org/issue21592#msg219934 for example).

I knew that there is multiples definition of the percentiles but got sloppy in 
my previous response by wanting to answer quickly. I will try not to do this 
again.


Regarding the use of sorting, I thought that sorting would be quicker than 
doing the other linear-time algorithm in Python given the general performance 
of Tim sort, some tests in https://bugs.python.org/issue21592 agreed with that.

For the iterator, I was thinking about how to implement percentiles when 
writing select() and thought that by writing:


def _select(data, i, key=None):
    if not len(data):
        raise StatisticsError("select requires at least one data point")
    if not (1 <= i <= len(data)):
        raise StatisticsError(f"The index looked for must be between 1 and 
{len(data)}")
    data = sorted(data, key=key)
    return islice(data, i-1, None)

def select(data, i, key=None):
    return next(_select(data, y, key=key))


and then doing some variant of:

    it = _select(data, i, key=key)
    left, right = next(it), next(it)
    # compute percentile with left and right

to implement the quantiles without sorting multiple time the list. Now that 
quantiles() has been implement by Raymond Hettinger, this is moot anyway.    

Since its probably not useful, feel free to disregard my PR.

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35775>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35775] Add a general selection function to statistics

Reply via email to