[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2019-01-18 Thread Rémi Lapeyre

Rémi Lapeyre  added the comment:

I suggest we closed this issue in favor of #35775 to discuss adding a selection 
function and the attached PR.

--
nosy: +remi.lapeyre

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2018-11-04 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

This issue (as originally proposed) should be closed.  A key function for 
median() and mode() likely isn't a good idea.  Those two functions should be 
kept parallel with mean() as returning simple descriptive statistics.  

Work towards a select() function with a key function can be pursued in a 
separate tracker item.  That would suffice to locate a specific record 
occurring at a median (or quartile or decile).  FWIW, that is how MS Excel 
approaches the problem as well (using RANK with INDEX to locate a record by its 
sort position, leaving AVERAGE, MODE.SNGL, and MEDIAN for straight descriptive 
statistics).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2018-10-17 Thread Michal Nowikowski


Michal Nowikowski  added the comment:

What is the progress of this issue?
I'm also interested in this feature.
I expected that these functions will behave as built-in min and max.
They have key argument, see here: 
https://docs.python.org/3/library/functions.html#max

--
nosy: +godfryd

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-31 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> I don't think it makes sense to add key arguments to mode, mean,
> variance etc. I'm having trouble thinking of what that would
> even mean

I concur.  This proposal bends the concept of a key-function to where it is no 
longer obvious what it does.

> I've given this some more thought, and I think that a "key"
> argument would make sense for a general selection function.

Yes, that would make sense:

select(A, k, key=somefunc) == sorted(A, key=somefunc)[k]

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-24 Thread Steven D'Aprano

Steven D'Aprano added the comment:

I've given this some more thought, and I think that a "key" argument 
would make sense for a general selection function.

The general selection problem is: given a set of items A, and a number k 
between 1 and the number of items, return the k-th item. In Python 
terms, we would use a list, and 0 <= k < len(A) instead.

https://www.cs.rochester.edu/~gildea/csc282/slides/C09-median.pdf

I've had the idea of adding a select(A, k) function to statistics for a 
while now. Then the median_low would be equivalent to select(A, 
len(A)//2) and median_high would be select(A, len(A)//2 + 1). I'd leave 
the median_* functions as they are, and possibly include a key function 
in select.

I don't think it makes sense to add key arguments to mode, mean, 
variance etc. I'm having trouble thinking of what that would even mean 
(no pun intented): it's unlikely that the mean will actually a data 
value (except by accident, or by careful construction of the data). 
Variance has the wrong units (it is the units of your data, squared) and 
the stdev is conceptually a difference between data values, not a data 
value itself, so it doesn't even make sense to apply a key function and 
return one of the data points.

And mode counts objects, so it already applies to non-numeric data. 
It's even documented as applying to nominal data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-24 Thread gerion

gerion added the comment:

The position might be useful, if you have a second list with some side data 
stored in it, and not a list of tuples :).

I had the idea to file a bug, when I had a list of coordinates and wanted to 
use the point with the median of the x-coordinates as "representation" for the 
dataset. With max() and min() in mind, I used median_low() with key argument 
and get the error that key is not a valid argument (my solution was to use 
dicts then).

So I thought this would be a similar use case as max() and min() and in fact 
more consistent. But I fully understand your concerns, that this breaks 
consistence with the other statistic functions.

This is not a killer feature, but in my opinion nice to have, because it 
changes nothing on the default (expected) behaviour, but provides with less 
code very high flexibility.

I cannot say something about other languages.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-24 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Thanks for explaining your use-case.

Although the median_* functions don't perform arithmetic on their data, 
they are still conceptually mathematical functions that operate on 
numbers and I'm reluctant to support arbitrary objects with a key 
function without a solid reason. In your example, I think there are 
existing ways to get the result you want:

(1) Use a dict:

data = dict([(1, ['Anna']), (3, ['Paul', 'Henry']), (4, ['Kate'])])
people = data[median_low(data)]

(2) Use a custom numeric type with associated data:

class MyNum(int):
def __new__(cls, num, data):
instance = super().__new__(cls, num)
instance.data = data
return instance

data = [MyNum(1, ['Anna']), MyNum(3, ['Paul', 'Henry']), 
MyNum(4, ['Kate'])]

people = median_low(data).data

As for your second example, do you have a use-case for wanting to know 
the position of the median in the original, unsorted list? When would 
that be useful?

One other reason for my reluctance: although median_low and median_high 
guarantee to only return an actual data point, that's a fairly special 
case. There are other order statistics (such as quartiles, quantiles, 
etc) which are conceptually related to median but don't necessarily 
return a data value. Indeed, the regular median() function doesn't 
always do so. I would be reluctant for median() and median_low() to have 
different signatures without an excellent reason.

I'm not completely ruling this out. One thing which might sway me is if 
there are other languages or statistics libraries which offer this 
feature. (I say *might*, not that it definitely will.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-24 Thread Mark Dickinson

Changes by Mark Dickinson :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-23 Thread gerion

gerion added the comment:

My use case is some side data somehow connected to the statistical relevant 
data.
(I think, this is more less a similar use case as with the min and max 
function.)

A few examples:

The datastructure is a list of tuples: (score, [list of people that have this 
score])
```
median = median_low([(1, ['Anna']), (3, ['Paul', 'Henry']), (4, ['Kate'])], 
key=lambda elem: elem[0])
for name in median[1]:
print(f"{name} is one of the people that reach the median score.")
```
or you can enumerate:
```
data = [1, 3, 4]
median = median_low(enumerate(data), key=lambda elem: elem[1])
print(f"median is at position {median[0]}")
```
With the keyword argument, the input can also be a list of self defined 
objects, where the median make sense on some member variable or function, etc.:
```
>>> median_low(list_of_self_defined_objects, key=lambda elem: elem.get_score())
```

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-23 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Apart from being "cool", what is the purpose of this key argument?

For the example shown, where you extract an item from tuple data:

>>> median_low([(1, 2), (3, 3), (4, 1)], key=lambda elem: elem[0])
(3, 3)

I'm not sure I understand when you would use this, and why you would describe 
(3,3) as a median (a kind of average) of the given data.


By the way, although it's not (yet?) officially supported, it turns out that 
this works:

py> median_low([(1, 2), (3, 3), (4, 1)])
(3, 3)

Officially, median requires numeric data. If the median* functions were to 
support tuples, I would be inclined to return a new tuple with the median of 
each column, as such:

median_low([(1, 2), (3, 3), (4, 1)])
(3, 2)  # median of 1,3,4 and median of 2,3,1


I can think of uses for that, e.g. calculating the "Q" correlation coefficient. 
What uses do you have for your suggested key argument?

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30999] statistics module: add "key" keyword argument to median, mode, ...

2017-07-23 Thread gerion

New submission from gerion:

With Python 3.4 the statistics module was added. It would be cool, if the 
functions:
median_low()
median_high()
mode()
would have a "key" keyword argument, just like in max() and min():
```
>>> median_low([(1, 2), (3, 3), (4, 1)], key=lambda elem: elem[0])
(3, 3)
```
This functions always choose a specific element of the list, so a "key" 
argument is meaningful.


Maybe such a parameter makes sense for mean() as well, if the return value 
always is the result itself, but this is another point:
```
>>> mean([(1, 2), (3, 3), (4, 1)], key=lambda elem: elem[0])
2.6665
```

--
components: Library (Lib)
messages: 298918
nosy: gerion
priority: normal
severity: normal
status: open
title: statistics module: add "key" keyword argument to median, mode, ...
type: enhancement
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com