-----Original Message-----
From: Python-ideas <python-ideas-bounces+gadgetsteve=live.co...@python.org> On 
Behalf Of Steven D'Aprano
Sent: 19 February 2019 00:16
To: python-ideas@python.org
Subject: [Python-ideas] Discuss: what should statistics.mode do?

See https://bugs.python.org/issue35892

The issue here is that statistics.mode, as initially designed, raises an 
exception for the case of multiple equal most-frequent data points. This is 
true to the way mode is taught in schools, but it may not be the most useful 
behaviour.

Raymond has suggested:

- keep the status quo;

- change mode() to return "the first tie" instead of raising an
  exception (with or without a deprecation warning for one release);

- add a flag to specify the behaviour.


I'm especially interested in opinions from those who use the function. What 
would be useful for you? How do you use it?
interactively or in scripts?

(When I designed this, I mostly imagined that mode() would be used 
interactively, using the interpreter as a calculator.)

Would changing the behaviour break your code?

Note that this question is seperate from that of whether or not there should be 
a multimode function.

[Steve Barnes] 
Can I suggest that one option other would be to modify the current exception, 
(e.g. "StatisticsError: no unique mode; found 2 equally common values"), to 
include the count and the values either as an additional property of the 
exception, or as a part of the text (probably with some truncation).

Being told that there are n equally common values, either working interactively 
or in a script, is of limited value but being told what they are would be much 
more valuable. Currently once I know that there are multiple "mode" values I 
would have to switch to collections Counter for most_common or some such and 
recalculate - (keeping in mind that I might be processing millions of values). 
Since mode only really makes sense for discrete values such as integers it is 
important information as a) the count might be 1 i.e. all the values are 
unique, b) if there are 2 or 3 modes and they are adjacent the mid-point can be 
used as a sensible approximation or c) if there are 2 or more distinct clusters 
it helps to visualise this without having a plot.

I also think that this would be the change with minimal disruption of the 
status quo as all existing code will still get the exception but with 
(optionally) more useful information.

Steve Barnes
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to