[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-08-08 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset 5925b7d555bc36bd43ee8704ae75cc51900cf2d4 by Raymond Hettinger (Miss Islington (bot)) in branch '3.8': bpo-35892: Add usage note to mode() (GH-15122) (GH-15176) https://github.com/python/cpython/commit/5925b7d555bc36bd43ee8704ae75cc51900cf2d4

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-08-08 Thread miss-islington
Change by miss-islington : -- pull_requests: +14907 pull_request: https://github.com/python/cpython/pull/15176 ___ Python tracker ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-08-08 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset e43e7ed36480190083740fd75e2b9cdca72f1a68 by Raymond Hettinger in branch 'master': bpo-35892: Add usage note to mode() (GH-15122) https://github.com/python/cpython/commit/e43e7ed36480190083740fd75e2b9cdca72f1a68 --

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-08-04 Thread Raymond Hettinger
Change by Raymond Hettinger : -- pull_requests: +14862 pull_request: https://github.com/python/cpython/pull/15122 ___ Python tracker ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-12 Thread Raymond Hettinger
Change by Raymond Hettinger : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset fc06a192fdc44225ef1cc879f615a81931ad0a85 by Raymond Hettinger in branch 'master': bpo-35892: Fix mode() and add multimode() (#12089) https://github.com/python/cpython/commit/fc06a192fdc44225ef1cc879f615a81931ad0a85 --

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-11 Thread Steven D'Aprano
Steven D'Aprano added the comment: Looks good to me, I'm happy to accept it. Thank you for your efforts Raymond, can I trouble you to do the merge yourself please, I'm still having issues using the Github website. -- ___ Python tracker

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's a text only link to the patch: https://patch-diff.githubusercontent.com/raw/python/cpython/pull/12089.patch -- ___ Python tracker

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-10 Thread Henry Chen
Change by Henry Chen : -- nosy: +scotchka ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-03-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: Steven, are you okay with applying this PR so we can put this to rest, cleanly and permanently? -- ___ Python tracker ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-28 Thread Raymond Hettinger
Raymond Hettinger added the comment: Attached a draft PR for discussion purposes. Let me know what you think (I'm not wedded to any part of it). -- ___ Python tracker ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-28 Thread Raymond Hettinger
Change by Raymond Hettinger : -- keywords: +patch pull_requests: +12099 stage: -> patch review ___ Python tracker ___ ___

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Are you happy guaranteeing that it will always be the first > mode encountered? Yes. All of the other implementations I looked at make some guarantee about which mode is returned. Maple, Matlab, and Excel all return the first encountered.ยน That is

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-26 Thread Steven D'Aprano
Steven D'Aprano added the comment: > Proposed spec: > ''' > Modify the API statistics.mode to handle multimodal cases so that the > first mode encountered is the one returned. If the input is empty, > raise a StatisticsError. Are you happy guaranteeing that it will always be the first mode

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: > If others agree that it is sufficiently easy, we can assign > the task to Cheryl. It's only easy if we clearly specify what we want to occur. Deciding what the right behavior should be is not a beginner skill. Proposed spec: ''' Modify the API

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-25 Thread Steven D'Aprano
Steven D'Aprano added the comment: What do people think about leaving this as an "Easy First Issue" for the sprint? If others agree that it is sufficiently easy, we can assign the task to Cheryl. It should be fairly easy: mode calls an internal function _counts which is not public and not

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-25 Thread Steven D'Aprano
Steven D'Aprano added the comment: Executive summary: - let's change the behaviour of mode to return a single mode rather than raise an exception if there are multimodes; - and let's do it without a depreciation period. Further comments in no particular order: I agree that in practice the

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-22 Thread Raymond Hettinger
Raymond Hettinger added the comment: > shouldn't be "mode" at some point be replaced by "multimode" ? No. The signature is completely incompatible with the existing mode() function. Like MS Excel which has two functions, MODE.SGNL and MODE.MULT, we should also have two functions, each with

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-22 Thread Francis MB
Francis MB added the comment: Good options itemization! >> This would give us a clean, fast API with no flags: >> mode(Iterable) -> scalar >> multimode(Iterable) -> list [...] >> For any of those options, we should still add a separate multimode() >> function. [..] >> * Add a Deprecation

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-19 Thread Francis MB
Francis MB added the comment: >> [...] This keeps the signature simple (Iterable -> Scalar). [...] >> >> Categorical, binned, or ordinal data: >> >> mode(data: Iterable, *, first_tie=False) -> object >> multimode(data: Iterable) -> List[object] This seems reasonable to me due legacy

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-18 Thread Windson Yang
Windson Yang added the comment: I think right now we can > Change mode() to return the first tie instead of raising an exception. This > is a behavior change but leaves you with the cleanest API going forward. as well as > Add a Deprecation warning to the current behavior of mode() when

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: The attraction to having a first_tie=False flag is that it is backwards compatible with the current API. However on further reflection, people would almost never want the existing behavior of raising an exception rather than returning a useful result.

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Did I miss something? Yes. It doesn't really matter which mode is returned as long as it is deterministically chosen. We're proposing to return the first mode rather than the smallest mode. Scipy returns the smallest mode because that is

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Windson Yang
Windson Yang added the comment: I only tested stats.mode() from scipy, data = 'BBAAC' should also return 'A'. But in your code **return return Counter(seq).most_common(1)[0][0]** will return B. Did I miss something? -- ___ Python tracker

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: I've been thinking about this a bit more. ISTM that for now, it's best to get mode() working for everyday use, returning just the first mode encountered. This keeps the signature simple (Iterable -> Scalar). In the future, if needed, there is room to

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: > We can return the smallest value from the **table** instead > of the code below. Internally, that does too much work and then throws most of it away. The difference between Counter(data).most_common()[1] and Counter(data).most_common(1) is that the

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Windson Yang
Windson Yang added the comment: IMHO, we don't need to add the option. We can return the smallest value from the **table** instead of the code below. if len(table) == 1: return table[0][0] [1] https://github.com/python/cpython/blob/master/Lib/statistics.py#L502 -- nosy:

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Francis MB
Francis MB added the comment: Good point Raymond! Only a minor observation on the packages API: [1] SciPy: scipy.stats.mode(a, axis=0, nan_policy='propagate') "Returns an array of the modal (most common) **value** in the passed array." --> Here it claims to return just ONE value And use

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: I would stick with "first_tie=False". That caters to the common case, avoids API complications, and does something similar to what other stats packages are doing. - API Survey Maple: """This function is only guaranteed to return one potential

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-10 Thread Steven D'Aprano
Steven D'Aprano added the comment: Thanks Raymond for the interesting use-case. The original design of mode() was support only the basic form taught in secondary schools, namely a single unique mode for categorical data or discrete numerical data. I think it is time to consider a richer

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-10 Thread Francis MB
Francis MB added the comment: >> There may be better names for the flag. "tie_goes_to_first_encountered" >> seemed a bit long though ;-) Could it may be an alternative to set the mode tie case in a form like: def mode(seq, *, case=CHOOSE_FIRST): [...] (or TIE_CHOOSE_FIRST, ...) where

[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

2019-02-03 Thread Raymond Hettinger
New submission from Raymond Hettinger : The current code for mode() does a good deal of extra work to support its two error outcomes (empty input and multimodal input). That latter case is informative but doesn't provide any reasonable way to find just one of those modes, where any of the