Hi all, I have been using the Counter class recently and came across several things that I was hoping to get feedback on. (This is my first time mailing this list, so any advice is greatly appreciated)
1) Addition of a Counter.least_common method: This would add a method to Counter that is basically the opposite of the pre-existing Counter.most_common method. In this case, the least common elements are considered the elements in c with the lowest (non-zero) frequency. This was addressed in https://bugs.python.org/issue16994, but it was never resolved and is still open (since Jan. 2013). This is a small change, but I think that it is useful to include in the stdlib. I have written a patch for this, but have not submitted a PR yet. It can be found at https://github.com/mcognetta/cpython/tree/collections_counter_least_common 2) Undefined behavior when using Counter.most_common: Consider the case c = Counter([1, 1, 2, 2, 3, 3, 'a', 'a', 'b', 'b', 'c', 'c']), when calling c.most_common(3), there are more than 3 "most common" elements in c and c.most_common(3) will not always return the same list, since there is no defined total order on the elements in c. Should this be mentioned in the documentation? Additionally, perhaps there is room for a method that produces all of the elements with the n highest frequencies in order of their frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2, 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2, 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2. 3) Addition of a collections.Frequency or collections.Proportion class derived from collections.Counter: This is sort of discussed in https://bugs.python.org/issue25478. The idea behind this would be a dictionary that, instead of returning the integer frequency of an element, would return it's proportional representation in the iterable. So, for example f = Frequency('aabbcc'), f would hold Frequency({'a': 0.3333333333333333, 'b': 0.3333333333333333, 'c': 0.3333333333333333}). To address >The pitfall I imagine here is that if you continue adding elements after >normalize() is called, the >results will be nonsensical. from the issue, this would not be a problem because we could just build it entirely on top of a Counter, keep a count of the total number of elements in the Counter, and just divide by that every time we output or return the object or any of its elements. I think that this would be a pretty useful addition especially for code related to discrete probability distributions (which is what motivated this in the first place). Thanks in advance, -Marco _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/