Thanks for the reply. I will add it to the documentation and will work
on a use case and recipe for the n highest frequency problem.

As for

>I'm not sure if this would be better as an attribute kept directly or as a 
>property that called `sum(self.values())` when accessed.  I believe that 
>having `mycounter.total` would provide the right normalization in a clean API, 
>and also expose easy access to other questions one would naturally ask (e.g. 
>"How many observations were made?")

I am not quite sure what you mean, especially the observations part.
For the attribute part, do you mean we would just have a hidden class
variable like num_values that was incremented or decremented whenever
something is added or removed, so we have O(1) size queries instead of
O(n) (where n is the number of keys)? Then there could be a method
like 'normalize' that printed all elements with their frequencies
divided by the total count?

On Wed, Mar 15, 2017 at 12:52 AM, David Mertz <me...@gnosis.cx> wrote:
> On Tue, Mar 14, 2017 at 2:38 AM, Marco Cognetta <cognetta.ma...@gmail.com>
> wrote:
>>
>> 1) Addition of a Counter.least_common method:
>> This was addressed in https://bugs.python.org/issue16994, but it was
>> never resolved and is still open (since Jan. 2013). This is a small
>> change, but I think that it is useful to include in the stdlib.
>
>
> -1 on adding this.  I read the issue, and do not find a convincing use case
> that is common enough to merit a new method.  As some people noted in the
> issue, the "least common" is really the infinitely many keys not in the
> collection at all.
>
> But I can imagine an occasional need to, e.g. "find outliers."  However,
> that is not hard to spell as `mycounter.most_common()[-1*N:]`.  Or if your
> program does this often, write a utility function `find_outliers(...)`
>
>> 2) Undefined behavior when using Counter.most_common:
>> 'c', 'c']), when calling c.most_common(3), there are more than 3 "most
>> common" elements in c and c.most_common(3) will not always return the
>> same list, since there is no defined total order on the elements in c.
>>
>> Should this be mentioned in the documentation?
>
>
> +1. I'd definitely support adding this point to the documentation.
>
>>
>> Additionally, perhaps there is room for a method that produces all of
>> the elements with the n highest frequencies in order of their
>> frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2,
>> 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2,
>> 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2.
>
>
> -0 on this.  I can see wanting this, but I'm not sure often enough to add to
> the promise of the class.  The utility function to do this would be somewhat
> less trivial to write than `find_outliers(..)` but not enormously hard.  I
> think I'd be +0 on adding a recipe to the documentation for a utility
> function.
>
>>
>> 3) Addition of a collections.Frequency or collections.Proportion class
>> derived from collections.Counter:
>>
>> This is sort of discussed in https://bugs.python.org/issue25478.
>> The idea behind this would be a dictionary that, instead of returning
>> the integer frequency of an element, would return it's proportional
>> representation in the iterable.
>
>
> One could write a subclass easily enough.  The essential feature in my mind
> would be to keep an attributed Counter.total around to perform the
> normalization.  I'm +1 on adding that to collections.Counter itself.
>
> I'm not sure if this would be better as an attribute kept directly or as a
> property that called `sum(self.values())` when accessed.  I believe that
> having `mycounter.total` would provide the right normalization in a clean
> API, and also expose easy access to other questions one would naturally ask
> (e.g. "How many observations were made?")
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to