> On Apr 15, 2018, at 5:44 PM, Peter Norvig <pe...@norvig.com> wrote:
> If you think of a Counter as a multiset, then it should support __or__, not 
> __add__, right?

FWIW, Counter is explicitly documented to support the four multiset-style 
mathematical operations discussed in Knuth TAOCP Volume II section 4.6.3 
exercise 19:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d                       # saturating subtraction (keeping only positive 
>>> counts)
Counter({'a': 2})
>>> c & d                       # intersection:  min(c[x], d[x]) 
Counter({'a': 1, 'b': 1})
>>> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

The wikipedia article on Multisets lists a further operation, inclusion, that 
is not currently supported:  

> I do think it would have been fine if Counter did not support "+" at all 
> (and/or if Counter was limited to integer values). But  given where we are 
> now, it feels like we should preserve `c + c == 2 * c`. 

The + operation has legitimate use cases (it is perfectly reasonable to want to 
combine the results two separate counts).  And, as you pointed out, it is what 
we already have and cannot change :-)

So, the API design issue that confronts us is that it would be a bit weird and 
disorienting for the arithmetic operators to have two different signatures:

    <counter> += <counter>
    <counter> -= <counter>
    <counter> *= <scalar>
    <counter> /= <scalar>

Also, we should respect the comments given by others on the tracker issue.  In 
particular, there is a preference to not have an in-place operation and only 
allow a new counter instance to be created.  That will help people avoid data 
structure modality problems:
    c[category] += 1       # Makes sense during the frequency counting or 
accumulation phase
    c /= c.total           # Covert to a probability mass function
    c[category] += 1       # This code looks correct but no longer makes any 

> As to the "doesn't really add any new capabilities" argument, that's true, 
> but it is also true for Counter as a whole: it doesn't add much over 
> defaultdict(int), but it is certainly convenient to have a standard way to do 
> what it does.

IIRC, the defaultdict(int) in your first version triggered a bug because the 
model inadvertently changed during the analysis phase rather than being frozen 
after the training phase.  The Counter doesn't suffer from the same issue 
(modifying the dict on a failed lookup).  Also, the Counter class does have a 
few value added features:  Counter(iterable), c.most_common(), c.elements(), 
etc.   But yes, at its heart the counter is mostly just a specialized 
dictionary.  The thought I was trying to express is that suggestions to build 
out Counter API are a little less compelling when we already have a way to do 
it that is flexible, fast, clear, and standard (i.e. dict comprehensions).

> I agree with your intuition that low level is better. `total` would be 
> useful. If you have total and mul, then as you and others have pointed out, 
> normalize is just c *= 1/c.total.

I fully support adding some functionality for scaling to support probability 
distributions, bayesian update steps, chi-square tests, and whatnot.  The 
people who need convincing are the other respondents on the tracker.  They had 
a strong mental model for the Counter class that is somewhat at odds with this 


Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to