If you think of a Counter as a multiset, then it should support __or__, not __add__, right?
I do think it would have been fine if Counter did not support "+" at all (and/or if Counter was limited to integer values). But given where we are now, it feels like we should preserve `c + c == 2 * c`. As to the "doesn't really add any new capabilities" argument, that's true, but it is also true for Counter as a whole: it doesn't add much over defaultdict(int), but it is certainly convenient to have a standard way to do what it does. I agree with your intuition that low level is better. `total` would be useful. If you have total and mul, then as you and others have pointed out, normalize is just c *= 1/c.total. I can also see the argument for a new FrequencyTable class in the statistics module. (By the way, I refactored my https://github.com/norvig/pytudes/blob/master/ipynb/Probability.ipynb a bit, and now I no longer need a `normalize` function.) On Sun, Apr 15, 2018 at 5:06 PM Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > > > > On Apr 15, 2018, at 2:05 PM, Peter Norvig <pe...@norvig.com> wrote: > > > > For most types that implement __add__, `x + x` is equal to `2 * x`. > > > > ... > > > > > > That is true for all numbers, list, tuple, str, timedelta, etc. -- but > not for collections.Counter. I can add two Counters, but I can't multiply > one by a scalar. That seems like an oversight. > > If you view the Counter as a sparse associative array of numeric values, > it does seem like an oversight. If you view the Counter as a Multiset or > Bag, it doesn't make sense at all ;-) > > From an implementation point of view, Counter is just a kind of dict that > has a __missing__() method that returns zero. That makes it trivially easy > to subclass Counter to add new functionality or just use dictionary > comprehensions for bulk updates. > > > > > > > It would be worthwhile to implement multiplication because, among other > reasons, Counters are a nice representation for discrete probability > distributions, for which multiplication is an even more fundamental > operation than addition. > > There is an open issue on this topic. See: > https://bugs.python.org/issue25478 > > One stumbling point is that a number of commenters are fiercely opposed to > non-integer uses of Counter. Also, some of the use cases (such as those > found in Allen Downey's "Think Stats" and "Think Bayes" books) also need > division and rescaling to a total (i.e. normalizing the total to 1.0) for a > probability mass function. > > If the idea were to go forward, it still isn't clear whether the correct > API should be low level (__mul__ and __div__ and a "total" property) or > higher level (such as a normalize() or rescale() method that produces a new > Counter instance). The low level approach has the advantage that it is > simple to understand and that it feels like a logical extension of the > __add__ and __sub__ methods. The downside is that doesn't really add any > new capabilities (being just short-cuts for a simple dict comprehension or > call to c.values()). And, it starts to feature creep the Counter class > further away from its core mission of counting and ventures into the realm > of generic sparse arrays with numeric values. There is also a > learnability/intelligibility issue in __add__ and __sub__ correspond to > "elementwise" operations while __mul__ and __div__ would be "scalar > broadcast" operations. > > Peter, I'm really glad you chimed in. My advocacy lacked sufficient > weight to move this idea forward. > > > Raymond > > > >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/