tf.bincount() returns a vector with integer counts. https://www.tensorflow.org/api_docs/python/tf/bincount
Keras calls np.bincount in an mnist example. np.bincount returns an array with a __mul__ https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.bincount.html - sklearn.preprocessing.normalize http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-normalization http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html featuretools.primitives.NUnique has a normalize method. https://docs.featuretools.com/generated/featuretools.primitives.NUnique.html#featuretools.primitives.NUnique And I'm done sharing non-pure-python solutions for this problem, I promise On Sunday, April 15, 2018, Wes Turner <wes.tur...@gmail.com> wrote: > > > On Sunday, April 15, 2018, Peter Norvig <pe...@norvig.com> wrote: > >> If you think of a Counter as a multiset, then it should support __or__, >> not __add__, right? >> >> I do think it would have been fine if Counter did not support "+" at all >> (and/or if Counter was limited to integer values). But given where we are >> now, it feels like we should preserve `c + c == 2 * c`. >> >> As to the "doesn't really add any new capabilities" argument, that's >> true, but it is also true for Counter as a whole: it doesn't add much over >> defaultdict(int), but it is certainly convenient to have a standard way to >> do what it does. >> >> I agree with your intuition that low level is better. `total` would be >> useful. If you have total and mul, then as you and others have pointed out, >> normalize is just c *= 1/c.total. >> >> I can also see the argument for a new FrequencyTable class in the >> statistics module. (By the way, I refactored my >> https://github.com/norvig/pytudes/blob/master/ipynb/Probability.ipynb a >> bit, and now I no longer need a `normalize` function.) >> > > nltk.probability.FreqDist(collections.Counter) doesn't have a __mul__ > either > http://www.nltk.org/api/nltk.html#nltk.probability.FreqDist > > numpy.unique(, return_counts=True).unique_counts returns an array sorted > by value with a __mul__. > https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html > > scipy.stats.itemfreq returns an array sorted by value with a __mul__ and > the items in the first column. > https://docs.scipy.org/doc/scipy/reference/generated/ > scipy.stats.itemfreq.html > > pandas.Series.value_counts(, normalize=False) returns a Series sorted by > descending frequency. > https://pandas.pydata.org/pandas-docs/stable/generated/ > pandas.Series.value_counts.html > > >> On Sun, Apr 15, 2018 at 5:06 PM Raymond Hettinger < >> raymond.hettin...@gmail.com> wrote: >> >>> >>> >>> > On Apr 15, 2018, at 2:05 PM, Peter Norvig <pe...@norvig.com> wrote: >>> > >>> > For most types that implement __add__, `x + x` is equal to `2 * x`. >>> > >>> > ... >>> > >>> > >>> > That is true for all numbers, list, tuple, str, timedelta, etc. -- but >>> not for collections.Counter. I can add two Counters, but I can't multiply >>> one by a scalar. That seems like an oversight. >>> >>> If you view the Counter as a sparse associative array of numeric values, >>> it does seem like an oversight. If you view the Counter as a Multiset or >>> Bag, it doesn't make sense at all ;-) >>> >>> From an implementation point of view, Counter is just a kind of dict >>> that has a __missing__() method that returns zero. That makes it trivially >>> easy to subclass Counter to add new functionality or just use dictionary >>> comprehensions for bulk updates. >>> >>> > >>> > >>> > It would be worthwhile to implement multiplication because, among >>> other reasons, Counters are a nice representation for discrete probability >>> distributions, for which multiplication is an even more fundamental >>> operation than addition. >>> >>> There is an open issue on this topic. See: >>> https://bugs.python.org/issue25478 >>> >>> One stumbling point is that a number of commenters are fiercely opposed >>> to non-integer uses of Counter. Also, some of the use cases (such as those >>> found in Allen Downey's "Think Stats" and "Think Bayes" books) also need >>> division and rescaling to a total (i.e. normalizing the total to 1.0) for a >>> probability mass function. >>> >>> If the idea were to go forward, it still isn't clear whether the correct >>> API should be low level (__mul__ and __div__ and a "total" property) or >>> higher level (such as a normalize() or rescale() method that produces a new >>> Counter instance). The low level approach has the advantage that it is >>> simple to understand and that it feels like a logical extension of the >>> __add__ and __sub__ methods. The downside is that doesn't really add any >>> new capabilities (being just short-cuts for a simple dict comprehension or >>> call to c.values()). And, it starts to feature creep the Counter class >>> further away from its core mission of counting and ventures into the realm >>> of generic sparse arrays with numeric values. There is also a >>> learnability/intelligibility issue in __add__ and __sub__ correspond to >>> "elementwise" operations while __mul__ and __div__ would be "scalar >>> broadcast" operations. >>> >>> Peter, I'm really glad you chimed in. My advocacy lacked sufficient >>> weight to move this idea forward. >>> >>> >>> Raymond >>> >>> >>> >>>
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/