Re: [Python-Dev] tally (and other accumulators)
On Apr 4, 2006, at 10:53 PM, Jess Austin wrote: Alex wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) I'll stop lurking and submit the following: def tally(seq): return dict((group[0], len(tuple(group[1]))) for group in itertools.groupby(sorted(seq))) In general itertools.groupby() seems like a very clean way to do this sort of thing, whether you want to end up with a dict or not. I'll go so far as to suggest that the existence of groupby() obviates the proposed tally(). Maybe I've just coded too much SQL and it has warped my brain... OTOH the latter definition of tally won't cope with iterables, and it seems like O(nlogn) rather than O(n). It will cope with any iterable just fine (not sure why you think otherwise), but the major big-O impact seems to me to rule it out as a general solution. Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Alex wrote: On Apr 4, 2006, at 10:53 PM, Jess Austin wrote: Alex wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) I'll stop lurking and submit the following: def tally(seq): return dict((group[0], len(tuple(group[1]))) for group in itertools.groupby(sorted(seq))) In general itertools.groupby() seems like a very clean way to do this sort of thing, whether you want to end up with a dict or not. I'll go so far as to suggest that the existence of groupby() obviates the proposed tally(). Maybe I've just coded too much SQL and it has warped my brain... OTOH the latter definition of tally won't cope with iterables, and it seems like O(nlogn) rather than O(n). It will cope with any iterable just fine (not sure why you think otherwise), but the major big-O impact seems to me to rule it out as a general solution. You're right in that it won't raise an exception on an iterator, but the sorted() means that it's much less memory efficient than your version for iterators. Another reason to avoid sorted() for this application, besides the big-O. Anyway, I still like groupby() for this sort of thing, with the aforementioned caveats. Functional code seems a little clearer to me, although I realize that preference is not held universally. cheers, Jess ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Jess Austin wrote: Alex wrote: On Apr 4, 2006, at 10:53 PM, Jess Austin wrote: Alex wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) [Jess again] def tally(seq): return dict((group[0], len(tuple(group[1]))) for group in itertools.groupby(sorted(seq))) In general itertools.groupby() seems like a very clean way to do this sort of thing, whether you want to end up with a dict or not. I'll go so far as to suggest that the existence of groupby() obviates the proposed tally(). Maybe I've just coded too much SQL and it has warped my brain... You're right in that it won't raise an exception on an iterator, but the sorted() means that it's much less memory efficient than your version for iterators. Another reason to avoid sorted() for this application, besides the big-O. Anyway, I still like groupby() for this sort of thing, with the aforementioned caveats. Functional code seems a little clearer to me, although I realize that preference is not held universally. However, sorted requires ordering. Try seq = [1, 1j, -1, -1j] * 5 Alex's tally works, but yours does not. -- -- Scott David Daniels [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Jess Austin wrote: I'll go so far as to suggest that the existence of groupby() obviates the proposed tally(). Except that it requires building a list of values in each group when all you want at the end is the length of the list. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] tally (and other accumulators)
It's a bit late for 2.5, of course, but, I thought I'd propose it anyway -- I noticed it on c.l.py. In 2.3/2.4 we have many ways to generate and process iterators but few accumulators -- functions that accept an iterable and produce some kind of summary result from it. sum, min, max, for example. And any, all in 2.5. The proposed function tally accepts an iterable whose items are hashable and returns a dict mapping each item to its count (number of times it appears). This is quite general and simple at the same time: for example, it was proposed originally to answer some complaint about any and all giving no indication of the count of true/false items: tally(bool(x) for x in seq) would give a dict with two entries, counts of true and false items. Just like the other accumulators mentioned above, tally is simple to implement, especially with the new collections.defaultdict: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) Nevertheless, simplicity and generality make it advisable to supply it as part of the standard library (location TBD). A good alternative would be a classmethod tally within collections.defaultdict, building and returning a defaultdict as above (with a .factory left to int, for further possible use as a 'bag'/multiset data structure); this would solve the problem of where to locate tally if it were to be a function. defaultdict.tally would be logically quite similar to dict.fromkeys, except that keys happening repeatedly get counted (and so each associated to a value of 1 and upwards) rather than collapsed. Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) Nevertheless, simplicity and generality make it advisable to supply it as part of the standard library (location TBD). A good alternative would be a classmethod tally within collections.defaultdict, building and returning a defaultdict as above (with a .factory left to int, for further possible use as a 'bag'/multiset data structure); this would solve the problem of where to locate tally if it were to be a function. defaultdict.tally would be logically quite similar to dict.fromkeys, except that keys happening repeatedly get counted (and so each associated to a value of 1 and upwards) rather than collapsed. Putting it somewhere in collections seems like a great idea. defaultdict is a bit odd, because the functionality doesn't have anything to do with defaults, just dicts. maybe a classmethod on regular dicts would make more sense? I write this function regularly, so I'd be happy to have it available directly. Jeremy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
On Apr 4, 2006, at 8:01 AM, Jeremy Hylton wrote: On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) ... Putting it somewhere in collections seems like a great idea. defaultdict is a bit odd, because the functionality doesn't have anything to do with defaults, just dicts. maybe a classmethod on regular dicts would make more sense? Good points: it should probably be a classmethod on dict, or a function in module collections. I write this function regularly, so I'd be happy to have it available directly. Heh, same here -- soon as I saw it proposed on c.l.py I recognized an old friend and it struck me that, simple but widely used, it should be somewhere in the standard library. Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Alex Martelli wrote: It's a bit late for 2.5, of course, but, I thought I'd propose it anyway -- I noticed it on c.l.py. In 2.3/2.4 we have many ways to generate and process iterators but few accumulators -- functions that accept an iterable and produce some kind of summary result from it. sum, min, max, for example. And any, all in 2.5. The proposed function tally accepts an iterable whose items are hashable and returns a dict mapping each item to its count (number of times it appears). This is quite general and simple at the same time: for example, it was proposed originally to answer some complaint about any and all giving no indication of the count of true/false items: tally(bool(x) for x in seq) would give a dict with two entries, counts of true and false items. Just like the other accumulators mentioned above, tally is simple to implement, especially with the new collections.defaultdict: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) Or: import collections bag = collections.Bag([1, 2, 3, 2, 1]) assert bag.count(1) == 2 assert bag.count(0) == 0 assert 3 in bag # etc... -- Ian Bicking / [EMAIL PROTECTED] / http://blog.ianbicking.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Alex Martelli wrote: On Apr 4, 2006, at 8:01 AM, Jeremy Hylton wrote: On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) ... Putting it somewhere in collections seems like a great idea. defaultdict is a bit odd, because the functionality doesn't have anything to do with defaults, just dicts. maybe a classmethod on regular dicts would make more sense? Good points: it should probably be a classmethod on dict, or a function in module collections. I write this function regularly, so I'd be happy to have it available directly. Heh, same here -- soon as I saw it proposed on c.l.py I recognized an old friend and it struck me that, simple but widely used, it should be somewhere in the standard library. Why not make it collections.bag, like the following: class bag(dict): def __init__(self, iterable=None): dict.__init__(self) if iterable: self.update(iterable) def update(self, iterable): for item in iterable: self.add(item) def add(self, item): self[item] = self.get(item, 0) + 1 def remove(self, item): if self[item] == 1: del self[item] else: self[item] -= 1 def count(self, item): return self[item] (etc.) Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
[Alex] This is quite general and simple at the same time: for example, it was proposed originally to answer some complaint about any and all giving no indication of the count of true/false items: tally(bool(x) for x in seq) would give a dict with two entries, counts of true and false items. FWIW, sum() works nicely for counting true entries: sum(x%3==0 for x in range(100)) 34 Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
On 4/4/06, Raymond Hettinger [EMAIL PROTECTED] wrote: [Alex] This is quite general and simple at the same time: for example, it was proposed originally to answer some complaint about any and all giving no indication of the count of true/false items: tally(bool(x) for x in seq) would give a dict with two entries, counts of true and false items. FWIW, sum() works nicely for counting true entries: sum(x%3==0 for x in range(100)) 34 Sure, and also works fine for counting false ones, thanks to 'not', but if you need both counts sum doesn't work (not without dirty tricks that can't be recommended;-). Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tally (and other accumulators)
Alex wrote: import collections def tally(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return dict(d) I'll stop lurking and submit the following: def tally(seq): return dict((group[0], len(tuple(group[1]))) for group in itertools.groupby(sorted(seq))) In general itertools.groupby() seems like a very clean way to do this sort of thing, whether you want to end up with a dict or not. I'll go so far as to suggest that the existence of groupby() obviates the proposed tally(). Maybe I've just coded too much SQL and it has warped my brain... OTOH the latter definition of tally won't cope with iterables, and it seems like O(nlogn) rather than O(n). cheers, Jess ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com