Re: [Python-Dev] tally (and other accumulators)

2006-04-05 Thread Alex Martelli

On Apr 4, 2006, at 10:53 PM, Jess Austin wrote:

 Alex wrote:
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)

 I'll stop lurking and submit the following:

 def tally(seq):
 return dict((group[0], len(tuple(group[1])))
 for group in itertools.groupby(sorted(seq)))

 In general itertools.groupby() seems like a very clean way to do this
 sort of thing, whether you want to end up with a dict or not.  I'll go
 so far as to suggest that the existence of groupby() obviates the
 proposed tally().  Maybe I've just coded too much SQL and it has  
 warped
 my brain...

 OTOH the latter definition of tally won't cope with iterables, and it
 seems like O(nlogn) rather than O(n).

It will cope with any iterable just fine (not sure why you think  
otherwise), but the major big-O impact seems to me to rule it out as  
a general solution.


Alex

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-05 Thread Jess Austin
Alex wrote:
 On Apr 4, 2006, at 10:53 PM, Jess Austin wrote:
  Alex wrote:
  import collections
  def tally(seq):
   d = collections.defaultdict(int)
   for item in seq:
   d[item] += 1
   return dict(d)
 
  I'll stop lurking and submit the following:
 
  def tally(seq):
  return dict((group[0], len(tuple(group[1])))
  for group in itertools.groupby(sorted(seq)))
 
  In general itertools.groupby() seems like a very clean way to do this
  sort of thing, whether you want to end up with a dict or not.  I'll go
  so far as to suggest that the existence of groupby() obviates the
  proposed tally().  Maybe I've just coded too much SQL and it has  
  warped my brain...
 
  OTOH the latter definition of tally won't cope with iterables, and it
  seems like O(nlogn) rather than O(n).
 
 It will cope with any iterable just fine (not sure why you think  
 otherwise), but the major big-O impact seems to me to rule it out as  
 a general solution.

You're right in that it won't raise an exception on an iterator, but the
sorted() means that it's much less memory efficient than your version
for iterators.  Another reason to avoid sorted() for this application,
besides the big-O.  Anyway, I still like groupby() for this sort of
thing, with the aforementioned caveats.  Functional code seems a little
clearer to me, although I realize that preference is not held
universally.

cheers,
Jess
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-05 Thread Scott David Daniels
Jess Austin wrote:
 Alex wrote:
 On Apr 4, 2006, at 10:53 PM, Jess Austin wrote:
 Alex wrote:
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)
[Jess again]
 def tally(seq):
 return dict((group[0], len(tuple(group[1])))
 for group in itertools.groupby(sorted(seq)))

 In general itertools.groupby() seems like a very clean way to do this
 sort of thing, whether you want to end up with a dict or not.  I'll go
 so far as to suggest that the existence of groupby() obviates the
 proposed tally().  Maybe I've just coded too much SQL and it has  
 warped my brain...

 You're right in that it won't raise an exception on an iterator, but the
 sorted() means that it's much less memory efficient than your version
 for iterators.  Another reason to avoid sorted() for this application,
 besides the big-O.  Anyway, I still like groupby() for this sort of
 thing, with the aforementioned caveats.  Functional code seems a little
 clearer to me, although I realize that preference is not held
 universally.

However, sorted requires ordering.  Try seq = [1, 1j, -1, -1j] * 5
Alex's tally works, but yours does not.

-- 
-- Scott David Daniels
[EMAIL PROTECTED]

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-05 Thread Greg Ewing
Jess Austin wrote:
 I'll go
 so far as to suggest that the existence of groupby() obviates the
 proposed tally().

Except that it requires building a list of values in
each group when all you want at the end is the length
of the list.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] tally (and other accumulators)

2006-04-04 Thread Alex Martelli
It's a bit late for 2.5, of course, but, I thought I'd propose it  
anyway -- I noticed it on c.l.py.

In 2.3/2.4 we have many ways to generate and process iterators but  
few accumulators -- functions that accept an iterable and produce  
some kind of summary result from it.  sum, min, max, for example.  
And any, all in 2.5.

The proposed function tally accepts an iterable whose items are  
hashable and returns a dict mapping each item to its count (number of  
times it appears).

This is quite general and simple at the same time: for example, it  
was proposed originally to answer some complaint about any and all  
giving no indication of the count of true/false items:

tally(bool(x) for x in seq)

would give a dict with two entries, counts of true and false items.

Just like the other accumulators mentioned above, tally is simple to  
implement, especially with the new collections.defaultdict:

import collections
def tally(seq):
 d = collections.defaultdict(int)
 for item in seq:
 d[item] += 1
 return dict(d)

Nevertheless, simplicity and generality make it advisable to supply  
it as part of the standard library (location TBD).

A good alternative would be a classmethod tally within  
collections.defaultdict, building and returning a defaultdict as  
above (with a .factory left to int, for further possible use as a  
'bag'/multiset data structure); this would solve the problem of where  
to locate tally if it were to be a function.  defaultdict.tally would  
be logically quite similar to dict.fromkeys, except that keys  
happening repeatedly get counted (and so each associated to a value  
of 1 and upwards) rather than collapsed.


Alex

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Jeremy Hylton
On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote:
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)

 Nevertheless, simplicity and generality make it advisable to supply
 it as part of the standard library (location TBD).

 A good alternative would be a classmethod tally within
 collections.defaultdict, building and returning a defaultdict as
 above (with a .factory left to int, for further possible use as a
 'bag'/multiset data structure); this would solve the problem of where
 to locate tally if it were to be a function.  defaultdict.tally would
 be logically quite similar to dict.fromkeys, except that keys
 happening repeatedly get counted (and so each associated to a value
 of 1 and upwards) rather than collapsed.

Putting it somewhere in collections seems like a great idea. 
defaultdict is a bit odd, because the functionality doesn't have
anything to do with defaults, just dicts.  maybe a classmethod on
regular dicts would make more sense?

I write this function regularly, so I'd be happy to have it available directly.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Alex Martelli

On Apr 4, 2006, at 8:01 AM, Jeremy Hylton wrote:

 On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote:
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)
...
 Putting it somewhere in collections seems like a great idea.
 defaultdict is a bit odd, because the functionality doesn't have
 anything to do with defaults, just dicts.  maybe a classmethod on
 regular dicts would make more sense?

Good points: it should probably be a classmethod on dict, or a  
function in module collections.

 I write this function regularly, so I'd be happy to have it  
 available directly.

Heh, same here -- soon as I saw it proposed on c.l.py I recognized an  
old friend and it struck me that, simple but widely used, it should  
be somewhere in the standard library.


Alex

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Ian Bicking
Alex Martelli wrote:
 It's a bit late for 2.5, of course, but, I thought I'd propose it  
 anyway -- I noticed it on c.l.py.
 
 In 2.3/2.4 we have many ways to generate and process iterators but  
 few accumulators -- functions that accept an iterable and produce  
 some kind of summary result from it.  sum, min, max, for example.  
 And any, all in 2.5.
 
 The proposed function tally accepts an iterable whose items are  
 hashable and returns a dict mapping each item to its count (number of  
 times it appears).
 
 This is quite general and simple at the same time: for example, it  
 was proposed originally to answer some complaint about any and all  
 giving no indication of the count of true/false items:
 
 tally(bool(x) for x in seq)
 
 would give a dict with two entries, counts of true and false items.
 
 Just like the other accumulators mentioned above, tally is simple to  
 implement, especially with the new collections.defaultdict:
 
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)

Or:

   import collections
   bag = collections.Bag([1, 2, 3, 2, 1])
   assert bag.count(1) == 2
   assert bag.count(0) == 0
   assert 3 in bag
   # etc...


-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Georg Brandl
Alex Martelli wrote:
 On Apr 4, 2006, at 8:01 AM, Jeremy Hylton wrote:
 
 On 4/4/06, Alex Martelli [EMAIL PROTECTED] wrote:
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)
 ...
 Putting it somewhere in collections seems like a great idea.
 defaultdict is a bit odd, because the functionality doesn't have
 anything to do with defaults, just dicts.  maybe a classmethod on
 regular dicts would make more sense?
 
 Good points: it should probably be a classmethod on dict, or a  
 function in module collections.
 
 I write this function regularly, so I'd be happy to have it  
 available directly.
 
 Heh, same here -- soon as I saw it proposed on c.l.py I recognized an  
 old friend and it struck me that, simple but widely used, it should  
 be somewhere in the standard library.

Why not make it collections.bag, like the following:


class bag(dict):
def __init__(self, iterable=None):
dict.__init__(self)
if iterable: self.update(iterable)

def update(self, iterable):
for item in iterable:
self.add(item)

def add(self, item):
self[item] = self.get(item, 0) + 1

def remove(self, item):
if self[item] == 1:
del self[item]
else:
self[item] -= 1

def count(self, item):
return self[item]


(etc.)

Georg












___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Raymond Hettinger
[Alex] 
 This is quite general and simple at the same time: for example, it  
 was proposed originally to answer some complaint about any and all  
 giving no indication of the count of true/false items:
 
 tally(bool(x) for x in seq)
 
 would give a dict with two entries, counts of true and false items.

FWIW, sum() works nicely for counting true entries:

 sum(x%3==0 for x in range(100))
   34


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Alex Martelli
On 4/4/06, Raymond Hettinger [EMAIL PROTECTED] wrote:
 [Alex]
  This is quite general and simple at the same time: for example, it
  was proposed originally to answer some complaint about any and all
  giving no indication of the count of true/false items:
 
  tally(bool(x) for x in seq)
 
  would give a dict with two entries, counts of true and false items.

 FWIW, sum() works nicely for counting true entries:

  sum(x%3==0 for x in range(100))
34

Sure, and also works fine for counting false ones, thanks to 'not',
but if you need both counts sum doesn't work (not without dirty tricks
that can't be recommended;-).


Alex
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tally (and other accumulators)

2006-04-04 Thread Jess Austin
Alex wrote: 
 import collections
 def tally(seq):
  d = collections.defaultdict(int)
  for item in seq:
  d[item] += 1
  return dict(d)

I'll stop lurking and submit the following:

def tally(seq):
return dict((group[0], len(tuple(group[1])))
for group in itertools.groupby(sorted(seq)))

In general itertools.groupby() seems like a very clean way to do this
sort of thing, whether you want to end up with a dict or not.  I'll go
so far as to suggest that the existence of groupby() obviates the
proposed tally().  Maybe I've just coded too much SQL and it has warped
my brain...

OTOH the latter definition of tally won't cope with iterables, and it
seems like O(nlogn) rather than O(n).

cheers,
Jess
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com