Peter Otten wrote:
Kevin D. Smith wrote:
On 2009-05-07 23:48:43 -0500, Chris Rebert <c...@rebertia.com> said:
On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <kevin.sm...@sas.com>
wrote:
I need the behavior of heapq.merge to merge a bunch of results from a
database.... I would prefer to do this with generators.... I need
the key argument to sort on the correct field in the database...
I think so.... [some code]
Ah, that's not a bad idea. I think it could be simplified by letting
Python do the comparison work as follows (also untested).
def keyify(gen, key=lamda x:x):
for item in gen:
yield (key(item), item)
... This is not as general. It makes sorting unstable and can even fail
if the items are unorderable:
One way to fix it:
def decorate(gen, key):
... for index, item in enumerate(gen):
... yield key(item), index, item
sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
[(0.0, 0, 1j), (0.0, 1, -1j), (1.0, 2, (1+1j))]
list(undecorate(_))
[1j, -1j, (1+1j)]
However, a heap gets faster if equal items are equal, so you
really want the decorated value to be the sole component of a
comparison. (and remember, sorts are now stable).
As in your code, key generation is likely best calculated once.
Here's another fix:
class Keyed(object):
def __init__(self, obj, key):
self.obj = obj
self.key = key
def __lt__(self, other): # cmp is old-style magic
return self.key < other.key
def __eq__(self, other):
return self.key == other.key
def __repr__(self):
return '%s(%r, %r)' % (
self.__class__.__name__, self.obj, self.key)
def kmerge(key, *streams):
keyed_streams = [(Keyed(obj, key(obj)) for obj in stream)
for stream in streams]
for element in heapq.merge(*keyed_streams):
yield element.obj
--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list