Peter Otten wrote:
Kevin D.  Smith wrote:

On 2009-05-07 23:48:43 -0500, Chris Rebert <c...@rebertia.com> said:

On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <kevin.sm...@sas.com>
wrote:
I need the behavior of heapq.merge to merge a bunch of results from a
database.... I would prefer to do this with generators....  I need
the key argument to sort on the correct field in the database...
I think so.... [some code]
Ah, that's not a bad idea.  I think it could be simplified by letting
Python do the comparison work as follows (also untested).

def keyify(gen, key=lamda x:x):
    for item in gen:
        yield (key(item), item)
... This is not as general. It makes sorting unstable and can even fail if the items are unorderable:
One way to fix it:
def decorate(gen, key):
...     for index, item in enumerate(gen):
...             yield key(item), index, item
sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
[(0.0, 0, 1j), (0.0, 1, -1j), (1.0, 2, (1+1j))]
list(undecorate(_))
[1j, -1j, (1+1j)]

However, a heap gets faster if equal items are equal, so you
really want the decorated value to be the sole component of a
comparison.  (and remember, sorts are now stable).
As in your code, key generation is likely best calculated once.
Here's another fix:

    class Keyed(object):
        def __init__(self, obj, key):
            self.obj = obj
            self.key = key

        def __lt__(self, other): # cmp is old-style magic
            return self.key < other.key

        def __eq__(self, other):
            return self.key == other.key

        def __repr__(self):
            return '%s(%r, %r)' % (
                self.__class__.__name__, self.obj, self.key)

    def kmerge(key, *streams):
        keyed_streams = [(Keyed(obj, key(obj)) for obj in stream)
                         for stream in streams]
        for element in heapq.merge(*keyed_streams):
            yield element.obj

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to