On Jun 1, 1:49 am, Peter Otten <[EMAIL PROTECTED]> wrote: > Peter Otten wrote: > > #untested > > Already found two major blunders :( > > # still untested > import difflib > > def _merge(a, b): > sm = difflib.SequenceMatcher(None, a, b) > for op, a1, a2, b1, b2 in sm.get_opcodes(): > if op == "insert": > yield b[b1:b2] > elif op == "replace": > yield a[a1:a2] > yield b[b1:b2] > else: # delete, equal > yield a[a1:a2] > > def merge(a, b): > return sum(_merge(a, b), []) > > def merge_to_unique(sources): > return unique(reduce(merge, sorted(sources, key=len, reverse=True))) >
difflib.SequenceMatcher looks promising; I'll try it. Thanks! > def unique(items): > u = set(items) > if len(u) == len(items): > return items > result = [] > for item in items: > if item in u: > result.append(item) > u.remove(item) > return result You did right by preserving the original (non-alphabetical) ordering, but I'm less enthusiastic about the shape of this function. My original function used 7 lines of code, and only 1 for the unique() step. This uses up to three container objects. Is it really an improvement? (Secret: the reference list (or, any of the sources) is unlikely to be more than a few dozen elements long. The data set that puts merge_to_unique through a workout will be a giant list of comparatively short lists, so the unique() part just needs to be short and conceptually clean, while merge() should attempt sane behavior for large len(sources).) -- http://mail.python.org/mailman/listinfo/python-list
