Neil Cerutti <[EMAIL PROTECTED]> wrote: > It's probable that a simpler implementation using slice > operations will be faster for shortish lengths of subseq. It was > certainly easier to get it working correctly. ;) > > def find(seq, subseq): > for i, j in itertools.izip(xrange(len(seq)-len(subseq)), > xrange(len(subseq), len(seq))): > if subseq == seq[i:j]: > return i > return -1
Simpler yet (though maybe slower!-): def find(seq, subseq): L = len(subseq) for i in xrange(0, len(seq)-L): if subseq == seq[i:i+L]: return i return -1 also worth trying (may be faster in some cases, e.g. when the first item of the subsequence occurs rarely in the sequence): def find(seq, subseq): L = len(subseq) firstitem = subseq[0] end = len(seq) - len(subseq) i = -1 while 1: try: i = seq.index(firstitem, i+1, end) except ValueError: return -1 if subseq == seq[i:i+L]: return i For particularly long sequences (with hashable items) it might even be worth trying variants of Boyer-Moore, Horspool, or Knuth-Morris-Pratt; while these search algorithms are mostly intended for text strings, since you need tables indexed by the item values, using dicts for such tables might yet be feasible (however, the program won't be quite that simple). Benchmarking of various possibilities on typical input data for your application is recommended, as performance may vary! Alex -- http://mail.python.org/mailman/listinfo/python-list