On Fri, Sep 26, 2008 at 01:39:16PM -0700, [EMAIL PROTECTED] wrote: > # building prefix-function > m = 0 > for i in xrange(1, len_sub): > while m > 0 and sub[m] != sub[i]: > m = table[m - 1] > if sub[m] == sub[i]: > m += 1 > table[i] = m > > # searching > m, i = 0, 0 > for x in items: > while m > 0 and sub[m] != x: > m = table[m - 1] > if sub[m] == x: > m += 1 > if m == len_sub: > return True > i += 1 > > return False
Quite a lot faster than mine... even without using psyco. Which is interesting, especially because if I change my search loop to work like yours, but leave out all your other optimizations, mine runs roughly as fast as yours (i.e. execution time is negligible to a user running the program in both cases, even with the large example data you gave). This leads me to point out two caveats with your version: 1. Guavat posted a version which returns a list of all the indexes where a match is found... which is what I mimiced. Yours returns only true or false indicating whether or not it found a match. The main difference in performance seems to come due to your iteration over the list items, versus my comparing a sublist-sized slice of the whole to the sublist. But this alone is an invalid optimization, because it doesn't produce the same results... If you only want to know if a match exists, it's great; but if you want to know *where*, or *how many times*, you lose. That could be fixed though, by counting the elements as you loop through them... I didn't attempt to determine the performance hit for doing that, but I assume it's negligible. I also imagine that that was your original purpose for the unused variable i... 2. Your secondary optimizations add a great deal of complexity to your code, making the algorithm much harder to understand. However they don't appear to buy you much, given that the cases they optimize would probably be rare, and the difference in execution time gained by the optimization is not noticable to the user. Unless you're doing lots and lots of these in your application, or maybe if you know in advance that your data will contain many instances of the cases you optimized, I think you're better off leaving the optimizations out, for the sake of code clarity. At the very least, if you're going to write complicated optimizations, you ought to have explained what you were doing in comments... :) -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0x81CFE75D
pgpcipqEw7nQM.pgp
Description: PGP signature
-- http://mail.python.org/mailman/listinfo/python-list