On Fri, 03 Jun 2011 02:58:24 +0000, Chris Torek wrote: > Python might be penalized by its use of Unicode here, since a > Boyer-Moore table for a full 16-bit Unicode string would need > 65536 entries (one per possible ord() value). However, if the > string being sought is all single-byte values, a 256-element > table suffices; re.compile(), at least, could scan the pattern > and choose an appropriate underlying search algorithm.
The table can be truncated or compressed at the cost of having to map codepoints to table indices. Or use a hash table instead of an array. -- http://mail.python.org/mailman/listinfo/python-list