On Fri, 03 Jun 2011 02:58:24 +0000, Chris Torek wrote:

> Python might be penalized by its use of Unicode here, since a
> Boyer-Moore table for a full 16-bit Unicode string would need
> 65536 entries (one per possible ord() value).  However, if the
> string being sought is all single-byte values, a 256-element
> table suffices; re.compile(), at least, could scan the pattern
> and choose an appropriate underlying search algorithm.

The table can be truncated or compressed at the cost of having to map
codepoints to table indices. Or use a hash table instead of an array.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to