Hello,
this patch allow to use Knuth-Morrison-Pratt algorithm for strpos() function
(see Cormen et al. Introduction to Algorithms, MIT Press, 2001).
It also works with multibyte wchar.
In worst case current brute force strpos() takes O(n * m) (n && m is length of
strings)
time (example: 'aaa.
> Do you have any performance test results for this?
I describe the worst case in first message (search 'aaa..aab' in
'aa..aa', "complete" N^2). It works some msec instead of several sec
(in current version).
Patch intends for artificial language (for example DNA, or
language with small alphabet
Tom Lane writes:
> I wonder why you didn't propose Boyer-Moore instead, as that would have
> some advantage for natural language text as well.
> The difficulty with B-M is the need for a table indexed by character
> code, which at first glance looks impractical for wchars. But it seems
> to me t
Tom Lane writes:
>> hash table?
> I'd think the cost of hashing would render it impractical. Most of the
> papers I've seen on this topic worry about getting single instructions
> out of the search loop --- a hash lookup will cost lots more than that.
> Moreover, you'd lose the guarantee of not-