Tom Lane writes: >> hash table?
> I'd think the cost of hashing would render it impractical. Most of the > papers I've seen on this topic worry about getting single instructions > out of the search loop --- a hash lookup will cost lots more than that. > Moreover, you'd lose the guarantee of not-worse-than-linear time, > because hash lookup can be pathologically bad if you get a lot of hash > collisions. compute max_wchar, min_wchar. If (d = max_wchar - min_wchar) < k (for example, k = 1000), then we use index table (wchar -> wchar - min_wchar). Else we use hash table. Number of collisions would be a few (because hash table needs for pattern characters only. Characters located serially, hash function = whchar % const). >> The main difficulty with BM is verification and understanding "good >> suffix shift" (the second part of BM) (I don't understand it entirely). > Yeah, there seem to be a bunch of variants of BM (many of them not > guaranteed linear, which I'm sure we don't want) and the earliest > papers had bugs. But a good implementation would be a lot easier > sell because it would show benefits for a much wider set of use-cases > than KMP. Is there requirement for some string mathching algorithms/data structure(suffix array/tree) in PG? or "We've had no complaints about the speed of those functions". ---- Ajtkulov Pavel [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq