Hi! On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, > i.e., if query key contains multibyte characters. In this case, byte > length of > the trigram string might be larger than three, and its CRC is used as a > trigram key instead of the trigram string itself. Because of using CRC, we > cannot do a partial-match. Attached patch extends pg_trgm so that it > compares a partial-match query key only when KEEPONLYALNUM is > enabled. > Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte? CREATE TABLE test (val TEXT); INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш'); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; test=# SELECT * FROM test WHERE val LIKE '%aa%'; val ------ aa aaa шaaш (3 rows) test=# set enable_seqscan = off; SET test=# SELECT * FROM test WHERE val LIKE '%aa%'; val ----- aa aaa (2 rows) I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this). ------ With best regards, Alexander Korotkov.