"MauMau" <maumau...@gmail.com> wrote: >>> On 09-05-2012 19:17, MauMau wrote: >>>> Then, does it make sense to remove "#define KEEPONLYALNUM" in >>>> 9.1.4? Would it cause any problems? Yes, it will cause problems. > For information, what kind of breakage would occur? > I imagined removing KEEPONLYALNUM would just accept > non-alphanumeric characters and cause no harm to those who use > only alphanumeric characters. This would break our current usages because of the handling of trigrams at the "edges" of groups of qualifying characters. It would make similarity (and distance) values less useful for our current name searches using it. To simulate the effect, I used an '8' in place of a comma instead of recompiling with the suggested change.
test=# select show_trgm('smith,john'); show_trgm ----------------------------------------------------------- {" j"," s"," jo"," sm","hn ",ith,joh,mit,ohn,smi,"th "} (1 row) test=# select show_trgm('smith8john'); show_trgm ----------------------------------------------------- {" s"," sm",8jo,h8j,"hn ",ith,joh,mit,ohn,smi,th8} (1 row) test=# select similarity('smith,john', 'jon smith'); similarity ------------ 0.615385 (1 row) test=# select similarity('smith8john', 'jon smith'); similarity ------------ 0.3125 (1 row) So making the proposed change unconditionally could indeed hurt current users of the technique. On the other hand, if there was fine-grained control of this, it might make trigrams useful for searching statute cites (using all characters) as well as names (using the current character set); so I wouldn't want it to just be controlled by a global GUC. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers