On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > On 10.10.2013 15:03, Fujii Masao wrote: >> >> Hi, >> >> The behavior of pg_trgm's similarity function seems strange. Is this >> intentional? >> >> I was thinking that the following three calls of the similarity function >> return >> the same number because the second argument is just the three characters >> contained in the first argument in every calls. >> >> =# SELECT similarity('12345', '123'); >> =# SELECT similarity('12345', '234'); >> =# SELECT similarity('12345', '345'); >> >> But that's not true. Each returns the different number. >> >> =# SELECT similarity('12345', '123'); >> similarity >> ------------ >> 0.428571 >> (1 row) >> >> =# SELECT similarity('12345', '234'); >> similarity >> ------------ >> 0.111111 >> (1 row) >> >> =# SELECT similarity('12345', '345'); >> similarity >> ------------ >> 0.25 >> (1 row) >> >> This happens because, for example, similarity('12345', '123') returns >> the similarity number of '**12345*' and '**123*' (* means the blank >> character), >> NOT '12345' and '123'. IOW, two and one blank characters are added into >> the heading and tailing of each argument, respectively. I wonder why >> pg_trgm's similarity function works in this way. We should change this >> so that no blank characters are added into the arguments? > > > Well, you could also argue that "111111" and "222222" are quite similar, > even though pg_trgm's similarity will not think so. It comes down to the > definition of similarity, and how well that definition matches your > intuition. > > FWIW, it feels right to me that a match in the beginning of a word is worth > more than one in the middle of a string. -1 on changing that.
I'm not so sure that the assumption that leading trigrams should effectively weight > 3x is a good one to build into the library. However, the behavior is clearly documented and can't be changed. I think you'd need to improvise an alternate set of "trigram ops" if you wanted to rig an alternate matching behavior. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers