On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
> On 10.10.2013 15:03, Fujii Masao wrote:
>>
>> Hi,
>>
>> The behavior of pg_trgm's similarity function seems strange. Is this
>> intentional?
>>
>> I was thinking that the following three calls of the similarity function
>> return
>> the same number because the second argument is just the three characters
>> contained in the first argument in every calls.
>>
>> =# SELECT similarity('12345', '123');
>> =# SELECT similarity('12345', '234');
>> =# SELECT similarity('12345', '345');
>>
>> But that's not true. Each returns the different number.
>>
>> =# SELECT similarity('12345', '123');
>>   similarity
>> ------------
>>     0.428571
>> (1 row)
>>
>> =# SELECT similarity('12345', '234');
>>   similarity
>> ------------
>>     0.111111
>> (1 row)
>>
>> =# SELECT similarity('12345', '345');
>>   similarity
>> ------------
>>         0.25
>> (1 row)
>>
>> This happens because, for example, similarity('12345', '123') returns
>> the similarity number of '**12345*' and '**123*' (* means the blank
>> character),
>> NOT '12345' and '123'. IOW, two and one blank characters are added into
>> the heading and tailing of each argument, respectively. I wonder why
>> pg_trgm's similarity function works in this way. We should change this
>> so that no blank characters are added into the arguments?
>
>
> Well, you could also argue that "111111" and "222222" are quite similar,
> even though pg_trgm's similarity will not think so. It comes down to the
> definition of similarity, and how well that definition matches your
> intuition.
>
> FWIW, it feels right to me that a match in the beginning of a word is worth
> more than one in the middle of a string. -1 on changing that.

I'm not so sure that the assumption that leading trigrams should
effectively weight > 3x is a good one to build into the library.
However, the behavior is clearly documented and can't be changed.   I
think you'd need to improvise an alternate set of "trigram ops" if you
wanted to rig an alternate matching behavior.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to