The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
----------------------
                  0.25

Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
  substring_similarity
----------------------
                  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
  substring_similarity
----------------------
                     1

Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?

Because it isn't a regex for substring search. Since implementing, pg_trgm works over words in string.
contrib_regression=# select similarity('block hole', 'hole black');
 similarity
------------
   0.571429
contrib_regression=# select similarity('block hole', 'black     hole');
 similarity
------------
   0.571429

It ignores spaces between words and word's order.

I agree, that substring_similarity is confusing name, but actually it search most similar word in second arg to first arg and returns their similarity.


--
Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to