what's about word with several infinitives

select to_tsvector('en', 'leavings');
  'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
(1 row)

Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'


select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.

This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
        phraseto_tsquery('cat ate some rats')
        ( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.

Agree, seems that's easy to change. I thought that I saw an issue with hyphenated word but, fortunately, I forget that hyphenated words don't share a position:
# select to_tsvector('foo-bar');
 'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
 ( 'foo-bar' <-> 'foo' ) <-> 'bar'
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');

Patch is attached

Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/

Attachment: phrase_exact_distance.patch
Description: binary/octet-stream

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to