Oleg Bartunov <obartu...@gmail.com> writes:
> On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I concur that that seems like a rather useless behavior.  If we have
>> "x <-> y" it is not possible to match at distance zero, while if we
>> have "x <-> x" it seems unlikely that the user is expecting us to
>> treat that identically to "x".  So phrase search simply should not
>> consider distance-zero matches.

> what's about word with several infinitives

> select to_tsvector('en', 'leavings');
>       to_tsvector
> ------------------------
>  'leave':1 'leavings':1
> (1 row)

> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>  ?column?
> ----------
>  t
> (1 row)

Hmm.  I can grant that there might be some cases where you want to see
if two separate patterns match the same lexeme, but that seems like an
extremely specialized use-case that you would only invoke very
intentionally.  It should not be built in as part of the default behavior
of every phrase search, because 99% of the time this would be an
unexpected and unwanted match.  I'm not even convinced that the operator
for this should be spelled <0> --- that seems more like a hack than a
natural extension of phrase search.  But if we do spell it like that,
then I think it should be called out as a special case that only applies
to <0>; that is, for any other value of N, the match has to be to separate

This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
        phraseto_tsquery('cat ate some rats')
        ( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.

Or maybe we need two operators, one for exactly-N-apart and one for

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to