Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-20 Thread Tom Lane
Teodor Sigaev  writes:
>> We're really quickly running out of time to get this done before
>> beta2.  Please don't commit anything that's going to break the tree
>> because we only have about 72 hours before the wrap, but if it's
>> correct then it should go in.

> Isn't late now? Or wait to beta2 is out?

Let's wait till after beta2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-20 Thread Teodor Sigaev

We're really quickly running out of time to get this done before
beta2.  Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.


Isn't late now? Or wait to beta2 is out?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 11:07 AM, Teodor Sigaev  wrote:
>>> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair
>>> could be
>>> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
>>
>>
>> Oh ... the indexes in the lists don't have much to do with the distances,
>> do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
>> okay with the patch as presented.
>
>
> Huh, I found that my isn't correct for example which I show :(. Reworked
> patch is in attach.

We're really quickly running out of time to get this done before
beta2.  Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.


Huh, I found that my isn't correct for example which I show :(. Reworked patch 
is in attach.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance-2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Tom Lane
Teodor Sigaev  writes:
> Tom Lane wrote:
>> Hmm, couldn't the loop logic be simplified a great deal if this is the
>> definition?  Or are you leaving it like that with the idea that we might
>> later introduce another operator with the less-than-or-equal behavior?

> Do you suggest something like merge join of two sorted lists? ie:
> ...
> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could 
> be 
> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';

Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev



Tom Lane wrote:

Teodor Sigaev  writes:

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.



Agree, seems that's easy to change.
...
Patch is attached


Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?


Do you suggest something like merge join of two sorted lists? ie:

while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{
if (*Lpos > *Rpos)
Rpos++;
else if (*Lpos < *Rpos)
{
if (*Rpos - *Lpos == distance)
match!
Lpos++;
}
else
{
if (distance == 0)
match!
Lpos++; Rpos++;
}
}

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be 
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-16 Thread Tom Lane
Teodor Sigaev  writes:
>> So I think there's a reasonable case for decreeing that  should only
>> match lexemes *exactly* N apart.  If we did that, we would no longer have
>> the misbehavior that Jean-Pierre is complaining about, and we'd not need
>> to argue about whether <0> needs to be treated specially.

> Agree, seems that's easy to change.
> ...
> Patch is attached

Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev

We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


Here is a patch, now phrase operation returns false if there is not postion 
information. If this behavior looks more reasonable, I'll commit that.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_no_fallback.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev

what's about word with several infinitives



select to_tsvector('en', 'leavings');
   to_tsvector

  'leave':1 'leavings':1
(1 row)



select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
  ?column?
--
  t
(1 row)


Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.


This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.


Agree, seems that's easy to change. I thought that I saw an issue with 
hyphenated word but, fortunately, I forget that hyphenated words don't share a 
position:

# select to_tsvector('foo-bar');
 to_tsvector
-
 'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
 phraseto_tsquery
---
 ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
 ?column?
--
 t


Patch is attached

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-09 Thread Oleg Bartunov
On Thu, Jun 9, 2016 at 12:47 AM, Tom Lane  wrote:
> Oleg Bartunov  writes:
>> On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane  wrote:
>>> Another thing I noticed: if you test with tsvectors that don't contain
>>> position info, <-> seems to reduce to &, that is it doesn't enforce
>>> relative position:
>
>> yes, that's documented behaviour.
>
> Oh?  Where?  I've been going through the phrase-search documentation and
> copy-editing it today, and I have not found this stated anywhere.

Hmm, looks like it is missing.  We have told about this since 2008. Just found
http://www.sai.msu.su/~megera/postgres/talks/2009.pdf (slide 5) and
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf (slide 27)

We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Tom Lane
Oleg Bartunov  writes:
> On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane  wrote:
>> Another thing I noticed: if you test with tsvectors that don't contain
>> position info, <-> seems to reduce to &, that is it doesn't enforce
>> relative position:

> yes, that's documented behaviour.

Oh?  Where?  I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Tom Lane
Oleg Bartunov  writes:
> On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane  wrote:
>> I concur that that seems like a rather useless behavior.  If we have
>> "x <-> y" it is not possible to match at distance zero, while if we
>> have "x <-> x" it seems unlikely that the user is expecting us to
>> treat that identically to "x".  So phrase search simply should not
>> consider distance-zero matches.

> what's about word with several infinitives

> select to_tsvector('en', 'leavings');
>   to_tsvector
> 
>  'leave':1 'leavings':1
> (1 row)

> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>  ?column?
> --
>  t
> (1 row)

Hmm.  I can grant that there might be some cases where you want to see
if two separate patterns match the same lexeme, but that seems like an
extremely specialized use-case that you would only invoke very
intentionally.  It should not be built in as part of the default behavior
of every phrase search, because 99% of the time this would be an
unexpected and unwanted match.  I'm not even convinced that the operator
for this should be spelled <0> --- that seems more like a hack than a
natural extension of phrase search.  But if we do spell it like that,
then I think it should be called out as a special case that only applies
to <0>; that is, for any other value of N, the match has to be to separate
lexemes.

This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.

Or maybe we need two operators, one for exactly-N-apart and one for
at-most-N-apart.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane  wrote:
> Jean-Pierre Pelletier  writes:
>> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
>> matching consecutive words but it won't work for us if it cannot handle
>> consecutive *duplicate* words.
>
>> For example, the following returns true:select
>> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
>
>> Is this expected ?
>
> I concur that that seems like a rather useless behavior.  If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x".  So phrase search simply should not
> consider distance-zero matches.

what's about word with several infinitives

select to_tsvector('en', 'leavings');
  to_tsvector

 'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
 ?column?
--
 t
(1 row)


>
> The attached one-liner patch seems to fix this problem, though I am
> uncertain whether any other places need to be changed to match.
> Also, there is a regression test case that changes:
>
> *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 
> 19:21:17 2016
> --- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 
> 17:55:41 2016
> ***
> *** 897,903 
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>ts_rank_cd
>   
> !   0.0714286
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
> --- 897,903 
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>ts_rank_cd
>   
> !   0
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
>
>
> I'm not sure if this case is intentionally exhibiting the behavior that
> both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
> result simply wasn't thought about carefully.
>
> regards, tom lane
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane  wrote:
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> --
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> --
>  t
> (1 row)

yes, that's documented behaviour.


>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?

it's question of convention. Probably, returning false will quickly
indicate user
on his error, so such behaviour looks better.

>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the
> lexemes are all being treated as having position zero, but I have
> not checked.)

I didn't see the patch yet.

>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Jean-Pierre Pelletier
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:

select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"

select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"

Jean-Pierre Pelletier

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?

Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:

regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
--
 t
(1 row)

regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
--
 t
(1 row)

I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes.  That is, applying <-> to a stripped tsvector
seems like user error to me.  Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?

(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 9:01 PM, Jean-Pierre Pelletier
<jppellet...@e-djuster.com> wrote:
> If instead of casts, functions to_tsvector() and to_tsquery() are used,
> then the results is (I think ?) as expected:

because to_tsvector() function returns positions of words.

>
> select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
> 'cat <-> rat');
> or
> select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "false"
>
> select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "true"
>
> Jean-Pierre Pelletier
>
> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Wednesday, June 8, 2016 1:12 PM
> To: Teodor Sigaev; Oleg Bartunov
> Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
> to_tsvector('simple', 'blue') be true ?
>
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> --
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> --
>  t
> (1 row)
>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?
>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the lexemes
> are all being treated as having position zero, but I have not checked.)
>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Tom Lane
Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:

regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
 ?column? 
--
 t
(1 row)

regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
 ?column? 
--
 t
(1 row)

I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes.  That is, applying <-> to a stripped tsvector
seems like user error to me.  Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?

(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-07 Thread Tom Lane
Jean-Pierre Pelletier  writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.

> For example, the following returns true:select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

> Is this expected ?

I concur that that seems like a rather useless behavior.  If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x".  So phrase search simply should not
consider distance-zero matches.

The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:

*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 
19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 
17:55:41 2016
***
*** 897,903 
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd 
  
!   0.0714286
  (1 row)
  
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd 
  
!   0
  (1 row)
  
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');


I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.

regards, tom lane

diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 591e59c..95ad69b 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
*** TS_phrase_execute(QueryItem *curitem,
*** 1409,1415 
  		{
  			while (Lpos < Ldata.pos + Ldata.npos)
  			{
! if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
  {
  	/*
  	 * Lpos is behind the Rpos, so we have to check the
--- 1409,1415 
  		{
  			while (Lpos < Ldata.pos + Ldata.npos)
  			{
! if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos))
  {
  	/*
  	 * Lpos is behind the Rpos, so we have to check the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-07 Thread Jean-Pierre Pelletier
Hi,

I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.

For example, the following returns true:select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

Is this expected ?

Thanks,
Jean-Pierre Pelletier


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers