Re: [HACKERS] Behaviour of to_tsquery(stopwords only)

2008-03-07 Thread Teodor Sigaev

Fixed for CVS HEAD and 8.3, will fix for previous versions too.

Richard Huxton wrote:

Teodor Sigaev wrote:


So - is this a bug, feature, feature?


It's definitely a bug:
select count(*), query from queries group by query;
 count |  query
---+--
 3 | 'tender'
 4 | 'tender'
 4 | 'tender'
(3 rows)

Will fix it soon.


Ah, smashing.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Behaviour of to_tsquery(stopwords only)

2008-03-06 Thread Richard Huxton
I'm not sure what value a tsquery has if it's composed from stopwords 
only, but it doesn't seem to be null or equal to itself.


That strikes me as ... unintuitive, although I'm happy to be re-educated 
on this.


I think it's because CompareTSQ (tsquery_op.c, line 142) doesn't have a 
case to handle query sizes of zero. That's what seems to be returned 
from tsearch/to_tsany.c lines ~ 345-350.



SELECT
  qid,words,query,
  (query is null) AS isnull,
  (query = to_tsquery(words)) as issame
FROM
  util.queries
ORDER BY qid DESC LIMIT 5;

NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored

 qid  |  words   |   query| isnull | issame
--+--+++
 1000 | to   || f  | f
  999 | or   || f  | f
  998 | requests | 'request'  | f  | t
  997 | site | 'site' | f  | t
  996 | document | 'document' | f  | t
(5 rows)

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] Behaviour of to_tsquery(stopwords only)

2008-03-06 Thread Richard Huxton

Further tsquery comparison fun:

= SELECT q.qid, q.query, count(*) FROM doc.documents d, util.queries q 
WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY 
q.qid, q.query ;

 qid |  query   | count
-+--+---
 195 | 'tender' |   374
 248 | 'tender' |   374
 257 | 'tender' |   374
 332 | 'tender' |   374
 401 | 'tender' |   374
 409 | 'tender' |   374
 519 | 'tender' |   374
 557 | 'tender' |   374
 736 | 'tender' |   374
 749 | 'tender' |   374
 869 | 'tender' |   374
 879 | 'tender' |   374
 926 | 'tender' |   374
(13 rows)

= SELECT q.query, count(*) FROM doc.documents d, util.queries q WHERE 
d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY q.query ; 
 query   | count

--+---
 'tender' |  1870
 'tender' |  1496
 'tender' |  1496
(3 rows)


It seems to be that the tsquery is remembering the shape of the original 
query, even though it's been trimmed.



= SELECT q.query, min(qid), max(qid), count(*) FROM doc.documents d, 
util.queries q WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$) 
GROUP BY q.query ;

  query   | min | max | count
--+-+-+---
 'tender' | 736 | 926 |  1870 (5 rows aggregated)
 'tender' | 401 | 557 |  1496 (4 rows aggregated)
 'tender' | 195 | 332 |  1496 (4 rows aggregated)
(3 rows)

= SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332, 
401,409,519,557,736,749,869,879,926) ORDER BY qid;

 qid |words|  query
-+-+--
 195 | can  of  tenders  | 'tender' (3 clauses)
 248 | tender  the  this | 'tender' (3 clauses)
 257 | have  tender  for | 'tender' (3 clauses)
 332 | for  tenders  of  | 'tender' (3 clauses)
 401 | tender  with   | 'tender' (2 clauses)
 409 | tenders  to| 'tender' (2 clauses)
 519 | tender  to | 'tender' (2 clauses)
 557 | tenders  be| 'tender' (2 clauses)
 736 | tenderer| 'tender' (1 clause)
 749 | tender  | 'tender' (1 clause)
 869 | tender  | 'tender' (1 clause)
 879 | tender  | 'tender' (1 clause)
 926 | tender  | 'tender' (1 clause)
(13 rows)

So - is this a bug, feature, feature?

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] Behaviour of to_tsquery(stopwords only)

2008-03-06 Thread Teodor Sigaev
= SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332, 
401,409,519,557,736,749,869,879,926) ORDER BY qid;

 qid |words|  query
-+-+--
 195 | can  of  tenders  | 'tender' (3 clauses)
 248 | tender  the  this | 'tender' (3 clauses)
 257 | have  tender  for | 'tender' (3 clauses)
 332 | for  tenders  of  | 'tender' (3 clauses)
 401 | tender  with   | 'tender' (2 clauses)
 409 | tenders  to| 'tender' (2 clauses)
 519 | tender  to | 'tender' (2 clauses)
 557 | tenders  be| 'tender' (2 clauses)
 736 | tenderer| 'tender' (1 clause)
 749 | tender  | 'tender' (1 clause)
 869 | tender  | 'tender' (1 clause)
 879 | tender  | 'tender' (1 clause)
 926 | tender  | 'tender' (1 clause)
(13 rows)

So - is this a bug, feature, feature?


It's definitely a bug:
select count(*), query from queries group by query;
 count |  query
---+--
 3 | 'tender'
 4 | 'tender'
 4 | 'tender'
(3 rows)

Will fix it soon.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] Behaviour of to_tsquery(stopwords only)

2008-03-06 Thread Richard Huxton

Teodor Sigaev wrote:


So - is this a bug, feature, feature?


It's definitely a bug:
select count(*), query from queries group by query;
 count |  query
---+--
 3 | 'tender'
 4 | 'tender'
 4 | 'tender'
(3 rows)

Will fix it soon.


Ah, smashing.

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers