[HACKERS] Custom index structure and strange count problem

2010-06-09 Thread Carsten Kropf
Hi *,
during the last few months I've been building a new index structure as part of 
a research project.
Everything seems to work properly, however I have some strange issues with the 
count sql command.
I introduced some custom structures (mainly document and hybrid_query) with 
which my index access method is supposed to work.
There is an operator  which is supposed to use my index structure (what 
also works properly).
The function that maps to the operator  is called hybrid_index_query, which 
I use to compare my results given from the index with the real results that are 
supposed to appear in the final result set.
Having described the outer circumstances (in a very short way), I will now show 
the strange stuff that happens:

test=# select id from documents where hybrid_index_query(to_document(words, 
points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order by 
id;
  id  
--
 2137
 2151
 2168
 2207
 2208
 2209
 2210
 2211
 2266
 2296
(10 rows)

This query takes a sequential scan and works properly (returning 10 rows).

test=# select id from documents where to_document(words, points)  row('pleas 
radio news'::tsvector, '[(-90,-180),(90,180)]') order by id;
  id  
--
 2137
 2151
 2168
 2207
 2208
 2209
 2210
 2211
 2266
 2296
(10 rows)

This query uses my index structure and returns the same result as in the 
sequential scan above.
Until here, everything seems to work fine. However, if I issue the same queries 
using the count aggregate function in SQL, there are some odd results:
test=# select count(*) from documents where 
hybrid_index_query(to_document(words, points), row('radio pleas 
news'::tsvector, '[(-90,-180),(90, 180)]'));
 count 
---
10
(1 row)

Using the sequential scan, still, everything seems fine.
However, if I now do the index scan (my function will be called 11 times, 
returning false at the end), I get the following result:
test=# select count(*) from documents where to_document(words, points)  
row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
 count 
---
 7
(1 row)

This seems strange, because the same query returned 10 rows (when I didn't use 
the aggregate). If I issue queries that count the id column, I receive the 
following:
test=# select count(id) from documents where 
hybrid_index_query(to_document(words, points), row('radio pleas 
news'::tsvector, '[(-90,-180),(90, 180)]'));
 count 
---
10
(1 row)

test=# select count(id) from documents where to_document(words, points)  
row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
 count 
---
10
(1 row)

These two queries do again return the same results.
Thus, I don't know, what's wrong here, does anybody know about that behaviour, 
or is it my fault that the results are wrong, somehow?
Thanks in advance

Carsten Kropf
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom index structure and strange count problem

2010-06-09 Thread Robert Haas
On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf ckro...@fh-hof.de wrote:
 Hi *,
 during the last few months I've been building a new index structure as part 
 of a research project.
 Everything seems to work properly, however I have some strange issues with 
 the count sql command.
 I introduced some custom structures (mainly document and hybrid_query) with 
 which my index access method is supposed to work.
 There is an operator  which is supposed to use my index structure (what 
 also works properly).
 The function that maps to the operator  is called hybrid_index_query, 
 which I use to compare my results given from the index with the real results 
 that are supposed to appear in the final result set.
 Having described the outer circumstances (in a very short way), I will now 
 show the strange stuff that happens:

 test=# select id from documents where hybrid_index_query(to_document(words, 
 points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order 
 by id;
  id
 --
  2137
  2151
  2168
  2207
  2208
  2209
  2210
  2211
  2266
  2296
 (10 rows)

 This query takes a sequential scan and works properly (returning 10 rows).

 test=# select id from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]') order by id;
  id
 --
  2137
  2151
  2168
  2207
  2208
  2209
  2210
  2211
  2266
  2296
 (10 rows)

 This query uses my index structure and returns the same result as in the 
 sequential scan above.
 Until here, everything seems to work fine. However, if I issue the same 
 queries using the count aggregate function in SQL, there are some odd results:
 test=# select count(*) from documents where 
 hybrid_index_query(to_document(words, points), row('radio pleas 
 news'::tsvector, '[(-90,-180),(90, 180)]'));
  count
 ---
    10
 (1 row)

 Using the sequential scan, still, everything seems fine.
 However, if I now do the index scan (my function will be called 11 times, 
 returning false at the end), I get the following result:
 test=# select count(*) from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
  count
 ---
     7
 (1 row)

 This seems strange, because the same query returned 10 rows (when I didn't 
 use the aggregate). If I issue queries that count the id column, I receive 
 the following:
 test=# select count(id) from documents where 
 hybrid_index_query(to_document(words, points), row('radio pleas 
 news'::tsvector, '[(-90,-180),(90, 180)]'));
  count
 ---
    10
 (1 row)

 test=# select count(id) from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
  count
 ---
    10
 (1 row)

 These two queries do again return the same results.
 Thus, I don't know, what's wrong here, does anybody know about that 
 behaviour, or is it my fault that the results are wrong, somehow?
 Thanks in advance

I am guessing this is a bug in your code - have you used EXPLAIN to
verify that the second-to-last of the above queries is really hitting
your code?  If so, I'd recommend attaching with gdb and setting a
breakpoint wherever you return the tuples, and then poke around...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom index structure and strange count problem

2010-06-09 Thread Carsten Kropf
Hi,
thanks so far.
However, if I attach a Debugger (which I did in advance, too) and I use 
explain, I get the same results.
My first guess in each case is always that it is my fault. However, I don't 
know exactly, why this strange behaviour occurs here. The problem I have is 
that EXPLAIN, too, always tells me that it uses an index scan (in the cases 
where the query is supposed to use one).
The query plan looks exactly the same in any case (if I apply count(id) or 
count(*), respectively). However, the results differ.
The query plan is also the same, if I use the select * or select id query 
without applying an aggregate with the small difference that the aggregate is 
used, where it is supposed to be.
I just thought, that somebody has already had problems with something like that 
(actually it is no real problem, except that the aggregate applied to * 
queries causes a different count).
The query data my index structure is called with, stays the same in all cases 
(mentioned in the previous mail).
Does anybody have some hints according to which checks to perform in order to 
determine the problem here?

Best regards
Carsten Kropf
Am 09.06.2010 um 19:09 schrieb Robert Haas:

 On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf ckro...@fh-hof.de wrote:
 Hi *,
 during the last few months I've been building a new index structure as part 
 of a research project.
 Everything seems to work properly, however I have some strange issues with 
 the count sql command.
 I introduced some custom structures (mainly document and hybrid_query) with 
 which my index access method is supposed to work.
 There is an operator  which is supposed to use my index structure (what 
 also works properly).
 The function that maps to the operator  is called hybrid_index_query, 
 which I use to compare my results given from the index with the real results 
 that are supposed to appear in the final result set.
 Having described the outer circumstances (in a very short way), I will now 
 show the strange stuff that happens:
 
 test=# select id from documents where hybrid_index_query(to_document(words, 
 points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order 
 by id;
  id
 --
  2137
  2151
  2168
  2207
  2208
  2209
  2210
  2211
  2266
  2296
 (10 rows)
 
 This query takes a sequential scan and works properly (returning 10 rows).
 
 test=# select id from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]') order by id;
  id
 --
  2137
  2151
  2168
  2207
  2208
  2209
  2210
  2211
  2266
  2296
 (10 rows)
 
 This query uses my index structure and returns the same result as in the 
 sequential scan above.
 Until here, everything seems to work fine. However, if I issue the same 
 queries using the count aggregate function in SQL, there are some odd 
 results:
 test=# select count(*) from documents where 
 hybrid_index_query(to_document(words, points), row('radio pleas 
 news'::tsvector, '[(-90,-180),(90, 180)]'));
  count
 ---
10
 (1 row)
 
 Using the sequential scan, still, everything seems fine.
 However, if I now do the index scan (my function will be called 11 times, 
 returning false at the end), I get the following result:
 test=# select count(*) from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
  count
 ---
 7
 (1 row)
 
 This seems strange, because the same query returned 10 rows (when I didn't 
 use the aggregate). If I issue queries that count the id column, I receive 
 the following:
 test=# select count(id) from documents where 
 hybrid_index_query(to_document(words, points), row('radio pleas 
 news'::tsvector, '[(-90,-180),(90, 180)]'));
  count
 ---
10
 (1 row)
 
 test=# select count(id) from documents where to_document(words, points)  
 row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
  count
 ---
10
 (1 row)
 
 These two queries do again return the same results.
 Thus, I don't know, what's wrong here, does anybody know about that 
 behaviour, or is it my fault that the results are wrong, somehow?
 Thanks in advance
 
 I am guessing this is a bug in your code - have you used EXPLAIN to
 verify that the second-to-last of the above queries is really hitting
 your code?  If so, I'd recommend attaching with gdb and setting a
 breakpoint wherever you return the tuples, and then poke around...
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers