[HACKERS] TOAST issue on custom index access method
Hi *, I have a strange issue using a custom built index structure. My index access method support document type composed of words (as tsvector) and points (1-dimensional arrays of them). For internal reasons, I have to save the documents as a whole inside my structure (for proper reorganisations). So, I form the tuples using index_form_tuple with the proper description. Everything works fine, as long as the documents are quite small. However, if the tsvector becomes too large, I run into a problem of not being able to store the documents, because (obviously) the tsvector is too large for one page. Everything works fine, as long as I do insert ...(select...) statement, i.e. I have some table that feeds my table, where the index is set up, with data using, e.g.: insert into documents(words, points) (select words, points from docs); This query works in every case, because the words that I receive in my index are properly toasted in advance and the size of the documents is quite limited then. The document is constructed as CREATE TYPE document AS (words tsvector, points _point) . However, I tried to insert a document containing a large words vector directly (the tsvector is of length 770, actually the points don't matter here too much). I tried this using: INSERT INTO documents (words, points) VALUES (to_tsvector('english', 'extremely long string constant.'), 'points array'); If I perform the statement, mentioned above (I can also extract data of my feeding table 'docs' and use them as input), i.e. I insert the large vector directly, without pulling it out of some feeding table, I get a problem, when I call index_form_tuple, resulting in the following output: ERROR: index row requires 12320 bytes, maximum size is 8191 So, creating my tuple using index_form_tuple fails because of the tsvector not being toasted in prior to inserting it to my index. What I tried to solve this issue here, is to extract the words from the document (in my index) and calling 'Datum toast_compress_datum(Datum value)'in order to compress the tsvector into a proper toast table. Unfortunately, this function always returned NULL, which meant that the vector could not be compressed, resulting in the same error. Actually, currently, I don't know, how to proceed (of course, there would be the possibilty of stopping the direct insertion, creating a table which triggers the insertion into the real data table, containing the index and inserting the row there after the insertion, which I don't assume to be a good solution). Does anybody know about this issue of getting untoasted values if the values are inserted directly and toasted values, if copying the data from another table? I hope that somebody could help me, here in order that my access method is able to run, properly. Thanks in advance Best regards Carsten Kropf P.S.: The version I use to program is 8.4.2
Re: [HACKERS] TOAST issue on custom index access method
Oh, thanks. I didn't know about index tables not having access to associated toast values. The index access method is (at least in the textual part) similar to GIN, however, I needed to do some changes to it. Saving the whole document is actually only important for vacuum. I think, I will find some workaround to solve this issue. However, it is a little bit strange, that I get toasted values (when inserting from another table) and untoasted values, if I insert items directly. Could anybody please explain this to me? Best regards Carsten Kropf Am 24.06.2010 um 16:20 schrieb Tom Lane: Carsten Kropf ckro...@fh-hof.de writes: I have a strange issue using a custom built index structure. My index access method support document type composed of words (as tsvector) and points (1-dimensional arrays of them). For internal reasons, I have to save the documents as a whole inside my structure (for proper reorganisations). So, I form the tuples using index_form_tuple with the proper description. Everything works fine, as long as the documents are quite small. However, if the tsvector becomes too large, I run into a problem of not being able to store the documents, because (obviously) the tsvector is too large for one page. Well, of course. I think this is a fundamentally bad index design. You didn't say exactly what sort of searches you want this index type to accelerate, but perhaps you need a design closer to GIN, in which you'd make index entries for individual words not whole documents. What I tried to solve this issue here, is to extract the words from the document (in my index) and calling 'Datum toast_compress_datum(Datum value)'in order to compress the tsvector into a proper toast table. Indexes don't have toast tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST issue on custom index access method
Hmm, k, then I don't know exactly what happens. I have some values in my composite type. The size of the index tuple (if coming from one table and being copied to the target table) is about 15x bytes. Otherwise, if I insert the same tuple directly, it is about 12kb. So, I don't know exactly what happens here, unfortunately. Best regards Carsten Kropf Am 24.06.2010 um 17:08 schrieb Tom Lane: Carsten Kropf ckro...@fh-hof.de writes: However, it is a little bit strange, that I get toasted values (when inserting from another table) and untoasted values, if I insert items directly. Could anybody please explain this to me? Huh? An index will never ever get passed an externally-toasted value. See the TOAST_INDEX_HACK code in indextuple.c. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Custom index structure and strange count problem
Hi *, during the last few months I've been building a new index structure as part of a research project. Everything seems to work properly, however I have some strange issues with the count sql command. I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed to work. There is an operator which is supposed to use my index structure (what also works properly). The function that maps to the operator is called hybrid_index_query, which I use to compare my results given from the index with the real results that are supposed to appear in the final result set. Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens: test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order by id; id -- 2137 2151 2168 2207 2208 2209 2210 2211 2266 2296 (10 rows) This query takes a sequential scan and works properly (returning 10 rows). test=# select id from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]') order by id; id -- 2137 2151 2168 2207 2208 2209 2210 2211 2266 2296 (10 rows) This query uses my index structure and returns the same result as in the sequential scan above. Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function in SQL, there are some odd results: test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')); count --- 10 (1 row) Using the sequential scan, still, everything seems fine. However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the following result: test=# select count(*) from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]'); count --- 7 (1 row) This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries that count the id column, I receive the following: test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')); count --- 10 (1 row) test=# select count(id) from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]'); count --- 10 (1 row) These two queries do again return the same results. Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results are wrong, somehow? Thanks in advance Carsten Kropf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom index structure and strange count problem
Hi, thanks so far. However, if I attach a Debugger (which I did in advance, too) and I use explain, I get the same results. My first guess in each case is always that it is my fault. However, I don't know exactly, why this strange behaviour occurs here. The problem I have is that EXPLAIN, too, always tells me that it uses an index scan (in the cases where the query is supposed to use one). The query plan looks exactly the same in any case (if I apply count(id) or count(*), respectively). However, the results differ. The query plan is also the same, if I use the select * or select id query without applying an aggregate with the small difference that the aggregate is used, where it is supposed to be. I just thought, that somebody has already had problems with something like that (actually it is no real problem, except that the aggregate applied to * queries causes a different count). The query data my index structure is called with, stays the same in all cases (mentioned in the previous mail). Does anybody have some hints according to which checks to perform in order to determine the problem here? Best regards Carsten Kropf Am 09.06.2010 um 19:09 schrieb Robert Haas: On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf ckro...@fh-hof.de wrote: Hi *, during the last few months I've been building a new index structure as part of a research project. Everything seems to work properly, however I have some strange issues with the count sql command. I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed to work. There is an operator which is supposed to use my index structure (what also works properly). The function that maps to the operator is called hybrid_index_query, which I use to compare my results given from the index with the real results that are supposed to appear in the final result set. Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens: test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order by id; id -- 2137 2151 2168 2207 2208 2209 2210 2211 2266 2296 (10 rows) This query takes a sequential scan and works properly (returning 10 rows). test=# select id from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]') order by id; id -- 2137 2151 2168 2207 2208 2209 2210 2211 2266 2296 (10 rows) This query uses my index structure and returns the same result as in the sequential scan above. Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function in SQL, there are some odd results: test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')); count --- 10 (1 row) Using the sequential scan, still, everything seems fine. However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the following result: test=# select count(*) from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]'); count --- 7 (1 row) This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries that count the id column, I receive the following: test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')); count --- 10 (1 row) test=# select count(id) from documents where to_document(words, points) row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]'); count --- 10 (1 row) These two queries do again return the same results. Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results are wrong, somehow? Thanks in advance I am guessing this is a bug in your code - have you used EXPLAIN to verify that the second-to-last of the above queries is really hitting your code? If so, I'd recommend attaching with gdb and setting a breakpoint wherever you return the tuples, and then poke around... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers