[HACKERS] TOAST issue on custom index access method

2010-06-24 Thread Carsten Kropf
Hi *,
I have a strange issue using a custom built index structure. My index access 
method support document type composed of words (as tsvector) and points 
(1-dimensional arrays of them). For internal reasons, I have to save the 
documents as a whole inside my structure (for proper reorganisations).
So, I form the tuples using index_form_tuple with the proper description. 
Everything works fine, as long as the documents are quite small. However, if 
the tsvector becomes too large, I run into a problem of not being able to store 
the documents, because (obviously) the tsvector is too large for one page.
Everything works fine, as long as I do insert ...(select...) statement, i.e. I 
have some table that feeds my table, where the index is set up, with data 
using, e.g.:
insert into documents(words, points) (select words, points from docs);

This query works in every case, because the words that I receive in my index 
are properly toasted in advance and the size of the documents is quite limited 
then. The document is constructed as CREATE TYPE document AS (words tsvector, 
points _point) .
However, I tried to insert a document containing a large words vector directly 
(the tsvector is of length 770, actually the points don't matter here too much).
I tried this using:

INSERT INTO documents (words, points) VALUES (to_tsvector('english', 
'extremely long string constant.'), 'points array');

If I perform the statement, mentioned above (I can also extract data of my 
feeding table 'docs' and use them as input), i.e. I insert the large vector 
directly, without pulling it out of some feeding table, I get a problem, when I 
call index_form_tuple, resulting in the following output:

ERROR:  index row requires 12320 bytes, maximum size is 8191
So, creating my tuple using index_form_tuple fails because of the tsvector not 
being toasted in prior to inserting it to my index.
What I tried to solve this issue here, is to extract the words from the 
document (in my index) and calling 'Datum toast_compress_datum(Datum value)'in 
order to compress the tsvector into a proper toast table. Unfortunately, this 
function always returned NULL, which meant that the vector could not be 
compressed, resulting in the same error.

Actually, currently, I don't know, how to proceed (of course, there would be 
the possibilty of stopping the direct insertion, creating a table which 
triggers the insertion into the real data table, containing the index and 
inserting the row there after the insertion, which I don't assume to be a good 
solution).
Does anybody know about this issue of getting untoasted values if the values 
are inserted directly and toasted values, if copying the data from another 
table?
I hope that somebody could help me, here in order that my access method is able 
to run, properly.


Thanks in advance

Best regards
Carsten Kropf

P.S.: The version I use to program is 8.4.2

Re: [HACKERS] TOAST issue on custom index access method

2010-06-24 Thread Tom Lane
Carsten Kropf ckro...@fh-hof.de writes:
 I have a strange issue using a custom built index structure. My index access 
 method support document type composed of words (as tsvector) and points 
 (1-dimensional arrays of them). For internal reasons, I have to save the 
 documents as a whole inside my structure (for proper reorganisations).
 So, I form the tuples using index_form_tuple with the proper description. 
 Everything works fine, as long as the documents are quite small. However, if 
 the tsvector becomes too large, I run into a problem of not being able to 
 store the documents, because (obviously) the tsvector is too large for one 
 page.

Well, of course.  I think this is a fundamentally bad index design.  You
didn't say exactly what sort of searches you want this index type to
accelerate, but perhaps you need a design closer to GIN, in which you'd
make index entries for individual words not whole documents.

 What I tried to solve this issue here, is to extract the words from the 
 document (in my index) and calling 'Datum toast_compress_datum(Datum 
 value)'in order to compress the tsvector into a proper toast table.

Indexes don't have toast tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TOAST issue on custom index access method

2010-06-24 Thread Carsten Kropf
Oh,
thanks. I didn't know about index tables not having access to associated toast 
values. The index access method is (at least in the textual part) similar to 
GIN, however, I needed to do some changes to it. Saving the whole document is 
actually only important for vacuum. I think, I will find some workaround to 
solve this issue.
However, it is a little bit strange, that I get toasted values (when inserting 
from another table) and untoasted values, if I insert items directly. Could 
anybody please explain this to me?

Best regards
Carsten Kropf
Am 24.06.2010 um 16:20 schrieb Tom Lane:

 Carsten Kropf ckro...@fh-hof.de writes:
 I have a strange issue using a custom built index structure. My index access 
 method support document type composed of words (as tsvector) and points 
 (1-dimensional arrays of them). For internal reasons, I have to save the 
 documents as a whole inside my structure (for proper reorganisations).
 So, I form the tuples using index_form_tuple with the proper description. 
 Everything works fine, as long as the documents are quite small. However, if 
 the tsvector becomes too large, I run into a problem of not being able to 
 store the documents, because (obviously) the tsvector is too large for one 
 page.
 
 Well, of course.  I think this is a fundamentally bad index design.  You
 didn't say exactly what sort of searches you want this index type to
 accelerate, but perhaps you need a design closer to GIN, in which you'd
 make index entries for individual words not whole documents.
 
 What I tried to solve this issue here, is to extract the words from the 
 document (in my index) and calling 'Datum toast_compress_datum(Datum 
 value)'in order to compress the tsvector into a proper toast table.
 
 Indexes don't have toast tables.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TOAST issue on custom index access method

2010-06-24 Thread Tom Lane
Carsten Kropf ckro...@fh-hof.de writes:
 However, it is a little bit strange, that I get toasted values (when 
 inserting from another table) and untoasted values, if I insert items 
 directly. Could anybody please explain this to me?

Huh?  An index will never ever get passed an externally-toasted value.
See the TOAST_INDEX_HACK code in indextuple.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TOAST issue on custom index access method

2010-06-24 Thread Carsten Kropf
Hmm,
k, then I don't know exactly what happens. I have some values in my composite 
type. The size of the index tuple (if coming from one table and being copied to 
the target table) is about 15x bytes. Otherwise, if I insert the same tuple 
directly, it is about 12kb. So, I don't know exactly what happens here, 
unfortunately.
Best regards
Carsten Kropf
Am 24.06.2010 um 17:08 schrieb Tom Lane:

 Carsten Kropf ckro...@fh-hof.de writes:
 However, it is a little bit strange, that I get toasted values (when 
 inserting from another table) and untoasted values, if I insert items 
 directly. Could anybody please explain this to me?
 
 Huh?  An index will never ever get passed an externally-toasted value.
 See the TOAST_INDEX_HACK code in indextuple.c.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers