Re: TEXT column > 1Gb

Pavel Stehule Tue, 11 Apr 2023 10:59:26 -0700

Hi


út 11. 4. 2023 v 19:42 odesílatel Joe Carlson <[email protected]> napsal:

> Hello,
>
> I’ve recently encountered the issue of trying to insert more than 1 Gb
> into a TEXT column. While the docs say TEXT is unlimited length, I had been
> unaware of the 1Gb buffer size limitations.
>

I think so this is some misunderstanding

see https://www.postgresql.org/docs/current/datatype-character.html

>>>The storage requirement for a short string (up to 126 bytes) is 1 byte
plus the actual string, which includes the space padding in the case of
character. Longer strings have 4 bytes of overhead instead of 1. Long
strings are compressed by the system automatically, so the physical
requirement on disk might be less. Very long values are also stored in
background tables so that they do not interfere with rapid access to
shorter column values. In any case, the longest possible character string
that can be stored is about 1 GB. (The maximum value that will be allowed
for *n* in the data type declaration is less than that. It wouldn't be
useful to change this because with multibyte character encodings the number
of characters and bytes can be quite different. If you desire to store long
strings with no specific upper limit, use text or character varying without
a length specifier, rather than making up an arbitrary length limit.)<<<

My note: nothing is unlimited in this world :-)

1GB is a theoretical limit, but depending on usage, it can sometimes be too
much - and the safe limit is about 500MB. Long strings can take too much
RAM in some cases.

For longer data you can use large objects LO API
https://www.postgresql.org/docs/current/largeobjects.html

It is much better than extra long strings, and the implementation is +/-
similar like you proposed.

Regards

Pavel


>
> We can debate whether or not saving something this big in a single column
> is a good idea (spoiler: it isn’t. But not my design and, in fairness, was
> not anticipated when the schema was designed.), I’d like to implement
> something that is not a major disruption and try to keep the mods on the
> server side. My first idea is to have a chunked associated table (in pseudo
> code)
>
> CREATE TABLE associated(key_id integer references main_table(key_id),
> chunk integer, text_start integer, text_end integer, text_chunk TEXT);
>
> And define functions for inserting and selecting by dividing into 1Mb
> chunks
>
> CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$
> DECLARE
>   chunk INTEGER := 0;
>   key_id ALIAS for $1;
>   the_text ALIAS for $2;
>   text_chunk TEXT;
> BEGIN
>   LOOP
>     text_chunk := substr(the_text,chunk*1000000,1000000);
>     IF length(text_chunk) = 0 THEN
>       EXIT;
>     END IF;
>     INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk)
> VALUES
> (key_id,chunk,chunk*1000000,(chunk*1000000+length(text_chunk)),text_chunk);
>     chunk := chunk + 1;
>   END LOOP;
>   RETURN chunk;
> END;
> $$ LANGUAGE plpgsql;
>
> This apparently runs into the same issues of buffers size: I get an
> ‘invalid message length’ in the log file and the insert fails. I can see
> from adding notices in the code that I never enter the LOOP; I assume
> having function arguments > 1Gb is also a bad thing.
>
> I’d like to continue to keep the modifications on the server size. And I’d
> like to believe someone else has had this problem before. Any suggestions
> other than have the client do the chunking? Can I use a different language
> binding and get around the argument length limitations?
>
> Thanks
>
>
>
>

Re: TEXT column > 1Gb

Reply via email to