On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote: > Hi Anna! > > > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akent...@postgrespro.ru> > > написал(а): > > > > > > Some time ago I've encountered a problem with the bytea type: we can't > > SELECT > > bytea strings whose textual representation is too big to fit into > > StringInfoData. > > And as a side effect, pg_dump refuses to dump tables with big bytea strings. > > > > It's a bug, it's pretty confusing, but it seems like there's no pretty way > > to fix it so far. Here's a link to a recent discussion on the issue: > > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79...@postgrespro.ru#c8bdf802d41ec37003ec3b726db79...@postgrespro.ru > > > > Since it won't be fixed anytime soon, I thought it could be worth > > documenting. > > Attaching a patch for the documentation: I added some text to the "Binary > > Data Types" > > part where I tried to describe the issue and to explain how to deal with it. > > > > My patch in plain text (for convenience): > > > > It is not recommended to use bytea strings whose textual representation > > exceeds 1GB, as it may not be possible to SELECT them due to output size > > limitations. Consequently, a table containing such big strings cannot be > > properly processed by pg_dump, as pg_dump will try to SELECT these values > > from the > > table and fail. The exact size limit advised for bytea strings depends on > > their > > content, the external format and encoding that you are using, the context in > > which they will be selected. The general rule is that when you use SELECT, > > the returned tuple should not exceed 1GB. Although even if SELECT does not > > work, you can still retrieve big bytea strings using COPY in binary format. > > Thanks for this message. It took me a while to find out what was the problem. > +1 for documenting this, maybe even with exact error like > [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR: invalid > memory alloc request size 1472599557 > It's really really scary. My first feeling was that it's TOAST corruption.
I still have Anna Akenteva's 2018 email in my mailbox because I wanted to research this further. Now that you have re-reported the problem, I am on it! ;-) I looked for a clearer reproducible test case, and I have found this one: $ awk 'BEGIN { printf "\\\\x"; for (i=0; i < 30000000; i++) \ printf "7f7f7f7f7f7f7f7f7f7f"; print}' > /tmp/x $ psql test CREATE TABLE test(x BYTEA); COPY test FROM '/tmp/x'; COPY test FROM '/tmp/x'; CREATE TABLE big_data AS SELECT (string_agg(x,'')) AS x FROM test; SELECT length(x) FROM big_data; length ----------- 600000000 SELECT octet_length(x) FROM big_data; octet_length -------------- 600000000 SELECT x FROM big_data; ERROR: invalid memory alloc request size 1200000003 \q $ pg_dump -d test > /rtmp/a pg_dump: Dumping the contents of table "big_data" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 1200000003 pg_dump: The command was: COPY public.big_data (x) TO stdout; So, the character and octet length is 600 million, but on output, that will be expanded, and both SELECT and pg_dump fail. I also can't see how to improve the error message since it happens so low in the stack. In reading the previous posts, I understand that expanding the length limit would be very hard, and it is difficult to see how to know if the output will error at input time. I am in favor of the 2018 proposed patch, attached, perhaps with some small adjustments. -- Bruce Momjian <br...@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 67bae32..3663398 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1308,6 +1308,22 @@ SELECT b, char_length(b) FROM test2; strings are appropriate for storing text. </para> + + <para> + It is not recommended to use <type>bytea</type> strings whose + textual representation exceeds 1GB, as it may not be possible to + <command>SELECT</command> them due to output size limitations. + Consequently, a table containing such big strings cannot be properly + processed by pg_dump, as pg_dump will try to <command>SELECT</command> + these values from the table and fail. The exact size limit advised for + bytea strings depends on their content, the external format and encoding + that you are using, the context in which they will be selected. The + general rule is that when you use <command>SELECT</command>, the returned + tuple should not exceed 1GB. Although even if <command>SELECT</command> + does not work, you can still retrieve big <type>bytea</type> strings + using <command>COPY</command> in <literal>binary</literal> format. + </para> + <para> The <type>bytea</type> type supports two external formats for input and output: <productname>PostgreSQL</productname>'s historical