Re: [DOCS] Let's document a bytea bug

Bruce Momjian Mon, 03 Aug 2020 13:21:06 -0700

On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote:
> Hi Anna!
> 
> > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akent...@postgrespro.ru> 
> > написал(а):
> > 
> > 
> > Some time ago I've encountered a problem with the bytea type: we can't 
> > SELECT
> > bytea strings whose textual representation is too big to fit into 
> > StringInfoData.
> > And as a side effect, pg_dump refuses to dump tables with big bytea strings.
> > 
> > It's a bug, it's pretty confusing, but it seems like there's no pretty way
> > to fix it so far. Here's a link to a recent discussion on the issue:
> > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79...@postgrespro.ru#c8bdf802d41ec37003ec3b726db79...@postgrespro.ru
> > 
> > Since it won't be fixed anytime soon, I thought it could be worth 
> > documenting.
> > Attaching a patch for the documentation: I added some text to the "Binary 
> > Data Types"
> > part where I tried to describe the issue and to explain how to deal with it.
> > 
> > My patch in plain text (for convenience):
> > 
> > It is not recommended to use bytea strings whose textual representation
> > exceeds 1GB, as it may not be possible to SELECT them due to output size
> > limitations. Consequently, a table containing such big strings cannot be
> > properly processed by pg_dump, as pg_dump will try to SELECT these values 
> > from the
> > table and fail. The exact size limit advised for bytea strings depends on 
> > their
> > content, the external format and encoding that you are using, the context in
> > which they will be selected. The general rule is that when you use SELECT,
> > the returned tuple should not exceed 1GB. Although even if SELECT does not
> > work, you can still retrieve big bytea strings using COPY in binary format.
> 
> Thanks for this message. It took me a while to find out what was the problem.
> +1 for documenting this, maybe even with exact error like
> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid 
> memory alloc request size 1472599557
> It's really really scary. My first feeling was that it's TOAST corruption.


I still have Anna Akenteva's 2018 email in my mailbox because I wanted
to research this further.  Now that you have re-reported the problem, I
am on it!  ;-)

I looked for a clearer reproducible test case, and I have found this
one:

        $ awk 'BEGIN { printf "\\\\x"; for (i=0; i < 30000000; i++) \
                printf "7f7f7f7f7f7f7f7f7f7f"; print}' > /tmp/x
        
        $ psql test
        
        CREATE TABLE test(x BYTEA);
        
        COPY test FROM '/tmp/x';
        
        COPY test FROM '/tmp/x';
        
        CREATE TABLE big_data AS SELECT (string_agg(x,'')) AS x FROM test;
        
        SELECT length(x) FROM big_data;
          length
        -----------
         600000000
        
        SELECT octet_length(x) FROM big_data;
         octet_length
        --------------
            600000000

        SELECT x FROM big_data;
        ERROR:  invalid memory alloc request size 1200000003

        \q

        $ pg_dump -d test > /rtmp/a
        pg_dump: Dumping the contents of table "big_data" failed: PQgetResult() 
failed.
        pg_dump: Error message from server: ERROR:  invalid memory alloc 
request size 1200000003
        pg_dump: The command was: COPY public.big_data (x) TO stdout;

So, the character and octet length is 600 million, but on output, that
will be expanded, and both SELECT and pg_dump fail.  I also can't see
how to improve the error message since it happens so low in the stack.

In reading the previous posts, I understand that expanding the length
limit would be very hard, and it is difficult to see how to know if the
output will error at input time.  I am in favor of the 2018 proposed
patch, attached, perhaps with some small adjustments.

-- 
  Bruce Momjian  <br...@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 67bae32..3663398 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1308,6 +1308,22 @@ SELECT b, char_length(b) FROM test2;
     strings are appropriate for storing text.
    </para>
 
+
+   <para>
+    It is not recommended to use <type>bytea</type> strings whose
+    textual representation exceeds 1GB, as it may not be possible to
+    <command>SELECT</command> them due to output size limitations.
+    Consequently, a table containing such big strings cannot be properly
+    processed by pg_dump, as pg_dump will try to <command>SELECT</command>
+    these values from the table and fail. The exact size limit advised for
+    bytea strings depends on their content, the external format and encoding
+    that you are using, the context in which they will be selected. The
+    general rule is that when you use <command>SELECT</command>, the returned
+    tuple should not exceed 1GB. Although even if <command>SELECT</command>
+    does not work, you can still retrieve big <type>bytea</type> strings
+    using <command>COPY</command> in <literal>binary</literal> format.
+   </para>
+
    <para>
     The <type>bytea</type> type supports two external formats for
     input and output: <productname>PostgreSQL</productname>'s historical

Re: [DOCS] Let's document a bytea bug

Reply via email to