Re: pg_dump: Remove "blob" terminology
On Mon, Dec 5, 2022 at 10:56 AM Tom Lane wrote: > Interesting idea. We'd have to either read the TOC multiple times, > or shove the LOB TOC entries into a temporary file, either of which > has downsides. I wonder if we'd need to read the TOC entries repeatedly, or just incrementally. I haven't studied the pg_dump format much, but I wonder if we could arrange things so that the "boring" entries without dependencies, or maybe the LOB entries specifically, are grouped together in some way where we know the byte-length of that section and can just skip over it. Then when we need them we go back and read 'em one by one. (Of course this doesn't work if reading for standard input or if multiple phases of processing need them or whatever. Not trying to say I've solved the problem. But in general I think we need to give more thought to the possibility that "just read all the data into memory" is not an adequate solution to $PROBLEM.) -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_dump: Remove "blob" terminology
Robert Haas writes: > I wonder if we can't find a better solution than bunching TOC entries > together. Perhaps we don't need every TOC entry in memory > simultaneously, for instance, especially things like LOBs that don't > have dependencies. Interesting idea. We'd have to either read the TOC multiple times, or shove the LOB TOC entries into a temporary file, either of which has downsides. regards, tom lane
Re: pg_dump: Remove "blob" terminology
On Sat, Dec 3, 2022 at 11:07 AM Andrew Dunstan wrote: > > Well, what this would lose is the ability to selectively restore > > individual large objects using "pg_restore -L". I'm not sure who > > out there might be depending on that, but if we assume that's the > > empty set I fear we'll find out it's not. So a workaround switch > > seemed possibly worth the trouble. I don't have a position yet > > on which way ought to be the default. > > OK, fair point. I suspect making the batched mode the default would gain > more friends than enemies. A lot of people probably don't know that selective restore even exists but it is an AWESOME feature and I'd hate to see us break it, or even just degrade it. I wonder if we can't find a better solution than bunching TOC entries together. Perhaps we don't need every TOC entry in memory simultaneously, for instance, especially things like LOBs that don't have dependencies. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_dump: Remove "blob" terminology
On 02.12.22 09:34, Daniel Gustafsson wrote: On 2 Dec 2022, at 08:09, Peter Eisentraut wrote: fixed +1 on this version of the patch, LGTM. committed I also put back the old long options forms in the documentation and help but marked them deprecated. + --blobs (deprecated) While not in scope for this patch, I wonder if we should add a similar (deprecated) marker on other commandline options which are documented to be deprecated. -i on bin/postgres comes to mind as one candidate. makes sense
Re: pg_dump: Remove "blob" terminology
On 2022-12-02 Fr 12:40, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-02 Fr 09:18, Tom Lane wrote: >>> The scheme I've vaguely thought about, but not got round to writing, >>> is to merge all blobs with the same owner and ACL into one TOC entry. >>> One would hope that would get it down to a reasonable number of >>> TOC entries in practical applications. (Perhaps there'd need to be >>> a switch to make this optional.) >> +1 for fixing this. Your scheme seems reasonable. This has been a pain >> point for a long time. I'm not sure what we'd gain by making the fix >> optional. > Well, what this would lose is the ability to selectively restore > individual large objects using "pg_restore -L". I'm not sure who > out there might be depending on that, but if we assume that's the > empty set I fear we'll find out it's not. So a workaround switch > seemed possibly worth the trouble. I don't have a position yet > on which way ought to be the default. > > OK, fair point. I suspect making the batched mode the default would gain more friends than enemies. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_dump: Remove "blob" terminology
Andrew Dunstan writes: > On 2022-12-02 Fr 09:18, Tom Lane wrote: >> The scheme I've vaguely thought about, but not got round to writing, >> is to merge all blobs with the same owner and ACL into one TOC entry. >> One would hope that would get it down to a reasonable number of >> TOC entries in practical applications. (Perhaps there'd need to be >> a switch to make this optional.) > +1 for fixing this. Your scheme seems reasonable. This has been a pain > point for a long time. I'm not sure what we'd gain by making the fix > optional. Well, what this would lose is the ability to selectively restore individual large objects using "pg_restore -L". I'm not sure who out there might be depending on that, but if we assume that's the empty set I fear we'll find out it's not. So a workaround switch seemed possibly worth the trouble. I don't have a position yet on which way ought to be the default. regards, tom lane
Re: pg_dump: Remove "blob" terminology
On 2022-12-02 Fr 09:18, Tom Lane wrote: > we really need to do something about situations with $BIGNUM > large objects. Currently those tend to run pg_dump or pg_restore > out of memory because of TOC bloat, and we've seen multiple field > reports of that actually happening. > > The scheme I've vaguely thought about, but not got round to writing, > is to merge all blobs with the same owner and ACL into one TOC entry. > One would hope that would get it down to a reasonable number of > TOC entries in practical applications. (Perhaps there'd need to be > a switch to make this optional.) +1 for fixing this. Your scheme seems reasonable. This has been a pain point for a long time. I'm not sure what we'd gain by making the fix optional. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_dump: Remove "blob" terminology
> On 2 Dec 2022, at 15:18, Tom Lane wrote: > I'm not asking you to make that happen as part of this patch, but > please don't refactor things in a way that will make it harder. I have that on my TODO as well since 7da8823d83a2b66bdd917aa6cb2c5c2619d86011.ca...@credativ.de, and having read this patch I don't think it moves the needle in any way which complicates that. -- Daniel Gustafsson https://vmware.com/
Re: pg_dump: Remove "blob" terminology
Peter Eisentraut writes: > On 30.11.22 09:07, Daniel Gustafsson wrote: >> Should BLOB be changed to BLOBS here (and in similar comments) to make it >> clearer that it refers to the archive entry and the concept of a binary large >> object in general? > I changed this one and went through it again to tidy it up a bit more. > (There are both "BLOB" and "BLOBS" archive entries, so both forms still > exist in the code now.) I've not read this patch and don't have time right this moment, but I wanted to make a note about something to have in the back of your head: we really need to do something about situations with $BIGNUM large objects. Currently those tend to run pg_dump or pg_restore out of memory because of TOC bloat, and we've seen multiple field reports of that actually happening. The scheme I've vaguely thought about, but not got round to writing, is to merge all blobs with the same owner and ACL into one TOC entry. One would hope that would get it down to a reasonable number of TOC entries in practical applications. (Perhaps there'd need to be a switch to make this optional.) I'm not asking you to make that happen as part of this patch, but please don't refactor things in a way that will make it harder. regards, tom lane
Re: pg_dump: Remove "blob" terminology
> On 2 Dec 2022, at 08:09, Peter Eisentraut > wrote: > fixed +1 on this version of the patch, LGTM. > I also put back the old long options forms in the documentation and help but > marked them deprecated. + --blobs (deprecated) While not in scope for this patch, I wonder if we should add a similar (deprecated) marker on other commandline options which are documented to be deprecated. -i on bin/postgres comes to mind as one candidate. -- Daniel Gustafsson https://vmware.com/
Re: pg_dump: Remove "blob" terminology
On 30.11.22 09:07, Daniel Gustafsson wrote: The commit message contains a typo: functinos fixed * called for both BLOB and TABLE data; it is the responsibility of - * the format to manage each kind of data using StartBlob/StartData. + * the format to manage each kind of data using StartLO/StartData. Should BLOB be changed to BLOBS here (and in similar comments) to make it clearer that it refers to the archive entry and the concept of a binary large object in general? I changed this one and went through it again to tidy it up a bit more. (There are both "BLOB" and "BLOBS" archive entries, so both forms still exist in the code now.) Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl: # Tests which are considered 'full' dumps by pg_dump, but there. # are flags used to exclude specific items (ACLs, blobs, etc). fixed I also put back the old long options forms in the documentation and help but marked them deprecated. From e32d43952a4f2a054f25883c1136177cd1ad1fc3 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 2 Dec 2022 07:55:52 +0100 Subject: [PATCH v2] pg_dump: Remove "blob" terminology For historical reasons, pg_dump refers to large objects as "BLOBs". This term is not used anywhere else in PostgreSQL, and it also means something different in the SQL standard and other SQL systems. This patch renames internal functions, code comments, documentation, etc. to use the "large object" or "LO" terminology instead. There is no functionality change, so the archive format still uses the name "BLOB" for the archive entry. Additional long command-line options are added with the new naming. Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com --- doc/src/sgml/ref/pg_dump.sgml | 16 +- src/bin/pg_dump/pg_backup.h | 12 +- src/bin/pg_dump/pg_backup_archiver.c| 78 +++ src/bin/pg_dump/pg_backup_archiver.h| 34 +-- src/bin/pg_dump/pg_backup_custom.c | 66 +++--- src/bin/pg_dump/pg_backup_db.c | 4 +- src/bin/pg_dump/pg_backup_directory.c | 112 +- src/bin/pg_dump/pg_backup_null.c| 50 ++--- src/bin/pg_dump/pg_backup_tar.c | 68 +++--- src/bin/pg_dump/pg_dump.c | 216 ++-- src/bin/pg_dump/pg_dump.h | 8 +- src/bin/pg_dump/pg_dump_sort.c | 16 +- src/bin/pg_dump/t/002_pg_dump.pl| 46 ++--- src/test/modules/test_pg_dump/t/001_base.pl | 2 +- 14 files changed, 367 insertions(+), 361 deletions(-) diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 363d1327e2..2c938cd7e1 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -132,7 +132,8 @@ Options -b - --blobs + --large-objects + --blobs (deprecated) Include large objects in the dump. This is the default behavior @@ -140,7 +141,7 @@ Options --schema-only is specified. The -b switch is therefore only useful to add large objects to dumps where a specific schema or table has been requested. Note that -blobs are considered data and therefore will be included when +large objects are considered data and therefore will be included when --data-only is used, but not when --schema-only is. @@ -149,7 +150,8 @@ Options -B - --no-blobs + --no-large-objects + --no-blobs (deprecated) Exclude large objects in the dump. @@ -323,7 +325,7 @@ Options Output a directory-format archive suitable for input into pg_restore. This will create a directory - with one file for each table and blob being dumped, plus a + with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with @@ -434,9 +436,9 @@ Options - Non-schema objects such as blobs are not dumped when -n is - specified. You can add blobs back to the dump with the - --blobs switch. + Non-schema objects such as large objects are not dumped when -n is + specified. You can add large objects back to the dump with the + --large-objects switch. diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index bc6b6594af..aba780ef4b 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -53,9 +53,9 @@ typedef enum _archiveMode typedef enum _teSection { - SECTION_NONE = 1,
Re: pg_dump: Remove "blob" terminology
> On 30 Nov 2022, at 08:04, Peter Eisentraut > wrote: > > For historical reasons, pg_dump refers to large objects as "BLOBs". This term > is not used anywhere else in PostgreSQL, and it also means something > different in the SQL standard and other SQL systems. > > This patch renames internal functinos, code comments, documentation, etc. to > use the "large object" or "LO" terminology instead. There is no > functionality change, so the archive format still uses the name "BLOB" for > the archive entry. Additional long command-line options are added with the > new naming. +1 on doing this. No pointy bits stood out when reading, just a few small comments: The commit message contains a typo: functinos * called for both BLOB and TABLE data; it is the responsibility of - * the format to manage each kind of data using StartBlob/StartData. + * the format to manage each kind of data using StartLO/StartData. Should BLOB be changed to BLOBS here (and in similar comments) to make it clearer that it refers to the archive entry and the concept of a binary large object in general? Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl: # Tests which are considered 'full' dumps by pg_dump, but there. # are flags used to exclude specific items (ACLs, blobs, etc). -- Daniel Gustafsson https://vmware.com/
pg_dump: Remove "blob" terminology
For historical reasons, pg_dump refers to large objects as "BLOBs". This term is not used anywhere else in PostgreSQL, and it also means something different in the SQL standard and other SQL systems. This patch renames internal functinos, code comments, documentation, etc. to use the "large object" or "LO" terminology instead. There is no functionality change, so the archive format still uses the name "BLOB" for the archive entry. Additional long command-line options are added with the new naming.From e4307810980d4874153ab53142a31ddce8d85298 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 30 Nov 2022 07:52:19 +0100 Subject: [PATCH] pg_dump: Remove "blob" terminology For historical reasons, pg_dump refers to large objects as "BLOBs". This term is not used anywhere else in PostgreSQL, and it also means something different in the SQL standard and other SQL systems. This patch renames internal functinos, code comments, documentation, etc. to use the "large object" or "LO" terminology instead. There is no functionality change, so the archive format still uses the name "BLOB" for the archive entry. Additional long command-line options are added with the new naming. --- doc/src/sgml/ref/pg_dump.sgml | 14 +- src/bin/pg_dump/pg_backup.h | 12 +- src/bin/pg_dump/pg_backup_archiver.c | 76 - src/bin/pg_dump/pg_backup_archiver.h | 32 ++-- src/bin/pg_dump/pg_backup_custom.c| 52 +++ src/bin/pg_dump/pg_backup_db.c| 4 +- src/bin/pg_dump/pg_backup_directory.c | 104 ++--- src/bin/pg_dump/pg_backup_null.c | 50 +++--- src/bin/pg_dump/pg_backup_tar.c | 68 - src/bin/pg_dump/pg_dump.c | 212 +- src/bin/pg_dump/pg_dump.h | 8 +- src/bin/pg_dump/pg_dump_sort.c| 16 +- src/bin/pg_dump/t/002_pg_dump.pl | 46 +++--- 13 files changed, 348 insertions(+), 346 deletions(-) diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 8b9d9f4cad43..989815939bf1 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -132,7 +132,7 @@ Options -b - --blobs + --large-objects Include large objects in the dump. This is the default behavior @@ -140,7 +140,7 @@ Options --schema-only is specified. The -b switch is therefore only useful to add large objects to dumps where a specific schema or table has been requested. Note that -blobs are considered data and therefore will be included when +large objects are considered data and therefore will be included when --data-only is used, but not when --schema-only is. @@ -149,7 +149,7 @@ Options -B - --no-blobs + --no-large-objects Exclude large objects in the dump. @@ -323,7 +323,7 @@ Options Output a directory-format archive suitable for input into pg_restore. This will create a directory - with one file for each table and blob being dumped, plus a + with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with @@ -434,9 +434,9 @@ Options - Non-schema objects such as blobs are not dumped when -n is - specified. You can add blobs back to the dump with the - --blobs switch. + Non-schema objects such as large objects are not dumped when -n is + specified. You can add large objects back to the dump with the + --large-objects switch. diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index e8b78982971e..a39b8e6c9bb3 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -52,9 +52,9 @@ typedef enum _archiveMode typedef enum _teSection { - SECTION_NONE = 1, /* COMMENTs, ACLs, etc; can be anywhere */ + SECTION_NONE = 1, /* comments, ACLs, etc; can be anywhere */ SECTION_PRE_DATA, /* stuff to be processed before data */ - SECTION_DATA, /* TABLE DATA, BLOBS, BLOB COMMENTS */ + SECTION_DATA, /* table data, large objects, LO comments */ SECTION_POST_DATA /* stuff to be processed after data */ } teSection; @@ -191,8 +191,8 @@ typedef struct _dumpOptions int outputClean; int outputCreateDB; - booloutputBlobs; - booldontOutputBl