Re: pg_dump: Remove "blob" terminology

2022-12-05 Thread Robert Haas
On Mon, Dec 5, 2022 at 10:56 AM Tom Lane  wrote:
> Interesting idea.  We'd have to either read the TOC multiple times,
> or shove the LOB TOC entries into a temporary file, either of which
> has downsides.

I wonder if we'd need to read the TOC entries repeatedly, or just incrementally.

I haven't studied the pg_dump format much, but I wonder if we could
arrange things so that the "boring" entries without dependencies, or
maybe the LOB entries specifically, are grouped together in some way
where we know the byte-length of that section and can just skip over
it. Then when we need them we go back and read 'em one by one.

(Of course this doesn't work if reading for standard input or if
multiple phases of processing need them or whatever. Not trying to say
I've solved the problem. But in general I think we need to give more
thought to the possibility that "just read all the data into memory"
is not an adequate solution to $PROBLEM.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump: Remove "blob" terminology

2022-12-05 Thread Tom Lane
Robert Haas  writes:
> I wonder if we can't find a better solution than bunching TOC entries
> together. Perhaps we don't need every TOC entry in memory
> simultaneously, for instance, especially things like LOBs that don't
> have dependencies.

Interesting idea.  We'd have to either read the TOC multiple times,
or shove the LOB TOC entries into a temporary file, either of which
has downsides.

regards, tom lane




Re: pg_dump: Remove "blob" terminology

2022-12-05 Thread Robert Haas
On Sat, Dec 3, 2022 at 11:07 AM Andrew Dunstan  wrote:
> > Well, what this would lose is the ability to selectively restore
> > individual large objects using "pg_restore -L".  I'm not sure who
> > out there might be depending on that, but if we assume that's the
> > empty set I fear we'll find out it's not.  So a workaround switch
> > seemed possibly worth the trouble.  I don't have a position yet
> > on which way ought to be the default.
>
> OK, fair point. I suspect making the batched mode the default would gain
> more friends than enemies.

A lot of people probably don't know that selective restore even exists
but it is an AWESOME feature and I'd hate to see us break it, or even
just degrade it.

I wonder if we can't find a better solution than bunching TOC entries
together. Perhaps we don't need every TOC entry in memory
simultaneously, for instance, especially things like LOBs that don't
have dependencies.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump: Remove "blob" terminology

2022-12-05 Thread Peter Eisentraut

On 02.12.22 09:34, Daniel Gustafsson wrote:

On 2 Dec 2022, at 08:09, Peter Eisentraut  
wrote:



fixed


+1 on this version of the patch, LGTM.


committed


I also put back the old long options forms in the documentation and help but 
marked them deprecated.


+  --blobs (deprecated)
While not in scope for this patch, I wonder if we should add a similar
(deprecated) marker on other commandline options which are documented to be
deprecated.  -i on bin/postgres comes to mind as one candidate.


makes sense





Re: pg_dump: Remove "blob" terminology

2022-12-03 Thread Andrew Dunstan


On 2022-12-02 Fr 12:40, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-02 Fr 09:18, Tom Lane wrote:
>>> The scheme I've vaguely thought about, but not got round to writing,
>>> is to merge all blobs with the same owner and ACL into one TOC entry.
>>> One would hope that would get it down to a reasonable number of
>>> TOC entries in practical applications.  (Perhaps there'd need to be
>>> a switch to make this optional.)
>> +1 for fixing this. Your scheme seems reasonable. This has been a pain
>> point for a long time. I'm not sure what we'd gain by making the fix
>> optional.
> Well, what this would lose is the ability to selectively restore
> individual large objects using "pg_restore -L".  I'm not sure who
> out there might be depending on that, but if we assume that's the
> empty set I fear we'll find out it's not.  So a workaround switch
> seemed possibly worth the trouble.  I don't have a position yet
> on which way ought to be the default.
>
>   


OK, fair point. I suspect making the batched mode the default would gain
more friends than enemies.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_dump: Remove "blob" terminology

2022-12-02 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-02 Fr 09:18, Tom Lane wrote:
>> The scheme I've vaguely thought about, but not got round to writing,
>> is to merge all blobs with the same owner and ACL into one TOC entry.
>> One would hope that would get it down to a reasonable number of
>> TOC entries in practical applications.  (Perhaps there'd need to be
>> a switch to make this optional.)

> +1 for fixing this. Your scheme seems reasonable. This has been a pain
> point for a long time. I'm not sure what we'd gain by making the fix
> optional.

Well, what this would lose is the ability to selectively restore
individual large objects using "pg_restore -L".  I'm not sure who
out there might be depending on that, but if we assume that's the
empty set I fear we'll find out it's not.  So a workaround switch
seemed possibly worth the trouble.  I don't have a position yet
on which way ought to be the default.

regards, tom lane




Re: pg_dump: Remove "blob" terminology

2022-12-02 Thread Andrew Dunstan


On 2022-12-02 Fr 09:18, Tom Lane wrote:
> we really need to do something about situations with $BIGNUM
> large objects.  Currently those tend to run pg_dump or pg_restore
> out of memory because of TOC bloat, and we've seen multiple field
> reports of that actually happening.
>
> The scheme I've vaguely thought about, but not got round to writing,
> is to merge all blobs with the same owner and ACL into one TOC entry.
> One would hope that would get it down to a reasonable number of
> TOC entries in practical applications.  (Perhaps there'd need to be
> a switch to make this optional.)


+1 for fixing this. Your scheme seems reasonable. This has been a pain
point for a long time. I'm not sure what we'd gain by making the fix
optional.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_dump: Remove "blob" terminology

2022-12-02 Thread Daniel Gustafsson
> On 2 Dec 2022, at 15:18, Tom Lane  wrote:

> I'm not asking you to make that happen as part of this patch, but
> please don't refactor things in a way that will make it harder.

I have that on my TODO as well since 
7da8823d83a2b66bdd917aa6cb2c5c2619d86011.ca...@credativ.de,
and having read this patch I don't think it moves the needle in any
way which complicates that.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_dump: Remove "blob" terminology

2022-12-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 30.11.22 09:07, Daniel Gustafsson wrote:
>> Should BLOB be changed to BLOBS here (and in similar comments) to make it
>> clearer that it refers to the archive entry and the concept of a binary large
>> object in general?

> I changed this one and went through it again to tidy it up a bit more. 
> (There are both "BLOB" and "BLOBS" archive entries, so both forms still 
> exist in the code now.)

I've not read this patch and don't have time right this moment, but
I wanted to make a note about something to have in the back of your
head: we really need to do something about situations with $BIGNUM
large objects.  Currently those tend to run pg_dump or pg_restore
out of memory because of TOC bloat, and we've seen multiple field
reports of that actually happening.

The scheme I've vaguely thought about, but not got round to writing,
is to merge all blobs with the same owner and ACL into one TOC entry.
One would hope that would get it down to a reasonable number of
TOC entries in practical applications.  (Perhaps there'd need to be
a switch to make this optional.)

I'm not asking you to make that happen as part of this patch, but
please don't refactor things in a way that will make it harder.

regards, tom lane




Re: pg_dump: Remove "blob" terminology

2022-12-02 Thread Daniel Gustafsson
> On 2 Dec 2022, at 08:09, Peter Eisentraut  
> wrote:

> fixed

+1 on this version of the patch, LGTM.

> I also put back the old long options forms in the documentation and help but 
> marked them deprecated.

+  --blobs (deprecated)
While not in scope for this patch, I wonder if we should add a similar
(deprecated) marker on other commandline options which are documented to be
deprecated.  -i on bin/postgres comes to mind as one candidate.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_dump: Remove "blob" terminology

2022-12-01 Thread Peter Eisentraut

On 30.11.22 09:07, Daniel Gustafsson wrote:

The commit message contains a typo: functinos


fixed


   * called for both BLOB and TABLE data; it is the responsibility of
- * the format to manage each kind of data using StartBlob/StartData.
+ * the format to manage each kind of data using StartLO/StartData.

Should BLOB be changed to BLOBS here (and in similar comments) to make it
clearer that it refers to the archive entry and the concept of a binary large
object in general?


I changed this one and went through it again to tidy it up a bit more. 
(There are both "BLOB" and "BLOBS" archive entries, so both forms still 
exist in the code now.)



Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl:

 # Tests which are considered 'full' dumps by pg_dump, but there.
 # are flags used to exclude specific items (ACLs, blobs, etc).


fixed

I also put back the old long options forms in the documentation and help 
but marked them deprecated.
From e32d43952a4f2a054f25883c1136177cd1ad1fc3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 2 Dec 2022 07:55:52 +0100
Subject: [PATCH v2] pg_dump: Remove "blob" terminology

For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.

This patch renames internal functions, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead.  There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry.  Additional long command-line options
are added with the new naming.

Discussion: 
https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com
---
 doc/src/sgml/ref/pg_dump.sgml   |  16 +-
 src/bin/pg_dump/pg_backup.h |  12 +-
 src/bin/pg_dump/pg_backup_archiver.c|  78 +++
 src/bin/pg_dump/pg_backup_archiver.h|  34 +--
 src/bin/pg_dump/pg_backup_custom.c  |  66 +++---
 src/bin/pg_dump/pg_backup_db.c  |   4 +-
 src/bin/pg_dump/pg_backup_directory.c   | 112 +-
 src/bin/pg_dump/pg_backup_null.c|  50 ++---
 src/bin/pg_dump/pg_backup_tar.c |  68 +++---
 src/bin/pg_dump/pg_dump.c   | 216 ++--
 src/bin/pg_dump/pg_dump.h   |   8 +-
 src/bin/pg_dump/pg_dump_sort.c  |  16 +-
 src/bin/pg_dump/t/002_pg_dump.pl|  46 ++---
 src/test/modules/test_pg_dump/t/001_base.pl |   2 +-
 14 files changed, 367 insertions(+), 361 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 363d1327e2..2c938cd7e1 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -132,7 +132,8 @@ Options
 
  
   -b
-  --blobs
+  --large-objects
+  --blobs (deprecated)
   

 Include large objects in the dump.  This is the default behavior
@@ -140,7 +141,7 @@ Options
 --schema-only is specified.  The -b
 switch is therefore only useful to add large objects to dumps
 where a specific schema or table has been requested.  Note that
-blobs are considered data and therefore will be included when
+large objects are considered data and therefore will be included when
 --data-only is used, but not
 when --schema-only is.

@@ -149,7 +150,8 @@ Options
 
  
   -B
-  --no-blobs
+  --no-large-objects
+  --no-blobs (deprecated)
   

 Exclude large objects in the dump.
@@ -323,7 +325,7 @@ Options
   
Output a directory-format archive suitable for input into
pg_restore. This will create a directory
-   with one file for each table and blob being dumped, plus a
+   with one file for each table and large object being dumped, plus a
so-called Table of Contents file describing the dumped objects in a
machine-readable format that pg_restore
can read. A directory format archive can be manipulated with
@@ -434,9 +436,9 @@ Options
 

 
- Non-schema objects such as blobs are not dumped when 
-n is
- specified.  You can add blobs back to the dump with the
- --blobs switch.
+ Non-schema objects such as large objects are not dumped when 
-n is
+ specified.  You can add large objects back to the dump with the
+ --large-objects switch.
 

 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index bc6b6594af..aba780ef4b 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -53,9 +53,9 @@ typedef enum _archiveMode
 
 typedef enum _teSection
 {
-   SECTION_NONE = 1, 

Re: pg_dump: Remove "blob" terminology

2022-11-30 Thread Daniel Gustafsson
> On 30 Nov 2022, at 08:04, Peter Eisentraut 
>  wrote:
> 
> For historical reasons, pg_dump refers to large objects as "BLOBs". This term 
> is not used anywhere else in PostgreSQL, and it also means something 
> different in the SQL standard and other SQL systems.
> 
> This patch renames internal functinos, code comments, documentation, etc. to 
> use the "large object" or "LO" terminology instead.  There is no 
> functionality change, so the archive format still uses the name "BLOB" for 
> the archive entry.  Additional long command-line options are added with the 
> new naming.

+1 on doing this.  No pointy bits stood out when reading, just a few small
comments:

The commit message contains a typo: functinos

  * called for both BLOB and TABLE data; it is the responsibility of
- * the format to manage each kind of data using StartBlob/StartData.
+ * the format to manage each kind of data using StartLO/StartData.

Should BLOB be changed to BLOBS here (and in similar comments) to make it
clearer that it refers to the archive entry and the concept of a binary large
object in general?

Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl:

# Tests which are considered 'full' dumps by pg_dump, but there.
# are flags used to exclude specific items (ACLs, blobs, etc).

--
Daniel Gustafsson   https://vmware.com/





pg_dump: Remove "blob" terminology

2022-11-29 Thread Peter Eisentraut
For historical reasons, pg_dump refers to large objects as "BLOBs". 
This term is not used anywhere else in PostgreSQL, and it also means 
something different in the SQL standard and other SQL systems.


This patch renames internal functinos, code comments, documentation, 
etc. to use the "large object" or "LO" terminology instead.  There is no 
functionality change, so the archive format still uses the name "BLOB" 
for the archive entry.  Additional long command-line options are added 
with the new naming.From e4307810980d4874153ab53142a31ddce8d85298 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 30 Nov 2022 07:52:19 +0100
Subject: [PATCH] pg_dump: Remove "blob" terminology

For historical reasons, pg_dump refers to large objects as "BLOBs".
This term is not used anywhere else in PostgreSQL, and it also means
something different in the SQL standard and other SQL systems.

This patch renames internal functinos, code comments, documentation,
etc. to use the "large object" or "LO" terminology instead.  There is
no functionality change, so the archive format still uses the name
"BLOB" for the archive entry.  Additional long command-line options
are added with the new naming.
---
 doc/src/sgml/ref/pg_dump.sgml |  14 +-
 src/bin/pg_dump/pg_backup.h   |  12 +-
 src/bin/pg_dump/pg_backup_archiver.c  |  76 -
 src/bin/pg_dump/pg_backup_archiver.h  |  32 ++--
 src/bin/pg_dump/pg_backup_custom.c|  52 +++
 src/bin/pg_dump/pg_backup_db.c|   4 +-
 src/bin/pg_dump/pg_backup_directory.c | 104 ++---
 src/bin/pg_dump/pg_backup_null.c  |  50 +++---
 src/bin/pg_dump/pg_backup_tar.c   |  68 -
 src/bin/pg_dump/pg_dump.c | 212 +-
 src/bin/pg_dump/pg_dump.h |   8 +-
 src/bin/pg_dump/pg_dump_sort.c|  16 +-
 src/bin/pg_dump/t/002_pg_dump.pl  |  46 +++---
 13 files changed, 348 insertions(+), 346 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad43..989815939bf1 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -132,7 +132,7 @@ Options
 
  
   -b
-  --blobs
+  --large-objects
   

 Include large objects in the dump.  This is the default behavior
@@ -140,7 +140,7 @@ Options
 --schema-only is specified.  The -b
 switch is therefore only useful to add large objects to dumps
 where a specific schema or table has been requested.  Note that
-blobs are considered data and therefore will be included when
+large objects are considered data and therefore will be included when
 --data-only is used, but not
 when --schema-only is.

@@ -149,7 +149,7 @@ Options
 
  
   -B
-  --no-blobs
+  --no-large-objects
   

 Exclude large objects in the dump.
@@ -323,7 +323,7 @@ Options
   
Output a directory-format archive suitable for input into
pg_restore. This will create a directory
-   with one file for each table and blob being dumped, plus a
+   with one file for each table and large object being dumped, plus a
so-called Table of Contents file describing the dumped objects in a
machine-readable format that pg_restore
can read. A directory format archive can be manipulated with
@@ -434,9 +434,9 @@ Options
 

 
- Non-schema objects such as blobs are not dumped when 
-n is
- specified.  You can add blobs back to the dump with the
- --blobs switch.
+ Non-schema objects such as large objects are not dumped when 
-n is
+ specified.  You can add large objects back to the dump with the
+ --large-objects switch.
 

 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index e8b78982971e..a39b8e6c9bb3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -52,9 +52,9 @@ typedef enum _archiveMode
 
 typedef enum _teSection
 {
-   SECTION_NONE = 1,   /* COMMENTs, ACLs, etc; can be 
anywhere */
+   SECTION_NONE = 1,   /* comments, ACLs, etc; can be 
anywhere */
SECTION_PRE_DATA,   /* stuff to be processed before 
data */
-   SECTION_DATA,   /* TABLE DATA, BLOBS, BLOB 
COMMENTS */
+   SECTION_DATA,   /* table data, large objects, 
LO comments */
SECTION_POST_DATA   /* stuff to be processed after 
data */
 } teSection;
 
@@ -191,8 +191,8 @@ typedef struct _dumpOptions
 
int outputClean;
int outputCreateDB;
-   booloutputBlobs;
-   booldontOutputBl