Re: pg_restore enhancements

2023-11-23 Thread Ron Johnson
Thanks for the explanation.

On Thu, Nov 23, 2023 at 10:55 AM Tom Lane  wrote:

> Ron Johnson  writes:
> > On Thu, Nov 23, 2023 at 3:37 AM Laurenz Albe 
> > wrote:
> >> You can avoidwriting WAL if you set "wal_level = minimal", restart
> >> PostgreSQL
> >> and restore the dump with the --single-transaction option.
>
> > Why does "--single-transaction" prevent WAL writes?  I'd expect _more_
> > pg_wal growth from One Ginormous Transaction.
>
> I don't recall all the details offhand, but there's some optimization
> concerned with not writing WAL if COPY's target table was created in
> the current transaction.  WAL will still be made for the catalog
> changes, but usually the bulk of the WAL for a pg_restore run comes
> from loading data, and this recipe eliminates that.  (Of course,
> you cannot use it on a replication primary.)
>
> regards, tom lane
>


Re: pg_restore enhancements

2023-11-23 Thread Tom Lane
Ron Johnson  writes:
> On Thu, Nov 23, 2023 at 3:37 AM Laurenz Albe 
> wrote:
>> You can avoidwriting WAL if you set "wal_level = minimal", restart
>> PostgreSQL
>> and restore the dump with the --single-transaction option.

> Why does "--single-transaction" prevent WAL writes?  I'd expect _more_
> pg_wal growth from One Ginormous Transaction.

I don't recall all the details offhand, but there's some optimization
concerned with not writing WAL if COPY's target table was created in
the current transaction.  WAL will still be made for the catalog
changes, but usually the bulk of the WAL for a pg_restore run comes
from loading data, and this recipe eliminates that.  (Of course,
you cannot use it on a replication primary.)

regards, tom lane




Re: pg_restore enhancements

2023-11-23 Thread Ron Johnson
On Thu, Nov 23, 2023 at 3:37 AM Laurenz Albe 
wrote:
[snip]

>
> You can avoidwriting WAL if you set "wal_level = minimal", restart
> PostgreSQL
> and restore the dump with the --single-transaction option.
>

Why does "--single-transaction" prevent WAL writes?  I'd expect _more_
pg_wal growth from One Ginormous Transaction.


Re: pg_restore enhancements

2023-11-23 Thread Laurenz Albe
On Wed, 2023-11-22 at 16:55 +, Efrain J. Berdecia wrote:
> Thanks, the issue we've run into, which I guess could be really a setup
> issue, with running a COPY command while executing pg_restore, 
> is that if we are restoring a large table (bigger than 500GB) our WAL 
> directory can grow to be very large.

You can avoidwriting WAL if you set "wal_level = minimal", restart PostgreSQL
and restore the dump with the --single-transaction option.

Yours,
Laurenz Albe




Re: pg_restore enhancements

2023-11-22 Thread Ron Johnson
On Wed, Nov 22, 2023 at 2:28 PM Tom Lane  wrote:

> "Efrain J. Berdecia"  writes:
> > Thanks, the issue we've run into, which I guess could be really a setup
> issue, with running a COPY command while executing pg_restore, is that if
> we are restoring a large table (bigger than 500GB) our WAL directory can
> grow to be very large.
> > I would think that if the pg_restore or COPY command was able to support
> a batch-size option, this should allow postgres to either archive or remove
> wal files and prevent having to re-size the WAL directory for a one time
> refresh operation.
> > I'm trying to gage how feasible would be to start looking at
> contributing to add such a feature to either the COPY command or pg_restore.
>
> Given the shortage of other complaints, I tend to agree with Adrian
> that there's not likely to be much interest in adding complexity
> to pg_restore (or COPY) to address this.  You should probably look
> harder at the idea that you have some configuration problem that's
> triggering your WAL bloat.  If COPY can run you out of WAL space,
> then so could any future bulk insert or update.
>

What OP needs, I think, since I'd use it, too, is "pg_bulkload without the
intrusive hacks and restrictions".


Re: pg_restore enhancements

2023-11-22 Thread Tom Lane
"Efrain J. Berdecia"  writes:
> Thanks, the issue we've run into, which I guess could be really a setup 
> issue, with running a COPY command while executing pg_restore, is that if we 
> are restoring a large table (bigger than 500GB) our WAL directory can grow to 
> be very large.
> I would think that if the pg_restore or COPY command was able to support a 
> batch-size option, this should allow postgres to either archive or remove wal 
> files and prevent having to re-size the WAL directory for a one time refresh 
> operation.
> I'm trying to gage how feasible would be to start looking at contributing to 
> add such a feature to either the COPY command or pg_restore.

Given the shortage of other complaints, I tend to agree with Adrian
that there's not likely to be much interest in adding complexity
to pg_restore (or COPY) to address this.  You should probably look
harder at the idea that you have some configuration problem that's
triggering your WAL bloat.  If COPY can run you out of WAL space,
then so could any future bulk insert or update.

regards, tom lane




Re: pg_restore enhancements

2023-11-22 Thread Efrain J. Berdecia
Thanks, the issue we've run into, which I guess could be really a setup issue, 
with running a COPY command while executing pg_restore, is that if we are 
restoring a large table (bigger than 500GB) our WAL directory can grow to be 
very large.
I would think that if the pg_restore or COPY command was able to support a 
batch-size option, this should allow postgres to either archive or remove wal 
files and prevent having to re-size the WAL directory for a one time refresh 
operation.
I'm trying to gage how feasible would be to start looking at contributing to 
add such a feature to either the COPY command or pg_restore.
Thanks,Efrain J. Berdecia 

On Wednesday, November 22, 2023 at 11:37:13 AM EST, Adrian Klaver 
 wrote:  
 
 On 11/22/23 05:25, Efrain J. Berdecia wrote:
> After working for a site where we are constantly doing logical pg_dump 
> to refresh environments I've come to miss features available in other 
> RDBMS' refresh/restore utilities.
> 
> Someone could point me in the right direction otherwise, but pg_restore 
> seems to be lacking the ability to resume a restore upon failure, is all 
> or nothing with this guy. There also doesn't seem to be a way to control 
> batch size when doing the COPY phase, therefore preventing the WAL 
> directory from filling up and crashing the system.

The above needs more information on Postgres version(community or fork), 
OS and version, the size of the data set, the storage type and size, the 
Postgres conf, etc. Restores are being done all the time and this is the 
first report, as far as I can remember,  about an issue with COPY and 
WAL in a restore.


pg_restore
https://www.postgresql.org/docs/current/app-pgrestore.html

Does have:

--section=sectionname

    Only restore the named section. The section name can be pre-data, 
data, or post-data. This option can be specified more than once to 
select multiple sections. The default is to restore all sections.

    The data section contains actual table data as well as large-object 
definitions. Post-data items consist of definitions of indexes, 
triggers, rules and constraints other than validated check constraints. 
Pre-data items consist of all other data definition items.


AND

-l
--list

    List the table of contents of the archive. The output of this 
operation can be used as input to the -L option. Note that if filtering 
switches such as -n or -t are used with -l, they will restrict the items 
listed.


-L list-file
--use-list=list-file

    Restore only those archive elements that are listed in list-file, 
and restore them in the order they appear in the file. Note that if 
filtering switches such as -n or -t are used with -L, they will further 
restrict the items restored.

    list-file is normally created by editing the output of a previous 
-l operation. Lines can be moved or removed, and can also be commented 
out by placing a semicolon (;) at the start of the line. See below for 
examples.


>
> IMHO, it would be nice to have a feature that would allow pg_restore to 
> resume based on which part of the restore and/or object have already 
> been restored.
> 
> When it comes to the COPY phase of the restore, it would be nice to be 
> able to control batch size and resume COPY of a particular object upon 
> failure.

COPY as it stands now is all or none, so that command would have to be 
changed.

> 
> Thanks in advance for any suggestions or the green light to post this to 
> the PG-developer group :-)
> 
> Thanks,
> Efrain J. Berdecia

-- 
Adrian Klaver
adrian.kla...@aklaver.com

  

Re: pg_restore enhancements

2023-11-22 Thread Adrian Klaver

On 11/22/23 05:25, Efrain J. Berdecia wrote:
After working for a site where we are constantly doing logical pg_dump 
to refresh environments I've come to miss features available in other 
RDBMS' refresh/restore utilities.


Someone could point me in the right direction otherwise, but pg_restore 
seems to be lacking the ability to resume a restore upon failure, is all 
or nothing with this guy. There also doesn't seem to be a way to control 
batch size when doing the COPY phase, therefore preventing the WAL 
directory from filling up and crashing the system.


The above needs more information on Postgres version(community or fork), 
OS and version, the size of the data set, the storage type and size, the 
Postgres conf, etc. Restores are being done all the time and this is the 
first report, as far as I can remember,  about an issue with COPY and 
WAL in a restore.



pg_restore
https://www.postgresql.org/docs/current/app-pgrestore.html

Does have:

--section=sectionname

Only restore the named section. The section name can be pre-data, 
data, or post-data. This option can be specified more than once to 
select multiple sections. The default is to restore all sections.


The data section contains actual table data as well as large-object 
definitions. Post-data items consist of definitions of indexes, 
triggers, rules and constraints other than validated check constraints. 
Pre-data items consist of all other data definition items.



AND

-l
--list

List the table of contents of the archive. The output of this 
operation can be used as input to the -L option. Note that if filtering 
switches such as -n or -t are used with -l, they will restrict the items 
listed.



-L list-file
--use-list=list-file

Restore only those archive elements that are listed in list-file, 
and restore them in the order they appear in the file. Note that if 
filtering switches such as -n or -t are used with -L, they will further 
restrict the items restored.


list-file is normally created by editing the output of a previous 
-l operation. Lines can be moved or removed, and can also be commented 
out by placing a semicolon (;) at the start of the line. See below for 
examples.





IMHO, it would be nice to have a feature that would allow pg_restore to 
resume based on which part of the restore and/or object have already 
been restored.


When it comes to the COPY phase of the restore, it would be nice to be 
able to control batch size and resume COPY of a particular object upon 
failure.


COPY as it stands now is all or none, so that command would have to be 
changed.




Thanks in advance for any suggestions or the green light to post this to 
the PG-developer group :-)


Thanks,
Efrain J. Berdecia


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: pg_restore enhancements

2023-11-22 Thread Efrain J. Berdecia
Thanks, I'm trying to gage the interest on such a feature enhancement. 
Up to now I have not actively contributed to the Postgres Project but this is 
itching my rusty programming fingers lol
Thanks,Efrain J. Berdecia 

On Wednesday, November 22, 2023 at 08:28:18 AM EST, David G. Johnston 
 wrote:  
 
 On Wednesday, November 22, 2023, Efrain J. Berdecia  
wrote:


Thanks in advance for any suggestions or the green light to post this to the 
PG-developer group :-)

If you aren’t offering up a patch for these it isn’t developer material and 
belongs right here.
David J.  

Re: pg_restore enhancements

2023-11-22 Thread David G. Johnston
On Wednesday, November 22, 2023, Efrain J. Berdecia 
wrote:

>
> Thanks in advance for any suggestions or the green light to post this to
> the PG-developer group :-)
>

If you aren’t offering up a patch for these it isn’t developer material and
belongs right here.

David J.


pg_restore enhancements

2023-11-22 Thread Efrain J. Berdecia
After working for a site where we are constantly doing logical pg_dump to 
refresh environments I've come to miss features available in other RDBMS' 
refresh/restore utilities.
Someone could point me in the right direction otherwise, but pg_restore seems 
to be lacking the ability to resume a restore upon failure, is all or nothing 
with this guy. There also doesn't seem to be a way to control batch size when 
doing the COPY phase, therefore preventing the WAL directory from filling up 
and crashing the system.
IMHO, it would be nice to have a feature that would allow pg_restore to resume 
based on which part of the restore and/or object have already been restored.
When it comes to the COPY phase of the restore, it would be nice to be able to 
control batch size and resume COPY of a particular object upon failure.
Thanks in advance for any suggestions or the green light to post this to the 
PG-developer group :-)
Thanks,Efrain J. Berdecia