Re: [GENERAL] (Ideas) pg_dump in a production environment

Russell Smith Tue, 24 May 2005 03:07:52 -0700

On Tue, 24 May 2005 02:12 pm, Tom Lane wrote:
> "Thomas F. O'Connell" <[EMAIL PROTECTED]> writes:
> > I'd like to use pg_dump to grab a live backup and, based on the  
> > documentation, this would seem to be a realistic possibility. When I  
> > try, though, during business hours, when people are frequently  
> > logging in and otherwise using the application, the application  
> > becomes almost unusable (to the point where logins take on the order  
> > of minutes).
> 
> The pg_dump sources contain some comments about throttling the rate
> at which data is pulled from the server, with a statement that this
> idea was discussed during July 2000 and eventually dropped.  Perhaps
> you can think of a better implementation.


A brief look at the code suggests a couple of possibilities for fixing problems.
There seem to be a least two different issues here from the user point of view.

Issue 1: Large Tables cause server slowdown
---
There are two dump cases in my short reading of the pg_dump code.  Case 1 is
the copy dump, which is done as one command.  The server does most of the work.
Case 2 is the INSERT type dump, where the pg_dump client does most of the work
creating the INSERT statement.  Case 2 is done with a cursor, and it would be 
easy to
insert a fixed delay sleep at the end of a certain amount of record dumps.  I'm 
sure we
could work out the average size of a tuple in this case (2), and even pause 
after a certain
amount of data has been transferred.

I am unsure about how to attack Case 1, as mentioned it is handled mostly in 
the backend code
which we don't really control.  If it could be declared as a CURSOR you could 
you the same
principal as Case 2.  The current throttling suggestions are all based on time. 
 I think that
a data/counter based solution would be less intense on the system.  When 
counter is reached,
just do a usleep for the throttle time.


Issue 2: Full Backups of large amount of data saturate disk I/O (Many tables 
make it slow)
---
If the backup dump is large, and given all files will be sequentially scanned 
during the backed,
the server IO is going to be pushed to the limit.  A pause between dumping 
tables seems a
simple possibility to reduce the ongoing IO load on the server to allow for a 
period where other
requests can be served.  This would result in a bursty type performance 
improvement.  In environments
with large numbers of tables of a reasonable size, this could give a benefit.


---
In releases prior to 8.0, any sort of wait on a certain amount of data would 
possibly not evict high use
data as the wait time would mean that the frequently used data would have been 
accessed again,
meaning you would evict the seqscan data you requested for the previous part of 
the dump.
In post 8.0, or 8.1 with clock sweep, it's possibly the same situation with 
regard to the delays, but
you could possibly process larger amounts of data before the sleep, as you 
would keep recycling the same
buffers.  You would use the sleep to reduce disk IO more than the reduce cache 
eviction.

The problem with timing waits for any backups are the database is not able to 
be vacuumed.  In some
limited circumstances (like mine), If you have a long running transaction that 
blocks vacuum to certain
small high update tables, you lose performance as the table bloats and can only 
fix it with a vacuum full.

Both of these suggestions may be totally bogus.  So I suppose I'm asking for 
feedback on them to see if
they would be worthwhile implementing.

Regards

Russell Smith

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] (Ideas) pg_dump in a production environment

Reply via email to