Re: [HACKERS] Pg_upgrade speed for many tables

Bruce Momjian Wed, 14 Nov 2012 15:56:23 -0800

On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
> On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian <[email protected]> wrote:
> > Magnus reported that a customer with a million tables was finding
> > pg_upgrade slow.  I had never considered many table to be a problem, but
> > decided to test it.  I created a database with 2k tables like this:
> >
> >         CREATE TABLE test1990 (x SERIAL);
> >
> > Running the git version of pg_upgrade on that took 203 seconds.  Using
> > synchronous_commit=off dropped the time to 78 seconds.  This was tested
> > on magnetic disks with a write-through cache.  (No change on an SSD with
> > a super-capacitor.)
> >
> > I don't see anything unsafe about having pg_upgrade use
> > synchronous_commit=off.  I could set it just for the pg_dump reload, but
> > it seems safe to just use it always.  We don't write to the old cluster,
> > and if pg_upgrade fails, you have to re-initdb the new cluster anyway.
> >
> > Patch attached.  I think it should be applied to 9.2 as well.
> 
> Is turning off synchronous_commit enough?  What about turning off fsync?


I did some testing with the attached patch on a magnetic disk with no
BBU that turns off fsync;   I got these results:

                 sync_com=off  fsync=off
            1        15.90     13.51
         1000        26.09     24.56
         2000        33.41     31.20
         4000        57.39     57.74
         8000       102.84    116.28
        16000       189.43    207.84

It shows fsync faster for < 4k, and slower for > 4k.  Not sure why this
is the cause but perhaps the buffering of the fsync is actually faster
than doing a no-op fsync.

I don't think fsync=off makes sense, except for testing;  let me know if
I should test something else.

> When I'm doing a pg_upgrade with thousands of tables, the shutdown
> checkpoint after restoring the dump to the new cluster takes a very
> long time, as the writer drains its operation table by opening and
> individually fsync-ing thousands of files.  This takes about 40 ms per
> file, which I assume is a combination of slow lap-top disk drive, and
> a strange deal with ext4 which makes fsyncing a recently created file
> very slow.   But even with faster hdd, this would still be a problem
> if it works the same way, with every file needing 4 rotations to be
> fsynced and this happens in serial.

Is this with the current code that does synchronous_commit=off?  If not,
can you test to see if this is still a problem?

> Worse, the shutdown only waits for the default of 60 seconds for the
> shutdown to take place before it throws an error and the entire
> pg_upgrade gives up.  It seems to me that either the -t setting should
> be increased, or should be an option to pg_upgrade.
> 
> My work around was to invoke a system-wide "sync" a couple seconds
> after the 'pg_ctl stop' is initiated.  Flushing the files wholesale
> seems to work to make the checkpoint writer rapidly find it has
> nothing to do when it tries to flush them retail.
> 
> Anyway, the reason I think turning fsync off might be reasonable is
> that as soon as the new cluster is shut down, pg_upgrade starts
> overwriting most of those just-fsynced file with other files from the
> old cluster, and AFAICT makes no effort to fsync them.  So until there
> is a system-wide sync after the pg_upgrade finishes, your new cluster
> is already in mortal danger anyway.

pg_upgrade does a cluster shutdown before overwriting those files.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 49d4c8f..01e0dd3
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster)
*** 219,225 ****
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
  			 " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 (cluster == &new_cluster) ? " -c synchronous_commit=off" : "",
  			 cluster->pgopts ? cluster->pgopts : "", socket_string);
  
  	/*
--- 219,225 ----
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
  			 " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 (cluster == &new_cluster) ? " -c fsync=off" : "",
  			 cluster->pgopts ? cluster->pgopts : "", socket_string);
  
  	/*

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg_upgrade speed for many tables

Reply via email to