On Wed, Nov 14, 2012 at 10:08:15AM -0500, Bruce Momjian wrote: > > I agree that parallel restore for schemas is a hard problem. But I > > didn't mean parallelism within the restore, I meant that we could > > start both postmasters and pipe the output from dump directly to > > restore. This way the times for dumping and restoring can overlap. > > Wow, that is a very creative idea. The current code doesn't do that, > but this has the potential of doubling pg_upgrade's speed, without > adding a lot of complexity. Here are the challenges of this approach: > > * I would need to log the output of pg_dumpall as it is passed to psql > so users can debug problems > > * pg_upgrade never runs the old and new clusters at the same time for > fear that it will run out of resources, e.g. shared memory, or if they > are using the same port number. We can make this optional and force > different port numbers. > > Let me work up a prototype in the next few days and see how it performs. > Thanks for the great idea.
I have developed the attached proof-of-concept patch to test this idea. Unfortunately, I got poor results: ---- pg_upgrade ---- dump restore dmp|res git dmp/res 1 0.12 0.07 0.13 11.16 13.03 1000 3.80 2.83 5.46 18.78 20.27 2000 5.39 5.65 13.99 26.78 28.54 4000 16.08 12.40 28.34 41.90 44.03 8000 32.77 25.70 57.97 78.61 80.09 16000 57.67 63.42 134.43 158.49 165.78 32000 131.84 176.27 302.85 380.11 389.48 64000 270.37 708.30 1004.39 1085.39 1094.70 The last two columns show the patch didn't help at all, and the third column shows it is just executing the pg_dump, then the restore, not in parallel, i.e. column 1 + column 2 ~= column 3. Testing pg_dump for 4k tables (16 seconds) shows the first row is not output by pg_dump until 15 seconds, meaning there can't be any parallelism with a pipe. (Test script attached.) Does anyone know how to get pg_dump to send some output earlier? In summary, it doesn't seem pg_dump makes any attempt to output its data early. pg_dump.c has some details: /* * And finally we can do the actual output. * * Note: for non-plain-text output formats, the output file is written * inside CloseArchive(). This is, um, bizarre; but not worth changing * right now. */ if (plainText) RestoreArchive(fout); CloseArchive(fout); FYI, log_min_duration_statement shows queries taking 11.2 seconds, even without the network overhead --- not sure how that can be optimized. I will now test using PRIMARY KEY and custom dump format with pg_restore --jobs to see if I can get parallelism that way. A further parallelism would be to allow multiple database to be dump/restored at the same time. I will test for that once this is done. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c new file mode 100644 index 577ccac..dbdf9a5 *** a/contrib/pg_upgrade/dump.c --- b/contrib/pg_upgrade/dump.c *************** generate_old_dump(void) *** 24,30 **** * restores the frozenid's for databases and relations. */ exec_prog(UTILITY_LOG_FILE, NULL, true, ! "\"%s/pg_dumpall\" %s --schema-only --binary-upgrade %s -f %s", new_cluster.bindir, cluster_conn_opts(&old_cluster), log_opts.verbose ? "--verbose" : "", ALL_DUMP_FILE); --- 24,30 ---- * restores the frozenid's for databases and relations. */ exec_prog(UTILITY_LOG_FILE, NULL, true, ! "\"%s/pg_dumpall\" %s --schema-only --globals-only --binary-upgrade %s -f %s", new_cluster.bindir, cluster_conn_opts(&old_cluster), log_opts.verbose ? "--verbose" : "", ALL_DUMP_FILE); *************** generate_old_dump(void) *** 47,63 **** void split_old_dump(void) { ! FILE *all_dump, ! *globals_dump, ! *db_dump; ! FILE *current_output; char line[LINE_ALLOC]; bool start_of_line = true; char create_role_str[MAX_STRING]; char create_role_str_quote[MAX_STRING]; char filename[MAXPGPATH]; - bool suppressed_username = false; - /* * Open all files in binary mode to avoid line end translation on Windows, --- 47,58 ---- void split_old_dump(void) { ! FILE *all_dump, *globals_dump; char line[LINE_ALLOC]; bool start_of_line = true; char create_role_str[MAX_STRING]; char create_role_str_quote[MAX_STRING]; char filename[MAXPGPATH]; /* * Open all files in binary mode to avoid line end translation on Windows, *************** split_old_dump(void) *** 70,80 **** snprintf(filename, sizeof(filename), "%s", GLOBALS_DUMP_FILE); if ((globals_dump = fopen_priv(filename, PG_BINARY_W)) == NULL) pg_log(PG_FATAL, "Could not write to dump file \"%s\": %s\n", filename, getErrorText(errno)); - snprintf(filename, sizeof(filename), "%s", DB_DUMP_FILE); - if ((db_dump = fopen_priv(filename, PG_BINARY_W)) == NULL) - pg_log(PG_FATAL, "Could not write to dump file \"%s\": %s\n", filename, getErrorText(errno)); - - current_output = globals_dump; /* patterns used to prevent our own username from being recreated */ snprintf(create_role_str, sizeof(create_role_str), --- 65,70 ---- *************** split_old_dump(void) *** 84,102 **** while (fgets(line, sizeof(line), all_dump) != NULL) { - /* switch to db_dump file output? */ - if (current_output == globals_dump && start_of_line && - suppressed_username && - strncmp(line, "\\connect ", strlen("\\connect ")) == 0) - current_output = db_dump; - /* output unless we are recreating our own username */ ! if (current_output != globals_dump || !start_of_line || (strncmp(line, create_role_str, strlen(create_role_str)) != 0 && strncmp(line, create_role_str_quote, strlen(create_role_str_quote)) != 0)) ! fputs(line, current_output); ! else ! suppressed_username = true; if (strlen(line) > 0 && line[strlen(line) - 1] == '\n') start_of_line = true; --- 74,84 ---- while (fgets(line, sizeof(line), all_dump) != NULL) { /* output unless we are recreating our own username */ ! if (!start_of_line || (strncmp(line, create_role_str, strlen(create_role_str)) != 0 && strncmp(line, create_role_str_quote, strlen(create_role_str_quote)) != 0)) ! fputs(line, globals_dump); if (strlen(line) > 0 && line[strlen(line) - 1] == '\n') start_of_line = true; *************** split_old_dump(void) *** 106,110 **** fclose(all_dump); fclose(globals_dump); - fclose(db_dump); } --- 88,91 ---- diff --git a/contrib/pg_upgrade/function.c b/contrib/pg_upgrade/function.c new file mode 100644 index 77bd3a0..d95dd6f *** a/contrib/pg_upgrade/function.c --- b/contrib/pg_upgrade/function.c *************** uninstall_support_functions_from_new_clu *** 102,111 **** prep_status("Removing support functions from new cluster"); ! for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++) { ! DbInfo *new_db = &new_cluster.dbarr.dbs[dbnum]; ! PGconn *conn = connectToServer(&new_cluster, new_db->db_name); /* suppress NOTICE of dropped objects */ PQclear(executeQueryOrDie(conn, --- 102,112 ---- prep_status("Removing support functions from new cluster"); ! /* use old db names because there might be a mismatch */ ! for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) { ! DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; ! PGconn *conn = connectToServer(&new_cluster, old_db->db_name); /* suppress NOTICE of dropped objects */ PQclear(executeQueryOrDie(conn, diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c new file mode 100644 index 4d2e79c..e218138 *** a/contrib/pg_upgrade/pg_upgrade.c --- b/contrib/pg_upgrade/pg_upgrade.c *************** main(int argc, char **argv) *** 117,129 **** /* New now using xids of the old system */ /* -- NEW -- */ start_postmaster(&new_cluster); prepare_new_databases(); ! create_new_objects(); stop_postmaster(false); /* * Most failures happen in create_new_objects(), which has completed at --- 117,134 ---- /* New now using xids of the old system */ /* -- NEW -- */ + old_cluster.port++; + start_postmaster(&old_cluster); start_postmaster(&new_cluster); prepare_new_databases(); ! create_new_objects(); stop_postmaster(false); + os_info.running_cluster = &old_cluster; + stop_postmaster(false); + old_cluster.port--; /* * Most failures happen in create_new_objects(), which has completed at *************** static void *** 279,308 **** create_new_objects(void) { int dbnum; prep_status("Adding support functions to new cluster"); ! for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++) { ! DbInfo *new_db = &new_cluster.dbarr.dbs[dbnum]; /* skip db we already installed */ ! if (strcmp(new_db->db_name, "template1") != 0) ! install_support_functions_in_new_db(new_db->db_name); } check_ok(); prep_status("Restoring database schema to new cluster"); ! exec_prog(RESTORE_LOG_FILE, NULL, true, ! "\"%s/psql\" " EXEC_PSQL_ARGS " %s -f \"%s\"", ! new_cluster.bindir, cluster_conn_opts(&new_cluster), ! DB_DUMP_FILE); check_ok(); /* regenerate now that we have objects in the databases */ get_db_and_rel_infos(&new_cluster); uninstall_support_functions_from_new_cluster(); } /* --- 284,330 ---- create_new_objects(void) { int dbnum; + /* save off conn_opts because it is a static local var */ + char *old_conn_opts = pg_strdup(cluster_conn_opts(&old_cluster)); prep_status("Adding support functions to new cluster"); ! /* ! * The new cluster might have databases that don't exist in the old ! * one, so cycle over the old database names. ! */ ! for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) { ! DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; /* skip db we already installed */ ! if (strcmp(old_db->db_name, "template1") != 0) ! install_support_functions_in_new_db(old_db->db_name); } check_ok(); prep_status("Restoring database schema to new cluster"); ! ! for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) ! { ! DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; ! ! exec_prog(RESTORE_LOG_FILE, NULL, true, ! "\"%s/pg_dump\" %s --schema-only --binary-upgrade %s \"%s\" | " ! "\"%s/psql\" " EXEC_PSQL_ARGS " %s -d \"%s\"", ! new_cluster.bindir, old_conn_opts, ! log_opts.verbose ? "--verbose" : "", old_db->db_name, ! new_cluster.bindir, cluster_conn_opts(&new_cluster), ! old_db->db_name); ! } check_ok(); /* regenerate now that we have objects in the databases */ get_db_and_rel_infos(&new_cluster); uninstall_support_functions_from_new_cluster(); + + pg_free(old_conn_opts); } /* *************** cleanup(void) *** 463,468 **** /* remove SQL files */ unlink(ALL_DUMP_FILE); unlink(GLOBALS_DUMP_FILE); - unlink(DB_DUMP_FILE); } } --- 485,489 ---- diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index ace56e5..72ed3bd *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *************** *** 32,38 **** #define ALL_DUMP_FILE "pg_upgrade_dump_all.sql" /* contains both global db information and CREATE DATABASE commands */ #define GLOBALS_DUMP_FILE "pg_upgrade_dump_globals.sql" - #define DB_DUMP_FILE "pg_upgrade_dump_db.sql" #define SERVER_LOG_FILE "pg_upgrade_server.log" #define RESTORE_LOG_FILE "pg_upgrade_restore.log" --- 32,37 ---- diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c new file mode 100644 index ca95bad..ce27fe3 *** a/src/bin/pg_dump/pg_dumpall.c --- b/src/bin/pg_dump/pg_dumpall.c *************** main(int argc, char *argv[]) *** 502,508 **** } /* Dump CREATE DATABASE commands */ ! if (!globals_only && !roles_only && !tablespaces_only) dumpCreateDB(conn); /* Dump role/database settings */ --- 502,508 ---- } /* Dump CREATE DATABASE commands */ ! if (binary_upgrade || (!globals_only && !roles_only && !tablespaces_only)) dumpCreateDB(conn); /* Dump role/database settings */
: . traprm export QUIET=$((QUIET + 1)) > /rtmp/out export PGOPTIONS="-c synchronous_commit=off" for CYCLES in 1 1000 2000 4000 8000 16000 32000 64000 do echo "$CYCLES" >> /rtmp/out for DIR in /pgsql/CURRENT do echo "$DIR" >> /rtmp/out cd "$DIR" pginstall cd - # need for +16k pipe sed 's/#max_locks_per_transaction = 64/max_locks_per_transaction = 3000/' /u/pg/data/postgresql.conf pipe sed 's/shared_buffers = 128MB/shared_buffers = 1GB/' /u/pg/data/postgresql.conf pipe sed 's/#work_mem = 1MB/work_mem = 500MB/' /u/pg/data/postgresql.conf pipe sed 's/#maintenance_work_mem = 16MB/maintenance_work_mem = 500MB/' /u/pg/data/postgresql.conf pgrestart sleep 2 echo "table creation" >> /rtmp/out newdb test for JOT in $(jot "$CYCLES"); do echo "CREATE TABLE test$JOT (x SERIAL);"; done| sql test echo "pg_dump creation" >> /rtmp/out /usr/bin/time --output=/rtmp/out --append --format '%e' aspg pg_dump --schema-only --binary-upgrade test | grep -v 'binary_upgrade\.' > $TMP/1 echo "pg_dump restore" >> /rtmp/out newdb test /usr/bin/time --output=/rtmp/out --append --format '%e' sql test < $TMP/1 echo "combined dump/restore" >> /rtmp/out newdb test2 /usr/bin/time --output=/rtmp/out --append --format '%e' sh -c "aspg pg_dump --schema-only --binary-upgrade test | grep -v 'binary_upgrade\.' | sql test2" > $TMP/1 pgstop done for BRANCH in master parallel do cd /pgtop pgsw $BRANCH cd - tools/setup 9.3 9.3 sleep 2 # need for +16k pipe sed 's/#max_locks_per_transaction = 64/max_locks_per_transaction = 3000/' /u/pgsql.old/data/postgresql.conf pipe sed 's/shared_buffers = 128MB/shared_buffers = 1GB/' /u/pgsql.old/data/postgresql.conf pipe sed 's/shared_buffers = 128MB/shared_buffers = 1GB/' /u/pgsql/data/postgresql.conf pgstart /u/pgsql.old/data sleep 8 for JOT in $(jot "$CYCLES"); do echo "CREATE TABLE test$JOT (x SERIAL);"; done| sql test pgstop /u/pgsql.old/data sleep 2 /usr/bin/time --output=/rtmp/out --append --format '%e' tools/upgrade sleep 2 done done bell
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers