Re: [HACKERS] [PATCH] Add --ordered option to pg_dump
I came across this thread from a web search when looking for prior art. I'm reviving it to alert other interested folks to a flaw in the provided patch should they try to use it. The feature request seems somewhat common on the web, and the patch here would cause some fun debugging sessions otherwise. On Thu, Apr 15, 2010 at 10:48:25AM -0700, Bob Lunney wrote: Using --ordered will order the data by primary key or unique index, if one exists, and use the smallest ordering (i.e. least number of columns required for a unique order). The provided patch for pg_dump.c fails on a corner case with deleted columns. It's likely rare, but tripping over it will either cause an unexpected order (at best) or a failed dump (at worst). + if (ordered) + { + appendPQExpBuffer(p, SELECT array_to_string(indkey, ','), array_length(i.indkey, 1) + FROM pg_catalog.pg_index i + WHERE (i.indisprimary = true or i.indisunique = true) + AND i.indisvalid = true + AND i.indrelid = '%s'::regclass + ORDER BY 2, 1 LIMIT 1, In this stanza the internal column numbers of the selected index keys are joined with commas to produce a string. + const char *s = PQgetvalue(res, 0, 0); + if (s != NULL) + { + appendPQExpBuffer(q, ORDER BY %s, s); + } Then that string is used when constructing the ORDER BY clause for the data retrieval query. The query is a SELECT * FROM ... which means the positional ORDER BY takes internal column numbers and uses them for the implicit select-list columns. That's fine if * is guaranteed to match the internal column order, but that's not true in cases where the table has a deleted column between two columns in the index. The query will either order by an unexpected column not part of the index (with potentially very slow results), or it will fail if it overruns the length of the select-list. A SQL script demonstrating the failure mode is attached. To work around this corner case, the patch should map the internal column numbers to column names (via pg_attribute or somesuch) and then use those in the ORDER BY. Hopefully this'll save someone else a bit of debugging in the future. Thomas begin; create table foo ( one int, two int, three int, primary key (one, three) ); insert into foo values (4, 5, 6); select * from foo order by 1, 3; select c.relname, a.attname, a.attnum from pg_attribute a join pg_class c on (c.oid = a.attrelid) where c.relname='foo' and attnum = 0 order by attnum; select indnatts, indkey, indisprimary from pg_index i join pg_class c on (c.oid = i.indrelid) where c.relname='foo'; alter table foo drop column two; select c.relname, a.attname, a.attnum from pg_attribute a join pg_class c on (c.oid = a.attrelid) where c.relname='foo' and attnum = 0 order by attnum; select indnatts, indkey, indisprimary from pg_index i join pg_class c on (c.oid = i.indrelid) where c.relname='foo'; select * from foo order by 1, 3; rollback; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add --ordered option to pg_dump
I needed a way to run diffs on two database dumps to see what data developers put in their local databases versus the initial database load. The pg_dump utility with --inserts works well for this, but since the order in which the data is returned of the server is not guaranteed I hacked this patch to make life simpler. Using --ordered will order the data by primary key or unique index, if one exists, and use the smallest ordering (i.e. least number of columns required for a unique order). Note that --ordered could crush your database server if you try to order very large tables, so use judiciously. This is my first patch submission, so I hope I've followed protocol. If not, please be gentle! Regards, Bob Lunney bob_lunney dot yahoo dot com diff -cNr src/bin/pg_dump/pg_dump.c.orig src/bin/pg_dump/pg_dump.c *** src/bin/pg_dump/pg_dump.c.orig 2010-04-06 11:21:48.0 -0400 --- src/bin/pg_dump/pg_dump.c 2010-04-15 10:28:49.0 -0400 *** *** 111,116 --- 111,117 static int disable_dollar_quoting = 0; static int dump_inserts = 0; static int column_inserts = 0; + static int ordered = 0; static void help(const char *progname); *** *** 275,280 --- 276,282 {inserts, no_argument, dump_inserts, 1}, {lock-wait-timeout, required_argument, NULL, 2}, {no-tablespaces, no_argument, outputNoTablespaces, 1}, + {ordered, no_argument, ordered, 1}, {role, required_argument, NULL, 3}, {use-set-session-authorization, no_argument, use_setsessauth, 1}, *** *** 493,498 --- 495,506 exit(1); } + if (!dump_inserts ordered) + { + write_msg(NULL, option --ordered cannot be used without --inserts or --column_inserts\n); + exit(1); + } + /* open the output file */ if (pg_strcasecmp(format, a) == 0 || pg_strcasecmp(format, append) == 0) { *** *** 822,827 --- 830,836 printf(_( --disable-dollar-quoting disable dollar quoting, use SQL standard quoting\n));
Re: [HACKERS] [PATCH] Add --ordered option to pg_dump
On tor, 2010-04-15 at 10:48 -0700, Bob Lunney wrote: I needed a way to run diffs on two database dumps to see what data developers put in their local databases versus the initial database load. Maybe pg_comparator would help you? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add --ordered option to pg_dump
Peter Eisentraut wrote: On tor, 2010-04-15 at 10:48 -0700, Bob Lunney wrote: I needed a way to run diffs on two database dumps to see what data developers put in their local databases versus the initial database load. Maybe pg_comparator would help you? Or DBIx::Compare if you like perl :-) Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add --ordered option to pg_dump
Thanks for the suggestion, Peter. It looks like pg_comparator is for comparing the contents of two different servers. I need to compare the contents of two dump files from the same server separated by time and busy developers. Regards, Bob Lunney --- On Thu, 4/15/10, Peter Eisentraut pete...@gmx.net wrote: From: Peter Eisentraut pete...@gmx.net Subject: Re: [HACKERS] [PATCH] Add --ordered option to pg_dump To: Bob Lunney bob_lun...@yahoo.com Cc: pgsql-hackers@postgresql.org Date: Thursday, April 15, 2010, 4:36 PM On tor, 2010-04-15 at 10:48 -0700, Bob Lunney wrote: I needed a way to run diffs on two database dumps to see what data developers put in their local databases versus the initial database load. Maybe pg_comparator would help you? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers