Re: [PATCHES] pg_dump additional options for performance
On Mon, 2008-07-21 at 19:19 -0400, Tom Lane wrote: > Stephen Frost <[EMAIL PROTECTED]> writes: > > Are there use cases for just --omit-post-load or --omit-pre-load? > > Probably not many. The thing that's bothering me is the > action-at-a-distance property of the positive-logic switches. > How are we going to explain this? > > "By default, --schema-pre-load, --data-only, --schema-post-load > are all ON. But if you turn one of them ON (never mind that > it was already ON by default), that changes the defaults for > the other two to OFF. Then you have to turn them ON (never > mind that the default for them is ON) if you want two out of > the three categories." While I accept your argument a certain amount, --schema-only and --data-only already behave in the manner you describe. Whether we pick include or exclude or both, it will make more sense than these existing options, regrettably. With regard to the logic, Insert and COPY also behave this way: if you mention *any* columns then you only get the ones you mention. We manage to describe that also. An Insert statement would be very confusing if you had to list the columns you don't want. So the --omit options seem OK if you assume we'll never add further options or include additional SQL in the dump. But that seems an unreliable prop, so I am inclined towards the inclusive approach. > You have to bend your mind into a pretzel to wrap it around this > behavior. Perhaps my mind was already toroidally challenged? :-} -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote: > daveg <[EMAIL PROTECTED]> writes: > > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: > >> In most cases our policy has been that pg_dumpall should accept and pass > >> through any pg_dump option for which it's sensible to do so. I did not > >> make that happen but it seems it'd be a reasonable follow-on patch. > > > I'll remember that next time. > > Er .. actually that was a direct request for you to do it. Attached is a the followon patch for pg_dumpall and docs to match pg_dump. On a second topic, is anyone working on a parallel dump/load? I'd be interested in helping. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. *** a/doc/src/sgml/ref/pg_dumpall.sgml --- b/doc/src/sgml/ref/pg_dumpall.sgml *** *** 196,201 PostgreSQL documentation --- 196,217 + --lock-wait-timeout=timeout + + + Do not wait forever to acquire shared table locks at the beginning of + the dump. Instead fail if unable to lock a table within the specified + timeout. The timeout may be + specified in any of the formats accepted by SET + statement_timeout. (Allowed values vary depending on the server + version you are dumping from, but an integer number of milliseconds + is accepted by all versions since 7.3. This option is ignored when + dumping from a pre-7.3 server.) + + + + + --no-tablespaces *** a/src/bin/pg_dump/pg_dumpall.c --- b/src/bin/pg_dump/pg_dumpall.c *** *** 120,125 main(int argc, char *argv[]) --- 120,126 {"disable-triggers", no_argument, &disable_triggers, 1}, {"no-tablespaces", no_argument, &no_tablespaces, 1}, {"use-set-session-authorization", no_argument, &use_setsessauth, 1}, + {"lock-wait-timeout", required_argument, NULL, 2}, {NULL, 0, NULL, 0} }; *** *** 305,310 main(int argc, char *argv[]) --- 306,316 case 0: break; + case 2: + appendPQExpBuffer(pgdumpopts, " --lock-wait-timeout="); + appendPQExpBuffer(pgdumpopts, optarg); + break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); *** *** 488,493 help(void) --- 494,500 printf(_(" -f, --file=FILENAME output file name\n")); printf(_(" --help show this help, then exit\n")); printf(_(" --versionoutput version information, then exit\n")); + printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n")); printf(_("\nOptions controlling the output content:\n")); printf(_(" -a, --data-only dump only the data, not the schema\n")); printf(_(" -c, --clean clean (drop) databases prior to create\n")); -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost <[EMAIL PROTECTED]> writes: > Are there use cases for just --omit-post-load or --omit-pre-load? Probably not many. The thing that's bothering me is the action-at-a-distance property of the positive-logic switches. How are we going to explain this? "By default, --schema-pre-load, --data-only, --schema-post-load are all ON. But if you turn one of them ON (never mind that it was already ON by default), that changes the defaults for the other two to OFF. Then you have to turn them ON (never mind that the default for them is ON) if you want two out of the three categories." You have to bend your mind into a pretzel to wrap it around this behavior. Yeah, it might be convenient once you understand it, but how long will it take for the savings in typing to repay the time to understand it and the mistakes along the way? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Tom, et al, * Tom Lane ([EMAIL PROTECTED]) wrote: > Ah, I see. No objection to those switch names, at least assuming we > want to stick to positive-logic switches. What did you think of the > negative-logic suggestion (--omit-xxx)? My preference is for positive-logic switches in general. The place where I would use this patch would lend itself to being more options if --omit- were used. I expect that would hold true for most people. It would be: --omit-data --omit-post-load --omit-pre-load --omit-post-load --omit-pre-load --omit-data vs. --schema-pre-load --data-only --schema-post-load Point being that I'd be dumping these into seperate files where I could more easily manipulate the pre-load or post-load files. I'd still want pre/post load to be seperate though since this would be used in cases where there's alot of data (hence the reason for the split) and putting pre and post together and running them before data would slow things down quite a bit. Are there use cases for just --omit-post-load or --omit-pre-load? Probably, but I just don't see any situation where I'd use them like that. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost <[EMAIL PROTECTED]> writes: > * Tom Lane ([EMAIL PROTECTED]) wrote: >> As far as the documentation/definition aspect goes, I think it should >> just say the parts are >> * stuff needed before you can load the data >> * the data >> * stuff needed after loading the data > Even that is a lie though, which I guess is what my problem is. True; the stuff done after is done that way at least in part for performance reasons rather than because it has to be done that way. (I think it's not only performance issues, though --- for circular FKs you pretty much have to load the data first.) >> I hadn't realized that Simon was using "pre-schema" and "post-schema" >> to name the first and third parts. I'd agree that this is confusing >> nomenclature: it looks like it's trying to say that the data is the >> schema, and the schema is not! How about "pre-data and "post-data"? > Argh. The command-line options follow the 'data'/'load' line > (--schema-pre-load and --schema-post-load), and so I think those are > fine. The problem was that in the documentation he switched to saying > they were "Pre-Schema" and "Post-Schema", which could lead to confusion. Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Hint Bits and Write I/O
Simon Riggs <[EMAIL PROTECTED]> writes: > On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote: >> I think we should try at least one or two possible optimizations and >> get some numbers before we jump and make substantial changes to the >> code. > You know you're suggesting months of tests and further discussion. :-( I agree with Pavan that we should have something that'd at least serve as test scaffolding to verify that the framework patch is sane. The test code needn't be anything we'd want to commit. It seems like largely the same kind of issue as with your stats-hooks patch. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Zdenek Kotala <[EMAIL PROTECTED]> writes: > Thanks for applying patch. I think that hash index "upgradebility" is > currently broken or it will be with new hash index improvement. But if > I think about it it does not make sense to break compatibility by this > patch first. Right, my point exactly. Those necessarily-incompatible changes might or might not ever get applied --- if they are, we can do cosmetic cleanups afterwards. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Tom, * Tom Lane ([EMAIL PROTECTED]) wrote: > As far as the documentation/definition aspect goes, I think it should > just say the parts are > * stuff needed before you can load the data > * the data > * stuff needed after loading the data > and not try to be any more specific than that. There are corner cases > that will turn any simple breakdown into a lie, and I doubt that it's > worth trying to explain them all. (Take a close look at the dependency > loop breaking logic in pg_dump if you doubt this.) Even that is a lie though, which I guess is what my problem is. It's really "everything for the schema, except stuff that is better done in bulk", I believe. Also, I'm a bit concerned about people who would argue that you need PKs and FKs before you can load the data. Probably couldn't be avoided tho. > I hadn't realized that Simon was using "pre-schema" and "post-schema" > to name the first and third parts. I'd agree that this is confusing > nomenclature: it looks like it's trying to say that the data is the > schema, and the schema is not! How about "pre-data and "post-data"? Argh. The command-line options follow the 'data'/'load' line (--schema-pre-load and --schema-post-load), and so I think those are fine. The problem was that in the documentation he switched to saying they were "Pre-Schema" and "Post-Schema", which could lead to confusion. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: > > I hadn't realized that Simon was using "pre-schema" and "post-schema" > > to name the first and third parts. I'd agree that this is confusing > > nomenclature: it looks like it's trying to say that the data is the > > schema, and the schema is not! How about "pre-data and "post-data"? > > OK by me. Any other takers? Having the command-line options be "--schema-pre-data" and "--schema-post-data" is fine with me. Leaving them the way they are is also fine by me. It's the documentation (back to pg_dump.sgml, ~774/~797) that starts talking about Pre-Schema and Post-Schema. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] [HACKERS] Hint Bits and Write I/O
On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote: > I think we should try at least one or two possible optimizations and > get some numbers before we jump and make substantial changes to the > code. You know you're suggesting months of tests and further discussion. :-( I'll fix the bug, but I'm not doing any more on this now till feature freeze. It's the wrong time to chase mirages. Thanks for checking my logic. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Tom Lane napsal(a): "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: ... That macro is actually doing the same thing as PageGetContents, so I switched to using that. As that moves the data sligthly on those bitmap pages, I guess we'll need a catversion bump. I'm amazed that Zdenek didn't scream bloody murder about that. You're creating a work item for in-place-upgrade that would not otherwise exist, in exchange for a completely trivial bit of code beautification. (The same can be said of his proposed change to hash meta pages.) :-) Yeah, These changes break in-place-upgrade on hash indexes and invokes reindexing request. I have had several reasons why I didn't complaint about it: 1) IIRC, hash function for double has been change 2) there is ongoing project to improve hash index performance -> completely redesigned content 3) hash index is not much used (by my opinion) and it affect only small group of users I'm planning to go over this patch today and apply it sans the parts that would require catversion bump. We can argue later about whether those are really worth doing, but I'm leaning to "not" --- unless Zdenek says that he has no intention of making in-place-upgrade handle hash indexes any time soon. Thanks for applying patch. I think that hash index "upgradebility" is currently broken or it will be with new hash index improvement. But if I think about it it does not make sense to break compatibility by this patch first. I will prepare patch which will be upgrade friendly. And if we will reimplement hash index soon, than we can clean a code. BTW, after further thought about the PageGetContents() situation: right now we can change it to guarantee maxalignment "for free", since SizeOfPageHeaderData happens to be maxaligned on all platforms (this wasn't the case as recently as 8.2). So I'm thinking we should do that. There's at least one place that thinks that PageGetContents is the same as page + SizeOfPageHeaderData, but that's easily fixed. On balance it seems like hidden assumptions about alignment are a bigger risk than assumptions about that offset --- anyone want to argue the contrary? I think it is OK and I seen that you already applied a patch. Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote: > * Simon Riggs ([EMAIL PROTECTED]) wrote: > > The options split the dump into 3 parts that's all: before the load, the > > load and after the load. > > > > --schema-pre-load says > > "Dumps exactly what --schema-only would dump, but only those > > statements before the data load." > > > > What is it you are suggesting? I'm unclear. > > That part is fine, the problem is that elsewhere in the documentation > (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you > change it to be "objects required before data loading", which isn't the > same. OK, gotcha now - will change that. I thought you might mean something about changing the output itself. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Maybe invert the logic? >> --omit-pre-data >> --omit-data >> --omit-post-data > Please, no. Negative logic seems likely to cause endless confusion. I think it might actually be less confusing, because with this approach, each switch has an identifiable default (no) and setting it doesn't cause side-effects on settings of other switches. The interactions of the switches as Simon presents 'em seem less than obvious. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?
Simon Riggs wrote: > On Fri, 2008-07-18 at 01:44 -0400, Tom Lane wrote: > > I agree, this is important for visibility into what's happening. > > The string isn't getting translated so I don't see any big downside > > to applying the patch in back branches. > > Patches for 8.3 and CVS HEAD. Applied, thanks. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Tom Lane wrote: Simon Riggs <[EMAIL PROTECTED]> writes: I also suggested having three options --want-pre-schema --want-data --want-post-schema so we could ask for any or all parts in the one dump. --data-only and --schema-only are negative options so don't allow this. (I don't like those names either, just thinking about capabilities) Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Not wedded to these either, just tossing out an idea... Please, no. Negative logic seems likely to cause endless confusion. I'd even be happier with --schema-part-1 and --schema-part-2 if we can't find some more expressive way of designating them. cheers andrew -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] WITH RECUSIVE patches 0721
Hi, Here is the lastest WITH RECURSIVE patches against 2007/07/17 CVS (CVS HEAD won't compile for me). This version includes regression tests and is almost ready for commit IMO. -- Tatsuo Ishii SRA OSS, Inc. Japan recursive_query.patch.gz Description: Binary data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
* Simon Riggs ([EMAIL PROTECTED]) wrote: > The options split the dump into 3 parts that's all: before the load, the > load and after the load. > > --schema-pre-load says > "Dumps exactly what --schema-only would dump, but only those > statements before the data load." > > What is it you are suggesting? I'm unclear. That part is fine, the problem is that elsewhere in the documentation (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you change it to be "objects required before data loading", which isn't the same. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump lock timeout
daveg <[EMAIL PROTECTED]> writes: > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: >> In most cases our policy has been that pg_dumpall should accept and pass >> through any pg_dump option for which it's sensible to do so. I did not >> make that happen but it seems it'd be a reasonable follow-on patch. > I'll remember that next time. Er .. actually that was a direct request for you to do it. > Finally, you changed the option value and case from 1 to case 2. getopt_long > only returns the value argument when there is no flag to set, so 1 was unique > and could have been read as "the first no-short option with an argument". Yeah. The code *worked* as you submitted it, but what was bothering me was that the "val = 1" table entries worked in two completely different ways for the different argument types. I first thought that you'd broken the existing long argument options --- you hadn't, but I had to go re-read the getopt_long source to convince myself of that. So it seemed like using a different "val" value might help clarify the difference in behavior for future readers of the code. In particular the next guy who wants to add a long option with parameter value would certainly not be able to use val = 1; but I thought the code as you had it wouldn't give him any clue what to do. If you've got a better idea about how to deal with that, feel free... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: > daveg <[EMAIL PROTECTED]> writes: > > Here is an updated version of this patch against head. It builds, runs and > > functions as expected. I did not build the sgml. > > Applied with mostly minor cosmetic improvements --- the only actual > error I noticed was failing to check whether the server version supports > statement_timeout. I chose not to test backend version on the grounds that getting an explicit failure for an explicitly requested option would be preferable to it being silently ignored. However if the user is trying to use the same scripts for many versions then ignoring unsupported but unessential features may be preferred. One of the cosmetic changes made in response to other reviewers was to not reuse lockquery, instead to have a separate query buffer. You have reversed that and eliminated lockquery too. Which seems better. > In most cases our policy has been that pg_dumpall should accept and pass > through any pg_dump option for which it's sensible to do so. I did not > make that happen but it seems it'd be a reasonable follow-on patch. I'll remember that next time. > A minor point is that the syntax "-X lock-wait-timeout=n" or > "-X lock-wait-timeout n" won't work, although perhaps people used to > -X might expect it to. Since we deprecate -X (and don't even document > it anymore), I thought that making this work would be much more trouble > than it's worth, but perhaps that's open to argument. All the other -X options are flags and supported directly by the getopt machinery. Adding a -X option with an argument would require parsing the argument by hand including dealing with '=' or ' ' as the separator and telling getopt you had eaten an extra argument. This seemed a bit too much code for the value of supporting a deprecated format for a brand new option. Finally, you changed the option value and case from 1 to case 2. getopt_long() only returns the value argument when there is no flag to set, so 1 was unique and could have been read as "the first no-short option with an argument". Thanks for checking this in. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches