Re: [PATCHES] pg_dump additional options for performance
On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote: * Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types? Yes, it is focused on tables and data loading. What about functions/types? No relevance here. I don't see how they're not relevant, it's not like they're being excluded and in fact they show up in the pre-load output. Heck, even if they *were* excluded, that should be made clear in the documentation (either be an explicit include list, or saying they're excluded). Part of what's driving this is making sure we have a plan for future objects and where they'll go. Perhaps it would be enough to just say pre-load is everything in the schema, except things which are faster done in bulk (eg: indexes, keys). I don't think it's right to say pre-load is only object definitions required to load data when it includes functions and ACLs though. Hopefully my suggestion and these comments will get us to a happy middle-ground. I don't really understand what you're saying. The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those statements before the data load. What is it you are suggesting? I'm unclear. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote: Stephen Frost [EMAIL PROTECTED] writes: * daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. That's not how pg_dump has traditionally worked, and the point of this patch is to add options to easily segregate the main pieces of the existing pg_dump output (main schema definition, data dump, key/index building). You suggestion brings up an interesting point that should pg_dump's traditional output structure change the --schema-post-load set of objects wouldn't be as clear to newcomers since the load and the indexes would be interleaved in the regular output. Stephen: Agreed. Yeah. Also, that is pushing into an entirely different line of development, which is to enable multithreaded pg_restore. The patch at hand is necessarily incompatible with that type of operation, and wouldn't be used together with it. As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data and not try to be any more specific than that. There are corner cases that will turn any simple breakdown into a lie, and I doubt that it's worth trying to explain them all. (Take a close look at the dependency loop breaking logic in pg_dump if you doubt this.) Tom: Agreed. I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? OK by me. Any other takers? I also suggested having three options --want-pre-schema --want-data --want-post-schema so we could ask for any or all parts in the one dump. --data-only and --schema-only are negative options so don't allow this. (I don't like those names either, just thinking about capabilities) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
daveg [EMAIL PROTECTED] writes: On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: In most cases our policy has been that pg_dumpall should accept and pass through any pg_dump option for which it's sensible to do so. I did not make that happen but it seems it'd be a reasonable follow-on patch. I'll remember that next time. Er .. actually that was a direct request for you to do it. Finally, you changed the option value and case from 1 to case 2. getopt_long only returns the value argument when there is no flag to set, so 1 was unique and could have been read as the first no-short option with an argument. Yeah. The code *worked* as you submitted it, but what was bothering me was that the val = 1 table entries worked in two completely different ways for the different argument types. I first thought that you'd broken the existing long argument options --- you hadn't, but I had to go re-read the getopt_long source to convince myself of that. So it seemed like using a different val value might help clarify the difference in behavior for future readers of the code. In particular the next guy who wants to add a long option with parameter value would certainly not be able to use val = 1; but I thought the code as you had it wouldn't give him any clue what to do. If you've got a better idea about how to deal with that, feel free... regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
* Simon Riggs ([EMAIL PROTECTED]) wrote: The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those statements before the data load. What is it you are suggesting? I'm unclear. That part is fine, the problem is that elsewhere in the documentation (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you change it to be objects required before data loading, which isn't the same. Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] WITH RECUSIVE patches 0721
Hi, Here is the lastest WITH RECURSIVE patches against 2007/07/17 CVS (CVS HEAD won't compile for me). This version includes regression tests and is almost ready for commit IMO. -- Tatsuo Ishii SRA OSS, Inc. Japan recursive_query.patch.gz Description: Binary data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I also suggested having three options --want-pre-schema --want-data --want-post-schema so we could ask for any or all parts in the one dump. --data-only and --schema-only are negative options so don't allow this. (I don't like those names either, just thinking about capabilities) Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Not wedded to these either, just tossing out an idea... Please, no. Negative logic seems likely to cause endless confusion. I'd even be happier with --schema-part-1 and --schema-part-2 if we can't find some more expressive way of designating them. cheers andrew -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?
Simon Riggs wrote: On Fri, 2008-07-18 at 01:44 -0400, Tom Lane wrote: I agree, this is important for visibility into what's happening. The string isn't getting translated so I don't see any big downside to applying the patch in back branches. Patches for 8.3 and CVS HEAD. Applied, thanks. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Please, no. Negative logic seems likely to cause endless confusion. I think it might actually be less confusing, because with this approach, each switch has an identifiable default (no) and setting it doesn't cause side-effects on settings of other switches. The interactions of the switches as Simon presents 'em seem less than obvious. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote: * Simon Riggs ([EMAIL PROTECTED]) wrote: The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those statements before the data load. What is it you are suggesting? I'm unclear. That part is fine, the problem is that elsewhere in the documentation (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you change it to be objects required before data loading, which isn't the same. OK, gotcha now - will change that. I thought you might mean something about changing the output itself. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] page macros cleanup (ver 04)
Tom Lane napsal(a): Heikki Linnakangas [EMAIL PROTECTED] writes: ... That macro is actually doing the same thing as PageGetContents, so I switched to using that. As that moves the data sligthly on those bitmap pages, I guess we'll need a catversion bump. I'm amazed that Zdenek didn't scream bloody murder about that. You're creating a work item for in-place-upgrade that would not otherwise exist, in exchange for a completely trivial bit of code beautification. (The same can be said of his proposed change to hash meta pages.) :-) Yeah, These changes break in-place-upgrade on hash indexes and invokes reindexing request. I have had several reasons why I didn't complaint about it: 1) IIRC, hash function for double has been change 2) there is ongoing project to improve hash index performance - completely redesigned content 3) hash index is not much used (by my opinion) and it affect only small group of users I'm planning to go over this patch today and apply it sans the parts that would require catversion bump. We can argue later about whether those are really worth doing, but I'm leaning to not --- unless Zdenek says that he has no intention of making in-place-upgrade handle hash indexes any time soon. Thanks for applying patch. I think that hash index upgradebility is currently broken or it will be with new hash index improvement. But if I think about it it does not make sense to break compatibility by this patch first. I will prepare patch which will be upgrade friendly. And if we will reimplement hash index soon, than we can clean a code. BTW, after further thought about the PageGetContents() situation: right now we can change it to guarantee maxalignment for free, since SizeOfPageHeaderData happens to be maxaligned on all platforms (this wasn't the case as recently as 8.2). So I'm thinking we should do that. There's at least one place that thinks that PageGetContents is the same as page + SizeOfPageHeaderData, but that's easily fixed. On balance it seems like hidden assumptions about alignment are a bigger risk than assumptions about that offset --- anyone want to argue the contrary? I think it is OK and I seen that you already applied a patch. Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Hint Bits and Write I/O
On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote: I think we should try at least one or two possible optimizations and get some numbers before we jump and make substantial changes to the code. You know you're suggesting months of tests and further discussion. :-( I'll fix the bug, but I'm not doing any more on this now till feature freeze. It's the wrong time to chase mirages. Thanks for checking my logic. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? OK by me. Any other takers? Having the command-line options be --schema-pre-data and --schema-post-data is fine with me. Leaving them the way they are is also fine by me. It's the documentation (back to pg_dump.sgml, ~774/~797) that starts talking about Pre-Schema and Post-Schema. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data Even that is a lie though, which I guess is what my problem is. True; the stuff done after is done that way at least in part for performance reasons rather than because it has to be done that way. (I think it's not only performance issues, though --- for circular FKs you pretty much have to load the data first.) I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? Argh. The command-line options follow the 'data'/'load' line (--schema-pre-load and --schema-post-load), and so I think those are fine. The problem was that in the documentation he switched to saying they were Pre-Schema and Post-Schema, which could lead to confusion. Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump additional options for performance
Tom, et al, * Tom Lane ([EMAIL PROTECTED]) wrote: Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? My preference is for positive-logic switches in general. The place where I would use this patch would lend itself to being more options if --omit- were used. I expect that would hold true for most people. It would be: --omit-data --omit-post-load --omit-pre-load --omit-post-load --omit-pre-load --omit-data vs. --schema-pre-load --data-only --schema-post-load Point being that I'd be dumping these into seperate files where I could more easily manipulate the pre-load or post-load files. I'd still want pre/post load to be seperate though since this would be used in cases where there's alot of data (hence the reason for the split) and putting pre and post together and running them before data would slow things down quite a bit. Are there use cases for just --omit-post-load or --omit-pre-load? Probably, but I just don't see any situation where I'd use them like that. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Stephen Frost [EMAIL PROTECTED] writes: Are there use cases for just --omit-post-load or --omit-pre-load? Probably not many. The thing that's bothering me is the action-at-a-distance property of the positive-logic switches. How are we going to explain this? By default, --schema-pre-load, --data-only, --schema-post-load are all ON. But if you turn one of them ON (never mind that it was already ON by default), that changes the defaults for the other two to OFF. Then you have to turn them ON (never mind that the default for them is ON) if you want two out of the three categories. You have to bend your mind into a pretzel to wrap it around this behavior. Yeah, it might be convenient once you understand it, but how long will it take for the savings in typing to repay the time to understand it and the mistakes along the way? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] pg_dump lock timeout
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote: daveg [EMAIL PROTECTED] writes: On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote: In most cases our policy has been that pg_dumpall should accept and pass through any pg_dump option for which it's sensible to do so. I did not make that happen but it seems it'd be a reasonable follow-on patch. I'll remember that next time. Er .. actually that was a direct request for you to do it. Attached is a the followon patch for pg_dumpall and docs to match pg_dump. On a second topic, is anyone working on a parallel dump/load? I'd be interested in helping. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. *** a/doc/src/sgml/ref/pg_dumpall.sgml --- b/doc/src/sgml/ref/pg_dumpall.sgml *** *** 196,201 PostgreSQL documentation --- 196,217 /varlistentry varlistentry + termoption--lock-wait-timeout=replaceable class=parametertimeout/replaceable/option/term + listitem +para + Do not wait forever to acquire shared table locks at the beginning of + the dump. Instead fail if unable to lock a table within the specified + replaceable class=parametertimeout/. The timeout may be + specified in any of the formats accepted by commandSET + statement_timeout/. (Allowed values vary depending on the server + version you are dumping from, but an integer number of milliseconds + is accepted by all versions since 7.3. This option is ignored when + dumping from a pre-7.3 server.) +/para + /listitem + /varlistentry + + varlistentry termoption--no-tablespaces/option/term listitem para *** a/src/bin/pg_dump/pg_dumpall.c --- b/src/bin/pg_dump/pg_dumpall.c *** *** 120,125 main(int argc, char *argv[]) --- 120,126 {disable-triggers, no_argument, disable_triggers, 1}, {no-tablespaces, no_argument, no_tablespaces, 1}, {use-set-session-authorization, no_argument, use_setsessauth, 1}, + {lock-wait-timeout, required_argument, NULL, 2}, {NULL, 0, NULL, 0} }; *** *** 305,310 main(int argc, char *argv[]) --- 306,316 case 0: break; + case 2: + appendPQExpBuffer(pgdumpopts, --lock-wait-timeout=); + appendPQExpBuffer(pgdumpopts, optarg); + break; + default: fprintf(stderr, _(Try \%s --help\ for more information.\n), progname); exit(1); *** *** 488,493 help(void) --- 494,500 printf(_( -f, --file=FILENAME output file name\n)); printf(_( --help show this help, then exit\n)); printf(_( --versionoutput version information, then exit\n)); + printf(_( --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n)); printf(_(\nOptions controlling the output content:\n)); printf(_( -a, --data-only dump only the data, not the schema\n)); printf(_( -c, --clean clean (drop) databases prior to create\n)); -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches