Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Joshua D. Drake
Simon Riggs wrote: On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote: 2. We have no concurrency which means, anyone with any database over 50G has unacceptable restore times. Agreed. Sounds good. Doesn't help with the main element of dump time: one table at a time to one output

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Simon Riggs
On Sat, 2008-07-26 at 13:56 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I want to dump tables separately for performance reasons. There are documented tests showing 100% gains using this method. There is no gain adding this to pg_restore. There is a gain to be had -

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Simon Riggs
On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote: 2. We have no concurrency which means, anyone with any database over 50G has unacceptable restore times. Agreed. Also the core reason for wanting -w 3. We have to continue develop hacks to define custom utilization. Why am I

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Andrew Dunstan
Joshua D. Drake wrote: Agreed but that is a problem I understand with a solution I don't. I am all eyes on a way to fix that. One thought I had and please, be gentle in response was some sort of async transaction capability. I know that libpq has the ability to send async queries. Is it

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Joshua D. Drake
Andrew Dunstan wrote: Joshua D. Drake wrote: Agreed but that is a problem I understand with a solution I don't. I am all eyes on a way to fix that. One thought I had and please, be gentle in response was some sort of async transaction capability. I know that libpq has the ability to send

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: I dislike, and doubt that I'd use, this approach. At the end of the day, it ends up processing the same (very large amount of data) multiple times. Well, that's easily avoided: just replace the third step by

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: Right, but the parallelization is going to happen sometime, and it is going to happen in the context of pg_restore. So I think it's pretty silly to argue that no one will ever want this feature to work in pg_restore. I think you've about convinced me on

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Joshua D. Drake
Stephen Frost wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: I dislike, and doubt that I'd use, this approach. At the end of the day, it ends up processing the same (very large amount of data) multiple times. This would depend on the dump being in the

Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Joshua D. Drake ([EMAIL PROTECTED]) wrote: Custom format rocks for partial set restores from a whole dump. See the TOC option :) I imagine it does, but that's very rarely what I need. Most of the time we're dumping out a schema to load it into a seperate schema (usually on another host).

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Simon Riggs
On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: [ pg_dump_beforeafter.v6.patch ] Unfortunately there's still a lot of work to do, and I don't feel like doing it so I'm bouncing this patch back for further work. Fair enough. Thanks for the review.

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore is *not* required, because once the dump has been split the restore never needs to be. It might seem that the patch should be symmetrical

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: The key problem is that pg_restore is broken: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore is *not* required, because once the dump

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: Another issue is that the rules for deciding which objects are before data and which are after data are wrong. In particular ACLs are after data not before data, which is relatively easy to fix. OK This was partially why I was complaining about

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: The key problem is that pg_restore is broken: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Simon Riggs
On Sat, 2008-07-26 at 12:20 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: The key problem is that pg_restore is broken: The key capability here is being able to split the dump into multiple pieces. The equivalent

Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread daveg
On Sat, Jul 26, 2008 at 01:56:14PM -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I want to dump tables separately for performance reasons. There are documented tests showing 100% gains using this method. There is no gain adding this to pg_restore. There is a gain to be had -

Re: [PATCHES] pg_dump additional options for performance

2008-07-25 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: [ pg_dump_beforeafter.v6.patch ] I looked over this patch a bit. I have a proposal for a slightly different way of defining the new switches: * --schema-before-data, --data-only, and --schema-after-data can be specified in any combination to obtain any

Re: [PATCHES] pg_dump additional options for performance

2008-07-25 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Tom Lane wrote: * --schema-before-data, --data-only, and --schema-after-data can be I thought you were arguing for some better names at one point? Those seem very confusing to me, especially --schema-after-data. I know it means the parts of

Re: [PATCHES] pg_dump additional options for performance

2008-07-24 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: [80k patch] Surely there is a whole lot of unintended noise in this patch? I certainly don't believe that you meant to change keywords.c for instance. regards, tom lane -- Sent via pgsql-patches mailing list

Re: [PATCHES] pg_dump additional options for performance

2008-07-24 Thread Simon Riggs
On Thu, 2008-07-24 at 03:54 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: [80k patch] Surely there is a whole lot of unintended noise in this patch? I certainly don't believe that you meant to change keywords.c for instance. Removed, thanks. Unrelated to this patch, it

Re: [PATCHES] pg_dump additional options for performance

2008-07-23 Thread Simon Riggs
On Mon, 2008-07-21 at 07:56 -0400, Stephen Frost wrote: Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to

Re: [PATCHES] pg_dump additional options for performance

2008-07-23 Thread Simon Riggs
On Wed, 2008-07-23 at 17:40 +0100, Simon Riggs wrote: On Mon, 2008-07-21 at 07:56 -0400, Stephen Frost wrote: Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this

Re: [PATCHES] pg_dump additional options for performance

2008-07-23 Thread Stephen Frost
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: ...and with command line help also. The documentation and whatnot looks good to me now. There are a couple of other issues I found while looking through and testing the patch though- Index: src/bin/pg_dump/pg_dump.c

Re: [PATCHES] pg_dump additional options for performance

2008-07-22 Thread Simon Riggs
On Mon, 2008-07-21 at 19:19 -0400, Tom Lane wrote: Stephen Frost [EMAIL PROTECTED] writes: Are there use cases for just --omit-post-load or --omit-pre-load? Probably not many. The thing that's bothering me is the action-at-a-distance property of the positive-logic switches. How are we

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs
On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote: * Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types?

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs
On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote: Stephen Frost [EMAIL PROTECTED] writes: * daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those statements before the data load. What is it you are

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Andrew Dunstan
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I also suggested having three options --want-pre-schema --want-data --want-post-schema so we could ask for any or all parts in the one dump. --data-only and --schema-only are negative options so don't allow this. (I don't like those

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Maybe invert the logic? --omit-pre-data --omit-data --omit-post-data Please, no. Negative logic seems likely to cause endless confusion. I think it might actually be less confusing, because with this approach, each switch has an

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs
On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote: * Simon Riggs ([EMAIL PROTECTED]) wrote: The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: As far as the documentation/definition aspect goes, I think it should just say the parts are * stuff needed before you can load the data * the data * stuff needed after loading the data Even that is a lie though,

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Tom, et al, * Tom Lane ([EMAIL PROTECTED]) wrote: Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? My preference is for positive-logic switches in general. The

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: Are there use cases for just --omit-post-load or --omit-pre-load? Probably not many. The thing that's bothering me is the action-at-a-distance property of the positive-logic switches. How are we going to explain this? By default,

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Simon Riggs
On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote: On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: Simon, I agree with adding these options in general, since I find myself frustrated by having to vi huge dumps to change simple schema things. A couple of comments on

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote: On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: [...] - Conflicting option handling Thanks for putting in the extra code to explicitly indicate which conflicting options were

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Simon Riggs
On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Perhaps this is up for some debate, but I find the documentation added for these options to be lacking the definitions I was looking for, and the explanation of why they are what they are. I'm also not sure I agree with the Pre-Schema

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types? Yes, it is focused on tables and data loading. What about

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread daveg
On Sun, Jul 20, 2008 at 09:18:29PM -0400, Stephen Frost wrote: * Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
* daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. That's not how pg_dump has traditionally worked,

Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: * daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. That's

[PATCHES] pg_dump additional options for performance

2008-07-19 Thread Stephen Frost
Simon, I agree with adding these options in general, since I find myself frustrated by having to vi huge dumps to change simple schema things. A couple of comments on the patch though: - Conflicting option handling I think we are doing our users a disservice by putting it on them to

Re: [PATCHES] pg_dump additional options for performance

2008-07-19 Thread Simon Riggs
On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: Simon, I agree with adding these options in general, since I find myself frustrated by having to vi huge dumps to change simple schema things. A couple of comments on the patch though: - Conflicting option handling I