Re: [HACKERS] pg_dump additional options for performance

Simon Riggs Tue, 26 Feb 2008 05:06:22 -0800

On Tue, 2008-02-26 at 18:19 +0530, Tom Dunstan wrote:
> On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote:
> >  > As a user I'd really prefer all of this to be much more transparent, and 
> > could
> >  > well imagine the -Fc format to be some kind of TOC + zip of table data + 
> > post
> >  > load instructions (organized per table), or something like this.
> >  > In fact just what you described, all embedded in a single file.
> >
> >  If its in a single file then it won't perform as well as if its separate
> >  files. We can put separate files on separate drives. We can begin
> >  reloading one table while another is still unloading. The OS will
> >  perform readahead for us on single files whereas on one file it will
> >  look like random I/O. etc.
> 
> Yeah, writing multiple unknown-length streams to a single file in
> parallel is going to be all kinds of painful, and this use case seems
> to be the biggest complaint against a zip file kind of approach. I
> didn't know about the custom file format when I suggested the zip file
> one yesterday*, but a zip or equivalent has the major benefit of
> allowing the user to do manual inspection / tweaking of the dump
> because the file format is one that can be manipulated by standard
> tools. And zip wins over tar because it's indexed - if you want to
> extract just the schema and hack on it you don't need to touch your
> multi-GBs of data.
> 
> Perhaps a compromise: we specify a file system layout for table data
> files, pre/post scripts and other metadata that we want to be made
> available to pg_restore. By default, it gets dumped into a zip file /
> whatever, but a user who wants to get parallel unloads can pass a flag
> that tells pg_dump to stick it into a directory instead, with exactly
> the same file layout. Or how about this: if the filename given to
> pg_dump is a directory, spit out files in there, otherwise
> create/overwrite a single file.
> 
> While it's a bit fiddly, putting data on separate drives would then
> involve something like symlinking the tablename inside the dump dir
> off to an appropriate mount point, but that's probably not much worse
> than running n different pg_dump commands specifying different files.
> Heck, if you've got lots of data and want very particular behavior,
> you've got to specify it somehow. :)


Separate files seems much simpler...

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] pg_dump additional options for performance

Reply via email to