Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs

On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote:
 * Simon Riggs ([EMAIL PROTECTED]) wrote:
  On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
   Even this doesn't cover everything though- it's too focused on tables
   and data loading.  Where do functions go?  What about types?
  
  Yes, it is focused on tables and data loading. What about
  functions/types? No relevance here.
 
 I don't see how they're not relevant, it's not like they're being
 excluded and in fact they show up in the pre-load output.  Heck, even if
 they *were* excluded, that should be made clear in the documentation
 (either be an explicit include list, or saying they're excluded).
 
 Part of what's driving this is making sure we have a plan for future
 objects and where they'll go.  Perhaps it would be enough to just say
 pre-load is everything in the schema, except things which are faster
 done in bulk (eg: indexes, keys).  I don't think it's right to say
 pre-load is only object definitions required to load data when it
 includes functions and ACLs though.
 
 Hopefully my suggestion and these comments will get us to a happy
 middle-ground.

I don't really understand what you're saying.

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
Dumps exactly what option--schema-only/ would dump, but only those
statements before the data load.

What is it you are suggesting? I'm unclear.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs

On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * daveg ([EMAIL PROTECTED]) wrote:
  One observation, indexes should be built right after the table data
  is loaded for each table, this way, the index build gets a hot cache
  for the table data instead of having to re-read it later as we do now.
 
  That's not how pg_dump has traditionally worked, and the point of this
  patch is to add options to easily segregate the main pieces of the
  existing pg_dump output (main schema definition, data dump, key/index
  building).  You suggestion brings up an interesting point that should
  pg_dump's traditional output structure change the --schema-post-load
  set of objects wouldn't be as clear to newcomers since the load and the
  indexes would be interleaved in the regular output.

Stephen: Agreed.

 Yeah.  Also, that is pushing into an entirely different line of
 development, which is to enable multithreaded pg_restore.  The patch
 at hand is necessarily incompatible with that type of operation, and
 wouldn't be used together with it.
 
 As far as the documentation/definition aspect goes, I think it should
 just say the parts are
   * stuff needed before you can load the data
   * the data
   * stuff needed after loading the data
 and not try to be any more specific than that.  There are corner cases
 that will turn any simple breakdown into a lie, and I doubt that it's
 worth trying to explain them all.  (Take a close look at the dependency
 loop breaking logic in pg_dump if you doubt this.)

Tom: Agreed.

 I hadn't realized that Simon was using pre-schema and post-schema
 to name the first and third parts.  I'd agree that this is confusing
 nomenclature: it looks like it's trying to say that the data is the
 schema, and the schema is not!  How about pre-data and post-data?

OK by me. Any other takers?



I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread Tom Lane
daveg [EMAIL PROTECTED] writes:
 On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
 In most cases our policy has been that pg_dumpall should accept and pass
 through any pg_dump option for which it's sensible to do so. I did not
 make that happen but it seems it'd be a reasonable follow-on patch.

 I'll remember that next time.

Er .. actually that was a direct request for you to do it.

 Finally, you changed the option value and case from 1 to case 2. getopt_long
 only returns the value argument when there is no flag to set, so 1 was unique
 and could have been read as the first no-short option with an argument.

Yeah.  The code *worked* as you submitted it, but what was bothering me
was that the val = 1 table entries worked in two completely different
ways for the different argument types.  I first thought that you'd
broken the existing long argument options --- you hadn't, but I had to
go re-read the getopt_long source to convince myself of that.  So it
seemed like using a different val value might help clarify the
difference in behavior for future readers of the code.  In particular
the next guy who wants to add a long option with parameter value would
certainly not be able to use val = 1; but I thought the code as you
had it wouldn't give him any clue what to do.  If you've got a better
idea about how to deal with that, feel free...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote:
 The options split the dump into 3 parts that's all: before the load, the
 load and after the load.
 
 --schema-pre-load says
 Dumps exactly what option--schema-only/ would dump, but only those
 statements before the data load.
 
 What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be objects required before data loading, which isn't the
same.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] WITH RECUSIVE patches 0721

2008-07-21 Thread Tatsuo Ishii
Hi,

Here is the lastest WITH RECURSIVE patches against 2007/07/17 CVS (CVS
HEAD won't compile for me).

This version includes regression tests and is almost ready for commit
IMO.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


recursive_query.patch.gz
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Andrew Dunstan



Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:
  

I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)



Maybe invert the logic?

--omit-pre-data
--omit-data
--omit-post-data

Not wedded to these either, just tossing out an idea...


  


Please, no. Negative logic seems likely to cause endless confusion.

I'd even be happier with --schema-part-1 and --schema-part-2 if we can't 
find some more expressive way of designating them.


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?

2008-07-21 Thread Alvaro Herrera
Simon Riggs wrote:

 On Fri, 2008-07-18 at 01:44 -0400, Tom Lane wrote:
  I agree, this is important for visibility into what's happening.
  The string isn't getting translated so I don't see any big downside
  to applying the patch in back branches.
 
 Patches for 8.3 and CVS HEAD.

Applied, thanks.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Maybe invert the logic?
 --omit-pre-data
 --omit-data
 --omit-post-data

 Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches.  The interactions of
the switches as Simon presents 'em seem less than obvious.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs

On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote:
 * Simon Riggs ([EMAIL PROTECTED]) wrote:
  The options split the dump into 3 parts that's all: before the load, the
  load and after the load.
  
  --schema-pre-load says
  Dumps exactly what option--schema-only/ would dump, but only those
  statements before the data load.
  
  What is it you are suggesting? I'm unclear.
 
 That part is fine, the problem is that elsewhere in the documentation
 (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
 change it to be objects required before data loading, which isn't the
 same.

OK, gotcha now - will change that. I thought you might mean something
about changing the output itself.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-21 Thread Zdenek Kotala

Tom Lane napsal(a):

Heikki Linnakangas [EMAIL PROTECTED] writes:
...  That macro is actually doing the 
same thing as PageGetContents, so I switched to using that. As that 
moves the data sligthly on those bitmap pages, I guess we'll need a 
catversion bump.


I'm amazed that Zdenek didn't scream bloody murder about that.  You're
creating a work item for in-place-upgrade that would not otherwise
exist, in exchange for a completely trivial bit of code beautification.
(The same can be said of his proposed change to hash meta pages.)


:-) Yeah, These changes break in-place-upgrade on hash indexes and invokes 
reindexing request. I have had several reasons why I didn't complaint about it:


1) IIRC, hash function for double has been change
2) there is ongoing project to improve hash index performance - completely 
redesigned content
3) hash index is not much used (by my opinion) and it affect only small group of 
users



I'm planning to go over this patch today and apply it sans the parts
that would require catversion bump.  We can argue later about whether
those are really worth doing, but I'm leaning to not --- unless Zdenek
says that he has no intention of making in-place-upgrade handle hash
indexes any time soon.


Thanks for applying patch. I think that hash index upgradebility is currently 
broken or it will be with new hash index improvement. But if I think about it it 
does not make sense to break compatibility by this patch first. I will prepare 
patch which will be upgrade friendly. And if we will reimplement hash index 
soon, than we can clean a code.



BTW, after further thought about the PageGetContents() situation:
right now we can change it to guarantee maxalignment for free,
since SizeOfPageHeaderData happens to be maxaligned on all platforms
(this wasn't the case as recently as 8.2).  So I'm thinking we should
do that.  There's at least one place that thinks that PageGetContents
is the same as page + SizeOfPageHeaderData, but that's easily fixed.
On balance it seems like hidden assumptions about alignment are a bigger
risk than assumptions about that offset --- anyone want to argue the
contrary?


I think it is OK and I seen that you already applied a patch.

Thanks Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-07-21 Thread Simon Riggs

On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote:

 I think we should try at least one or two possible optimizations and
 get some numbers before we jump and make substantial changes to the
 code.

You know you're suggesting months of tests and further discussion. :-(

I'll fix the bug, but I'm not doing any more on this now till feature
freeze. It's the wrong time to chase mirages.

Thanks for checking my logic.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Simon,

* Simon Riggs ([EMAIL PROTECTED]) wrote:
  I hadn't realized that Simon was using pre-schema and post-schema
  to name the first and third parts.  I'd agree that this is confusing
  nomenclature: it looks like it's trying to say that the data is the
  schema, and the schema is not!  How about pre-data and post-data?
 
 OK by me. Any other takers?

Having the command-line options be --schema-pre-data and
--schema-post-data is fine with me.  Leaving them the way they are is
also fine by me.  It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
 As far as the documentation/definition aspect goes, I think it should
 just say the parts are
 * stuff needed before you can load the data
 * the data
 * stuff needed after loading the data

 Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

 I hadn't realized that Simon was using pre-schema and post-schema
 to name the first and third parts.  I'd agree that this is confusing
 nomenclature: it looks like it's trying to say that the data is the
 schema, and the schema is not!  How about pre-data and post-data?

 Argh.  The command-line options follow the 'data'/'load' line
 (--schema-pre-load and --schema-post-load), and so I think those are
 fine.  The problem was that in the documentation he switched to saying
 they were Pre-Schema and Post-Schema, which could lead to confusion.

Ah, I see.  No objection to those switch names, at least assuming we
want to stick to positive-logic switches.  What did you think of the
negative-logic suggestion (--omit-xxx)?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Tom, et al,

* Tom Lane ([EMAIL PROTECTED]) wrote:
 Ah, I see.  No objection to those switch names, at least assuming we
 want to stick to positive-logic switches.  What did you think of the
 negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general.  The place
where I would use this patch would lend itself to being more options if
--omit- were used.  I expect that would hold true for most people.
It would be:

  --omit-data --omit-post-load
  --omit-pre-load --omit-post-load
  --omit-pre-load --omit-data

vs.

  --schema-pre-load
  --data-only
  --schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files.  I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Are there use cases for just --omit-post-load or --omit-pre-load?

Probably not many.  The thing that's bothering me is the
action-at-a-distance property of the positive-logic switches.
How are we going to explain this?

By default, --schema-pre-load, --data-only, --schema-post-load
are all ON.  But if you turn one of them ON (never mind that
it was already ON by default), that changes the defaults for
the other two to OFF.  Then you have to turn them ON (never
mind that the default for them is ON) if you want two out of
the three categories.

You have to bend your mind into a pretzel to wrap it around this
behavior.  Yeah, it might be convenient once you understand it,
but how long will it take for the savings in typing to repay the
time to understand it and the mistakes along the way?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread daveg
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote:
 daveg [EMAIL PROTECTED] writes:
  On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
  In most cases our policy has been that pg_dumpall should accept and pass
  through any pg_dump option for which it's sensible to do so. I did not
  make that happen but it seems it'd be a reasonable follow-on patch.
 
  I'll remember that next time.
 
 Er .. actually that was a direct request for you to do it.

 
Attached is a the followon patch for pg_dumpall and docs to match pg_dump.

On a second topic, is anyone working on a parallel dump/load? I'd be
interested in helping.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.
*** a/doc/src/sgml/ref/pg_dumpall.sgml
--- b/doc/src/sgml/ref/pg_dumpall.sgml
***
*** 196,201  PostgreSQL documentation
--- 196,217 
   /varlistentry
  
   varlistentry
+   termoption--lock-wait-timeout=replaceable 
class=parametertimeout/replaceable/option/term
+   listitem
+para
+ Do not wait forever to acquire shared table locks at the beginning of
+ the dump. Instead fail if unable to lock a table within the specified
+ replaceable class=parametertimeout/. The timeout may be
+ specified in any of the formats accepted by commandSET
+ statement_timeout/.  (Allowed values vary depending on the server
+ version you are dumping from, but an integer number of milliseconds
+ is accepted by all versions since 7.3.  This option is ignored when
+ dumping from a pre-7.3 server.)
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry
termoption--no-tablespaces/option/term
listitem
 para
*** a/src/bin/pg_dump/pg_dumpall.c
--- b/src/bin/pg_dump/pg_dumpall.c
***
*** 120,125  main(int argc, char *argv[])
--- 120,126 
{disable-triggers, no_argument, disable_triggers, 1},
{no-tablespaces, no_argument, no_tablespaces, 1},
{use-set-session-authorization, no_argument, 
use_setsessauth, 1},
+   {lock-wait-timeout, required_argument, NULL, 2},
  
{NULL, 0, NULL, 0}
};
***
*** 305,310  main(int argc, char *argv[])
--- 306,316 
case 0:
break;
  
+   case 2:
+   appendPQExpBuffer(pgdumpopts,  
--lock-wait-timeout=);
+   appendPQExpBuffer(pgdumpopts, optarg);
+ break;
+ 
default:
fprintf(stderr, _(Try \%s --help\ for more 
information.\n), progname);
exit(1);
***
*** 488,493  help(void)
--- 494,500 
printf(_(  -f, --file=FILENAME  output file name\n));
printf(_(  --help   show this help, then exit\n));
printf(_(  --versionoutput version information, then 
exit\n));
+   printf(_(  --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for 
a table lock\n));
printf(_(\nOptions controlling the output content:\n));
printf(_(  -a, --data-only  dump only the data, not the 
schema\n));
printf(_(  -c, --clean  clean (drop) databases prior to 
create\n));

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches