Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs

On Mon, 2008-07-21 at 19:19 -0400, Tom Lane wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > Are there use cases for just --omit-post-load or --omit-pre-load?
> 
> Probably not many.  The thing that's bothering me is the
> action-at-a-distance property of the positive-logic switches.
> How are we going to explain this?
> 
>   "By default, --schema-pre-load, --data-only, --schema-post-load
>   are all ON.  But if you turn one of them ON (never mind that
>   it was already ON by default), that changes the defaults for
>   the other two to OFF.  Then you have to turn them ON (never
>   mind that the default for them is ON) if you want two out of
>   the three categories."

While I accept your argument a certain amount, --schema-only and
--data-only already behave in the manner you describe. Whether we pick
include or exclude or both, it will make more sense than these existing
options, regrettably. 

With regard to the logic, Insert and COPY also behave this way: if you
mention *any* columns then you only get the ones you mention. We manage
to describe that also. An Insert statement would be very confusing if
you had to list the columns you don't want.

So the --omit options seem OK if you assume we'll never add further
options or include additional SQL in the dump. But that seems an
unreliable prop, so I am inclined towards the inclusive approach.

> You have to bend your mind into a pretzel to wrap it around this
> behavior. 

Perhaps my mind was already toroidally challenged? :-}

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread daveg
On Mon, Jul 21, 2008 at 03:43:11AM -0400, Tom Lane wrote:
> daveg <[EMAIL PROTECTED]> writes:
> > On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
> >> In most cases our policy has been that pg_dumpall should accept and pass
> >> through any pg_dump option for which it's sensible to do so. I did not
> >> make that happen but it seems it'd be a reasonable follow-on patch.
> 
> > I'll remember that next time.
> 
> Er .. actually that was a direct request for you to do it.

 
Attached is a the followon patch for pg_dumpall and docs to match pg_dump.

On a second topic, is anyone working on a parallel dump/load? I'd be
interested in helping.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.
*** a/doc/src/sgml/ref/pg_dumpall.sgml
--- b/doc/src/sgml/ref/pg_dumpall.sgml
***
*** 196,201  PostgreSQL documentation
--- 196,217 
   
  
   
+   --lock-wait-timeout=timeout
+   
+
+ Do not wait forever to acquire shared table locks at the beginning of
+ the dump. Instead fail if unable to lock a table within the specified
+ timeout. The timeout may be
+ specified in any of the formats accepted by SET
+ statement_timeout.  (Allowed values vary depending on the server
+ version you are dumping from, but an integer number of milliseconds
+ is accepted by all versions since 7.3.  This option is ignored when
+ dumping from a pre-7.3 server.)
+
+   
+  
+ 
+  
--no-tablespaces

 
*** a/src/bin/pg_dump/pg_dumpall.c
--- b/src/bin/pg_dump/pg_dumpall.c
***
*** 120,125  main(int argc, char *argv[])
--- 120,126 
{"disable-triggers", no_argument, &disable_triggers, 1},
{"no-tablespaces", no_argument, &no_tablespaces, 1},
{"use-set-session-authorization", no_argument, 
&use_setsessauth, 1},
+   {"lock-wait-timeout", required_argument, NULL, 2},
  
{NULL, 0, NULL, 0}
};
***
*** 305,310  main(int argc, char *argv[])
--- 306,316 
case 0:
break;
  
+   case 2:
+   appendPQExpBuffer(pgdumpopts, " 
--lock-wait-timeout=");
+   appendPQExpBuffer(pgdumpopts, optarg);
+ break;
+ 
default:
fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"), progname);
exit(1);
***
*** 488,493  help(void)
--- 494,500 
printf(_("  -f, --file=FILENAME  output file name\n"));
printf(_("  --help   show this help, then exit\n"));
printf(_("  --versionoutput version information, then 
exit\n"));
+   printf(_("  --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for 
a table lock\n"));
printf(_("\nOptions controlling the output content:\n"));
printf(_("  -a, --data-only  dump only the data, not the 
schema\n"));
printf(_("  -c, --clean  clean (drop) databases prior to 
create\n"));

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> Are there use cases for just --omit-post-load or --omit-pre-load?

Probably not many.  The thing that's bothering me is the
action-at-a-distance property of the positive-logic switches.
How are we going to explain this?

"By default, --schema-pre-load, --data-only, --schema-post-load
are all ON.  But if you turn one of them ON (never mind that
it was already ON by default), that changes the defaults for
the other two to OFF.  Then you have to turn them ON (never
mind that the default for them is ON) if you want two out of
the three categories."

You have to bend your mind into a pretzel to wrap it around this
behavior.  Yeah, it might be convenient once you understand it,
but how long will it take for the savings in typing to repay the
time to understand it and the mistakes along the way?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Tom, et al,

* Tom Lane ([EMAIL PROTECTED]) wrote:
> Ah, I see.  No objection to those switch names, at least assuming we
> want to stick to positive-logic switches.  What did you think of the
> negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general.  The place
where I would use this patch would lend itself to being more options if
--omit- were used.  I expect that would hold true for most people.
It would be:

  --omit-data --omit-post-load
  --omit-pre-load --omit-post-load
  --omit-pre-load --omit-data

vs.

  --schema-pre-load
  --data-only
  --schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files.  I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
>> As far as the documentation/definition aspect goes, I think it should
>> just say the parts are
>> * stuff needed before you can load the data
>> * the data
>> * stuff needed after loading the data

> Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

>> I hadn't realized that Simon was using "pre-schema" and "post-schema"
>> to name the first and third parts.  I'd agree that this is confusing
>> nomenclature: it looks like it's trying to say that the data is the
>> schema, and the schema is not!  How about "pre-data and "post-data"?

> Argh.  The command-line options follow the 'data'/'load' line
> (--schema-pre-load and --schema-post-load), and so I think those are
> fine.  The problem was that in the documentation he switched to saying
> they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Ah, I see.  No objection to those switch names, at least assuming we
want to stick to positive-logic switches.  What did you think of the
negative-logic suggestion (--omit-xxx)?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-07-21 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote:
>> I think we should try at least one or two possible optimizations and
>> get some numbers before we jump and make substantial changes to the
>> code.

> You know you're suggesting months of tests and further discussion. :-(

I agree with Pavan that we should have something that'd at least serve
as test scaffolding to verify that the framework patch is sane.  The
test code needn't be anything we'd want to commit.  It seems like
largely the same kind of issue as with your stats-hooks patch.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-21 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Thanks for applying patch. I think that hash index "upgradebility" is
> currently broken or it will be with new hash index improvement. But if
> I think about it it does not make sense to break compatibility by this
> patch first.

Right, my point exactly.  Those necessarily-incompatible changes might
or might not ever get applied --- if they are, we can do cosmetic
cleanups afterwards.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Tom,

* Tom Lane ([EMAIL PROTECTED]) wrote:
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
>   * stuff needed before you can load the data
>   * the data
>   * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Even that is a lie though, which I guess is what my problem is.  It's
really "everything for the schema, except stuff that is better done in
bulk", I believe.  Also, I'm a bit concerned about people who would
argue that you need PKs and FKs before you can load the data.  Probably
couldn't be avoided tho.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

Argh.  The command-line options follow the 'data'/'load' line
(--schema-pre-load and --schema-post-load), and so I think those are
fine.  The problem was that in the documentation he switched to saying
they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Simon,

* Simon Riggs ([EMAIL PROTECTED]) wrote:
> > I hadn't realized that Simon was using "pre-schema" and "post-schema"
> > to name the first and third parts.  I'd agree that this is confusing
> > nomenclature: it looks like it's trying to say that the data is the
> > schema, and the schema is not!  How about "pre-data and "post-data"?
> 
> OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me.  Leaving them the way they are is
also fine by me.  It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-07-21 Thread Simon Riggs

On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote:

> I think we should try at least one or two possible optimizations and
> get some numbers before we jump and make substantial changes to the
> code.

You know you're suggesting months of tests and further discussion. :-(

I'll fix the bug, but I'm not doing any more on this now till feature
freeze. It's the wrong time to chase mirages.

Thanks for checking my logic.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-21 Thread Zdenek Kotala

Tom Lane napsal(a):

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
...  That macro is actually doing the 
same thing as PageGetContents, so I switched to using that. As that 
moves the data sligthly on those bitmap pages, I guess we'll need a 
catversion bump.


I'm amazed that Zdenek didn't scream bloody murder about that.  You're
creating a work item for in-place-upgrade that would not otherwise
exist, in exchange for a completely trivial bit of code beautification.
(The same can be said of his proposed change to hash meta pages.)


:-) Yeah, These changes break in-place-upgrade on hash indexes and invokes 
reindexing request. I have had several reasons why I didn't complaint about it:


1) IIRC, hash function for double has been change
2) there is ongoing project to improve hash index performance -> completely 
redesigned content
3) hash index is not much used (by my opinion) and it affect only small group of 
users



I'm planning to go over this patch today and apply it sans the parts
that would require catversion bump.  We can argue later about whether
those are really worth doing, but I'm leaning to "not" --- unless Zdenek
says that he has no intention of making in-place-upgrade handle hash
indexes any time soon.


Thanks for applying patch. I think that hash index "upgradebility" is currently 
broken or it will be with new hash index improvement. But if I think about it it 
does not make sense to break compatibility by this patch first. I will prepare 
patch which will be upgrade friendly. And if we will reimplement hash index 
soon, than we can clean a code.



BTW, after further thought about the PageGetContents() situation:
right now we can change it to guarantee maxalignment "for free",
since SizeOfPageHeaderData happens to be maxaligned on all platforms
(this wasn't the case as recently as 8.2).  So I'm thinking we should
do that.  There's at least one place that thinks that PageGetContents
is the same as page + SizeOfPageHeaderData, but that's easily fixed.
On balance it seems like hidden assumptions about alignment are a bigger
risk than assumptions about that offset --- anyone want to argue the
contrary?


I think it is OK and I seen that you already applied a patch.

Thanks Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Simon Riggs

On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote:
> * Simon Riggs ([EMAIL PROTECTED]) wrote:
> > The options split the dump into 3 parts that's all: before the load, the
> > load and after the load.
> > 
> > --schema-pre-load says
> > "Dumps exactly what --schema-only would dump, but only those
> > statements before the data load."
> > 
> > What is it you are suggesting? I'm unclear.
> 
> That part is fine, the problem is that elsewhere in the documentation
> (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
> change it to be "objects required before data loading", which isn't the
> same.

OK, gotcha now - will change that. I thought you might mean something
about changing the output itself.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Maybe invert the logic?
>> --omit-pre-data
>> --omit-data
>> --omit-post-data

> Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches.  The interactions of
the switches as Simon presents 'em seem less than obvious.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?

2008-07-21 Thread Alvaro Herrera
Simon Riggs wrote:

> On Fri, 2008-07-18 at 01:44 -0400, Tom Lane wrote:
> > I agree, this is important for visibility into what's happening.
> > The string isn't getting translated so I don't see any big downside
> > to applying the patch in back branches.
> 
> Patches for 8.3 and CVS HEAD.

Applied, thanks.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Andrew Dunstan



Tom Lane wrote:

Simon Riggs <[EMAIL PROTECTED]> writes:
  

I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)



Maybe invert the logic?

--omit-pre-data
--omit-data
--omit-post-data

Not wedded to these either, just tossing out an idea...


  


Please, no. Negative logic seems likely to cause endless confusion.

I'd even be happier with --schema-part-1 and --schema-part-2 if we can't 
find some more expressive way of designating them.


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


[PATCHES] WITH RECUSIVE patches 0721

2008-07-21 Thread Tatsuo Ishii
Hi,

Here is the lastest WITH RECURSIVE patches against 2007/07/17 CVS (CVS
HEAD won't compile for me).

This version includes regression tests and is almost ready for commit
IMO.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


recursive_query.patch.gz
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote:
> The options split the dump into 3 parts that's all: before the load, the
> load and after the load.
> 
> --schema-pre-load says
> "Dumps exactly what --schema-only would dump, but only those
> statements before the data load."
> 
> What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be "objects required before data loading", which isn't the
same.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
>> In most cases our policy has been that pg_dumpall should accept and pass
>> through any pg_dump option for which it's sensible to do so. I did not
>> make that happen but it seems it'd be a reasonable follow-on patch.

> I'll remember that next time.

Er .. actually that was a direct request for you to do it.

> Finally, you changed the option value and case from 1 to case 2. getopt_long
> only returns the value argument when there is no flag to set, so 1 was unique
> and could have been read as "the first no-short option with an argument".

Yeah.  The code *worked* as you submitted it, but what was bothering me
was that the "val = 1" table entries worked in two completely different
ways for the different argument types.  I first thought that you'd
broken the existing long argument options --- you hadn't, but I had to
go re-read the getopt_long source to convince myself of that.  So it
seemed like using a different "val" value might help clarify the
difference in behavior for future readers of the code.  In particular
the next guy who wants to add a long option with parameter value would
certainly not be able to use val = 1; but I thought the code as you
had it wouldn't give him any clue what to do.  If you've got a better
idea about how to deal with that, feel free...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread daveg
On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
> daveg <[EMAIL PROTECTED]> writes:
> > Here is an updated version of this patch against head. It builds, runs and
> > functions as expected. I did not build the sgml.
> 
> Applied with mostly minor cosmetic improvements --- the only actual
> error I noticed was failing to check whether the server version supports
> statement_timeout.

I chose not to test backend version on the grounds that getting an explicit
failure for an explicitly requested option would be preferable to it being
silently ignored. However if the user is trying to use the same scripts for
many versions then ignoring unsupported but unessential features may be
preferred.

One of the cosmetic changes made in response to other reviewers was to
not reuse lockquery, instead to have a separate query buffer. You have
reversed that and eliminated lockquery too. Which seems better.

> In most cases our policy has been that pg_dumpall should accept and pass
> through any pg_dump option for which it's sensible to do so. I did not
> make that happen but it seems it'd be a reasonable follow-on patch.

I'll remember that next time.
 
> A minor point is that the syntax "-X lock-wait-timeout=n" or
> "-X lock-wait-timeout n" won't work, although perhaps people used to
> -X might expect it to.  Since we deprecate -X (and don't even document
> it anymore), I thought that making this work would be much more trouble
> than it's worth, but perhaps that's open to argument.

All the other -X options are flags and supported directly by the getopt
machinery. Adding a -X option with an argument would require parsing
the argument by hand including dealing with '=' or ' ' as the separator and
telling getopt you had eaten an extra argument. This seemed a bit too much
code for the value of supporting a deprecated format for a brand new option.

Finally, you changed the option value and case from 1 to case 2. getopt_long()
only returns the value argument when there is no flag to set, so 1 was unique
and could have been read as "the first no-short option with an argument".

Thanks for checking this in.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches