Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread daveg
On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
 On Thu, Jul 28, 2011 at 5:46 PM, daveg da...@sonic.net wrote:
  On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
  On Wed, Jul 27, 2011 at 8:28 PM, daveg da...@sonic.net wrote:
   My client has been seeing regular instances of the following sort of 
   problem:
  On what version of PostgreSQL?
 
  9.0.4.
 
  I previously said:
   This occurs on postgresql 9.0.4. on 32 core 512GB Dell boxes. We have
   identical systems still running 8.4.8 that do not have this issue, so I'm
   assuming it is related to the vacuum full work done for 9.0. Oddly, we 
   don't
   see this on the smaller hosts (8 core, 64GB, slower cpus) running 9.0.4,
   so it may be timing related.
 
 Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
 means that a REINDEX is happening as part of the same operation.  In
 9.0, there's no point in doing VACUUM FULL immediately followed by
 REINDEX.  My guess is that this is happening either right around the
 time the VACUUM FULL commits or right around the time the REINDEX
 commits.  It'd be helpful to know which, if you can figure it out.

I'll update my vacuum script to skip reindexes after vacuum full for 9.0
servers and see if that makes the problem go away. Thanks for reminding
me that they are not needed. However, I suspect it is the vacuum, not the
reindex causing the problem. I'll update when I know.

 If there's not a hardware problem causing those read errors, maybe a
 backend is somehow ending up with a stale or invalid relcache entry.
 I'm not sure exactly how that could be happening, though...

It does not appear to be a hardware problem. I also suspect it is a stale
relcache.

-dg
 
-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-29 Thread Florian Pflug
On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
 On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
 On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
 Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
 would behave as in
 
 SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 'amp;'::XML 
 AS text) AS a))
 
 With both 9.1 and 9.2 this query returns
 
 xmlelement 
 
 t a=amp;amp;/
 
 i.e. makes the value of a represent the *literal* string 'amp;', *not*
 the literal string ''. Just to be sure there's no miss-understanding here
 - is this what you expect?
 
 Well, I expect it to fail.

Now you've lost me. What exactly should fail under what circumstances?

 What's the workaround if you have a value of type XML, e.g. 'amp;',
 and want to set an attribute to the value represented by that XML fragment
 (i.e. '')? Since we have no XMLUNESCAPE function, I don't see an easy
 way to do that. Maybe I'm missing something, though.
 
 It may be worth researching whether the XMLSERIALIZE function is
 actually doing what it should, or whether it could/should do some
 unescaping.

I don't see how that could work. It can't unescape anything in e.g.
'tamp;/t', because the result would be a quite useless not-really-XML
kind of thing. It could unescape 'amp;' but that kind of content-dependent
behaviour seem even worse than my proposed escaping rules for XMLATTRIBUTES.

 Unfortunately, in the latest SQL/XML standard the final
 answer it nested deep in the three other standards, so I don't have an
 answer right now.  But there are plenty of standards in this area, so
 I'd hope that one of them can give us the right behavior, instead of us
 making something up.

Which standards to you have in mind there? If you can point me to a place
where I can obtain them, I could check if there's something in them
which helps.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
Hi,

Any preferences for the name?
 connoinh
 conisonly
 constatic or confixed

 I'd probably pick conisonly from those choices.


The use of \d inside psql will show ONLY constraints without any
embellishments similar to normal constraints. E.g.


ALTER TABLE ONLY a ADD CONSTRAINT achk CHECK (FALSE);

ALTER TABLE a ADD CONSTRAINT bchk CHECK (b  0);

psql=# \d a
   Table public.a
 Column |  Type   | Modifiers
+-+---
 b  | integer |
Check constraints:
achk CHECK (false)
bchk CHECK (b  0)

Is this acceptable? Or we need to put in work into psql to show ONLY
somewhere in the description? If yes, ONLY CHECK sounds weird, maybe
we should use LOCAL CHECK or some such mention:

Check constraints:
achk LOCAL CHECK (false)
bchk CHECK (b  0)

Regards,
Nikhils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 7:41 AM, Nikhil Sontakke nikkh...@gmail.com wrote:
 Hi,

Any preferences for the name?
 connoinh
 conisonly
 constatic or confixed

 I'd probably pick conisonly from those choices.


 The use of \d inside psql will show ONLY constraints without any
 embellishments similar to normal constraints. E.g.


 ALTER TABLE ONLY a ADD CONSTRAINT achk CHECK (FALSE);

 ALTER TABLE a ADD CONSTRAINT bchk CHECK (b  0);

 psql=# \d a
       Table public.a
  Column |  Type   | Modifiers
 +-+---
  b      | integer |
 Check constraints:
    achk CHECK (false)
    bchk CHECK (b  0)

 Is this acceptable? Or we need to put in work into psql to show ONLY
 somewhere in the description? If yes, ONLY CHECK sounds weird, maybe
 we should use LOCAL CHECK or some such mention:

 Check constraints:
    achk LOCAL CHECK (false)
    bchk CHECK (b  0)

I think you need to stick with ONLY.  Using two different words is
just going to create confusion. You could fool around with where
exactly you put it on the line, but switching to a different word
seems like not a good idea.

(Also, don't forget you need to hack pg_dump, too.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
 psql=# \d a
   Table public.a
  Column |  Type   | Modifiers
 +-+---
  b  | integer |
 Check constraints:
achk CHECK (false)
bchk CHECK (b  0)

 Is this acceptable? Or we need to put in work into psql to show ONLY
 somewhere in the description? If yes, ONLY CHECK sounds weird, maybe
 we should use LOCAL CHECK or some such mention:

 Check constraints:
achk LOCAL CHECK (false)
bchk CHECK (b  0)

I think you need to stick with ONLY.  Using two different words is
just going to create confusion. You could fool around with where
exactly you put it on the line, but switching to a different word
seems like not a good idea.

Ok, maybe something like:

achk (ONLY) CHECK (false)

(Also, don't forget you need to hack pg_dump, too.)

Yeah, I have already hacked it a bit. This constraint now needs to be
spit out later as an ALTER command with ONLY attached to it
appropriately. Earlier all CHECK constraints were generally emitted as
part of the table definition itself.

Regards,
Nikhils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 8:30 AM, Nikhil Sontakke nikkh...@gmail.com wrote:
 Yeah, I have already hacked it a bit. This constraint now needs to be
 spit out later as an ALTER command with ONLY attached to it
 appropriately. Earlier all CHECK constraints were generally emitted as
 part of the table definition itself.

Hrm.  That doesn't seem so good.  Maybe we've got the design wrong
here.  It doesn't seem like we want to lose the ability to define
arbitrary constraints at table-creation time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Tom Lane
Nikhil Sontakke nikkh...@gmail.com writes:
 (Also, don't forget you need to hack pg_dump, too.)

 Yeah, I have already hacked it a bit. This constraint now needs to be
 spit out later as an ALTER command with ONLY attached to it
 appropriately. Earlier all CHECK constraints were generally emitted as
 part of the table definition itself.

IIRC, there's already support for splitting out a constraint that way,
in order to deal with circular dependencies.  You just need to treat
this as an additional reason for splitting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 29, 2011 at 8:30 AM, Nikhil Sontakke nikkh...@gmail.com wrote:
 Yeah, I have already hacked it a bit. This constraint now needs to be
 spit out later as an ALTER command with ONLY attached to it
 appropriately. Earlier all CHECK constraints were generally emitted as
 part of the table definition itself.

 Hrm.  That doesn't seem so good.  Maybe we've got the design wrong
 here.  It doesn't seem like we want to lose the ability to define
 arbitrary constraints at table-creation time.

Well, you can't define arbitrary indexes within the CREATE TABLE syntax,
either.  This does not bother me a lot.

We could imagine doing something like CHECK ONLY (foo), but that seems
quite non-orthogonal with (a) everything else in CREATE TABLE, and
(b) ALTER TABLE ONLY.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
  Yeah, I have already hacked it a bit. This constraint now needs to be
  spit out later as an ALTER command with ONLY attached to it
  appropriately. Earlier all CHECK constraints were generally emitted as
  part of the table definition itself.

 Hrm.  That doesn't seem so good.  Maybe we've got the design wrong
 here.  It doesn't seem like we want to lose the ability to define
 arbitrary constraints at table-creation time.


Well the handling is different now for ONLY constraints only. The normal
constraints can still be attached at table-creation time.

Regards,
Nikhils


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
  Yeah, I have already hacked it a bit. This constraint now needs to be
  spit out later as an ALTER command with ONLY attached to it
  appropriately. Earlier all CHECK constraints were generally emitted as
  part of the table definition itself.

 IIRC, there's already support for splitting out a constraint that way,
 in order to deal with circular dependencies.  You just need to treat
 this as an additional reason for splitting.


Yeah, I have indeed followed the existing separate printing logic for ONLY
constraints. Had to make the table dependent on this constraint to print the
constraint *after* the table definition.

Regards,
Nikhils


Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 (4)  We communicate acceptable snapshots to the replica to make
 the order of visibility visibility match the master even when
 that doesn't match the order that transactions returned from
 commit.
 
  I (predictably) like (4) -- even though it's a lot of work
 
 I think that (4), beyond being a lot of work, will also have
 pretty terrible performance.  You're basically talking about
 emitting two WAL records for every commit instead of one.
 
Well, I can think of a great many other ways this could be done,
each with its own trade-offs of various types of overhead against
how close the replica is to current.  At one extreme you could do
what you describe, at the other you could generate a new snapshot on
the replica once every few minutes.
 
Then there are more clever ways, in discussions a few months ago I
suggested that adding two new bit flags to the commit record would
suffice, and I don't remember anyone blowing holes in that idea.  Of
course, that was to achieve serializable behavior on the replica,
based on some assumption that the current hot standby already
supported repeatable read.  We might need another bit or two to
solve the problems with that which have surfaced on this thread.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
daveg da...@sonic.net writes:
 On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
 Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
 means that a REINDEX is happening as part of the same operation.  In
 9.0, there's no point in doing VACUUM FULL immediately followed by
 REINDEX.  My guess is that this is happening either right around the
 time the VACUUM FULL commits or right around the time the REINDEX
 commits.  It'd be helpful to know which, if you can figure it out.

 I'll update my vacuum script to skip reindexes after vacuum full for 9.0
 servers and see if that makes the problem go away.

The thing that was bizarre about the one instance in the buildfarm was
that the error was persistent, ie, once a session had failed all its
subsequent attempts to access pg_class failed too.  I gather from Dave's
description that it's working that way for him too.  I can think of ways
that there might be a transient race condition against a REINDEX, but
it's very unclear why the failure would persist across multiple
attempts.  The best idea I can come up with is that the session has
somehow cached a wrong commit status for the reindexing transaction,
causing it to believe that both old and new copies of the index's
pg_class row are dead ... but how could that happen?  The underlying
state in the catalog is not wrong, because no concurrent sessions are
upset (at least not in the buildfarm case ... Dave, do you see more than
one session doing this at a time?).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
 We could imagine doing something like CHECK ONLY (foo), but that seems
 quite non-orthogonal with (a) everything else in CREATE TABLE, and
 (b) ALTER TABLE ONLY.


Yeah, I thought about CHECK ONLY support as part of table definition, but as
you say - it appears to be too non-standard right now and we can always go
back to this later if the need be felt.

Regards,
Nikhils


[HACKERS] Incremental checkopints

2011-07-29 Thread jordani
Hi,
I have read all information about checkpoints in PostgreSQL I have found.
I think that current implementation of checkpoints is not good for huge
shared buffer cache and for many WAL segments. If there is more buffers
and if buffers can be written rarely more updates of buffers can be
combined so total number of writes to disk will be significantly less. I
think that incremental checkpoints can achieve this goal (maybe more) and
price is additional memory (about 1/1000 of size of buffer cache).

My main source of information is
http://wiki.postgresql.org/wiki/User:Gsmith#How_do_checkpoints_happen_inside_the_PostgreSQL_backend.3F
I see that some data are required to be written into WAL in 3) and 6). I
will use CD to denote that data and P1, P2... to denote pages that are
dirty and has to be written to disk in 4).

In incremental checkpoint when WAL segment has written we will not start
writing but we will add to queue pages P1, P2 ... and CD. If meanwhile
background writer has to clean some page that page is removed from queue.
When checkpoint_segments are written in the transaction log we have in
queue:
P1, P2 ... CD, Pi ... CD, Pj ... CD ...
Here we have to make checkpoint in order to free first WAL segment. Only
pages before first CD have to be written and fsync’d.

I suppose that this task can be done in background writer. So first we can
make some number of writes per round both lru and checkpoint. There is no
deadline for each incremental checkpoint but if WAL is growing total
number of writes have to increase. Also it is not required to do
checkpoint for each WAL segment. It is possible to write N pages from
queue and to combine several potential checkpoint in one.

I hope I have explained the general idea. I am not C programmer so it is
hard to me to give more details.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-column FDW options, v5

2011-07-29 Thread Robert Haas
2011/7/29 Shigeru Hanada shigeru.han...@gmail.com:
 Here is a rebased version of per-column FDW options patch.  I've
 proposed this patch in last CF, but it was marked as returned with
 feedback.  So I would like to propose in next CF 2011-09.  I already
 moved CF item into new topic SQL/MED of CF 2011-09.

I did a quick review of this patch and it looks good to me, modulo
some quibbles with the wording of the documentation and comments and a
couple of minor stylistic nitpicks.  Barring objections, I'll fix this
up a bit and commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Hannu Krosing
On Thu, 2011-07-28 at 20:14 -0400, Robert Haas wrote:
 On Thu, Jul 28, 2011 at 7:54 PM, Ants Aasma ants.aa...@eesti.ee wrote:
  On Thu, Jul 28, 2011 at 11:54 PM, Kevin Grittner
  kevin.gritt...@wicourts.gov wrote:
  (4)  We communicate acceptable snapshots to the replica to make the
  order of visibility visibility match the master even when that
  doesn't match the order that transactions returned from commit.
 
  I wonder if some interpretation of 2 phase commit could make Robert's
  original suggestion implement this.
 
  On the master the commit sequence would look something like:
  1. Insert commit record to the WAL
  2. Wait for replication
  3. Get a commit seq nr and mark XIDs visible
  4. WAL log the seq nr
  5. Return success to client
 
  When replaying:
  * When replaying commit record, do everything but make
   the tx visible.
  * When replaying the commit sequence number
 if there is a gap between last visible commit and current:
   insert the commit sequence nr. to list of waiting commits.
 else:
   mark current and all directly following waiting tx's visible
 
  This would give consistent visibility order on master and slave. Robert
  is right that this would undesirably increase WAL traffic. Delaying this
  traffic would undesirably increase replay lag between master and slave.
  But it seems to me that this could be an optional WAL level on top of
  hot_standby that would only be enabled if consistent visibility on
  slaves is desired.
 
 I think you nailed it.

Agreed, this would keep current semantics on master and same visibility
order on master and slave.

 An additional point to think about: if we were willing to insist on
 streaming replication, we could send the commit sequence numbers via a
 side channel rather than writing them to WAL, which would be a lot
 cheaper. 

Why do you think that side channel is cheaper than main WAL ?

How would you handle synchronising the two ?

 That might even be a reasonable thing to do, because if
 you're doing log shipping, this is all going to be super-not-real-time
 anyway. 

But perhaps you still may want to preserve visibility order to be able
to do PITR to exact transaction commit, no ?

  OTOH, I know we don't want to make WAL shipping anything less
 than a first class citizen, so maybe not.
 
 At any rate, we may be getting a little sidetracked here from the
 original point of the thread, which was how to make snapshot-taking
 cheaper.  Maybe there's some tie-in to when transactions become
 visible, but I think it's pretty weak.  The existing system could be
 hacked up to avoid making transactions visible out of LSN order, and
 the system I proposed could make them visible either in LSN order or
 do the same thing we do now.  They are basically independent problems,
 AFAICS.

Agreed.


-- 
---
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 An additional point to think about: if we were willing to insist on
 streaming replication, we could send the commit sequence numbers via a
 side channel rather than writing them to WAL, which would be a lot
 cheaper.

 Why do you think that side channel is cheaper than main WAL ?

You don't have to flush it to disk, and you can use some other lock
that isn't as highly contended as WALInsertLock to synchronize it.

 That might even be a reasonable thing to do, because if
 you're doing log shipping, this is all going to be super-not-real-time
 anyway.

 But perhaps you still may want to preserve visibility order to be able
 to do PITR to exact transaction commit, no ?

Maybe.  In practice, I suspect most people won't be willing to pay the
price a feature like this would exact.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Hannu Krosing
On Fri, 2011-07-29 at 10:23 -0400, Robert Haas wrote:
 On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
  An additional point to think about: if we were willing to insist on
  streaming replication, we could send the commit sequence numbers via a
  side channel rather than writing them to WAL, which would be a lot
  cheaper.
 
  Why do you think that side channel is cheaper than main WAL ?
 
 You don't have to flush it to disk, 

You can probably write the i became visible WAL record without forcing
a flush and still get the same visibility order.

 and you can use some other lock
 that isn't as highly contended as WALInsertLock to synchronize it.

but you will need to synchronise it with WAL replay on slave anyway. It
seems easiest to just insert it in the WAL stream and be done with it. 

  That might even be a reasonable thing to do, because if
  you're doing log shipping, this is all going to be super-not-real-time
  anyway.
 
  But perhaps you still may want to preserve visibility order to be able
  to do PITR to exact transaction commit, no ?
 
 Maybe.  In practice, I suspect most people won't be willing to pay the
 price a feature like this would exact.

Unless we find some really bad problems with different visibility orders
on master and slave(s) you are probably right.

-- 
---
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 daveg da...@sonic.net writes:
 On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
 Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
 means that a REINDEX is happening as part of the same operation.  In
 9.0, there's no point in doing VACUUM FULL immediately followed by
 REINDEX.  My guess is that this is happening either right around the
 time the VACUUM FULL commits or right around the time the REINDEX
 commits.  It'd be helpful to know which, if you can figure it out.

 I'll update my vacuum script to skip reindexes after vacuum full for 9.0
 servers and see if that makes the problem go away.

 The thing that was bizarre about the one instance in the buildfarm was
 that the error was persistent, ie, once a session had failed all its
 subsequent attempts to access pg_class failed too.  I gather from Dave's
 description that it's working that way for him too.  I can think of ways
 that there might be a transient race condition against a REINDEX, but
 it's very unclear why the failure would persist across multiple
 attempts.  The best idea I can come up with is that the session has
 somehow cached a wrong commit status for the reindexing transaction,
 causing it to believe that both old and new copies of the index's
 pg_class row are dead ... but how could that happen?  The underlying
 state in the catalog is not wrong, because no concurrent sessions are
 upset (at least not in the buildfarm case ... Dave, do you see more than
 one session doing this at a time?).

I was thinking more along the lines of a failure while processing a
sinval message emitted by the REINDEX.  The sinval message doesn't get
fully processed and therefore we get confused about what the
relfilenode is for pg_class.  If that happened for any other relation,
we could recover by scanning pg_class.  But if it happens for pg_class
or pg_class_oid_index, we're toast, because we can't scan them without
knowing what relfilenode to open.

Now that can't be quite right, because of course those are mapped
relations... and I'm pretty sure there are some other holes in my line
of reasoning too.  Just thinking out loud...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] USECS_* constants undefined with float8 timestamps?

2011-07-29 Thread Johann 'Myrkraverk' Oskarsson
Hi all,

I just noticed that the USECS_* constants are not defined when the server
is compiled without integer dates and timestamps.

Explicitly, timestamp.h is

#ifdef HAVE_INT64_TIMESTAMP
#define USECS_PER_DAY   INT64CONST(864)
#define USECS_PER_HOUR  INT64CONST(36)
#define USECS_PER_MINUTE INT64CONST(6000)
#define USECS_PER_SEC   INT64CONST(100)
#endif

Is there a particular reason for this?  Even with float8 timestamps
there are uses for these constants in extensions.


-- 
   Johann Oskarssonhttp://www.2ndquadrant.com/|[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
  |
   Blog: http://my.opera.com/myrkraverk/blog/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The thing that was bizarre about the one instance in the buildfarm was
 that the error was persistent, ie, once a session had failed all its
 subsequent attempts to access pg_class failed too.

 I was thinking more along the lines of a failure while processing a
 sinval message emitted by the REINDEX.  The sinval message doesn't get
 fully processed and therefore we get confused about what the
 relfilenode is for pg_class.  If that happened for any other relation,
 we could recover by scanning pg_class.  But if it happens for pg_class
 or pg_class_oid_index, we're toast, because we can't scan them without
 knowing what relfilenode to open.

Well, no, because the ScanPgRelation call is not failing internally.
It's performing a seqscan of pg_class and not finding a matching tuple.

You could hypothesize about maybe an sinval message got missed leading
us to scan the old (pre-VAC-FULL) copy of pg_class, but that still
wouldn't explain how come we can't find a valid-looking entry for
pg_class_oid_index in it.

Tis a puzzlement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The thing that was bizarre about the one instance in the buildfarm was
 that the error was persistent, ie, once a session had failed all its
 subsequent attempts to access pg_class failed too.

 I was thinking more along the lines of a failure while processing a
 sinval message emitted by the REINDEX.  The sinval message doesn't get
 fully processed and therefore we get confused about what the
 relfilenode is for pg_class.  If that happened for any other relation,
 we could recover by scanning pg_class.  But if it happens for pg_class
 or pg_class_oid_index, we're toast, because we can't scan them without
 knowing what relfilenode to open.

 Well, no, because the ScanPgRelation call is not failing internally.
 It's performing a seqscan of pg_class and not finding a matching tuple.

SnapshotNow race?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, no, because the ScanPgRelation call is not failing internally.
 It's performing a seqscan of pg_class and not finding a matching tuple.

 SnapshotNow race?

That's what I would have guessed to start with, except that then it
wouldn't be a persistent failure.  You'd get one, but then the next
query would succeed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Common object property boards

2011-07-29 Thread Kohei Kaigai
Robert Haas wrote:
| I think that get_object_namespace() needs to be rethought.  If you
| take a look at AlterObjectNamespace() and its callers, you'll notice
| that we already have, encoded in those call sites, the knowledge of
| which object types can be looked up in which system caches, and which
| columns within the resulting heap tuples will contain the schema OIDs.
|  Here, we're essentially duplicating that information in another
| place, which doesn't seem good.  I think perhaps we should create a
| big static array, each row of which would contain:
| 
| - ObjectType
| - system cache ID for OID lookups
| - system catalog table OID for scans
| - attribute number for the name attribute, where applicable (see
|   AlterObjectNamespace)
| - attribute number for the namespace attribute
| - attribute number for the owner attribute
| - ...and maybe some other properties
| 
| We could then create a function (in objectaddress.c, probably) that
| you call with an ObjectType argument, and it returns a pointer to the
| appropriate struct entry, or calls elog() if none is found.  This
| function could be used by the existing object_exists() functions in
| lieu of having its own giant switch statement, by
| AlterObjectNamespace(), and by this new consolidated DROP code.  If
| you agree with this approach, we should probably make it a separate
| patch.
| 

According to the suggestion from Robert, I'd like to see opinions about
this feature prior to its implementation.

In addition to this suggestion, I think the big static array also contains
the following items:
- Text form of the object type (e.g, table, function, ...)
- Function pointer of validator. It shall raise an error when relkind is
  not RELKIND_RELATION for ObjectType being OBJECT_TABLE, for example.

Does the main lookup function ought to return an entry of the big array?
If so, the definition of structure should be declared in objectaddress.h,
as follows:

  ObjectProperty get_object_property(ObjectType objtype);

In addition to the big array indexed by ObjectType, I'd like to have
a utility function that translate ObjectAddress to ObjectType.
(A part of them need syscache lookup, because pg_class contains several
 objcts types: table, view, ...)

  ObjectType get_object_type(const ObjectAddress *address);

If and when we have both of the function to reference ObjectProperty and
to translate ObjectAddress to ObjectType, it is quite easy to implement
service routines to lookup namespaceId, ownerId and others for the
supplied ObjectAddress, as follows:

  Oid get_object_namespace(const ObjectAddress *address);
  Oid get_object_ownership(const ObjectAddress *address);
  char *get_object_name(const ObjectAddress *address);
  HeapTuple get_object_tuple(const ObjectAddress *address);

And, a translation from ObjectType to type name (e.g table, type, ...)
is helpful to generate error messages.

  const char *get_object_type_name(ObjectType objtype);

Any comments please.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: OT: OFF TOPIC: returning multiple result sets from a stored procedure

2011-07-29 Thread Matt Keranen
I honestly do not mean any offence, just out of curiosity.
If you guys care about money and time why would you spend the best years of 
your life basically copying commercial products for free?

1) For the same reasons commercial vendors build competing products: different 
tools in the same category fit different scenarios

2) The right-tool-for-the-job concept: Sometimes a commercial solution is the 
right choice, sometime an open-source solution is. Budgets, applicability, and 
local expertise all should factor into platform selection decision.

3) Because sometimes the output of a platform is more important than the 
platform itself: If your role is to work with data, not to produce software, it 
can in the end be more efficient in the end to have a platform you can modify 
as needed.

Re: [HACKERS] [RFC] Common object property boards

2011-07-29 Thread Tom Lane
Kohei Kaigai kohei.kai...@emea.nec.com writes:
 In addition to this suggestion, I think the big static array also contains
 the following items:
 - Text form of the object type (e.g, table, function, ...)

What will you do with that that wouldn't be better done by calling
getObjectDescription?  The latter's output is somewhat localizable, but
individual words would be hard to translate.

 Does the main lookup function ought to return an entry of the big array?
 If so, the definition of structure should be declared in objectaddress.h,
 as follows:

It would likely be better to not expose the struct type, just individual
lookup functions.

 And, a translation from ObjectType to type name (e.g table, type, ...)
 is helpful to generate error messages.

   const char *get_object_type_name(ObjectType objtype);

Again, I think this is too low level because of message translation
considerations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
Hi all,

PFA, patch which implements non-inheritable ONLY constraints. This
has been achieved by introducing a new column conisonly in
pg_constraint catalog. Specification of 'ONLY' in the ALTER TABLE ADD
CONSTRAINT CHECK command is used to set this new column to true.
Constraints which have this column set to true cannot be inherited by
present and future children ever.

The psql and pg_dump binaries have been modified to account for such
persistent non-inheritable check constraints. This patch also has
documentation changes along with relevant changes to the test cases.
The regression runs pass fine with this patch applied.

Comments and further feedback, if any, appreciated.

Regards,
Nikhils
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5e5f8a7..683ad67 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1995,6 +1995,16 @@
  /row
 
  row
+  entrystructfieldconisonly/structfield/entry
+  entrytypebool/type/entry
+  entry/entry
+  entry
+   This constraint is defined locally for the relation.  It is a
+  non-inheritable constraint.
+  /entry
+ /row
+
+ row
   entrystructfieldconkey/structfield/entry
   entrytypeint2[]/type/entry
   entryliterallink 
linkend=catalog-pg-attributestructnamepg_attribute/structname/link.attnum//entry
diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 4c2a4cd..3ee3ec0 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -984,6 +984,14 @@ ALTER TABLE distributors ADD CONSTRAINT zipchk CHECK 
(char_length(zipcode) = 5);
   /para
 
   para
+   To add a check constraint only to a table and not to its children:
+programlisting
+ALTER TABLE ONLY distributors ADD CONSTRAINT zipchk CHECK 
(char_length(zipcode) = 5);
+/programlisting
+   (The check constraint will not be inherited by future children too.)
+  /para
+
+  para
To remove a check constraint from a table and all its children:
 programlisting
 ALTER TABLE distributors DROP CONSTRAINT zipchk;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4399493..1b382b8 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -98,10 +98,10 @@ static Oid AddNewRelationType(const char *typeName,
   Oid new_array_type);
 static void RelationRemoveInheritance(Oid relid);
 static void StoreRelCheck(Relation rel, char *ccname, Node *expr,
- bool is_validated, bool is_local, int inhcount);
+ bool is_validated, bool is_local, int inhcount, bool 
is_only);
 static void StoreConstraints(Relation rel, List *cooked_constraints);
 static bool MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr,
-   bool allow_merge, bool 
is_local);
+   bool allow_merge, bool 
is_local, bool is_only);
 static void SetRelationNumChecks(Relation rel, int numchecks);
 static Node *cookConstraint(ParseState *pstate,
   Node *raw_constraint,
@@ -1860,7 +1860,7 @@ StoreAttrDefault(Relation rel, AttrNumber attnum, Node 
*expr)
  */
 static void
 StoreRelCheck(Relation rel, char *ccname, Node *expr,
- bool is_validated, bool is_local, int inhcount)
+ bool is_validated, bool is_local, int inhcount, bool 
is_only)
 {
char   *ccbin;
char   *ccsrc;
@@ -1943,7 +1943,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr,
  ccbin,/* Binary form 
of check constraint */
  ccsrc,/* Source form 
of check constraint */
  is_local, /* 
conislocal */
- inhcount);/* coninhcount 
*/
+ inhcount, /* 
coninhcount */
+ is_only); 
/* conisonly */
 
pfree(ccbin);
pfree(ccsrc);
@@ -1984,7 +1985,7 @@ StoreConstraints(Relation rel, List *cooked_constraints)
break;
case CONSTR_CHECK:
StoreRelCheck(rel, con-name, con-expr, 
!con-skip_validation,
- con-is_local, 
con-inhcount);
+ con-is_local, 
con-inhcount, con-is_only);
numchecks++;
break;
default:
@@ -2100,6 +2101,7 @@ AddRelationNewConstraints(Relation rel,
cooked-skip_validation = false;
cooked-is_local = is_local;
cooked-inhcount = 

Re: [HACKERS] [RFC] Common object property boards

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 1:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It would likely be better to not expose the struct type, just individual
 lookup functions.

I'm not sure about that...  I think that's just going to introduce a
lot of excess notation.

 And, a translation from ObjectType to type name (e.g table, type, ...)
 is helpful to generate error messages.

   const char *get_object_type_name(ObjectType objtype);

 Again, I think this is too low level because of message translation
 considerations.

On this point I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] include host names in hba error messages

2011-07-29 Thread Peter Eisentraut
On tis, 2011-07-19 at 14:17 -0400, Robert Haas wrote:
 I think it would be less confusing to write the IP address as the main
 piece of information, and put the hostname in parentheses only if we
 accepted it as valid (i.e. we did both lookups, and everything
 matched).
 
 ERROR: no pg_hba.conf entry for host 127.0.0.1 (localhost), user
 x, database y
 
 As for the case where we the forward lookup and reverse lookup don't
 match, could we add that as a DETAIL?
 
 ERROR: no pg_hba.conf entry for host 127.0.0.1, user x, database y
 DETAIL: Forward and reverse DNS lookups do not match.

On further reflection, the only way we would get a complete match host
name is if there actually were a line in pg_hba.conf with that host
name, but it didn't match because of other parameters.  So that would be
quite rare, and so the error message would look one way or the other
depending on obscure circumstances, which would be confusing.

But picking up on your second suggestion, I propose instead that we put
a note in the detail about the host name and what we know about it, if
we know it, e.g.

ERROR: no pg_hba.conf entry for host 127.0.0.1, user x, database y
DETAIL: Client IP address resolved to localhost, forward lookup matches.

I chose to use errdetail_log(), which only goes into the server log, so
we don't expose too much about the server's DNS setup to the client.

diff --git i/src/backend/libpq/auth.c w/src/backend/libpq/auth.c
index d153880..1b6399d 100644
--- i/src/backend/libpq/auth.c
+++ w/src/backend/libpq/auth.c
@@ -439,6 +439,17 @@ ClientAuthentication(Port *port)
    NULL, 0,
    NI_NUMERICHOST);
 
+#define HOSTNAME_LOOKUP_DETAIL(port) \
+(port-remote_hostname  \
+ ? (port-remote_hostname_resolv == +1	\
+	? errdetail_log(Client IP address resolved to \%s\, forward lookup matches., port-remote_hostname) \
+	: (port-remote_hostname_resolv == 0\
+	   ? errdetail_log(Client IP address resolved to \%s\, forward lookup not checked., port-remote_hostname) \
+	   : (port-remote_hostname_resolv == -1			\
+		  ? errdetail_log(Client IP address resolved to \%s\, forward lookup does not match., port-remote_hostname) \
+		  : 0)))		\
+ : 0)
+
 if (am_walsender)
 {
 #ifdef USE_SSL
@@ -446,12 +457,14 @@ ClientAuthentication(Port *port)
 	   (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
 		errmsg(no pg_hba.conf entry for replication connection from host \%s\, user \%s\, %s,
 			   hostinfo, port-user_name,
-			   port-ssl ? _(SSL on) : _(SSL off;
+			   port-ssl ? _(SSL on) : _(SSL off)),
+		HOSTNAME_LOOKUP_DETAIL(port)));
 #else
 	ereport(FATAL,
 	   (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
 		errmsg(no pg_hba.conf entry for replication connection from host \%s\, user \%s\,
-			   hostinfo, port-user_name)));
+			   hostinfo, port-user_name),
+		HOSTNAME_LOOKUP_DETAIL(port)));
 #endif
 }
 else
@@ -462,13 +475,15 @@ ClientAuthentication(Port *port)
 		errmsg(no pg_hba.conf entry for host \%s\, user \%s\, database \%s\, %s,
 			   hostinfo, port-user_name,
 			   port-database_name,
-			   port-ssl ? _(SSL on) : _(SSL off;
+			   port-ssl ? _(SSL on) : _(SSL off)),
+		HOSTNAME_LOOKUP_DETAIL(port)));
 #else
 	ereport(FATAL,
 	   (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
 		errmsg(no pg_hba.conf entry for host \%s\, user \%s\, database \%s\,
 			   hostinfo, port-user_name,
-			   port-database_name)));
+			   port-database_name),
+		HOSTNAME_LOOKUP_DETAIL(port)));
 #endif
 }
 break;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI error messages

2011-07-29 Thread Peter Eisentraut
On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote:
 I think I would prefer something like this:
 
 ERROR:  could not serialize access due to read/write dependencies
 among 
 transactions
 DETAIL: Reason code: %s
 HINT:  The transaction might succeed if retried.
 
 Where %s gets the current detail field, untranslated, like:
 
 Canceled on commit attempt with conflict in from prepared pivot.

Do you have an idea how to address this case:

@@ -3865,7 +3865,7 @@ CheckForSerializableConflictOut(bool visible, Relation 
relation,
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 errmsg(could not serialize access due to read/write 
dependencies among transactions),
-errdetail_internal(Canceled on conflict out to old 
pivot %u., xid),
+errdetail(Reason code: %s., canceled on conflict 
out to old pivot %u, xid),  // XXX bogus
  errhint(The transaction might succeed if retried.)));
 
if (SxactHasSummaryConflictIn(MySerializableXact)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] include host names in hba error messages

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 2:44 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2011-07-19 at 14:17 -0400, Robert Haas wrote:
 I think it would be less confusing to write the IP address as the main
 piece of information, and put the hostname in parentheses only if we
 accepted it as valid (i.e. we did both lookups, and everything
 matched).

 ERROR: no pg_hba.conf entry for host 127.0.0.1 (localhost), user
 x, database y

 As for the case where we the forward lookup and reverse lookup don't
 match, could we add that as a DETAIL?

 ERROR: no pg_hba.conf entry for host 127.0.0.1, user x, database y
 DETAIL: Forward and reverse DNS lookups do not match.

 On further reflection, the only way we would get a complete match host
 name is if there actually were a line in pg_hba.conf with that host
 name, but it didn't match because of other parameters.  So that would be
 quite rare, and so the error message would look one way or the other
 depending on obscure circumstances, which would be confusing.

 But picking up on your second suggestion, I propose instead that we put
 a note in the detail about the host name and what we know about it, if
 we know it, e.g.

 ERROR: no pg_hba.conf entry for host 127.0.0.1, user x, database y
 DETAIL: Client IP address resolved to localhost, forward lookup matches.

 I chose to use errdetail_log(), which only goes into the server log, so
 we don't expose too much about the server's DNS setup to the client.

Seems reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-07-29 Thread Greg Smith

On 07/29/2011 11:04 AM, jord...@go-link.net wrote:

I think that current implementation of checkpoints is not good for huge
shared buffer cache and for many WAL segments. If there is more buffers
and if buffers can be written rarely more updates of buffers can be
combined so total number of writes to disk will be significantly less. I
think that incremental checkpoints can achieve this goal (maybe more) and
price is additional memory (about 1/1000 of size of buffer cache).
   


The current code optimizes for buffers that are written frequently.  
Those will sit in shared_buffers and in the hoped for case, only be 
written once at checkpoint time.


There are two issues with adopting increment checkpoints instead, one 
fundamental, the other solvable but not started on yet:


1) Postponing writes as long as possible always improves the resulting 
throughput of those writes.  Any incremental checkpoint approach will 
detune throughput by some amount.  If you make writes go out more often, 
they will be less efficient; that's just how things work if you 
benchmark anything that allows write combining.  Any incremental 
checkpoint approach is likely to improve latency in some cases if it 
works well, while decreasing throughput in most cases.


2) The incremental checkpoint approach used by other databases, such as 
the MySQL implementation, works by tracking what transaction IDs were 
associated with a buffer update.  The current way PostgreSQL saves 
buffer sync information for the checkpoint to process things doesn't 
store enough information to do that.  As you say, the main price there 
is some additional memory.


From my perspective, the main problem with plans to tweak the 
checkpoint code is that we don't have a really good benchmark that 
tracks both throughput and latency to test proposed changes against.  
Mark Wong has been working to get his TCP-E clone DBT-5 running 
regularly for that purpose, and last I heard that was basically done at 
this point--he's running daily tests now.  There's already a small pile 
of patches that adjust checkpoint behavior around that were postponed 
from being included in 9.1 mainly because it was hard to prove they were 
useful given the benchmark used to test them, pgbench.  I have higher 
hopes for DBT-5 as being a test that gives informative data in this 
area.  I would want to go back and revisit the existing patches (sorted 
checkpoints, spread sync) before launching into this whole new area.  I 
don't think any of those has even been proven not to work, they just 
didn't help the slightly unrealistic pgbench write-heavy workload.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Alvaro Herrera
Excerpts from Nikhil Sontakke's message of vie jul 29 14:12:37 -0400 2011:
 Hi all,
 
 PFA, patch which implements non-inheritable ONLY constraints. This
 has been achieved by introducing a new column conisonly in
 pg_constraint catalog. Specification of 'ONLY' in the ALTER TABLE ADD
 CONSTRAINT CHECK command is used to set this new column to true.
 Constraints which have this column set to true cannot be inherited by
 present and future children ever.
 
 The psql and pg_dump binaries have been modified to account for such
 persistent non-inheritable check constraints. This patch also has
 documentation changes along with relevant changes to the test cases.
 The regression runs pass fine with this patch applied.
 
 Comments and further feedback, if any, appreciated.

Did you look at how this conflicts with my patch to add not null
rows to pg_constraint?

https://commitfest.postgresql.org/action/patch_view?id=601

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI error messages

2011-07-29 Thread Alvaro Herrera
Excerpts from Peter Eisentraut's message of vie jul 29 14:46:20 -0400 2011:
 On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote:
  I think I would prefer something like this:
  
  ERROR:  could not serialize access due to read/write dependencies
  among 
  transactions
  DETAIL: Reason code: %s
  HINT:  The transaction might succeed if retried.
  
  Where %s gets the current detail field, untranslated, like:
  
  Canceled on commit attempt with conflict in from prepared pivot.
 
 Do you have an idea how to address this case:

Call sprintf to expand the %u before ereport()?

 @@ -3865,7 +3865,7 @@ CheckForSerializableConflictOut(bool visible, Relation 
 relation,
 ereport(ERROR,
 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  errmsg(could not serialize access due to read/write 
 dependencies among transactions),
 -errdetail_internal(Canceled on conflict out to old 
 pivot %u., xid),
 +errdetail(Reason code: %s., canceled on conflict 
 out to old pivot %u, xid),  // XXX bogus
   errhint(The transaction might succeed if retried.)));

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI error messages

2011-07-29 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Peter Eisentraut's message of vie jul 29 14:46:20 -0400 2011:
 On lör, 2011-07-16 at 21:55 +0300, Heikki Linnakangas wrote:
 Do you have an idea how to address this case:

 Call sprintf to expand the %u before ereport()?

That sounds like way too much work for a wording improvement that is
exceedingly marginal in the first place.

IMO the only thing really wrong with those messages is that they aren't
complete sentences, which is a style guideline that we ought to try to
adhere to even if we're not bothering to translate them.  Perhaps
Transaction canceled because of ... would read better than Canceled
on ...?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench internal contention

2011-07-29 Thread Robert Haas
On machines with lots of CPU cores, pgbench can start eating up a lot
of system time.  Investigation reveals that the problem is with
random(), which glibc implements like this:

long int
__random ()
{
  int32_t retval;
  __libc_lock_lock (lock);
  (void) __random_r (unsafe_state, retval);
  __libc_lock_unlock (lock);
  return retval;
}
weak_alias (__random, random)

Rather obviously, if you're running enough pgbench threads, you're
going to have a pretty ugly point of contention there.  On the 32-core
machine provided by Nate Boley, with my usual 5-minute SELECT-only
test, lazy-vxid and sinval-fastmessages applied, and scale factor 100,
time shows that pgbench uses almost as much system time as user
time:

$ time pgbench -n -S -T 300 -c 64 -j 64
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 300 s
number of transactions actually processed: 55319555
tps = 184396.016257 (including connections establishing)
tps = 184410.926840 (excluding connections establishing)

real5m0.019s
user21m10.100s
sys 17m45.480s

I patched it to use random_r() - the patch is attached - and here are
the (rather gratifying) results of that test:

$ time ./pgbench -n -S -T 300 -c 64 -j 64
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 300 s
number of transactions actually processed: 71851589
tps = 239503.585813 (including connections establishing)
tps = 239521.816698 (excluding connections establishing)

real5m0.016s
user20m40.880s
sys 9m25.930s

Since a client-limited benchmark isn't very interesting, I think this
change makes sense.  Thoughts?  Objections?  Coding style
improvements?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


random_r.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench internal contention

2011-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On machines with lots of CPU cores, pgbench can start eating up a lot
 of system time.  Investigation reveals that the problem is with
 random(),

Interesting.

 I patched it to use random_r() - the patch is attached - and here are
 the (rather gratifying) results of that test:
 Since a client-limited benchmark isn't very interesting, I think this
 change makes sense.  Thoughts?  Objections?

Portability, or rather lack of it.  What about using erand48, which we
already have a dependency on (and substitute code for)?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-07-29 Thread daveg
On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote:
 The thing that was bizarre about the one instance in the buildfarm was
 that the error was persistent, ie, once a session had failed all its
 subsequent attempts to access pg_class failed too.  I gather from Dave's
 description that it's working that way for him too.  I can think of ways
 that there might be a transient race condition against a REINDEX, but
 it's very unclear why the failure would persist across multiple
 attempts.  The best idea I can come up with is that the session has
 somehow cached a wrong commit status for the reindexing transaction,
 causing it to believe that both old and new copies of the index's
 pg_class row are dead ... but how could that happen?  The underlying

It is definitely persistant. Once triggered the error occurs for any new
statement until the session exits.

 state in the catalog is not wrong, because no concurrent sessions are
 upset (at least not in the buildfarm case ... Dave, do you see more than
 one session doing this at a time?).

It looks like it happens to multiple sessions so far as one can tell from
the timestamps of the errors:

 timestampsessionid error
  - --
 03:05:37.434 4e26a861.4a6d could not find pg_class tuple for index 2662
 03:05:37.434 4e26a861.4a6f could not find pg_class tuple for index 2662

 03:06:12.041 4e26a731.438e could not find pg_class tuple for index 2662

 03:06:12.042 4e21b6a3.629b could not find pg_class tuple for index 2662
 03:06:12.042 4e26a723.42ec could not find pg_class tuple for index 2662 at 
character 13


-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench internal contention

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On machines with lots of CPU cores, pgbench can start eating up a lot
 of system time.  Investigation reveals that the problem is with
 random(),

 Interesting.

 I patched it to use random_r() - the patch is attached - and here are
 the (rather gratifying) results of that test:
 Since a client-limited benchmark isn't very interesting, I think this
 change makes sense.  Thoughts?  Objections?

 Portability, or rather lack of it.  What about using erand48, which we
 already have a dependency on (and substitute code for)?

Neither our implementation nor glibc's appears to be thread-safe, and
erand48() is deprecated according to my Linux man page:

NOTES
   These functions are declared obsolete by  SVID  3,  which  states  that
   rand(3) should be used instead.

glibc provides erand48_r(), and I suppose we could kludge up something
similar out of what's already in src/port?

This is also not exactly the world's most sophisticated algorithm, but
perhaps for pgbench that doesn't matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] USECS_* constants undefined with float8 timestamps?

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 11:18 AM, Johann 'Myrkraverk' Oskarsson
joh...@2ndquadrant.com wrote:
 Hi all,

 I just noticed that the USECS_* constants are not defined when the server
 is compiled without integer dates and timestamps.

 Explicitly, timestamp.h is

 #ifdef HAVE_INT64_TIMESTAMP
 #define USECS_PER_DAY   INT64CONST(864)
 #define USECS_PER_HOUR  INT64CONST(36)
 #define USECS_PER_MINUTE INT64CONST(6000)
 #define USECS_PER_SEC   INT64CONST(100)
 #endif

 Is there a particular reason for this?  Even with float8 timestamps
 there are uses for these constants in extensions.

I don't see any particular reason not define them unconditionally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RC1 / Beta4?

2011-07-29 Thread Joshua Berkus
All,

Where are we on RC1 or Beta4 for PostgreSQL 9.1?  

While I know we're doing going to do a final release in August due to the 
europeans, it would be nice to move things along before then.  There don't seem 
to be any blockers open.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-07-29 Thread Alvaro Herrera
Excerpts from Robert Haas's message of sáb jul 23 07:40:12 -0400 2011:
 On Sat, Jul 23, 2011 at 4:37 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
  That looks wrong to me, because a NOT NULL constraint is a column
  constraint not a table constraint. The CREATE TABLE syntax explicitly
  distinguishes these 2 cases, and only allows NOT NULLs in column
  constraints. So from a consistency point-of-view, I think that ALTER
  TABLE should follow suit.
 
  So the new syntax could be:
 
  ALTER TABLE table_name ALTER [COLUMN] col_name ADD column_constraint
 
  where column_constraint is the same as in CREATE TABLE (i.e., allowing
  all the other constraint types at the same time).
 
  It looks like that approach would probably lend itself to more
  code-reusability too, especially once we start adding options to the
  constraint.
 
 So you'd end up with something like this?
 
 ALTER TABLE foo ALTER COLUMN bar ADD CONSTRAINT somename NOT NULL
 
 That works for me.  I think sticking the name of the constraint in
 there at the end of the line as Alvaro proposed would be terrible for
 future syntax extensibility - we'll be much less likely to paint
 ourselves into a corner with something like this.

Here's a patch that does things more or less in this way.  Note that
this is separate from the other patch, so while you can specify a
constraint name for the NOT NULL clause, it's not stored anywhere.

This is preliminary: there's no docs nor new tests.  Here's how it works
(you can also throw in PRIMARY KEY into the mix, but not EXCLUSION):

alvherre=# create table bar (a int);
CREATE TABLE
alvherre=# alter table bar alter column a add constraint foo_fk references foo 
initially deferred deferrable check (a  4) constraint a_uq unique constraint 
fnn not null;
NOTICE:  ALTER TABLE / ADD UNIQUE creará el índice implícito «a_uq» para la 
tabla «bar»
ALTER TABLE
alvherre=# \d bar
Tabla «public.bar»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null
Índices:
a_uq UNIQUE CONSTRAINT, btree (a)
Restricciones CHECK:
bar_a_check CHECK (a  4)
Restricciones de llave foránea:
foo_fk FOREIGN KEY (a) REFERENCES foo(a) DEFERRABLE INITIALLY DEFERRED


The implementation is a bit dirty (at least IMO), but I don't see a way
around that, mainly because ALTER TABLE / ALTER COLUMN does not have a
proper ColumnDef to stick the Constraint nodes into; so while the other
constraints can do fine without that, it isn't very helpful for NOT NULL.
So it has to create a phony ColumnDef for transformConstraintItems to use.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


alter_add.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduced power consumption in autovacuum launcher process

2011-07-29 Thread Peter Geoghegan
Attached is revision of this patch that now treats the latch in
PGPROC, waitLatch, as the generic process latch, rather than just
using it for sync rep; It is initialised appropriately as a shared
latch generically, within InitProcGlobal(), and ownership is
subsequently set within InitProcess(). We were doing so before, though
only for the benefit of sync rep.

The idea here is to have a per-process latch to guard against time-out
invalidation issues from within each generic signal handler, by
calling SetLatch() on our generic latch there, much as we already do
from within non-generic archiver process signal handlers on the
archiver's static, non-generic latch (the archiver has its MyProc
pointer set to NULL, and we allow for the possibility that generic
handlers may be registered within processes that have a NULL MyProc -
though latch timeout invalidation issues then become the
responsibility of the process exclusively, just as is currently the
case with the archiver. In other words, they better not register a
generic signal handler).

It doesn't really matter that the SetLatch() call will usually be
unnecessary, because, as Heikki once pointed out, redundantly setting
a latch that is already set is very cheap. We don't check if it's set
directly in advance of setting the latch, because the Latch struct is
logically opaque and there is no public function to check if it's
set, nor should there be; the checking simply happens in SetLatch().

On 18 July 2011 20:06, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Right, we can easily change the timeout argument to be in milliseconds
 instead of microseconds.

I've done so in this latest revision as a precautionary measure. I
don't see much point in sub-millisecond granularity, and besides, the
Windows implementation will not provide that granularity anyway as
things stand.

Thoughts?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6a6959f..b2cf973 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10210,7 +10210,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
+	WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000L);
 	ResetLatch(XLogCtl-recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 9940a42..0663eb8 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -182,7 +182,7 @@ DisownLatch(volatile Latch *latch)
  * to wait for. If the latch is already set (and WL_LATCH_SET is given), the
  * function returns immediately.
  *
- * The 'timeout' is given in microseconds. It must be = 0 if WL_TIMEOUT
+ * The 'timeout' is given in milliseconds. It must be = 0 if WL_TIMEOUT
  * event is given, otherwise it is ignored. On some platforms, signals cause
  * the timeout to be restarted, so beware that the function can sleep for
  * several times longer than the specified timeout.
@@ -236,8 +236,9 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	if (wakeEvents  WL_TIMEOUT)
 	{
 		Assert(timeout = 0);
-		tv.tv_sec = timeout / 100L;
-		tv.tv_usec = timeout % 100L;
+		tv.tv_sec = timeout / 1000L;
+		/* tv_usec should be in microseconds */
+		tv.tv_usec = (timeout % 1000L) * 1000L;
 		tvp = tv;
 	}
 
@@ -329,6 +330,11 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 /*
  * Sets a latch and wakes up anyone waiting on it. Returns quickly if the
  * latch is already set.
+ *
+ * Note that there is a general requirement to call this function within
+ * signal handlers, to ensure that latch timeout is not invalidated. For
+ * generic signal handlers that may be registered for multiple processes,
+ * it will be called with the per-process latch in PGPROC as its argument.
  */
 void
 SetLatch(volatile Latch *latch)
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 8d59ad4..5ee4f40 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -99,23 +99,14 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock,
 	int			numevents;
 	int			result = 0;
 	int			pmdeath_eventno = 0;
-	long		timeout_ms;
 
 	Assert(wakeEvents != 0);
+	Assert(timeout = 0 || (wakeEvents  WL_TIMEOUT) == 0);
 
 	/* Ignore WL_SOCKET_* events if no valid socket is given */
 	if (sock == PGINVALID_SOCKET)
 		wakeEvents = ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
 
-	/* Convert timeout to milliseconds for WaitForMultipleObjects() */
-	if (wakeEvents  WL_TIMEOUT)
-	{
-		Assert(timeout = 0);
-		timeout_ms = timeout / 1000;
-	}
-	else
-		timeout_ms = INFINITE;
-
 	/* Construct an array of event handles for 

Re: [HACKERS] Incremental checkopints

2011-07-29 Thread jordani
 If you make writes go out more often, they will be less efficient

I think fsync is more important. But many writes + fsync is no good too.
Let suppose that 30 WAL segments are good for performance (to be written
at once). In incremental approach we can have 60 segments and we can write
30 at once. There is no checkpoint_timeout - more buffers will stay more
time.

I can not see any disadvantage.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Check constraints on partition parents only?

2011-07-29 Thread Nikhil Sontakke
 Comments and further feedback, if any, appreciated.

 Did you look at how this conflicts with my patch to add not null
 rows to pg_constraint?

 https://commitfest.postgresql.org/action/patch_view?id=601


I was certainly not aware of this patch in the commitfest. Your patch
has a larger footprint with more functional changes in it. IMHO, it
will be easiest to queue this non-inheritable constraints patch behind
your patch in the commitfest. There will be certain bitrot, which I
can fix once your patch gets committed.

Regards,
Nikhils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix for pg_update on win32

2011-07-29 Thread Bruce Momjian
Based on EnterpriseDB testing, I have applied the attached patch to fix
a pg_upgrade bug on Win32 on 9.1 and 9.2.  The problem is that on Win32
you can't stat() a directory with a trailing slash --- this is already
mentioned in our src/port/path.c.  The patch removes a lone trailing
slash.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
new file mode 100644
index ef21899..b632584
*** a/contrib/pg_upgrade/exec.c
--- b/contrib/pg_upgrade/exec.c
*** check_data_dir(const char *pg_data)
*** 168,174 
  	{
  		struct stat statBuf;
  
! 		snprintf(subDirName, sizeof(subDirName), %s/%s, pg_data,
   requiredSubdirs[subdirnum]);
  
  		if (stat(subDirName, statBuf) != 0)
--- 168,176 
  	{
  		struct stat statBuf;
  
! 		snprintf(subDirName, sizeof(subDirName), %s%s%s, pg_data,
! 			/* Win32 can't stat() a directory with a trailing slash. */
!  *requiredSubdirs[subdirnum] ? / : ,
   requiredSubdirs[subdirnum]);
  
  		if (stat(subDirName, statBuf) != 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers