Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Simon Riggs
On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> I think its possible to tell automatically whether we need to replan
>> always or not based upon the path we take through selectivity
>> functions.
>
> I don't really believe that, or at least I think it would only detect a
> few cases.  Examples of parameter-value-sensitive decisions that are
> made nowhere near the selectivity functions are constraint exclusion and
> LIKE pattern to index-qual conversion.  And in none of these cases do we
> really know at the bottom level whether a different parameter value will
> lead to a significant change in the finished plan.  For instance, if
> there's no index for column foo, it is a waste of time to force
> replanning just because we have varying selectivity estimates for
> "WHERE foo > $1".
>
> I think we'll be a lot better off with the framework discussed last
> year: build a generic plan, as well as custom plans for the first few
> sets of parameter values, and then observe whether there's a significant
> reduction in estimated costs for the custom plans.

The problem there is which executions we build custom plans for. That
turns the problem into a sampling issue and you'll only fix the
problems that occur with a frequency to match your sampling pattern
and rate. Examples of situations where it won't help.

* plans that vary by table size will be about the same in the first 5
executions. After large number of executions, things go bad.

* text search using parameter is provided by user input - sensible
requests have low selectivities; some users put in  or "e" and
then we try to retrieve whole table by index scan. Almost impossible
to prevent all potentially high selectivity inputs from user. We could
add LIMIT but frequently ORM generated queries do not do that.

This isn't my-way-or-your-way - I think we need to look at some form
of "safety barriers" so we generate a plan but also know when the plan
has outlived its usefulness and force a re-plan.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Tom Lane
Peter Geoghegan  writes:
> On 3 August 2011 21:03, Tom Lane  wrote:
>> I mean that it's unclear what you'll get if status has a bitpattern
>> equivalent to a negative integer.  If the compiler implements the
>> comparison as signed, the test will yield TRUE; if unsigned, it's FALSE.

> On compilers on which the enum value is represented as an unsigned
> int, passing -1 is just the same as doing that with any function with
> an unsigned int argument for that argument - it will convert to a
> large unsigned value . So sure, that isolated part of the test's
> outcome will vary depending on whether or not the compiler opts to
> represent the enum as signed when it can, but I don't look at it as
> you do. I simply consider that to be a violation of the enum's
> contract, or the caller's failure to consider that the enum couldn't
> represent -1 -- they got what they asked for.

This argument is completely missing the point of the test, which is to
verify whether or not the caller adhered to the enum's contract.  You
can *not* assume that he did while arguing about whether the test is
valid or accomplishes its goals.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Heikki Linnakangas

On 04.08.2011 04:21, David Fetter wrote:

On Wed, Aug 03, 2011 at 01:40:42PM +0300, Heikki Linnakangas wrote:

On 03.08.2011 13:05, Peter Geoghegan wrote:

I don't believe that the standard allows for an implementation of
enums as unsigned integers - after all, individual enum literals can
be given corresponding negative integer values.


C99 says:


Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,110) but 
shall be
capable of representing the values of all the members of the enumeration.


Are we moving to C99?

C89 says:

 Each enumerated type shall be compatible with an integer type; the
 choice of type is implementation-defined.


Well, that's the same thing, just in less explicit words.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 6:21 PM, Jim Nasby  wrote:
> On Aug 3, 2011, at 1:21 PM, Robert Haas wrote:
>> 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
>> because PostgreSQL protects its free-list with a single lock and thus
>> scales poorly with smaller caches."  This is a complaint about
>> BufFreeList lock which, in fact, I've seen as a huge point of
>> contention on some workloads.  In fact, on read-only workloads, with
>> my lazy vxid lock patch applied, this is, I believe, the only
>> remaining unpartitioned LWLock that is ever taken in exclusive mode;
>> or at least the only one that's taken anywhere near often enough to
>> matter.  I think we're going to do something about this, although I
>> don't have a specific idea in mind at the moment.
>
> This has been discussed before: 
> http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which 
> itself references 2 other threads).
>
> The basic idea is: have a background process that proactively moves buffers 
> onto the free list so that backends should normally never have to run the 
> clock sweep (which is rather expensive). The challenge there is figuring out 
> how to get stuff onto the free list with minimal locking impact. I think one 
> possible option would be to put the freelist under it's own lock (IIRC we 
> currently use it to protect the clock sweep as well). Of course, that still 
> means the free list lock could be a point of contention, but presumably it's 
> far faster to add or remove something from the list than it is to run the 
> clock sweep.

Based on recent benchmarking, I'm going to say "no".  It doesn't seem
to matter how short you make the critical section: a single
program-wide mutex is a loser.  Furthermore, the "free list" is a
joke, because it's nearly always going to be completely empty.  We
could probably just rip that out and use the clock sweep and never
miss it, but I doubt it would improve performance much.

I think what we probably need to do is have multiple clock sweeps in
progress at the same time.  So, for example, if you have 8GB of
shared_buffers, you might have 8 mutexes, one for each GB.  When a
process wants a buffer, it locks one of the mutexes and sweeps through
that 1GB partition.  If it finds a buffer before returning to the
point at which it started the scan, it's done.  Otherwise, it releases
its mutex, grabs the next one, and continues on until it finds a free
buffer.

The trick with any modification in this area is that pretty much any
degree of increased parallelism is potentially going to reduce the
quality of buffer replacement to some degree. So the trick will be to
try to squeeze out as much concurrency as possible while minimizing
degradation in the quality of buffer replacements.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Alvaro Herrera
Excerpts from Jim Nasby's message of mié ago 03 18:05:21 -0400 2011:

> Not sure how much this relates to this discussion, but I have often wished we 
> had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets 
> you could make use of. Sometimes it's very valuably to be able to look at 
> *all* the rows that changed in a transaction in one shot.

I have a mind to implement this sometime.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread David Fetter
On Wed, Aug 03, 2011 at 01:40:42PM +0300, Heikki Linnakangas wrote:
> On 03.08.2011 13:05, Peter Geoghegan wrote:
> >I don't believe that the standard allows for an implementation of
> >enums as unsigned integers - after all, individual enum literals can
> >be given corresponding negative integer values.
> 
> C99 says:
> 
> >Each enumerated type shall be compatible with char, a signed integer type, 
> >or an
> >unsigned integer type. The choice of type is implementation-defined,110) but 
> >shall be
> >capable of representing the values of all the members of the enumeration.

Are we moving to C99?

C89 says:

Each enumerated type shall be compatible with an integer type; the
choice of type is implementation-defined.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 6:05 PM, Jim Nasby  wrote:
> Not sure how much this relates to this discussion, but I have often wished we 
> had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets 
> you could make use of. Sometimes it's very valuably to be able to look at 
> *all* the rows that changed in a transaction in one shot.

Yeah, that would be awesome.  I think some of our competitors provide
exactly that feature...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 5:35 PM, Tom Lane  wrote:
> That still seems utterly astonishing to me.  We're touching each of
> those files once per query cycle; a cycle that contains two message
> sends, who knows how many internal spinlock/lwlock/heavyweightlock
> acquisitions inside Postgres (some of which *do* contend with each
> other), and a not insignificant amount of plain old computing.
> Meanwhile, this particular spinlock inside the kernel is protecting
> what, a single doubleword fetch?  How is that the bottleneck?

Spinlocks seem to have a very ugly "tipping point".  When I tested
pgbench -S on a 64-core system with the lazy vxid patch applied and a
patch to use random_r() in lieu of random, the amount of system time
used per SELECT-only transaction at 48 clients was 3.59 times as much
as it was at 4 clients.  And the amount used per transaction at 52
clients was 3.63 times the amount used per transaction at 48 clients.
And the amount used at 56 clients was 3.25 times the amount used at 52
clients.  You can see the throughput graph starting to flatten out in
the 32-44 client range, but it's not particularly alarming.  However,
once you pass that point things rapidly get totally out of control in
a real hurry.  A few more clients and the machine is basically doing
nothing but spin.

> I am wondering whether kernel spinlocks are broken.

I don't think so.  Stefan Kaltenbrunner had one profile where he
showed something like sixty or eighty percent of the usermode CPU time
in s_lock.  I didn't have access to that particular hardware, but the
testing I've done strongly suggests that most of that was the
SInvalReadLock spinlock.  And before I patched pgbench to avoid
calling random(), that was doing the same thing - literally flattening
a 64-core box fighting over a single futex that normally costs almost
nothing.  (That one wasn't quite as bad because the futex actually
deschedules the waiters, but it was still bad.)  I'm actually not
really sure why it shakes out this way (birthday paradox?) but having
seen the effect several times now, I'm disinclined to believe it's an
artifact.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-08-03 Thread jordani
I have not explained well what I have in my mind in first message.

Main goal is more buffers to stay dirty in memory for longer time. So
checkpoint segments have to be 2, 3... times than in current approach. And
separate parameter can control how much buffers to write at once. DBA can
tune:
- checkpoint segments - higher number for less writes but more time for
crash recovery;
- (how much buffers to write at once) - more for throughput and less for
latency.

> I think what we'd need to track is the LSN that first dirtied the page
> (as opposed to the current field, which tracks the LSN that most
> recently wrote the page).  If we write and flush all pages whose
> first-dirtied LSN precedes some cutoff point, then we ought to be able
> to advance the redo pointer to that point.

Also if the page is written by backend LSN_first_dirtied have to be cleared.

I believe (but I can not prove) it worths testing and advantage will be
noticeable.

Jordan Ivanov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 21:03, Tom Lane  wrote:
> I mean that it's unclear what you'll get if status has a bitpattern
> equivalent to a negative integer.  If the compiler implements the
> comparison as signed, the test will yield TRUE; if unsigned, it's FALSE.

On compilers on which the enum value is represented as an unsigned
int, passing -1 is just the same as doing that with any function with
an unsigned int argument for that argument - it will convert to a
large unsigned value . So sure, that isolated part of the test's
outcome will vary depending on whether or not the compiler opts to
represent the enum as signed when it can, but I don't look at it as
you do. I simply consider that to be a violation of the enum's
contract, or the caller's failure to consider that the enum couldn't
represent -1 -- they got what they asked for.

To put it another way, you could say that the first part of the test
only exists for the benefit of compilers that treat all enums as
signed. Clang recognised that redundancy for its unsigned case, and
complained. It doesn't make sense to me to look at that one part of
the test in isolation though. With my patch, the test as a whole does
its job (in an identical fashion to before, in fact).

Come to think of it, you could argue that the warning is invalid on
the basis that the enum being unsigned is implementation defined and
therefore the first part of the test is conceivably not redundant on
some platforms, and perhaps I will on the Clang list. Probably not
though, because it was hard enough with the warning bug that I managed
to get them to fix, and I thought that was open-and-shut.

>> I'm not convinced that that is an improvement to rely on the
>> conversion doing so, but it's not as if I feel very strongly about it.
>
> The C standard specifies that signed-to-unsigned conversions must work
> like that; and even if the standard didn't, it would surely work like
> that on any machine with two's-complement representation, which is to
> say every computer built in the last forty years or so.  So I don't find
> it a questionable assumption.

I wasn't questioning whether it would work. I just think that relying
on the behaviour of the conversion like that doesn't make your intent
very clear. If I thought that it would or could be plain broken under
certain circumstances, I would naturally feel strongly about it; as
I've said, I don't.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-03 Thread Tom Lane
James Robinson  writes:
> Python's getattr() allows for dynamic lookup of attributes on an  
> object, as in:

>   inst = MyClass(x=12, y=24)
>   v = getattr(inst, 'x')
>   assert v == 12

> Oftentimes in writing data validating trigger functions, it'd be real  
> handy to be able to do a similar thing in plpgsql

> Is there something in the internals which inherently prevent this? 

plpgsql is strongly typed (much more so than python, anyway).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread daveg
On Wed, Aug 03, 2011 at 04:03:39PM -0400, Tom Lane wrote:
> The C standard specifies that signed-to-unsigned conversions must work
> like that; and even if the standard didn't, it would surely work like
> that on any machine with two's-complement representation, which is to
> say every computer built in the last forty years or so.  So I don't find
> it a questionable assumption.

I had the "pleasure" of working on a Univac 1108 in about 1978 and
it was very definitely ones complement. I'm somewhat amazed to find that
the Univac 1100 series architecture and instruction set lives on to this
day. The last pure 1100 seems to be the Unisys 2200/3800 released in 1997.
Even later U1100/Exec 8 descendants appear to still exist and are still
actively supported:

  http://en.wikipedia.org/wiki/Unisys_OS_2200_operating_system

So there are still ones complement machines out there. However I suggest we
pretend otherwise and continue to ignore them.

-dg

--
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Jim Nasby
On Aug 3, 2011, at 1:21 PM, Robert Haas wrote:
> 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
> because PostgreSQL protects its free-list with a single lock and thus
> scales poorly with smaller caches."  This is a complaint about
> BufFreeList lock which, in fact, I've seen as a huge point of
> contention on some workloads.  In fact, on read-only workloads, with
> my lazy vxid lock patch applied, this is, I believe, the only
> remaining unpartitioned LWLock that is ever taken in exclusive mode;
> or at least the only one that's taken anywhere near often enough to
> matter.  I think we're going to do something about this, although I
> don't have a specific idea in mind at the moment.

This has been discussed before: 
http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which itself 
references 2 other threads).

The basic idea is: have a background process that proactively moves buffers 
onto the free list so that backends should normally never have to run the clock 
sweep (which is rather expensive). The challenge there is figuring out how to 
get stuff onto the free list with minimal locking impact. I think one possible 
option would be to put the freelist under it's own lock (IIRC we currently use 
it to protect the clock sweep as well). Of course, that still means the free 
list lock could be a point of contention, but presumably it's far faster to add 
or remove something from the list than it is to run the clock sweep.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Jim Nasby
On Aug 2, 2011, at 7:09 AM, Simon Riggs wrote:
>>  The best compression and flexibility in
>>> that case is to store a bitmap since that will average out at about 1
>>> bit per row, with variable length bitmaps. Which is about 8 times
>>> better compression ratio than originally suggested, without any loss
>>> of metadata.
>> 
>> Yeah that's probably possible in specific cases, but I'm still not
>> sure how to make it meet the full requirements of the after trigger
>> queue.
> 
> I think you'd better explain what use case you are trying to optimise
> for. It seems unlikely that you will come up with a compression scheme
> that will fit all cases.
> 
> The only cases that seem a problem to me are
> * bulk RI checks
> * large writes on tables using trigger based replication
> maybe you have others?

Not sure how much this relates to this discussion, but I have often wished we 
had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets you 
could make use of. Sometimes it's very valuably to be able to look at *all* the 
rows that changed in a transaction in one shot.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locking end of indexes during VACUUM

2011-08-03 Thread Tom Lane
Simon Riggs  writes:
> What seems strange is that we make no attempt to check whether we have
> already identified all tuples being removed by the VACUUM. We have the
> number of dead tuples we are looking for and we track the number of
> tuples we have deleted from the index, so we could easily make this
> check early and avoid waiting.

> Can we avoid scanning all pages once we have proven we have all dead tuples?

That seems pretty dangerous to me, as it makes correctness of tuple
removal totally dependent on there not being any duplicates, etc.
I don't think there's a sufficiently compelling performance argument
here to justify making VACUUM more fragile.

(In any case, since VACUUM is visiting the leaf pages in physical not
logical order, it's hard to argue that there would be any clear win for
specific application access patterns such as "hitting the largest keys a
lot".)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-03 Thread James Robinson

Hackers,

Python's getattr() allows for dynamic lookup of attributes on an  
object, as in:


inst = MyClass(x=12, y=24)
v = getattr(inst, 'x')
assert v == 12

Oftentimes in writing data validating trigger functions, it'd be real  
handy to be able to do a similar thing in plpgsql against column  
values in a row or record type, such as making use of a trigger  
argument for hint as what column to consider in this table's case. Oh,  
to be able to do something like (toy example known to be equivalent to  
a check):


CREATE OR REPLACE FUNCTION must_be_positive() RETURNS TRIGGER AS
$$
begin
if getattr(NEW, TG_ARGV[0]) <= 0
then
raise exception(TG_ARGV[0] || ' must be positive');
end if;

-- after trigger
return null;
end;
$$ LANGUAGE PLPGSQL;


A function which takes a row + a text column name, and / or a peer  
function taking row + index within row would really open up plpgsql's  
expressivity in cases where you're writing mainly SQL stuff, not  
really wanting to go over to plpythonu or whatnot (whose description  
of rows are as dicts).


Is there something in the internals which inherently prevent this? Or  
am I fool and it already exists?


Not having to defer to EXECUTE would be attractive.



James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 3, 2011 at 4:38 PM, Tom Lane  wrote:
>> ... We could possibly accept stale values for the
>> planner estimates, but I think heapam's number had better be accurate.

> I think the exact requirement is that, if the relation turns out to be
> larger than the size we read, the extra blocks had better not contain
> any tuples our snapshot can see.  There's actually no interlock
> between smgrnblocks() and smgrextend() right now, so presumably we
> don't need to add one.

No interlock in userspace, you mean.  We're relying on the kernel to do
it, ie, give us a number that is not older than the time of our (already
taken at this point) snapshot.

> I don't really think there's any sensible way to implement a
> per-backend cache, because that would require invalidation events of
> some kind to be sent on relation extension, and that seems utterly
> insane from a performance standpoint, even if we invented something
> less expensive than sinval.

Yeah, that's the issue.  But "relation extension" is not actually a
cheap operation, since it requires a minimum of one kernel call that is
presumably doing something nontrivial in the filesystem.  I'm not
entirely convinced that we couldn't make this work --- especially since
we could certainly derate the duty cycle by a factor of ten or more
without giving up anything remotely meaningful in planning accuracy.
(I'd be inclined to make it send an inval only once the relation size
had changed at least, say, 10%.)

> A shared cache seems like it could work, but the locking is tricky.
> Normally we'd just use a hash table protected by an LWLock, one one
> LWLock per partition, but here that's clearly not going to work.  The
> kernel is using a spinlock per file, and that's still too
> heavy-weight.

That still seems utterly astonishing to me.  We're touching each of
those files once per query cycle; a cycle that contains two message
sends, who knows how many internal spinlock/lwlock/heavyweightlock
acquisitions inside Postgres (some of which *do* contend with each
other), and a not insignificant amount of plain old computing.
Meanwhile, this particular spinlock inside the kernel is protecting
what, a single doubleword fetch?  How is that the bottleneck?

I am wondering whether kernel spinlocks are broken.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-03 Thread Alvaro Herrera
Excerpts from Dean Rasheed's message of sáb jul 23 04:37:06 -0400 2011:
> On 22 July 2011 22:28, Robert Haas  wrote:

> >> mine was that we need a command such as
> >>
> >> ALTER TABLE foo ALTER COLUMN bar SET NOT NULL name_of_notnull_constr
> >>
> >> where the last bit is what's new.
> >
> > Well, if you don't have that, I don't see how you have any chance of
> > pg_dump working correctly.
> 
> Ah yes, pg_dump's dumpConstraint() needs a clause to alter a table
> adding a named NOT NULL constraint (and the DOMAIN case should be
> preserving the constraint's name too). So it looks like some new
> syntax for ALTER TABLE to add named NOT NULL constraints is probably
> needed before this can be committed.

So after writing the code to handle named NOT NULL constraints for
tables, I'm thinking that dumpConstraints needs to be fixed thusly:

@@ -12888,6 +12968,27 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 NULL, NULL);
}
}
+   else if (coninfo->contype == 'n' && tbinfo)
+   {
+   /* NOT NULL constraint on a table */
+   if (coninfo->separate)
+   {
+   write_msg(NULL, "NOT NULL constraints cannot be dumped separately 
from their owning table\n");
+   exit_nicely();
+   }
+   }
+   else if (coninfo->contype == 'n' && tbinfo == NULL)
+   {
+   /* NOT NULL constraint on a domain */
+   TypeInfo   *tyinfo = coninfo->condomain;
+
+   /* Ignore if not to be dumped separately */
+   if (coninfo->separate)
+   {
+   write_msg(NULL, "NOT NULL constraints cannot be dumped separately 
from their owning domain\n");
+   exit_nicely();
+   }
+   }
else
{
write_msg(NULL, "unrecognized constraint type: %c\n", coninfo->contype);


There doesn't seem to be any point in giving pg_dump the ability to dump
such constraints separately; I don't think there's any situation in
which dependencies between the table/domain and NOT NULL constraints
would get circular (which is what causes them to acquire the "separate"
flag).  I don't want to write code that is going to be always
unused, particularly not in a beast as hairy such as pg_dump.

Note that the code dumps not null constraints along with the table
definition, which includes the constraint name, so it works perfectly
fine already.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] Odd VACUUM behavior when it is expected to truncate last empty pages

2011-08-03 Thread Pavan Deolasee
On Wed, Aug 3, 2011 at 12:33 PM, Pavan Deolasee
 wrote:
>
>
> The only problem, other than a surprising behavior that you noted,
> that I see with this approach is that we might repeatedly try to
> truncate a relation which in fact does not have anything to truncate.
> The worst  thing is we might unnecessarily take an exclusive lock on
> the table.
>

So it seems we tried to fix this issue sometime back
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01994.php

But I don't quite understand how the fix would really work.
nonempty_pages would most likely be set at a value lower than relpages
if the last page in the relation is all-visible according to the
visibility map. Did we mean to test (nonempty_pages > 0) there ? But
even that may not work except for the case when there are no dead
tuples in the relation.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 4:38 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Aug 3, 2011 at 3:38 PM, Tom Lane  wrote:
>>> (If the query ended up being a seqscan, I'd expect a second
>>> lseek(SEEK_END) when the executor starts up, but I gather from the other
>>> complaints that the mosbench people were only testing simple indexscan
>>> queries.)
>
>> Yeah, it seems that for a sequential scan we lseek the heap, then the
>> index, then the heap again; but for index scans we just hit the heap
>> and the index.
>
> Sure.  The first two come from the planner getting the table and index
> sizes for estimation purposes (look in plancat.c).  The last is done in
> heapam.c's initscan().  We could possibly accept stale values for the
> planner estimates, but I think heapam's number had better be accurate.

I think the exact requirement is that, if the relation turns out to be
larger than the size we read, the extra blocks had better not contain
any tuples our snapshot can see.  There's actually no interlock
between smgrnblocks() and smgrextend() right now, so presumably we
don't need to add one.  However, a value cached from a few seconds ago
is clearly not going to cut it.

I don't really think there's any sensible way to implement a
per-backend cache, because that would require invalidation events of
some kind to be sent on relation extension, and that seems utterly
insane from a performance standpoint, even if we invented something
less expensive than sinval.  I guess it might work for planning
purposes if you only sent out invalidation events on every N'th
extension or something, but penalizing the accuracy of planning to
work around a Linux kernel bug that only manifests itself on machines
with >32 cores doesn't seem very appealing.

A shared cache seems like it could work, but the locking is tricky.
Normally we'd just use a hash table protected by an LWLock, one one
LWLock per partition, but here that's clearly not going to work.  The
kernel is using a spinlock per file, and that's still too
heavy-weight.  I think that if we could prepopulate the cache with all
the keys (i.e. relfilenodes) we care about and then never add or evict
any, we could run it completely unlocked.  I believe that the existing
memory barriers in things like LWLockAcquire() would be sufficient to
prevent us from reading a too-old value (i.e. block count).  In
particular, you couldn't read a value that predated your snapshot, if
you got your snapshot by holding ProcArrayLock.  But the races
involved with adding and removing items from the cache are hard to
deal with without using locks, especially because the keys are 12
bytes or more and therefore can't be read or written atomically.

I've been mulling over how we might deal with this and actually coded
up an implementation, but it turns out (surprise, surprise) to have
problems with insufficient locking.  So I'm thinking it over some
more.  And hoping that the Linux guys decide to do something about it.
 This isn't really our bug - lseek is quite cheap in the uncontended
case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Locking end of indexes during VACUUM

2011-08-03 Thread Simon Riggs
During btvacuumscan(), we lock the index for extension and then wait
to acquire a cleanup lock on the last page. Then loop until we find a
point where the index has not expanded again during our wait for lock
on that last page. On a busy index this can take some time, especially
when people regularly access data with the highest values in the
index.

The comments there say "It is critical that we visit all leaf pages,
including ones added after we start the scan, else we might fail to
delete some deletable tuples."

What seems strange is that we make no attempt to check whether we have
already identified all tuples being removed by the VACUUM. We have the
number of dead tuples we are looking for and we track the number of
tuples we have deleted from the index, so we could easily make this
check early and avoid waiting.

Can we avoid scanning all pages once we have proven we have all dead tuples?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 3, 2011 at 3:38 PM, Tom Lane  wrote:
>> (If the query ended up being a seqscan, I'd expect a second
>> lseek(SEEK_END) when the executor starts up, but I gather from the other
>> complaints that the mosbench people were only testing simple indexscan
>> queries.)

> Yeah, it seems that for a sequential scan we lseek the heap, then the
> index, then the heap again; but for index scans we just hit the heap
> and the index.

Sure.  The first two come from the planner getting the table and index
sizes for estimation purposes (look in plancat.c).  The last is done in
heapam.c's initscan().  We could possibly accept stale values for the
planner estimates, but I think heapam's number had better be accurate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 3:38 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On a straight pgbench -S test, you get four system calls per query:
>> recvfrom(), lseek(), lseek(), sendto().  Adding -M prepared eliminates
>> the two lseeks.
>
> [ scratches head... ]  Two?

Yep.

> Is that one for the table and one for its
> lone index, or are we being redundant there?

The former.  Specifically, it appears we're smart enough to only test
the last segment (in this case, the table is large enough that there
is a .1 file, and that's what we're lseeking).

> (If the query ended up being a seqscan, I'd expect a second
> lseek(SEEK_END) when the executor starts up, but I gather from the other
> complaints that the mosbench people were only testing simple indexscan
> queries.)

Yeah, it seems that for a sequential scan we lseek the heap, then the
index, then the heap again; but for index scans we just hit the heap
and the index.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Tom Lane
Peter Geoghegan  writes:
> On 3 August 2011 15:29, Tom Lane  wrote:
>> There is another point here, though, which is that if we're not sure
>> whether the compiler considers ExecStatusType to be signed or unsigned,
>> then we have no idea what the test "status < PGRES_EMPTY_QUERY" even
>> means.

> I'm sorry, but I don't know what you mean by this.

I mean that it's unclear what you'll get if status has a bitpattern
equivalent to a negative integer.  If the compiler implements the
comparison as signed, the test will yield TRUE; if unsigned, it's FALSE.

>> So I think the most reasonable fix is probably
>> 
>>if ((unsigned int) status >= sizeof pgresStatus / sizeof 
>> pgresStatus[0])
>> 
>> which is sufficient to cover both directions, since if status is passed
>> as -1 then it will convert to a large unsigned value.  It's also a
>> natural expression of what we really want, ie, that the integer
>> equivalent of the enum value is in range.

> I'm not convinced that that is an improvement to rely on the
> conversion doing so, but it's not as if I feel very strongly about it.

The C standard specifies that signed-to-unsigned conversions must work
like that; and even if the standard didn't, it would surely work like
that on any machine with two's-complement representation, which is to
say every computer built in the last forty years or so.  So I don't find
it a questionable assumption.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Tom Lane
Robert Haas  writes:
> On a straight pgbench -S test, you get four system calls per query:
> recvfrom(), lseek(), lseek(), sendto().  Adding -M prepared eliminates
> the two lseeks.

[ scratches head... ]  Two?  Is that one for the table and one for its
lone index, or are we being redundant there?

(If the query ended up being a seqscan, I'd expect a second
lseek(SEEK_END) when the executor starts up, but I gather from the other
complaints that the mosbench people were only testing simple indexscan
queries.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Simon Riggs  writes:
> I think its possible to tell automatically whether we need to replan
> always or not based upon the path we take through selectivity
> functions.

I don't really believe that, or at least I think it would only detect a
few cases.  Examples of parameter-value-sensitive decisions that are
made nowhere near the selectivity functions are constraint exclusion and
LIKE pattern to index-qual conversion.  And in none of these cases do we
really know at the bottom level whether a different parameter value will
lead to a significant change in the finished plan.  For instance, if
there's no index for column foo, it is a waste of time to force
replanning just because we have varying selectivity estimates for
"WHERE foo > $1".

I think we'll be a lot better off with the framework discussed last
year: build a generic plan, as well as custom plans for the first few
sets of parameter values, and then observe whether there's a significant
reduction in estimated costs for the custom plans.

But in any case, it's way premature to be debating this until we have
the infrastructure in which we can experiment with different policies.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 3:19 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> This seems like a good design.  Now what would be really cool is if
>> you could observe a stream of queries like this:
>
>> SELECT a, b FROM foo WHERE c = 123
>> SELECT a, b FROM foo WHERE c = 97
>> SELECT a, b FROM foo WHERE c = 236
>
>> ...and say, hey, I could just make a generic plan and use it every
>> time I see one of these.  It's not too clear to me how you'd make
>> recognition of such queries cheap enough to be practical, but maybe
>> someone will think of a way...
>
> Hm, you mean reverse-engineering the parameterization of the query?
> Interesting thought, but I really don't see a way to make it practical.
>
> In any case, it would amount to making up for a bad decision on the
> application side, ie, not transmitting the query in the parameterized
> form that presumably exists somewhere in the application.  I think
> we'd be better served all around by encouraging app developers to rely
> more heavily on parameterized queries ... but first we have to fix the
> performance risks there.

Fair enough.  I have to admit I'm afraid of them right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread David E. Wheeler
On Aug 3, 2011, at 11:28 AM, Grzegorz Jaskiewicz wrote:

> export CC=clang 
> ./configure
> ...

Thanks, I'll give that a try the next time I build (RC1 I guess).

David



Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Robert Haas  writes:
> This seems like a good design.  Now what would be really cool is if
> you could observe a stream of queries like this:

> SELECT a, b FROM foo WHERE c = 123
> SELECT a, b FROM foo WHERE c = 97
> SELECT a, b FROM foo WHERE c = 236

> ...and say, hey, I could just make a generic plan and use it every
> time I see one of these.  It's not too clear to me how you'd make
> recognition of such queries cheap enough to be practical, but maybe
> someone will think of a way...

Hm, you mean reverse-engineering the parameterization of the query?
Interesting thought, but I really don't see a way to make it practical.

In any case, it would amount to making up for a bad decision on the
application side, ie, not transmitting the query in the parameterized
form that presumably exists somewhere in the application.  I think
we'd be better served all around by encouraging app developers to rely
more heavily on parameterized queries ... but first we have to fix the
performance risks there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 2:49 PM, Tom Lane  wrote:
> Martijn van Oosterhout  writes:
>> On Wed, Aug 03, 2011 at 02:21:25PM -0400, Robert Haas wrote:
>>> It would be nice if the Linux guys would fix this problem for us, but
>>> I'm not sure whether they will.  For those who may be curious, the
>>> problem is in generic_file_llseek() in fs/read-write.c.  On a platform
>>> with 8-byte atomic reads, it seems like it ought to be very possible
>>> to read inode->i_size without taking a spinlock.
>
>> Interesting. There's this thread from 2003 suggesting the use of pread
>> instead, it was rejected on the argument that lseek is cheap so not a
>> problem.
>
>> http://archives.postgresql.org/pgsql-patches/2003-02/msg00197.php
>
> That seems rather unrelated.  The point here is our use of lseek to find
> out the current file size --- or at least, I would hope they're not
> trying to read the inode's file size in a SEEK_CUR call.

Correct.

> The reason "-M prepared" helps is presumably that it eliminates most of
> the RelationGetNumberOfBlocks calls the planner does to check current
> table size.  While we could certainly consider using a cheaper (possibly
> more stale) value there, it's a bit astonishing to think that that's the
> main cost in a parse/plan/execute cycle.  Perhaps there are more hotspot
> calls than that one?

Nope.

On a straight pgbench -S test, you get four system calls per query:
recvfrom(), lseek(), lseek(), sendto().  Adding -M prepared eliminates
the two lseeks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Grzegorz Jaskiewicz

On 3 Aug 2011, at 19:20, David E. Wheeler wrote:

> 
> Yeah, not sure whether or not to tweak the Makefile to use Clang. I guess not?
> 

export CC=clang 
./configure
...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-03 Thread Bruce Momjian
Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mar ago 02 22:46:55 -0400 2011:
> 
> > I have created a documentation patch to clarify this, and to mention
> > CREATE TABLE AS which also has this optimization.
> 
> It doesn't seem particularly better to me.  How about something like
> 
> In minimal level, WAL-logging of some operations can be safely skipped,
> which can make those operations much faster (see ).  Operations on
> which this optimization can be applied include: 
> 
>  CREATE INDEX
>  CLUSTER
>  CREATE TABLE AS
>  COPY, when tables that were created or truncated in the same
>  transaction
> 
> 
> Minimal WAL does not contain enough information to reconstruct the data
> from a base backup and the WAL logs, so either archive or
> hot_standby level must be used to enable ...

Good idea --- updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 4fadca9..aac6c3b
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** SET ENABLE_SEQSCAN TO OFF;
*** 1451,1461 
  This parameter can only be set at server start.
 
 
! In minimal level, WAL-logging of some bulk operations, like
! CREATE INDEX, CLUSTER and COPY on
! a table that was created or truncated in the same transaction can be
! safely skipped, which can make those operations much faster (see
! ). But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either archive or hot_standby
  level must be used to enable
--- 1451,1468 
  This parameter can only be set at server start.
 
 
! In minimal level, WAL-logging of some bulk
! operations can be safely skipped, which can make those
! operations much faster (see ).
! Operations in which this optimization can be applied include:
! 
!  CREATE INDEX
!  CLUSTER
!  CREATE TABLE AS
!  COPY into tables that were created or truncated in the same
!  transaction
! 
! But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either archive or hot_standby
  level must be used to enable

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Wed, Aug 03, 2011 at 02:21:25PM -0400, Robert Haas wrote:
>> It would be nice if the Linux guys would fix this problem for us, but
>> I'm not sure whether they will.  For those who may be curious, the
>> problem is in generic_file_llseek() in fs/read-write.c.  On a platform
>> with 8-byte atomic reads, it seems like it ought to be very possible
>> to read inode->i_size without taking a spinlock.

> Interesting. There's this thread from 2003 suggesting the use of pread
> instead, it was rejected on the argument that lseek is cheap so not a
> problem.

> http://archives.postgresql.org/pgsql-patches/2003-02/msg00197.php

That seems rather unrelated.  The point here is our use of lseek to find
out the current file size --- or at least, I would hope they're not
trying to read the inode's file size in a SEEK_CUR call.

The reason "-M prepared" helps is presumably that it eliminates most of
the RelationGetNumberOfBlocks calls the planner does to check current
table size.  While we could certainly consider using a cheaper (possibly
more stale) value there, it's a bit astonishing to think that that's the
main cost in a parse/plan/execute cycle.  Perhaps there are more hotspot
calls than that one?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Dimitri Fontaine
Tom Lane  writes:
> So yes, it'd get a little worse for that use-case.  But you have to
> weigh that against the likelihood that other use-cases will get better.
> If our requirement for a transient-plan mechanism is that no individual
> case can ever be worse than before, then we might as well abandon the
> entire project right now, because the only way to meet that requirement
> is to change nothing.

That is not were I wanted to drift.  It's just that I don't have as much
time as I would like to those days, and so it helps me a lot seeing a
worked out example rather than make sure I parse your proposal
correctly.  Thanks a lot for your answer, I have a very clear
confirmation on how I read your previous email.

I will have to do some testing, but it could well be that this
application will benefit from locking reductions enough that it buys
this effect back.

> Of course we could address the worst cases by providing some mechanism
> to tell the plancache code "always use a generic plan for this query"
> or "always use a custom plan".  I'm not entirely thrilled with that,
> because it's effectively a planner hint and has got the same problems
> as all planner hints, namely that users are likely to get it wrong.

Yeah.

> But it would be relatively painless to supply such a hint at the SPI
> level, which is why I asked whether we should.  It'd be much harder to
> do something equivalent at higher levels, which is why I'm not that
> eager to do it for SPI.

Given the SLA of those prepared queries in my case, I think I could
accept to have to switch from SQL statements to C coded SRF to guarantee
the planning behavior.  It will not make the upgrade cheaper, but I
realize it's a very narrow and specific use case.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-03 Thread Martijn van Oosterhout
On Wed, Aug 03, 2011 at 02:21:25PM -0400, Robert Haas wrote:
> It would be nice if the Linux guys would fix this problem for us, but
> I'm not sure whether they will.  For those who may be curious, the
> problem is in generic_file_llseek() in fs/read-write.c.  On a platform
> with 8-byte atomic reads, it seems like it ought to be very possible
> to read inode->i_size without taking a spinlock.

Interesting. There's this thread from 2003 suggesting the use of pread
instead, it was rejected on the argument that lseek is cheap so not a
problem.

http://archives.postgresql.org/pgsql-patches/2003-02/msg00197.php

Perhaps we now have a benchmark where the effect can be measured.

There's the issue about whether it screws up the readahead mechanism...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


[HACKERS] mosbench revisited

2011-08-03 Thread Robert Haas
About nine months ago, we had a discussion of some benchmarking that
was done by the mosbench folks at MIT:

http://archives.postgresql.org/pgsql-hackers/2010-10/msg00160.php

Although the authors used PostgreSQL as a test harness for driving
load, it's pretty clear from reading the paper that their primary goal
was to stress the Linux kernel, so the applicability of the paper to
real-world PostgreSQL performance improvement is less than it might
be.  Still, having now actually investigated in some detail many of
the same performance issues that they were struggling with, I have a
much clearer understanding of what's really going on here.  In
PostgreSQL terms, here are the bottlenecks they ran into:

1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
because PostgreSQL protects its free-list with a single lock and thus
scales poorly with smaller caches."  This is a complaint about
BufFreeList lock which, in fact, I've seen as a huge point of
contention on some workloads.  In fact, on read-only workloads, with
my lazy vxid lock patch applied, this is, I believe, the only
remaining unpartitioned LWLock that is ever taken in exclusive mode;
or at least the only one that's taken anywhere near often enough to
matter.  I think we're going to do something about this, although I
don't have a specific idea in mind at the moment.

2. "PostgreSQL implements row- and table-level locks atop user-level
mutexes; as a result, even a non-conflicting row- or table-level lock
acquisition requires exclusively locking one of only 16 global
mutexes."  I think that the reference to row-level locks here is a red
herring; or at least, I haven't seen any evidence that row-level
locking is a meaningful source of contention on any workload I've
tested.  Table-level locks clearly are, and this is the problem that
the now-committed fastlock patch addressed.  So, fixed!

3. "Our workload creates one PostgreSQL connection per server core and
sends queries (selects or updates) in batches of 256, aggregating
successive read-only transac- tions into single transactions. This
workload is intended to minimize application-level contention within
PostgreSQL in order to maximize the stress PostgreSQL places on the
kernel."  I had no idea what this was talking about at the time, but
it's now obvious in retrospect that they were working around the
overhead imposed by acquiring and releasing relation and virtualxid
locks.  My pending "lazy vxids" patch will address the remaining issue
here.

4. "With modified PostgreSQL on stock Linux, throughput for both
workloads collapses at 36 cores .. The main reason is the kernel's
lseek implementation."  With the fastlock, sinval-hasmessages, and
lazy-vxid patches applied (the first two are committed now), it's now
much easier to run headlong into this bottleneck.  Prior to those
patches, for this to be an issue, you would need to batch your queries
together in big groups to avoid getting whacked by the lock manager
and/or sinval overhead first.  With those problems and the recently
discovered bottleneck in glibc's random() implementation fixed, good
old pgbench -S is enough to hit this problem if you have enough
clients and enough cores.  And it turns out that the word "collapse"
is not an exaggeration.  On a 64-core Intel box running RHEL 6.1,
performance ramped up from 24k TPS at 4 clients to 175k TPS at 32
clients and then to 207k TPS at 44 clients.  After that it fell off a
cliff, dropping to 93k TPS at 52 clients and 26k TPS at 64 clients,
consuming truly horrifying amounts of system time in the process.  A
somewhat tedious investigation revealed that the problem is, in fact,
contention on the inode mutex caused by lseek().  Results are much
better with -M prepared (310k TPS at 48 clients, 294k TPS at 64
clients).  All one-minute tests with scale factor 100, fitting inside
8GB of shared_buffers (clearly not enough for serious benchmarking,
but enough to demonstrate this issue).

It would be nice if the Linux guys would fix this problem for us, but
I'm not sure whether they will.  For those who may be curious, the
problem is in generic_file_llseek() in fs/read-write.c.  On a platform
with 8-byte atomic reads, it seems like it ought to be very possible
to read inode->i_size without taking a spinlock.  A little Googling
around suggests that some patches along these lines have been proposed
and - for reasons that I don't fully understand - rejected.  That now
seems unfortunate.  Barring a kernel-level fix, we could try to
implement our own cache to work around this problem.  However, any
such cache would need to be darn cheap to check and update (since we
can't assume that relation extension is an infrequent event) and must
somehow having the same sort of mutex contention that's killing the
kernel in this workload.

5. With all of the above problems fixed or worked around, the authors
write, "PostgreSQL's overall scalability is primarily limited by
contention for the spinlock protecting t

Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread David E. Wheeler
On Aug 3, 2011, at 11:17 AM, Peter Eisentraut wrote:

> Note that what you are using there is a GCC frontend with a LLVM
> backend.  (So I suppose you would get mostly GCC warnings.)  Clang is a
> new/different compiler frontend using the LLVM backend.

Yeah, not sure whether or not to tweak the Makefile to use Clang. I guess not?

David



Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-03 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of mié ago 03 13:12:38 -0400 2011:

> ... ah, maybe what we could do is have gram.y create a ColumnDef in the
> new production, and stick that in cmd->def instead of the list of
> constraints.  So parse_utilcmd would have to know that if that node
> IsA(ColumnDef) then it needs to deal with column constraints.  It seems
> a bit cleaner overall, though it's still wart-ish.

Yes, this works, thanks for the suggestion.  Attached.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


alter_add-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Eisentraut
On ons, 2011-08-03 at 10:33 -0700, David E. Wheeler wrote:
> I haven't been paying attention to warnings, but FWIW 9.1beta3 has
> been building with LLVM by default with Xcode 4. Both on Snow Leopard
> and on Lion I saw something like this:
> 
> try=# select version();
> 
> version   
> 
> -
>  PostgreSQL 9.1beta3 on x86_64-apple-darwin11.0.0, compiled by
> i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc.
> build 5658) (LLVM build 2335.15.00), 64-bit
> (1 row)
> 
> Which is just awesome. Things seem to work great.

Note that what you are using there is a GCC frontend with a LLVM
backend.  (So I suppose you would get mostly GCC warnings.)  Clang is a
new/different compiler frontend using the LLVM backend.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Eisentraut
On ons, 2011-08-03 at 10:25 +0100, Peter Geoghegan wrote:
> Attached patch removes the tautologolical part of an evaluated
> expression, fixing the problem flagged by this quite valid warning.

I think these warnings are completely bogus and should not be worked
around.  Even in the most trivial case of

{
unsigned int foo;

...

if (foo < 0 && ...)
...
}

I would not want to remove the check, because as the code gets moved
around, refactored, reused, whatever, the unsigned int might change into
a signed int, and then you're hosed.  It's fine for -Wextra, but not for
the -Wall level.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread David E. Wheeler
On Aug 3, 2011, at 1:47 AM, Peter Geoghegan wrote:

> So, there are 4 remaining warnings.

I haven't been paying attention to warnings, but FWIW 9.1beta3 has been 
building with LLVM by default with Xcode 4. Both on Snow Leopard and on Lion I 
saw something like this:

try=# select version();

   version  
 
-
 PostgreSQL 9.1beta3 on x86_64-apple-darwin11.0.0, compiled by 
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) 
(LLVM build 2335.15.00), 64-bit
(1 row)

Which is just awesome. Things seem to work great.

Best,

David



Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 12:27 PM, Florian Pflug  wrote:
> Going down that road opens the door to a *lot* of subtle semantic
> differences between currently equivalent queries. For example,
>
>  UPDATE T SET f=f, a=1
>
> would behave differently then
>
>  UPDATE T SET a=1
>
> because in the first case the new row would depend on the old row's
> value of "f", while in the second case it doesn't.

True.  But I'm not really bothered by that.  If the user gets an error
message that says:

ERROR: updated column "f" has already been modified by a BEFORE trigger

...the user will realize the error of their ways.

>> There's no way to get the
>> same result as if you'd done either one of them first, because they
>> are inextricably intertwined.
>>
>> In practice, my hand-wavy reference to "reconciling the updates" is a
>> problem because of the way the trigger interface works.  It feels
>> pretty impossible to decide that we're going to do the update, but
>> with some other random values we dredged up from some other update
>> replacing some of the ones the user explicitly handed to us.  But if
>> the trigger could return an indication of which column values it
>> wished to override, then it seems to me that we could piece together a
>> reasonable set of semantics.  It's not exactly clear how to make that
>> work, though.
>
> I dunno, that all still feels awfully complex. As you said yourself,
> this case is quite similar to a serialization anomaly. Taking that
> correspondence further, that reconciliation of updates is pretty much
> what the EPQ machinery does in READ COMMITTED mode. Now, we ourselves
> have warned users in the past to *not* use READ COMMITTED mode if they
> do complex updates (e.g., use UPDATE ... FROM ...), because the behaviour
> of that reconciliation machinery in the present of concurrent updates
> is extremely hard to predict. I thus don't believe that it's a good idea
> to introduce similarly complex behaviour in other parts of the system -
> and particularly not if you cannot disable it by switching to another
> isolation level.
>
> Simply throwing an error, on the other hand, makes the behaviour simple
> to understand and explain.

True.  I'm coming around to liking that behavior better than I did on
first hearing, but I'm still not crazy about it, because as an app
developer I would really like to have at least the unproblematic cases
actually work.  Throwing an error at least makes it clear that you've
done something which isn't supported, and that's probably an
improvement over the current, somewhat-baffling behavior.  However,
it's not even 25% as nice as having it actually work as intended.
That's why, even if we can't make all the cases work sanely, I'd be a
lot more enthusiastic about it if we could find a way to make at least
some of them work sanely.  The mind-bending cases are unlikely to be
the ones people do on purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-03 Thread Alvaro Herrera
Excerpts from Dean Rasheed's message of mié ago 03 03:11:32 -0400 2011:
> On 3 August 2011 04:40, Alvaro Herrera  wrote:
> > Excerpts from Dean Rasheed's message of sáb jul 30 18:40:46 -0400 2011:
> >
> >> Looks pretty good to me (not too dirty). I suppose given that the
> >> parser transforms AT_ColumnConstraint into one of the existing command
> >> subtypes, you could just have gram.y emit an AT_AddConstraint with the
> >> ColumnDef attached, to save adding a new subtype, but there's probably
> >> not much difference.
> >
> > Thanks.  I've done the other changes you suggested, but I don't see that
> > it's desirable to have gram.y emit AT_AddConstraint directly.  It seems
> > cleaner to be able to turn a NOT NULL constraint into AT_SetNotNull
> > in parse_utilcmd instead.
> 
> I wasn't proposing getting rid of that bit in parse_utilcmd. I just
> meant move the block that calls transformColumnConstraints() to the
> AT_AddConstraint case in transformAlterTableStmt(). So it would test
> if it has a ColumnDef attached, and either process a table constraint
> or a set of column constraints. That would avoid the need for a new
> command subtype, and so the changes to tablecmds.c would not be
> needed. I think it would also avoid the need to add colname to the
> Constraint struct, so none of the parsenodes.h changes would be needed
> either.

Oh, I see.  Well, the problem is precisely that we don't have a
ColumnDef at that point.

... ah, maybe what we could do is have gram.y create a ColumnDef in the
new production, and stick that in cmd->def instead of the list of
constraints.  So parse_utilcmd would have to know that if that node
IsA(ColumnDef) then it needs to deal with column constraints.  It seems
a bit cleaner overall, though it's still wart-ish.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Kevin Grittner
Florian Pflug  wrote:
 
> To me, it still seems conceptionally cleaner to just decree that a
> row must not be modified while BEFORE triggers are running,
> period.
> 
> This, BTW, also matches what Oracle does, only on a per-row
> instead of per-table basis. Oracle AFAIR simply forbids touching
> of the table a BEFORE trigger is attached to from within that
> trigger. (They even forbid SELECTS, which is presumably because
> they don't have an equivalent of our per-row command id, i.e.
> cannot ensure that such a SELECT sees the state the table was in
> at the beginning of the statement)
 
It appears this was in flight while I was composing my last post. 
This seems pretty strict, but given the trade-offs, perhaps it is
worth it.  I can live with either solution, myself.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 12:19 PM, Tom Lane  wrote:
> Dimitri Fontaine  writes:
>> Tom Lane  writes:
>>> Anyone have an opinion about that?
>
>> I still have this application where PREPARE takes between 50ms and 300ms
>> and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
>> quite easily.  (Yes the database fits in RAM, and yes when that's no
>> longer the case we just upgrade the hardware)
>
>> What does your proposal mean for such a use case?
>
> Well, the policy for when to replan or not remains to be worked out in
> detail, but what is likely to happen for such cases is that we'll waste
> a few planning cycles before determining that there's no benefit in a
> custom plan.  So, using the worst-case ends of your ranges above and
> assuming that "a few" means "10", we'd go from 300 + 5 * 1 = 50300
> ms to execute the query 1 times, to 10 * 300 + 5 * 1 = 53000 ms.

A little OT here, but (as I think Simon said elsewhere) I think we
really ought to be considering the table statistics when deciding
whether or not to replan.  It seems to me that the overwhelmingly
common case where this is going to come up is when (some subset of)
the MCVs require a different plan than run-of-the-mill values.  It
would be nice to somehow work that out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FOR KEY LOCK foreign keys

2011-08-03 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié ago 03 12:14:15 -0400 2011:
> On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera
>  wrote:
> > One thing I have not addressed is Noah's idea about creating a new lock
> > mode, KEY UPDATE, that would let us solve the initial problem that this
> > patch set to resolve in the first place.  I am not clear on exactly how
> > that is to be implemented, because currently heap_update and heap_delete
> > do not grab any kind of lock but instead do their own ad-hoc waiting.  I
> > think that might need to be reshuffled a bit, to which I haven't gotten
> > yet, and is a radical enough idea that I would like it to be discussed
> > by the hackers community at large before setting sail on developing it.
> > In the meantime, this patch does improve the current situation quite a
> > lot.
> 
> I haven't looked at the patch yet, but do you have a pointer to Noah's
> proposal?  And/or a description of how it differs from what you
> implemented here?

Yes, see his review email here:
  
http://archives.postgresql.org/message-id/20110211071322.gb26...@tornado.leadboat.com

It's long, but search for the part where he talks about "KEY UPDATE".
The way my patch works is explained by Noah there.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Kevin Grittner
Robert Haas  wrote:
> Florian Pflug  wrote:
 
>> Here's a step-by-step description of what happens in the case of
>> two UPDATES, assuming that we don't ignore any updated and throw
>> no error.
>>
>> 1) UPDATE T SET ... WHERE ID=1
>> 2) Row with ID=1 is found & locked
>> 3) BEFORE UPDATE triggers are fired, with OLD containing the
>>original row's contents and NEW the values specified in (1)
>> 4) UPDATE T SET ... WHERE ID=1, executed from within one of the
>>triggers
>> 5) BEFORE UPDATE triggers are fired again, with OLD containing
>>the original row's contents and NEW the value specified in
>>(4).
>> 6) The row is updated with the values specified in (4), possibly
>>changed by the triggers in (5)
>> 7) The row is updated with the values specified in (2), possibly
>>changed by the triggers in (3).
 
>> The different between Kevin's original patch and my suggestion
>> is, BTW, that he made step (7) through an error while I suggested
>> the error to be thrown already at (4).
> 
> I think Kevin's proposal is better, because AFAICS if the BEFORE
> UPDATE trigger returns NULL then we haven't got a problem; and we
> can't know that until we get to step 7.
 
Robert makes a good point -- at step 4 we can't know whether the
BEFORE trigger will return an unmodified record, a modified record,
or NULL.  Without some way to deal with that, an throwing the error
earlier seems like a non-starter.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Florian Pflug
On Aug3, 2011, at 17:57 , Robert Haas wrote:
> On Wed, Aug 3, 2011 at 4:50 AM, Florian Pflug  wrote:
>> The different between Kevin's original patch and my suggestion is, BTW,
>> that he made step (7) through an error while I suggested the error to
>> be thrown already at (4).
> 
> I think Kevin's proposal is better, because AFAICS if the BEFORE
> UPDATE trigger returns NULL then we haven't got a problem; and we
> can't know that until we get to step 7.

Hm, not sure if I agree. A BEFORE trigger might decide to block an update,
and then take some action based on the assumption that the row is actually
unmodified (i.e., identical to the OLD value observed by the trigger).

To me, it still seems conceptionally cleaner to just decree that a row
must not be modified while BEFORE triggers are running, period.

This, BTW, also matches what Oracle does, only on a per-row instead of
per-table basis. Oracle AFAIR simply forbids touching of the table a BEFORE
trigger is attached to from within that trigger. (They even forbid SELECTS,
which is presumably because they don't have an equivalent of our per-row
command id, i.e. cannot ensure that such a SELECT sees the state the table
was in at the beginning of the statement)

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] Odd VACUUM behavior when it is expected to truncate last empty pages

2011-08-03 Thread Pavan Deolasee
(moving this to hackers since I suspect we got an issue to fix here)

On Wed, Aug 3, 2011 at 6:35 AM, Sergey Konoplev  wrote:
> Hi all,
>
> I have PostgreSQL 9.0.3 installed on my Gentoo Linux box. The
> configuration is default. There is no any activity in the database but
> the described below.
>
> What I am trying to achieve is the effect described in this article
> http://blog.endpoint.com/2010/09/reducing-bloat-without-locking.html.
> In short I am updating last pages of a table to move the tuples to the
> earlier pages to make VACUUM able to truncate the empty tail. However
> I faced a strange VACUUM behavior. So the situation is:
>
> 1. do some UPDATEs on the table so it has several last pages free,
> 2. perform VACUUM of this table the 1st time, no tail pages will be
> truncated (why?),
> 3. perform VACUUM the 2nd time straight after the 1st one and it will
> truncate the tail pages (why this time?).
>

There is a check to truncate only if minimum 1000 or relpages/16 pages
(compile time constants) can be truncated and attempt truncation only
if thats true, otherwise we assume that the cost of attempting
truncation is not worth the cost of pages salvaged. With that logic,
we should have never truncated just 3 pages in a relation with 3000+
pages. But I can see that happening because of visibility maps. So if
we stop scanning beyond a certain percentage of pages, we might be
fooled into believing that the rest of the pages can possibly be
truncated and then do a hard check to find that out.

The first vacuum probably scans even the last few pages because you
just updated the tuples of those pages and then updates the visibility
map for those pages. The second vacuum then stops much before because
the visibility map tells us that all remaining pages are visible (and
thus set nonempty_pages to a lower number) and so the check I
mentioned earlier succeeds and we attempt the truncation. Now, this is
just a theory and a reproducible case will help to confirm this.

The only problem, other than a surprising behavior that you noted,
that I see with this approach is that we might repeatedly try to
truncate a relation which in fact does not have anything to truncate.
The worst  thing is we might unnecessarily take an exclusive lock on
the table.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-03 Thread Heikki Linnakangas

On 03.08.2011 11:18, Alexander Korotkov wrote:

I found that in previous version of patch I missed PageSetLSN
and PageSetTLI, but huge amount of WAL is still here. Also I found that huge
amount of WAL appears only with -O2. With -O0 amount of WAL is ok, but
messages "FATAL:  xlog flush request BFF11148/809A600 is not satisfied ---
flushed only to 44/9C518750" appears. Seems that there is some totally wrong
use of WAL if even optimization level does matter...


Try this:

diff --git a/src/backend/access/gist/gistbuild.c 
b/src/backend/access/gist/gistbuild.c

index 5099330..5a441e0 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -478,7 +478,7 @@ bufferingbuildinsert(GISTInsertState *state,
/* Write the WAL record */
if (RelationNeedsWAL(state->r))
{
-   gistXLogUpdate(state->r->rd_node, buffer, oldoffnum, 
noldoffnum,
+			recptr = gistXLogUpdate(state->r->rd_node, buffer, oldoffnum, 
noldoffnum,


itup, ntup, InvalidBuffer);
PageSetLSN(page, recptr);
PageSetTLI(page, ThisTimeLineID);


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Florian Pflug
On Aug3, 2011, at 17:55 , Robert Haas wrote:
> On that note, I think in some ways the problems we're hitting here are
> very much like serialization anomalies.

Yeah, I had the same feeling of familiarity ;-)

> If the user updates a tuple
> based on its PK and sets some non-PK field to a constant, and while
> we're doing that, a BEFORE trigger updates any field in the tuple
> other than the PK, then in theory it seems we ought to be able to
> reconcile the updates.  It feels like the result ought to be the same
> as if we'd simply run the BEFORE-trigger update to completion, and
> then run the main update.  However, if the BEFORE-trigger modifies any
> columns that the main update examined while constructing its value for
> NEW, then the updates can't be serialized.

Going down that road opens the door to a *lot* of subtle semantic
differences between currently equivalent queries. For example,

  UPDATE T SET f=f, a=1

would behave differently then

  UPDATE T SET a=1

because in the first case the new row would depend on the old row's
value of "f", while in the second case it doesn't.

> There's no way to get the
> same result as if you'd done either one of them first, because they
> are inextricably intertwined.
> 
> In practice, my hand-wavy reference to "reconciling the updates" is a
> problem because of the way the trigger interface works.  It feels
> pretty impossible to decide that we're going to do the update, but
> with some other random values we dredged up from some other update
> replacing some of the ones the user explicitly handed to us.  But if
> the trigger could return an indication of which column values it
> wished to override, then it seems to me that we could piece together a
> reasonable set of semantics.  It's not exactly clear how to make that
> work, though.

I dunno, that all still feels awfully complex. As you said yourself,
this case is quite similar to a serialization anomaly. Taking that
correspondence further, that reconciliation of updates is pretty much
what the EPQ machinery does in READ COMMITTED mode. Now, we ourselves
have warned users in the past to *not* use READ COMMITTED mode if they
do complex updates (e.g., use UPDATE ... FROM ...), because the behaviour
of that reconciliation machinery in the present of concurrent updates
is extremely hard to predict. I thus don't believe that it's a good idea
to introduce similarly complex behaviour in other parts of the system -
and particularly not if you cannot disable it by switching to another
isolation level.

Simply throwing an error, on the other hand, makes the behaviour simple
to understand and explain.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Dimitri Fontaine  writes:
> Tom Lane  writes:
>> Anyone have an opinion about that?

> I still have this application where PREPARE takes between 50ms and 300ms
> and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
> quite easily.  (Yes the database fits in RAM, and yes when that's no
> longer the case we just upgrade the hardware)

> What does your proposal mean for such a use case?

Well, the policy for when to replan or not remains to be worked out in
detail, but what is likely to happen for such cases is that we'll waste
a few planning cycles before determining that there's no benefit in a
custom plan.  So, using the worst-case ends of your ranges above and
assuming that "a few" means "10", we'd go from 300 + 5 * 1 = 50300
ms to execute the query 1 times, to 10 * 300 + 5 * 1 = 53000 ms.
So yes, it'd get a little worse for that use-case.  But you have to
weigh that against the likelihood that other use-cases will get better.
If our requirement for a transient-plan mechanism is that no individual
case can ever be worse than before, then we might as well abandon the
entire project right now, because the only way to meet that requirement
is to change nothing.

Of course we could address the worst cases by providing some mechanism
to tell the plancache code "always use a generic plan for this query"
or "always use a custom plan".  I'm not entirely thrilled with that,
because it's effectively a planner hint and has got the same problems
as all planner hints, namely that users are likely to get it wrong.
But it would be relatively painless to supply such a hint at the SPI
level, which is why I asked whether we should.  It'd be much harder to
do something equivalent at higher levels, which is why I'm not that
eager to do it for SPI.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FOR KEY LOCK foreign keys

2011-08-03 Thread Robert Haas
On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera
 wrote:
> One thing I have not addressed is Noah's idea about creating a new lock
> mode, KEY UPDATE, that would let us solve the initial problem that this
> patch set to resolve in the first place.  I am not clear on exactly how
> that is to be implemented, because currently heap_update and heap_delete
> do not grab any kind of lock but instead do their own ad-hoc waiting.  I
> think that might need to be reshuffled a bit, to which I haven't gotten
> yet, and is a radical enough idea that I would like it to be discussed
> by the hackers community at large before setting sail on developing it.
> In the meantime, this patch does improve the current situation quite a
> lot.

I haven't looked at the patch yet, but do you have a pointer to Noah's
proposal?  And/or a description of how it differs from what you
implemented here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incremental checkopints

2011-08-03 Thread Robert Haas
2011/7/29 Greg Smith :
> 1) Postponing writes as long as possible always improves the resulting
> throughput of those writes.  Any incremental checkpoint approach will detune
> throughput by some amount.  If you make writes go out more often, they will
> be less efficient; that's just how things work if you benchmark anything
> that allows write combining.  Any incremental checkpoint approach is likely
> to improve latency in some cases if it works well, while decreasing
> throughput in most cases.

Agreed.  I came to the same conclusion a while back and then got
depressed.  That might mean we need a parameter to control the
behavior, unless we can find a change where the throughput drop is
sufficiently small that we don't really care, or make the optimization
apply only in cases where we determine that the latency problem will
be so severe that we'll certainly be willing to accept a drop in
throughput to avoid it.

> 2) The incremental checkpoint approach used by other databases, such as the
> MySQL implementation, works by tracking what transaction IDs were associated
> with a buffer update.  The current way PostgreSQL saves buffer sync
> information for the checkpoint to process things doesn't store enough
> information to do that.  As you say, the main price there is some additional
> memory.

I think what we'd need to track is the LSN that first dirtied the page
(as opposed to the current field, which tracks the LSN that most
recently wrote the page).  If we write and flush all pages whose
first-dirtied LSN precedes some cutoff point, then we ought to be able
to advance the redo pointer to that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 4:50 AM, Florian Pflug  wrote:
> On Aug3, 2011, at 04:54 , Robert Haas wrote:
>> On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
>>  wrote:
> Would you feel comfortable with a patch which threw an error on
> the DELETE case, as it does on the UPDATE case?

 Yes, though with a little additional twist. The twist being that
 I'd like the error to be thrown earlier, at the time of the
 offending UPDATE or DELETE, not later, when the original UPDATE or
 DELETE which caused the BEFORE trigger invocation stumbles over
 the updated row. It not only seems more logical that way, but it
 also makes it possible for the trigger to catch the error, and
 react accordingly.
>>>
>>> I hadn't thought of that.  It does seem better in every way except
>>> for how much work it takes to do it and how much testing it needs to
>>> have confidence in it.  Definitely not 9.1 material.
>>
>> I'm not sure I understand the justification for throwing an error.
>> Updating a row twice is not an error under any other circumstances;
>> nor is deleting a row you've updated.  Why should it be here?
>
> Because updating or deleting a row from within a BEFORE trigger fired
> upon updating or deleting that very row causes two intermingled
> UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.
>
> Here's a step-by-step description of what happens in the case of two
> UPDATES, assuming that we don't ignore any updated and throw no error.
>
> 1) UPDATE T SET ... WHERE ID=1
> 2) Row with ID=1 is found & locked
> 3) BEFORE UPDATE triggers are fired, with OLD containing the original
>   row's contents and NEW the values specified in (1)
> 4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
> 5) BEFORE UPDATE triggers are fired again, with OLD containing the
>   original row's contents and NEW the value specified in (4).
> 6) The row is updated with the values specified in (4), possibly changed
>   by the triggers in (5)
> 7) The row is updated with the values specified in (2), possibly changed
>   by the triggers in (3).
>
> Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
> even looking at the row that UPDATE produced. If, for example, both UPDATES
> do "counter = counter + 1", we'd end up with the counter incremented by
> 1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
> the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
> outer UPDATE in (7), even though the original WHERE condition from (1)
> no longer matches the row.
>
> We could attempt to fix that by using the EvalPlanQual machinery to
> re-check the WHERE clause and re-compute the new row in (7). However,
> EvalPlanQual introduces a lot of conceptional complexity, and can lead
> to very surprising results for more complex UPDATES. (There's that whole
> self-join problem with EvalPlanQual, for example)
>
> Also, even if we did use EvalPlanQual, we'd still have executed the outer
> BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
> the row the we actually update in (7). Doing that seems rather hazardous
> too - who knows what the BEFORE trigger has used the values for.
>
> The different between Kevin's original patch and my suggestion is, BTW,
> that he made step (7) through an error while I suggested the error to
> be thrown already at (4).

I think Kevin's proposal is better, because AFAICS if the BEFORE
UPDATE trigger returns NULL then we haven't got a problem; and we
can't know that until we get to step 7.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 10:48 AM, Kevin Grittner
 wrote:
> As have many other data mangling bugs which we fix in minor
> releases.  Our point here is that it's never been like this in any
> product that the Wisconsin Courts has used for triggers, and never
> will be.

I don't believe this is very similar at all to the sorts of bugs we
fix in minor releases, but let's leave that aside for the moment since
it seems that we're in violent agreement about which release we're
targeting.  With respect to the second part of your comment, it might
be useful to describe how this works in some of those other products.

>> I'm not sure I understand the justification for throwing an error.
>
> Nobody has suggested sane semantics for doing otherwise.  There is
> the pipe dream of what application programmers here would like,
> described earlier in the thread.  I just don't see that happening,
> and I'm not sure it addresses all of Florian's concerns anyway.  You
> had a suggestion for how to salvage the update situation, but it was
> pretty hand-wavy, and I find it very hard to reconcile to some of
> the issues raised by Florian.  Maybe someone can propose sane
> semantics which covers all the objections, and maybe that would even
> be possible to implement; but right now Florian's approach seems
> like the most solid proposal yet put forward.

It probably is, but I'm not sure we've done enough thinking about it
to be certain that it's the right way forward.

On that note, I think in some ways the problems we're hitting here are
very much like serialization anomalies.  If the user updates a tuple
based on its PK and sets some non-PK field to a constant, and while
we're doing that, a BEFORE trigger updates any field in the tuple
other than the PK, then in theory it seems we ought to be able to
reconcile the updates.  It feels like the result ought to be the same
as if we'd simply run the BEFORE-trigger update to completion, and
then run the main update.  However, if the BEFORE-trigger modifies any
columns that the main update examined while constructing its value for
NEW, then the updates can't be serialized.  There's no way to get the
same result as if you'd done either one of them first, because they
are inextricably intertwined.

In practice, my hand-wavy reference to "reconciling the updates" is a
problem because of the way the trigger interface works.  It feels
pretty impossible to decide that we're going to do the update, but
with some other random values we dredged up from some other update
replacing some of the ones the user explicitly handed to us.  But if
the trigger could return an indication of which column values it
wished to override, then it seems to me that we could piece together a
reasonable set of semantics.  It's not exactly clear how to make that
work, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 15:29, Tom Lane  wrote:
> No, this is not an improvement at all.  The point of the code is that we
> are about to use the enum value as an integer array subscript, and we
> want to make sure it is within the array bounds.  Tying that logic to
> some member of the enum is not a readability or safety improvement.
> We aren't trusting the caller to pass a valid enum value, and likewise
> this code doesn't particularly want to trust the enum definition to be
> anything in particular.

I would think that if someone were to have a reason to change the
explicit value set for the identifier PGRES_EMPTY_QUERY, they would
carefully consider the ramifications of doing so. It's far more likely
they'd just append new values to the end of the enum though. This is
why I don't consider that what I've proposed exposes us to any
additional risk.

> There is another point here, though, which is that if we're not sure
> whether the compiler considers ExecStatusType to be signed or unsigned,
> then we have no idea what the test "status < PGRES_EMPTY_QUERY" even
> means.

I'm sorry, but I don't know what you mean by this.

> So I think the most reasonable fix is probably
>
>        if ((unsigned int) status >= sizeof pgresStatus / sizeof 
> pgresStatus[0])
>
> which is sufficient to cover both directions, since if status is passed
> as -1 then it will convert to a large unsigned value.  It's also a
> natural expression of what we really want, ie, that the integer
> equivalent of the enum value is in range.

I'm not convinced that that is an improvement to rely on the
conversion doing so, but it's not as if I feel very strongly about it.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread Tom Lane
daveg  writes:
> We have installed the patch and have encountered the error as usual.
> However there is no additional output from the patch. I'm speculating
> that the pg_class scan in ScanPgRelationDetailed() fails to return
> tuples somehow.

Evidently not, if it's not logging anything, but now the question is
why.  One possibility is that for some reason RelationGetNumberOfBlocks
is persistently lying about the file size.  (We've seen kernel bugs
before that resulted in transiently wrong values, so this isn't totally
beyond the realm of possibility.)  Please try the attached patch, which
extends the previous one to add a summary line including the number of
blocks physically scanned by the seqscan.

regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b60406dffa7b5d278697ab5ad6cef6a3d8..589513a2d35987b2202ce1162b326db2d358b6fa 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
***
*** 32,37 
--- 32,38 
  
  #include "access/genam.h"
  #include "access/reloptions.h"
+ #include "access/relscan.h"
  #include "access/sysattr.h"
  #include "access/transam.h"
  #include "access/xact.h"
***
*** 64,69 
--- 65,71 
  #include "optimizer/var.h"
  #include "rewrite/rewriteDefine.h"
  #include "storage/fd.h"
+ #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  #include "utils/array.h"
*** ScanPgRelation(Oid targetRelId, bool ind
*** 310,315 
--- 312,388 
  }
  
  /*
+  *		ScanPgRelationDetailed
+  *
+  *		Try to figure out why we failed to locate row for relation.
+  */
+ static HeapTuple
+ ScanPgRelationDetailed(Oid targetRelId)
+ {
+ 	HeapTuple	pg_class_tuple;
+ 	Relation	pg_class_desc;
+ 	SysScanDesc pg_class_scan;
+ 	ScanKeyData key[1];
+ 	int			count = 0;
+ 
+ 	/*
+ 	 * form a scan key
+ 	 */
+ 	ScanKeyInit(&key[0],
+ ObjectIdAttributeNumber,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(targetRelId));
+ 
+ 	/*
+ 	 * Open pg_class and fetch tuples, forcing heap scan and disabling
+ 	 * visibility checks.
+ 	 */
+ 	pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
+ 	pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
+ 	   false,
+ 	   SnapshotAny,
+ 	   1, key);
+ 
+ 	while (HeapTupleIsValid((pg_class_tuple = systable_getnext(pg_class_scan
+ 	{
+ 		Buffer		buf = pg_class_scan->scan->rs_cbuf;
+ 		bool		valid;
+ 
+ 		count++;
+ 
+ 		/* need buffer lock to call HeapTupleSatisfiesVisibility */
+ 		LockBuffer(buf, BUFFER_LOCK_SHARE);
+ 		valid = HeapTupleSatisfiesVisibility(pg_class_tuple,
+ 			 SnapshotNow,
+ 			 buf);
+ 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ 
+ 		elog(LOG, "searching %u for pg_class tuple for index %u: found ctid (%u,%u), xmin %u, xmax %u, flags 0x%4x 0x%4x, valid %d",
+ 			 pg_class_desc->rd_node.relNode,
+ 			 targetRelId,
+ 			 ItemPointerGetBlockNumber(&(pg_class_tuple->t_self)),
+ 			 ItemPointerGetOffsetNumber(&(pg_class_tuple->t_self)),
+ 			 HeapTupleHeaderGetXmin(pg_class_tuple->t_data),
+ 			 HeapTupleHeaderGetXmax(pg_class_tuple->t_data),
+ 			 pg_class_tuple->t_data->t_infomask,
+ 			 pg_class_tuple->t_data->t_infomask2,
+ 			 valid);
+ 	}
+ 
+ 	elog(LOG, "ScanPgRelationDetailed: found %d tuples with OID %u in %u blocks of filenode %u",
+ 		 count,
+ 		 targetRelId,
+ 		 pg_class_scan->scan->rs_nblocks,
+ 		 pg_class_desc->rd_node.relNode);
+ 
+ 	/* all done */
+ 	systable_endscan(pg_class_scan);
+ 	heap_close(pg_class_desc, AccessShareLock);
+ 
+ 	return NULL;
+ }
+ 
+ /*
   *		AllocateRelationDesc
   *
   *		This is used to allocate memory for a new relation descriptor
*** RelationReloadIndexInfo(Relation relatio
*** 1737,1744 
  	indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
  	pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
  	if (!HeapTupleIsValid(pg_class_tuple))
! 		elog(ERROR, "could not find pg_class tuple for index %u",
! 			 RelationGetRelid(relation));
  	relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
  	memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
  	/* Reload reloptions in case they changed */
--- 1810,1824 
  	indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
  	pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
  	if (!HeapTupleIsValid(pg_class_tuple))
! 	{
! 		pg_class_tuple = ScanPgRelationDetailed(RelationGetRelid(relation));
! 		if (!HeapTupleIsValid(pg_class_tuple))
! 			elog(ERROR, "could not find pg_class tuple for index %u",
!  RelationGetRelid(relation));
! 		else
! 			elog(LOG, "could not find pg_class tuple for index %u, but succeeded on second try",
!  RelationGetRelid(relation));
! 	}
  	relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
  	memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
  	/* Reload reloptions in case they changed */

-- 
Sent via 

Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Kevin Grittner
Robert Haas  wrote:
> Kevin Grittner  wrote:
 
 Would you feel comfortable with a patch which threw an error on
 the DELETE case, as it does on the UPDATE case?
>>>
>>> Yes, though with a little additional twist. The twist being that
>>> I'd like the error to be thrown earlier, at the time of the
>>> offending UPDATE or DELETE, not later, when the original UPDATE
>>> or DELETE which caused the BEFORE trigger invocation stumbles
>>> over the updated row. It not only seems more logical that way,
>>> but it also makes it possible for the trigger to catch the
>>> error, and react accordingly.
>>
>> I hadn't thought of that.  It does seem better in every way
>> except for how much work it takes to do it and how much testing
>> it needs to have confidence in it.  Definitely not 9.1 material.
> 
> IMHO, none of this is 9.1 material.
 
Nobody is currently arguing for including anything to fix this in
9.1.  Perhaps I should have been more explicit that in agreeing with
Florian, I'm giving up on the idea of any PostgreSQL release
attempting to do anything about this before 9.2.
 
That said, this is a data corrupting bug in the view of management
here, based on reports from testers and my explanation of why they
saw that behavior.  They are aware that it can be worked around, but
that's not acceptable if there's no reliable way to find all
occurrences of patterns that hit this bug.  To keep this effort
alive here, I am using my quick hack for 9.0 and 9.1.  Deletes will
behave as they expect, and updates won't quietly discard part of the
work of a transaction -- an error will be thrown in that situation.
 
> It's been like this forever
 
As have many other data mangling bugs which we fix in minor
releases.  Our point here is that it's never been like this in any
product that the Wisconsin Courts has used for triggers, and never
will be.
 
> People who would otherwise be  tempted to write their applications
> this way will end up not doing so precisely because it currently
> does not work.
 
I can attest to that.
 
> So I think we should focus on getting the right semantics here,
> rather than trying to minimize the scope of the change.
 
Agreed.  I thought that's what we were talking about.
 
> I'm not sure I understand the justification for throwing an error.
 
Nobody has suggested sane semantics for doing otherwise.  There is
the pipe dream of what application programmers here would like,
described earlier in the thread.  I just don't see that happening,
and I'm not sure it addresses all of Florian's concerns anyway.  You
had a suggestion for how to salvage the update situation, but it was
pretty hand-wavy, and I find it very hard to reconcile to some of
the issues raised by Florian.  Maybe someone can propose sane
semantics which covers all the objections, and maybe that would even
be possible to implement; but right now Florian's approach seems
like the most solid proposal yet put forward.
 
> Updating a row twice is not an error under any other
> circumstances; nor is deleting a row you've updated.  Why should
> it be here?
 
As Florian pointed out, it's not exactly a matter of doing those
things.  If you have a proposal which addresses the issues raised
earlier in the thread, please share.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Tom Lane
Peter Geoghegan  writes:
> On 3 August 2011 12:19, Heikki Linnakangas
>  wrote:
>> Right, but the purpose of that check is to defend from programmer error. If
>> the programmer screws up and calls "PQresStatus(-1)", we want to give an
>> error, not crash. If you assume that the programmer will only pass a valid
>> enum constant as parameter, then you might as well remove the if-statement
>> altogether. I don't think that would be an improvement.

> Ahh. I failed to consider the intent of the code.

> Attached patch has a better solution than casting though - we simply
> use an enum literal. The fact that PGRES_EMPTY_QUERY is the first
> value in the enum can be safely assumed to be stable, not least
> because we've even already explicitly given it a corresponding value
> of 0.

No, this is not an improvement at all.  The point of the code is that we
are about to use the enum value as an integer array subscript, and we
want to make sure it is within the array bounds.  Tying that logic to
some member of the enum is not a readability or safety improvement.
We aren't trusting the caller to pass a valid enum value, and likewise
this code doesn't particularly want to trust the enum definition to be
anything in particular.

There is another point here, though, which is that if we're not sure
whether the compiler considers ExecStatusType to be signed or unsigned,
then we have no idea what the test "status < PGRES_EMPTY_QUERY" even
means.  So I think the most reasonable fix is probably

if ((unsigned int) status >= sizeof pgresStatus / sizeof pgresStatus[0])

which is sufficient to cover both directions, since if status is passed
as -1 then it will convert to a large unsigned value.  It's also a
natural expression of what we really want, ie, that the integer
equivalent of the enum value is in range.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
It occurred to me that you may be a little surprised that the patch
actually prevents the warning from occurring. If you have any doubts,
I can assure you that it does. Clang differentiates between 0 as an
rvalue, 0 from an enum literal and 0 resulting from evaluating a macro
expression (including a function-like macro with deep nesting).

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench internal contention

2011-08-03 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 2, 2011 at 8:44 PM, Tom Lane  wrote:
>> Maybe.  But if that's the approach we want to use, let's just call it
>> pg_erand48 in the code, and dispense with the alias macros as well as
>> all vestiges of configure support.

> Works for me.  Just to confirm, that means we'd also change GEQO to
> use pg_erand48().

Right, that's what I was thinking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 4:18 AM, Alexander Korotkov  wrote:
> I found that in previous version of patch I missed PageSetLSN
> and PageSetTLI, but huge amount of WAL is still here. Also I found that huge
> amount of WAL appears only with -O2. With -O0 amount of WAL is ok, but
> messages "FATAL:  xlog flush request BFF11148/809A600 is not satisfied ---
> flushed only to 44/9C518750" appears. Seems that there is some totally wrong
> use of WAL if even optimization level does matter...

Try setting wal_debug=true to see what records are getting emitted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Dimitri Fontaine
Tom Lane  writes:
> Anyone have an opinion about that?

I still have this application where PREPARE takes between 50ms and 300ms
and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
quite easily.  (Yes the database fits in RAM, and yes when that's no
longer the case we just upgrade the hardware)

What does your proposal mean for such a use case?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error: could not find pg_class tuple for index 2662

2011-08-03 Thread daveg
On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote:
> daveg  writes:
> > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> >> I think we need to start adding some instrumentation so we can get a
> >> better handle on what's going on in your database.  If I were to send
> >> you a source-code patch for the server that adds some more logging
> >> printout when this happens, would you be willing/able to run a patched
> >> build on your machine?
> 
> > Yes we can run an instrumented server so long as the instrumentation does
> > not interfere with normal operation. However, scheduling downtime to switch
> > binaries is difficult, and generally needs to be happen on a weekend, but
> > sometimes can be expedited. I'll look into that.
> 
> OK, attached is a patch against 9.0 branch that will re-scan pg_class
> after a failure of this sort occurs, and log what it sees in the tuple
> header fields for each tuple for the target index.  This should give us
> some useful information.  It might be worthwhile for you to also log the
> results of
> 
> select relname,pg_relation_filenode(oid) from pg_class
> where relname like 'pg_class%';
> 
> in your script that does VACUUM FULL, just before and after each time it
> vacuums pg_class.  That will help in interpreting the relfilenodes in
> the log output.

We have installed the patch and have encountered the error as usual.
However there is no additional output from the patch. I'm speculating
that the pg_class scan in ScanPgRelationDetailed() fails to return
tuples somehow.


I have also been trying to trace it further by reading the code, but have not
got any solid hypothesis yet. In the absence of any debugging output I've
been trying to deduce the call tree leading to the original failure. So far
it looks like this:

RelationReloadIndexInfo(Relation)
// Relation is 2662 and !rd_isvalid
pg_class_tuple = ScanPgRelation(2662, indexOK=false)  // returns NULL
pg_class_desc = heap_open(1259, ACC_SHARE)
r = relation_open(1259, ACC_SHARE) // locks oid, ensures 
RelationIsValid(r)
r = RelationIdGetRelation(1259)
r = RelationIdCacheLookup(1259)   // assume success
if !rd_isvalid:
RelationClearRelation(r, true)
RelationInitPhysicalAddr(r) // r is pg_class 
relcache

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 12:19, Heikki Linnakangas
 wrote:

> Right, but the purpose of that check is to defend from programmer error. If
> the programmer screws up and calls "PQresStatus(-1)", we want to give an
> error, not crash. If you assume that the programmer will only pass a valid
> enum constant as parameter, then you might as well remove the if-statement
> altogether. I don't think that would be an improvement.

Ahh. I failed to consider the intent of the code.

Attached patch has a better solution than casting though - we simply
use an enum literal. The fact that PGRES_EMPTY_QUERY is the first
value in the enum can be safely assumed to be stable, not least
because we've even already explicitly given it a corresponding value
of 0.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 605d242..b6e6363 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -2386,7 +2386,7 @@ PQresultStatus(const PGresult *res)
 char *
 PQresStatus(ExecStatusType status)
 {
-	if (status < 0 || status >= sizeof pgresStatus / sizeof pgresStatus[0])
+	if (status < PGRES_EMPTY_QUERY || status >= sizeof pgresStatus / sizeof pgresStatus[0])
 		return libpq_gettext("invalid ExecStatusType code");
 	return pgresStatus[status];
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Heikki Linnakangas

On 03.08.2011 14:13, Peter Geoghegan wrote:

On 3 August 2011 11:05, Peter Geoghegan  wrote:

I don't believe that the standard allows for an implementation of
enums as unsigned integers - after all, individual enum literals can
be given corresponding negative integer values.


It actually gives leeway to implement the enum as unsigned int when
the compiler knows that it won't matter, because there are no negative
integer values that correspond to some enum literal. The hint was in
my original warning.  :-)


This warning is only seen because the first enum literal in the enum
is explicitly given the value 0, thus precluding the possibility of
the value being<  0, barring some abuse of the enum.


It's also seen if no explicit values are given, and the compiler opts
to make the representation unsigned. It is not seen if it the value is
-1, for example.

Despite the fact that whether or not the value is unsigned is
implementation defined, I think that the patch is still valid - the
expression is at least logically tautological, even if it isn't
necessarily bitwise tautological, because, as I said, barring some
violation of the enum's contract, it should not be<  0. That's
precisely why the compiler has opted to make it unsigned.


Right, but the purpose of that check is to defend from programmer error. 
If the programmer screws up and calls "PQresStatus(-1)", we want to give 
an error, not crash. If you assume that the programmer will only pass a 
valid enum constant as parameter, then you might as well remove the 
if-statement altogether. I don't think that would be an improvement.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 11:05, Peter Geoghegan  wrote:
> I don't believe that the standard allows for an implementation of
> enums as unsigned integers - after all, individual enum literals can
> be given corresponding negative integer values.

It actually gives leeway to implement the enum as unsigned int when
the compiler knows that it won't matter, because there are no negative
integer values that correspond to some enum literal. The hint was in
my original warning.  :-)

> This warning is only seen because the first enum literal in the enum
> is explicitly given the value 0, thus precluding the possibility of
> the value being < 0, barring some abuse of the enum.

It's also seen if no explicit values are given, and the compiler opts
to make the representation unsigned. It is not seen if it the value is
-1, for example.

Despite the fact that whether or not the value is unsigned is
implementation defined, I think that the patch is still valid - the
expression is at least logically tautological, even if it isn't
necessarily bitwise tautological, because, as I said, barring some
violation of the enum's contract, it should not be < 0. That's
precisely why the compiler has opted to make it unsigned.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Heikki Linnakangas

On 03.08.2011 13:05, Peter Geoghegan wrote:

I don't believe that the standard allows for an implementation of
enums as unsigned integers - after all, individual enum literals can
be given corresponding negative integer values.


C99 says:


Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,110) but 
shall be
capable of representing the values of all the members of the enumeration.


See also:

http://stackoverflow.com/questions/2579230/signedness-of-enum-in-c-c99-c-cx-gnu-c-gnu-c99

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 10:34, Heikki Linnakangas
 wrote:
> The check is only tautological if the compiler implements enums as unsigned
> integers. Whether enums are signed or not is implementation-dependent.
> Perhaps cast status to unsigned or signed explicitly before the checks?

I don't believe that the standard allows for an implementation of
enums as unsigned integers - after all, individual enum literals can
be given corresponding negative integer values.

This warning is only seen because the first enum literal in the enum
is explicitly given the value 0, thus precluding the possibility of
the value being < 0, barring some abuse of the enum.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Heikki Linnakangas

On 03.08.2011 12:25, Peter Geoghegan wrote:

Attached patch removes the tautologolical part of an evaluated
expression, fixing the problem flagged by this quite valid warning.


The check is only tautological if the compiler implements enums as 
unsigned integers. Whether enums are signed or not is 
implementation-dependent. Perhaps cast status to unsigned or signed 
explicitly before the checks?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
Attached patch removes the tautologolical part of an evaluated
expression, fixing the problem flagged by this quite valid warning.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 605d242..51af0d8 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -2386,7 +2386,7 @@ PQresultStatus(const PGresult *res)
 char *
 PQresStatus(ExecStatusType status)
 {
-	if (status < 0 || status >= sizeof pgresStatus / sizeof pgresStatus[0])
+	if (status >= sizeof pgresStatus / sizeof pgresStatus[0])
 		return libpq_gettext("invalid ExecStatusType code");
 	return pgresStatus[status];
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP fix proposal for bug #6123

2011-08-03 Thread Florian Pflug
On Aug3, 2011, at 04:54 , Robert Haas wrote:
> On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
>  wrote:
 Would you feel comfortable with a patch which threw an error on
 the DELETE case, as it does on the UPDATE case?
>>> 
>>> Yes, though with a little additional twist. The twist being that
>>> I'd like the error to be thrown earlier, at the time of the
>>> offending UPDATE or DELETE, not later, when the original UPDATE or
>>> DELETE which caused the BEFORE trigger invocation stumbles over
>>> the updated row. It not only seems more logical that way, but it
>>> also makes it possible for the trigger to catch the error, and
>>> react accordingly.
>> 
>> I hadn't thought of that.  It does seem better in every way except
>> for how much work it takes to do it and how much testing it needs to
>> have confidence in it.  Definitely not 9.1 material.
> 
> I'm not sure I understand the justification for throwing an error.
> Updating a row twice is not an error under any other circumstances;
> nor is deleting a row you've updated.  Why should it be here?

Because updating or deleting a row from within a BEFORE trigger fired
upon updating or deleting that very row causes two intermingled 
UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.

Here's a step-by-step description of what happens in the case of two
UPDATES, assuming that we don't ignore any updated and throw no error.

1) UPDATE T SET ... WHERE ID=1
2) Row with ID=1 is found & locked
3) BEFORE UPDATE triggers are fired, with OLD containing the original
   row's contents and NEW the values specified in (1)
4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
5) BEFORE UPDATE triggers are fired again, with OLD containing the
   original row's contents and NEW the value specified in (4). 
6) The row is updated with the values specified in (4), possibly changed
   by the triggers in (5)
7) The row is updated with the values specified in (2), possibly changed
   by the triggers in (3).

Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
even looking at the row that UPDATE produced. If, for example, both UPDATES
do "counter = counter + 1", we'd end up with the counter incremented by
1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
outer UPDATE in (7), even though the original WHERE condition from (1)
no longer matches the row.

We could attempt to fix that by using the EvalPlanQual machinery to
re-check the WHERE clause and re-compute the new row in (7). However,
EvalPlanQual introduces a lot of conceptional complexity, and can lead
to very surprising results for more complex UPDATES. (There's that whole
self-join problem with EvalPlanQual, for example)

Also, even if we did use EvalPlanQual, we'd still have executed the outer
BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
the row the we actually update in (7). Doing that seems rather hazardous
too - who knows what the BEFORE trigger has used the values for.

The different between Kevin's original patch and my suggestion is, BTW,
that he made step (7) through an error while I suggested the error to
be thrown already at (4).

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-03 Thread Peter Geoghegan
On 3 August 2011 00:52, Peter Geoghegan  wrote:
> Now, the only warning that remains is that same

Correction - there are actually 3 additional warnings like this in repl_gram.c:

/home/peter/build/Release/bin/clang -O2 -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wformat-security -fno-strict-aliasing -fwrapv -I.
-I../../../src/include -D_GNU_SOURCE   -c -o repl_gram.o repl_gram.c
repl_gram.y:106:30: warning: implicit conversion from enumeration type
'enum ReplNodeTag' to different enumeration type 'NodeTag' (aka 'enum
NodeTag') [-Wconversion]
(yyval.node) = (Node *)
makeNode(IdentifySystemCmd);

^~~
../../../src/include/nodes/nodes.h:475:64: note: expanded from:
#define makeNode(_type_)((_type_ *)
newNode(sizeof(_type_),T_##_type_))
 ^
:180:1: note: expanded from:
T_IdentifySystemCmd
^
../../../src/include/nodes/nodes.h:452:19: note: expanded from:
_result->type = (tag); \
  ~  ^~~

I'll take a look into them later.

This tautology warning sounds completely valid:

/home/peter/build/Release/bin/clang -O2 -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wformat-security -fno-strict-aliasing -fwrapv -pthread  -D_REENTRANT
-D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS -fpic -DFRONTEND
-DUNSAFE_STAT_OK -I. -I../../../src/include -D_GNU_SOURCE
-I../../../src/port -I../../../src/port -DSO_MAJOR_VERSION=5  -c -o
fe-exec.o fe-exec.c
fe-exec.c:2408:13: warning: comparison of unsigned enum expression < 0
is always false [-Wtautological-compare]
if (status < 0 || status >= sizeof pgresStatus / sizeof pgresStatus[0])
~~ ^ ~
1 warning generated.

So, there are 4 remaining warnings.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-03 Thread Alexander Korotkov
I found that in previous version of patch I missed PageSetLSN
and PageSetTLI, but huge amount of WAL is still here. Also I found that huge
amount of WAL appears only with -O2. With -O0 amount of WAL is ok, but
messages "FATAL:  xlog flush request BFF11148/809A600 is not satisfied ---
flushed only to 44/9C518750" appears. Seems that there is some totally wrong
use of WAL if even optimization level does matter...

--
With best regards,
Alexander Korotkov.


gist_fast_build-0.9.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-03 Thread Dean Rasheed
On 3 August 2011 04:40, Alvaro Herrera  wrote:
> Excerpts from Dean Rasheed's message of sáb jul 30 18:40:46 -0400 2011:
>
>> Looks pretty good to me (not too dirty). I suppose given that the
>> parser transforms AT_ColumnConstraint into one of the existing command
>> subtypes, you could just have gram.y emit an AT_AddConstraint with the
>> ColumnDef attached, to save adding a new subtype, but there's probably
>> not much difference.
>
> Thanks.  I've done the other changes you suggested, but I don't see that
> it's desirable to have gram.y emit AT_AddConstraint directly.  It seems
> cleaner to be able to turn a NOT NULL constraint into AT_SetNotNull
> in parse_utilcmd instead.

I wasn't proposing getting rid of that bit in parse_utilcmd. I just
meant move the block that calls transformColumnConstraints() to the
AT_AddConstraint case in transformAlterTableStmt(). So it would test
if it has a ColumnDef attached, and either process a table constraint
or a set of column constraints. That would avoid the need for a new
command subtype, and so the changes to tablecmds.c would not be
needed. I think it would also avoid the need to add colname to the
Constraint struct, so none of the parsenodes.h changes would be needed
either.


  (Maybe I'll have to bite the bullet and make
> AT_AddConstraint work for not null constraints as well, as part of the
> larger patch.  Not sure.)  Currently, the table constraint syntax only
> lets you do a single constraint at a time, but you can do multiple
> constraints with the column constraint syntax.  I am not sure how hard
> it is to rework the grammar so that only a single constraint is
> allowed, but I'm not sure that it's worth the trouble either.
>
> Attached is an updated version, touching the docs and adding a new
> simple regression test.
>
> But ... I just noticed that I need to touch ALTER DOMAIN in a similar
> way as well.

Oh I keep forgetting about domains. But after a quick glance at the
docs, doesn't it already have syntax for this?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers