Re: [HACKERS] foreign key locks, 2nd attempt

2012-04-05 Thread Peter Geoghegan
On 25 March 2012 09:17, Simon Riggs si...@2ndquadrant.com wrote:
 The main thing we're waiting on are the performance tests to confirm
 the lack of regression.

I have extensively benchmarked the latest revision of the patch
(tpc-b.sql), which I pulled from Alvaro's github account. The
benchmark was of the feature branch's then-and-current HEAD, Don't
follow update chains unless caller requests it.

I've had to split these numbers out into two separate reports.
Incidentally, at some future point I hope that pgbench-tools can
handling testing across feature branches, initdb'ing and suchlike
automatically and as needed. That's something that's likely to happen
sooner rather than later.

The server used was kindly supplied by the University of Oregon open
source lab.

Server (It's virtualised, but apparently this is purely for sandboxing
purposes and the virtualisation technology is rather good):

IBM,8231-E2B POWER7 processor (8 cores).
Fedora 16
8GB Ram
Dedicated RAID1 disks. Exact configuration unknown.

postgresql.conf (this information is available when you drill down
into each test too, fwiw):
max_connections = 200
shared_buffers = 2GB
checkpoint_segments = 30
checkpoint_completion_target = 0.8
effective_cache_size = 6GB

Reports:

http://results_fklocks.staticloud.com/
http://results_master_for_fks.staticloud.com/

Executive summary: There is a clear regression of less than 10%. There
also appears to be a new source of contention at higher client counts.

I realise that the likely upshot of this, and other concerns that are
generally held at this late stage is that this patch will not make it
into 9.2 . For what it's worth, that comes as a big disappointment to
me. I would like to thank both Alvaro and Noah for their hard work
here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-25 Thread Simon Riggs
On Sat, Mar 17, 2012 at 10:45 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Here is v11.  This version is mainly updated to add pg_upgrade support,
 as discussed.  It also contains the README file that was posted earlier
 (plus wording fixes per Bruce), a couple of bug fixes, and some comment
 updates.

The main thing we're waiting on are the performance tests to confirm
the lack of regression.

You are working on that, right?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
 
 On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:

  Regarding performance, the good thing about this patch is that if you
  have an operation that used to block, it might now not block.  So maybe
  multixact-related operation is a bit slower than before, but if it
  allows you to continue operating rather than sit waiting until some
  other transaction releases you, it's much better.
 
  That's probably true, although there is some deferred cost that is
  hard to account for.  You might not block immediately, but then later
  somebody might block either because the mxact SLRU now needs fsyncs or
  because they've got to decode an mxid long after the relevant segment
  has been evicted from the SLRU buffers.  In general, it's hard to
  bound that latter cost, because you only avoid blocking once (when the
  initial update happens) but you might pay the extra cost of decoding
  the mxid as many times as the row is read, which could be arbitrarily
  many.  How much of a problem that is in practice, I'm not completely
  sure, but it has worried me before and it still does.  In the worst
  case scenario, a handful of frequently-accessed rows with MXIDs all of
  whose members are dead except for the UPDATE they contain could result
  in continual SLRU cache-thrashing.
 
 Cases I regularly see involve wait times of many seconds.
 
 When this patch helps, it will help performance by algorithmic gains,
 so perhaps x10-100.
 
 That can and should be demonstrated though, I agree.

BTW, the isolation tester cases have a few places that in the unpatched
code die with deadlocks and with the patched code continue without
dying.  Others are cases that block in unpatched master, and continue
without blocking in patched.  This should be enough proof that there are
algorithmic gains here.

There's also a test case that demostrates a fix for the problem (pointed
out by docs) that if you acquire a row lock, then a subxact upgrades it
(say by deleting the row) and the subxact aborts, the original row lock
is lost.  With the patched code, the original lock is no longer lost.

I completely agree with the idea that we need some mitigation against
repeated lookups of mxids that contain committed updates.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar mar 06 18:33:13 -0300 2012:

 The lock modes are correct, appropriate and IMHO have meaningful
 names. No redesign required here.
 
 Not sure about the naming of some of the flag bits however.

Feel free to suggest improvements ... I've probably seen them for too
long to find them anything but what I intended them to mean.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar mar 06 17:28:12 -0300 2012:
 On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.
 
 So there are 4 lock types, but we only have room for 3 on the tuple
 header, so we store the least common/deprecated of the 4 types as a
 multixactid. Some rewording would help there.

Hmm, I rewrote that paragraph two times.  I'll try to adjust it a bit
more.

 My understanding is that all of theses workloads will change
 
 * Users of explicit SHARE lockers will be slightly worse in the case
 of the 1st locker, but then after that they'll be the same as before.

Right.  (We're assuming that there *are* users of SHARE locks, which I'm
not sure to be a given.)

 * Updates against an RI locked table will be dramatically faster
 because of reduced lock waits

Correct.

 ...and that these previous workloads are effectively unchanged:
 
 * Stream of RI checks causes mxacts

Yes.

 * Multi row deadlocks still possible

Yes.

 * Queues of writers still wait in the same way

Yes.

 * Deletes don't cause mxacts unless by same transaction

Yeah .. there's no way for anyone to not conflict with a FOR KEY UPDATE
lock (the strength grabbed by a delete) unless you're the same
transaction.

  The possibility of having an update within a MultiXact means that they must
  persist across crashes and restarts: a future reader of the tuple needs to
  figure out whether the update committed or aborted.  So we have a 
  requirement
  that pg_multixact needs to retain pages of its data until we're certain that
  the MultiXacts in them are no longer of interest.
 
 I think the no longer of interest aspect needs to be tracked more
 closely because it will necessarily lead to more I/O.

Not sure what you mean here.

 If we store the LSN on each mxact page, as I think we need to, we can
 get rid of pages more quickly if we know they don't have an LSN set.
 So its possible we can optimise that more.

Hmm, I had originally thought that this was rather pointless because it
was unlikely that a segment would *never* have *all* multis not
containing updates.  But then, maybe Robert is right and there are users
out there that run a lot of RI checks and never update the masters ...
Hm.  I'm not sure that LSN tracking is the right tool to do that
optimization, however -- I mean, a single multi containing an update in
a whole segment will prevent that segment from being considered useless.

  VACUUM is in charge of removing old MultiXacts at the time of tuple 
  freezing.
 
 You mean mxact segments?

Well, both.  When a tuple is frozen, we both remove its Xmin/Xmax and
any possible multi that it might have in Xmax.  That's what I really
meant above.  But also, vacuum will remove pg_multixact segments just as
it will remove pg_clog segments.

(It is possible, and probably desirable, to remove a Multi much earlier
than freezing the tuple.  The patch does not (yet) do that, however.)

 Surely we set hint bits on tuples same as now? Hope so.

We set hint bits, but if a multi contains an update, we don't set
HEAP_XMAX_COMMITTED even when the update is known committed.  I think
we could do this in some cases.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of vie mar 16 00:04:06 -0300 2012:
 
 On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote:
  
  Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
   
   On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
  
When there is a single locker in a tuple, we can just store the locking 
info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, 
and
setting hint bits specifying the locking strength.  There is one 
exception
here: since hint bit space is limited, we do not provide a separate 
hint bit
for SELECT FOR SHARE, so we have to use the extended info in a 
MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY 
SHARE, are
presumably more commonly used due to being the standards-mandated 
locking
mechanism, or heavily used by the RI code, so we want to provide fast 
paths
for those.)
   
   Are those tuple bits actually hint bits?  They seem quite a bit more
   powerful than a hint.
  
  I'm not sure what's your point.  We've had a hint bit for SELECT FOR
  UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
  HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
  hints, but it's not the job of this patch to fix that problem.
 
 Now I am confused.  Where do you see the word hint used by
 HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
 bits, not hints, meaning they are not optional or there just for
 performance.

Okay, I think this is just a case of confusing terminology.  I have
always assumed (because I have not seen any evidence to the contrary)
that anything in t_infomask and t_infomask2 is a hint bit --
regardless of it being actually a hint or something with a stronger
significance.  HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK are
certainly not optional in the sense that if they are missing, the
meaning of the Xmax field is completely different.  So in all
correctness they are not hints, though we call them that.

Now, if we want to differentiate infomask bits that are just hints from
those that are something else, we can do that, but I'm not sure it's
useful -- AFAICS only XMAX_COMMITTED and XMIN_COMMITTED are proper
hints.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of vie mar 16 10:36:11 -0300 2012:

  Now I am confused.  Where do you see the word hint used by
  HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
  bits, not hints, meaning they are not optional or there just for
  performance.
 
 Okay, I think this is just a case of confusing terminology.  I have
 always assumed (because I have not seen any evidence to the contrary)
 that anything in t_infomask and t_infomask2 is a hint bit --
 regardless of it being actually a hint or something with a stronger
 significance.

Maybe this is just my mistake.  I see in
http://wiki.postgresql.org/wiki/Hint_Bits that we only call the
COMMITTED/INVALID infomask bits hints.

I think it's easy enough to correct the README to call them infomask
bits rather than hints .. I'll go do that.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Robert Haas
On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
 On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I agree with you that some worst case performance tests should be
  done. Could you please say what you think the worst cases would be, so
  those can be tested? That would avoid wasting time or getting anything
  backwards.

 I've thought about this some and here's what I've come up with so far:

 I question whether we are in a position to do the testing necessary to
 commit this for 9.2.

Is anyone even working on testing it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Bruce Momjian
On Fri, Mar 16, 2012 at 10:40:01AM -0300, Alvaro Herrera wrote:
 
 Excerpts from Alvaro Herrera's message of vie mar 16 10:36:11 -0300 2012:
 
   Now I am confused.  Where do you see the word hint used by
   HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
   bits, not hints, meaning they are not optional or there just for
   performance.
  
  Okay, I think this is just a case of confusing terminology.  I have
  always assumed (because I have not seen any evidence to the contrary)
  that anything in t_infomask and t_infomask2 is a hint bit --
  regardless of it being actually a hint or something with a stronger
  significance.
 
 Maybe this is just my mistake.  I see in
 http://wiki.postgresql.org/wiki/Hint_Bits that we only call the
 COMMITTED/INVALID infomask bits hints.
 
 I think it's easy enough to correct the README to call them infomask
 bits rather than hints .. I'll go do that.

OK, thanks.  I only brought it up so people would not be confused by
thinking these were optional pieces of information, and that the real
information is stored somewhere else.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Bruce Momjian
On Fri, Mar 16, 2012 at 10:08:07AM -0400, Robert Haas wrote:
 On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian br...@momjian.us wrote:
  On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
  On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
   I agree with you that some worst case performance tests should be
   done. Could you please say what you think the worst cases would be, so
   those can be tested? That would avoid wasting time or getting anything
   backwards.
 
  I've thought about this some and here's what I've come up with so far:
 
  I question whether we are in a position to do the testing necessary to
  commit this for 9.2.
 
 Is anyone even working on testing it?

No one I know of.  I am just trying to set expectations that this still
has a long way to go.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-16 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of vie mar 16 15:22:05 -0300 2012:
 
 On Fri, Mar 16, 2012 at 10:08:07AM -0400, Robert Haas wrote:
  On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian br...@momjian.us wrote:
   On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
   On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com 
   wrote:
I agree with you that some worst case performance tests should be
done. Could you please say what you think the worst cases would be, so
those can be tested? That would avoid wasting time or getting anything
backwards.
  
   I've thought about this some and here's what I've come up with so far:
  
   I question whether we are in a position to do the testing necessary to
   commit this for 9.2.
  
  Is anyone even working on testing it?
 
 No one I know of.  I am just trying to set expectations that this still
 has a long way to go.

A Command Prompt colleague is on it.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:

 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.

So because committed does not equal all visible there will be
additional lookups on mxids? That's complete rubbish.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Thu, Mar 15, 2012 at 2:15 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch n...@leadboat.com wrote:
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

 I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
 syntax additions.  We could even avoid doing that by not documenting them.  A
 later major release could implement them using a completely different
 mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
 UPDATE.  To be sure, let's still do a good job the first time.

 What I mean is really that, once the release is out, we don't get to
 take it back.  Sure, the next release can fix things, but any
 regressions will become obstacles to upgrading and pain points for new
 users.

This comment is completely superfluous. It's a complete waste of time
to turn up on a thread and remind people that if they commit something
and it doesn't actually work that it would be a bad thing. Why, we
might ask do you think that thought needs to be expressed here?
Please, don't answer, lets spend the time on actually reviewing the
patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?

 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other 
 multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

 Hmm, yeah.

 In a way, the fact that we don't do that is kind of fortuitous in this
 situation.  I had just assumed that we were not fsyncing it because
 there seems to be no reason to do so.  But since we are, we already
 know that the fsyncs resulting from frequent mxid allocation aren't a
 huge pain point.  If they were, somebody would have presumably
 complained about it and fixed it before now.  So that means that what
 we're really worrying about here is the overhead of fsyncing a little
 more often, which is a lot less scary than starting to do it when we
 weren't previously.

Good

 Now, we could look at this as an opportunity to optimize the existing
 implementation by removing the fsyncs, rather than adding the new
 infrastructure Alvaro is proposing.

This is not an exercise in tuning mxact code. There is a serious
algorithmic problem that is causing real world problems.

Removing the fsync will *not* provide a solution to the problem, so
there is no opportunity here.

 But that would only make sense if
 we thought that getting rid of the fsyncs would be more valuable than
 avoiding the blocking here, and I don't.

You're right that the existing code could use some optimisation.

I'm a little tired, but I can't see a reason to fsync this except at checkpoint.

Also seeing that we issue 2 WAL records for each RI check. We issue
one during MultiXactIdCreate/MultiXactIdExpand and then immediately
afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
that each thinks it is doing it for the same reason and is the only
place its being done. Alvaro, any ideas why that is.


 I still think that someone needs to do some benchmarking here, because
 this is a big complicated performance patch, and we can't predict the
 impact of it on real-world scenarios without testing.  There is
 clearly some additional overhead, and it makes sense to measure it and
 hopefully discover that it isn't excessive.  Still, I'm a bit
 relieved.

Very much agreed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 As things stand today

Can I confirm where we are now? Is there another version of the patch
coming out soon?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
 On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:

  But that would only make sense if
  we thought that getting rid of the fsyncs would be more valuable than
  avoiding the blocking here, and I don't.
 
 You're right that the existing code could use some optimisation.
 
 I'm a little tired, but I can't see a reason to fsync this except at 
 checkpoint.

Hang on.  What fsyncs are we talking about?  I don't see that the
multixact code calls any fsync except that checkpoint and shutdown.

 Also seeing that we issue 2 WAL records for each RI check. We issue
 one during MultiXactIdCreate/MultiXactIdExpand and then immediately
 afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
 that each thinks it is doing it for the same reason and is the only
 place its being done. Alvaro, any ideas why that is.

AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
being locked by a multixact -- it doesn't record the contents (member
xids) of said multixact, which is what the other log entry records.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue mar 15 18:46:44 -0300 2012:
 
 On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  As things stand today
 
 Can I confirm where we are now? Is there another version of the patch
 coming out soon?

Yes, another version is coming soon.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
 On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:

  But that would only make sense if
  we thought that getting rid of the fsyncs would be more valuable than
  avoiding the blocking here, and I don't.

 You're right that the existing code could use some optimisation.

 I'm a little tired, but I can't see a reason to fsync this except at 
 checkpoint.

 Hang on.  What fsyncs are we talking about?  I don't see that the
 multixact code calls any fsync except that checkpoint and shutdown.

If a dirty page is evicted it will fsync.

 Also seeing that we issue 2 WAL records for each RI check. We issue
 one during MultiXactIdCreate/MultiXactIdExpand and then immediately
 afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
 that each thinks it is doing it for the same reason and is the only
 place its being done. Alvaro, any ideas why that is.

 AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
 being locked by a multixact -- it doesn't record the contents (member
 xids) of said multixact, which is what the other log entry records.

Agreed. But issuing two records when we could issue just one seems a
little strange, especially when the two record types follow one
another so closely - so we end up queuing for the lock twice while
holding the lock on the data block.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:
 
 On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
  On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:
 
   But that would only make sense if
   we thought that getting rid of the fsyncs would be more valuable than
   avoiding the blocking here, and I don't.
 
  You're right that the existing code could use some optimisation.
 
  I'm a little tired, but I can't see a reason to fsync this except at 
  checkpoint.
 
  Hang on.  What fsyncs are we talking about?  I don't see that the
  multixact code calls any fsync except that checkpoint and shutdown.
 
 If a dirty page is evicted it will fsync.

Ah, right.

  Also seeing that we issue 2 WAL records for each RI check. We issue
  one during MultiXactIdCreate/MultiXactIdExpand and then immediately
  afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
  that each thinks it is doing it for the same reason and is the only
  place its being done. Alvaro, any ideas why that is.
 
  AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
  being locked by a multixact -- it doesn't record the contents (member
  xids) of said multixact, which is what the other log entry records.
 
 Agreed. But issuing two records when we could issue just one seems a
 little strange, especially when the two record types follow one
 another so closely - so we end up queuing for the lock twice while
 holding the lock on the data block.

Hmm, that seems optimization that could be done separately.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Simon Riggs
On Thu, Mar 15, 2012 at 10:13 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:

 On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
  On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com 
  wrote:
 
   But that would only make sense if
   we thought that getting rid of the fsyncs would be more valuable than
   avoiding the blocking here, and I don't.
 
  You're right that the existing code could use some optimisation.
 
  I'm a little tired, but I can't see a reason to fsync this except at 
  checkpoint.
 
  Hang on.  What fsyncs are we talking about?  I don't see that the
  multixact code calls any fsync except that checkpoint and shutdown.

 If a dirty page is evicted it will fsync.

 Ah, right.

  Also seeing that we issue 2 WAL records for each RI check. We issue
  one during MultiXactIdCreate/MultiXactIdExpand and then immediately
  afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
  that each thinks it is doing it for the same reason and is the only
  place its being done. Alvaro, any ideas why that is.
 
  AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
  being locked by a multixact -- it doesn't record the contents (member
  xids) of said multixact, which is what the other log entry records.

 Agreed. But issuing two records when we could issue just one seems a
 little strange, especially when the two record types follow one
 another so closely - so we end up queuing for the lock twice while
 holding the lock on the data block.

 Hmm, that seems optimization that could be done separately.

Oh yes, definitely not something for you to add to the main patch.

Just some additional tuning to alleviate Robert's concerns.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Robert Haas
On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.

 So because committed does not equal all visible there will be
 additional lookups on mxids? That's complete rubbish.

Noah seemed to be implying that once the updating transaction
committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
 But I think that's not true, because anyone who looks at the tuple
afterward will still need to know the exact xmax, to test it against
their snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue mar 15 21:37:36 -0300 2012:
 
 On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
  mxid
  lookups, so I think something more sophisticated is needed to exercise 
  that
  cost.  Not sure what.
 
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.
 
  So because committed does not equal all visible there will be
  additional lookups on mxids? That's complete rubbish.
 
 Noah seemed to be implying that once the updating transaction
 committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
  But I think that's not true, because anyone who looks at the tuple
 afterward will still need to know the exact xmax, to test it against
 their snapshot.

Yeah, we don't set HEAP_XMAX_COMMITTED on multis, even when there's an
update in it and it committed.  I think we could handle it, at least
some of the cases, but that'd require careful re-examination of all the
tqual.c code, which is not something I want to do right now.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Noah Misch
On Thu, Mar 15, 2012 at 08:37:36PM -0400, Robert Haas wrote:
 On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
  mxid
  lookups, so I think something more sophisticated is needed to exercise 
  that
  cost. ?Not sure what.
 
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.
 
  So because committed does not equal all visible there will be
  additional lookups on mxids? That's complete rubbish.
 
 Noah seemed to be implying that once the updating transaction
 committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
  But I think that's not true, because anyone who looks at the tuple
 afterward will still need to know the exact xmax, to test it against
 their snapshot.

Yeah, my comment above was wrong.  I agree that we'll need to retrieve the
mxid members during every MVCC scan until we either mark the page all-visible
or have occasion to simplify the mxid xmax to the updater xid.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian
On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
  
  On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
 
   When there is a single locker in a tuple, we can just store the locking 
   info
   in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
   setting hint bits specifying the locking strength.  There is one exception
   here: since hint bit space is limited, we do not provide a separate hint 
   bit
   for SELECT FOR SHARE, so we have to use the extended info in a MultiXact 
   in
   that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, 
   are
   presumably more commonly used due to being the standards-mandated locking
   mechanism, or heavily used by the RI code, so we want to provide fast 
   paths
   for those.)
  
  Are those tuple bits actually hint bits?  They seem quite a bit more
  powerful than a hint.
 
 I'm not sure what's your point.  We've had a hint bit for SELECT FOR
 UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
 HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
 hints, but it's not the job of this patch to fix that problem.

Now I am confused.  Where do you see the word hint used by
HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
bits, not hints, meaning they are not optional or there just for
performance.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian
On Thu, Mar 15, 2012 at 11:04:06PM -0400, Bruce Momjian wrote:
 On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote:
  
  Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
   
   On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
  
When there is a single locker in a tuple, we can just store the locking 
info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, 
and
setting hint bits specifying the locking strength.  There is one 
exception
here: since hint bit space is limited, we do not provide a separate 
hint bit
for SELECT FOR SHARE, so we have to use the extended info in a 
MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY 
SHARE, are
presumably more commonly used due to being the standards-mandated 
locking
mechanism, or heavily used by the RI code, so we want to provide fast 
paths
for those.)
   
   Are those tuple bits actually hint bits?  They seem quite a bit more
   powerful than a hint.
  
  I'm not sure what's your point.  We've had a hint bit for SELECT FOR
  UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
  HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
  hints, but it's not the job of this patch to fix that problem.
 
 Now I am confused.  Where do you see the word hint used by
 HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
 bits, not hints, meaning they are not optional or there just for
 performance.

Are you saying that the bit is only a guide and is there only for
performance?  If so, I understand why it is called hint.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian
On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
 On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I agree with you that some worst case performance tests should be
  done. Could you please say what you think the worst cases would be, so
  those can be tested? That would avoid wasting time or getting anything
  backwards.
 
 I've thought about this some and here's what I've come up with so far:

I question whether we are in a position to do the testing necessary to
commit this for 9.2.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas
On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
 More often than that; each 2-member mxid takes 4 bytes in an offsets file and
 10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
 that we already fsync the multixact SLRUs today, so any increase will arise
 from the widening of member entries from 4 bytes to 5.  The realism of this
 test is attractive.  Nearly-static parent tables are plenty common, and this
 test will illustrate the impact on those designs.

Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU today?

 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

 This isn't exactly a test case, but from Noah's previous comments I
 gather that there is a theoretical risk of mxid consumption running
 ahead of xid consumption.  We should try to think about whether there
 are any realistic workloads where that might actually happen.  I'm
 willing to believe that there aren't, but not just because somebody
 asserts it.  The reason I'm concerned about this is because, if it
 should happen, the result will be more frequent anti-wraparound
 vacuums on every table in the cluster.  Those are already quite
 painful for some users.

 Yes.  Pre-release, what can we really do here other than have more people
 thinking about ways it might happen in practice?  Post-release, we could
 suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
 is using more mxid space than xid space.

Well, post-release, the cat is out of the bag: we'll be stuck with
this whether the performance characteristics are acceptable or not.
That's why we'd better be as sure as possible before committing to
this implementation that there's nothing we can't live with.  It's not
like there's any reasonable way to turn this off if you don't like it.

 Also consider a benchmark that does plenty of non-key updates on a parent
 table with no activity on the child table.  We'll pay the overhead of
 determining that the key column(s) have not changed, but it will never pay off
 by preventing a lock wait.

Good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Noah Misch
On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
 On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
  More often than that; each 2-member mxid takes 4 bytes in an offsets file 
  and
  10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
  that we already fsync the multixact SLRUs today, so any increase will arise
  from the widening of member entries from 4 bytes to 5. ?The realism of this
  test is attractive. ?Nearly-static parent tables are plenty common, and this
  test will illustrate the impact on those designs.
 
 Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
 today?

Good question.  So far, I can't think of a reason.  nextMulti is critical,
but we already fsync it with pg_control.  We could delete the other multixact
state data at every startup and set OldestVisibleMXactId accordingly.

  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
  lookups, so I think something more sophisticated is needed to exercise that
  cost. ?Not sure what.
 
 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
the updater xid when all transactions of the multixact have ended.  You would
need a test workload with long-running multixacts that delay such replacement.
However, the workloads that come to mind are the very workloads for which this
patch eliminates lock waits; they wouldn't illustrate a worst-case.

  This isn't exactly a test case, but from Noah's previous comments I
  gather that there is a theoretical risk of mxid consumption running
  ahead of xid consumption. ?We should try to think about whether there
  are any realistic workloads where that might actually happen. ?I'm
  willing to believe that there aren't, but not just because somebody
  asserts it. ?The reason I'm concerned about this is because, if it
  should happen, the result will be more frequent anti-wraparound
  vacuums on every table in the cluster. ?Those are already quite
  painful for some users.
 
  Yes. ?Pre-release, what can we really do here other than have more people
  thinking about ways it might happen in practice? ?Post-release, we could
  suggest monitoring methods or perhaps have VACUUM emit a WARNING when a 
  table
  is using more mxid space than xid space.
 
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
syntax additions.  We could even avoid doing that by not documenting them.  A
later major release could implement them using a completely different
mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
UPDATE.  To be sure, let's still do a good job the first time.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Alvaro Herrera

Excerpts from Noah Misch's message of mié mar 14 19:10:00 -0300 2012:
 
 On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
  On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
   More often than that; each 2-member mxid takes 4 bytes in an offsets file 
   and
   10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
   that we already fsync the multixact SLRUs today, so any increase will 
   arise
   from the widening of member entries from 4 bytes to 5. ?The realism of 
   this
   test is attractive. ?Nearly-static parent tables are plenty common, and 
   this
   test will illustrate the impact on those designs.
  
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?
 
 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

Hmm, yeah.

   You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
   mxid
   lookups, so I think something more sophisticated is needed to exercise 
   that
   cost. ?Not sure what.
  
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.
 
 True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
 the updater xid when all transactions of the multixact have ended.

I have noticed that this code is not correct, because we don't know that
we're holding an appropriate lock on the page, so we can't simply change
the Xmax and reset those hint bits.  As things stand today, mxids
persist longer.  (We could do some cleanup at HOT-style page prune, for
example, though the lock we need is even weaker than that.)  Overall
this means that coming up with a test case demonstrating this pressure
probably isn't that hard.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas
On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch n...@leadboat.com wrote:
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

 I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
 syntax additions.  We could even avoid doing that by not documenting them.  A
 later major release could implement them using a completely different
 mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
 UPDATE.  To be sure, let's still do a good job the first time.

What I mean is really that, once the release is out, we don't get to
take it back.  Sure, the next release can fix things, but any
regressions will become obstacles to upgrading and pain points for new
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas
On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?

 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

 Hmm, yeah.

In a way, the fact that we don't do that is kind of fortuitous in this
situation.  I had just assumed that we were not fsyncing it because
there seems to be no reason to do so.  But since we are, we already
know that the fsyncs resulting from frequent mxid allocation aren't a
huge pain point.  If they were, somebody would have presumably
complained about it and fixed it before now.  So that means that what
we're really worrying about here is the overhead of fsyncing a little
more often, which is a lot less scary than starting to do it when we
weren't previously.

Now, we could look at this as an opportunity to optimize the existing
implementation by removing the fsyncs, rather than adding the new
infrastructure Alvaro is proposing.  But that would only make sense if
we thought that getting rid of the fsyncs would be more valuable than
avoiding the blocking here, and I don't.

I still think that someone needs to do some benchmarking here, because
this is a big complicated performance patch, and we can't predict the
impact of it on real-world scenarios without testing.  There is
clearly some additional overhead, and it makes sense to measure it and
hopefully discover that it isn't excessive.  Still, I'm a bit
relieved.

 I have noticed that this code is not correct, because we don't know that
 we're holding an appropriate lock on the page, so we can't simply change
 the Xmax and reset those hint bits.  As things stand today, mxids
 persist longer.  (We could do some cleanup at HOT-style page prune, for
 example, though the lock we need is even weaker than that.)  Overall
 this means that coming up with a test case demonstrating this pressure
 probably isn't that hard.

What would such a test case look like?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Bruce Momjian
On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
 Here's a first attempt at a README illustrating this.  I intend this to
 be placed in src/backend/access/heap/README.tuplock; the first three
 paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
 those from there, directing people to this new file instead.  Is there
 something that you think should be covered more extensively (or at all)
 here?
...
 
 When there is a single locker in a tuple, we can just store the locking info
 in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
 setting hint bits specifying the locking strength.  There is one exception
 here: since hint bit space is limited, we do not provide a separate hint bit
 for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
 that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
 presumably more commonly used due to being the standards-mandated locking
 mechanism, or heavily used by the RI code, so we want to provide fast paths
 for those.)

Are those tuple bits actually hint bits?  They seem quite a bit more
powerful than a hint.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 9:24 PM, Noah Misch n...@leadboat.com wrote:
 When we lock an update-in-progress row, we walk the t_ctid chain and lock all
 descendant tuples.  They may all have uncommitted xmins.  This is essential to
 ensure that the final outcome of the updating transaction does not affect
 whether the locking transaction has its KEY SHARE lock.  Similarly, when we
 update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
 to the new version.  That new tuple is both uncommitted and has locks, and we
 cannot easily sacrifice either property.  Do you see a way to extend your
 scheme to cover these needs?

No, I think that sinks it.  Good analysis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
 
 On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:

  When there is a single locker in a tuple, we can just store the locking info
  in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
  setting hint bits specifying the locking strength.  There is one exception
  here: since hint bit space is limited, we do not provide a separate hint bit
  for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
  that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, 
  are
  presumably more commonly used due to being the standards-mandated locking
  mechanism, or heavily used by the RI code, so we want to provide fast paths
  for those.)
 
 Are those tuple bits actually hint bits?  They seem quite a bit more
 powerful than a hint.

I'm not sure what's your point.  We've had a hint bit for SELECT FOR
UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
hints, but it's not the job of this patch to fix that problem.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I agree with you that some worst case performance tests should be
 done. Could you please say what you think the worst cases would be, so
 those can be tested? That would avoid wasting time or getting anything
 backwards.

I've thought about this some and here's what I've come up with so far:

1. SELECT FOR SHARE on a large table on a system with no write cache.

2. A small parent table (say 30 rows or so) and a larger child table
with a many-to-one FK relationship to the parent (say 100 child rows
per parent row), with heavy update activity on the child table, on a
system where fsyncs are very slow.  This should generate lots of mxid
consumption, and every 1600 or so mxids (I think) we've got to fsync;
does that generate a noticeable performance hit?

3. It would be nice to test the impact of increased mxid lookups in
the parent, but I've realized that the visibility map will probably
mask a good chunk of that effect, which is a good thing.  Still, maybe
something like this: a fairly large parent table, say a million rows,
but narrow rows, so that many of them fit on a page, with frequent
reads and occasional updates (if there are only reads, autovacuum
might end with all the visibility map bits set); plus a child table
with one or a few rows per parent which is heavily updated.  In theory
this ought to be good for the patch, since the the more fine-grained
locking will avoid blocking, but in this case the parent table is
large enough that you shouldn't get much blocking anyway, yet you'll
still pay the cost of mxid lookups because the occasional updates on
the parent will clear VM bits.  This might not be the exactly right
workload to measure this effect, but if it's not maybe someone can
devote a little time to thinking about what would be.

4. A plain old pgbench run or two, to see whether there's any
regression when none of this matters at all...

This isn't exactly a test case, but from Noah's previous comments I
gather that there is a theoretical risk of mxid consumption running
ahead of xid consumption.  We should try to think about whether there
are any realistic workloads where that might actually happen.  I'm
willing to believe that there aren't, but not just because somebody
asserts it.  The reason I'm concerned about this is because, if it
should happen, the result will be more frequent anti-wraparound
vacuums on every table in the cluster.  Those are already quite
painful for some users.

It would be nice if Noah or someone else who has reviewed this patch
in detail could comment further.  I am shooting from the hip here, a
bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Noah Misch
On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
 On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I agree with you that some worst case performance tests should be
  done. Could you please say what you think the worst cases would be, so
  those can be tested? That would avoid wasting time or getting anything
  backwards.
 
 I've thought about this some and here's what I've come up with so far:
 
 1. SELECT FOR SHARE on a large table on a system with no write cache.

Easy enough that we may as well check it.  Share-locking an entire large table
is impractical in a real application, so I would not worry if this shows a
substantial regression.

 2. A small parent table (say 30 rows or so) and a larger child table
 with a many-to-one FK relationship to the parent (say 100 child rows
 per parent row), with heavy update activity on the child table, on a
 system where fsyncs are very slow.  This should generate lots of mxid
 consumption, and every 1600 or so mxids (I think) we've got to fsync;
 does that generate a noticeable performance hit?

More often than that; each 2-member mxid takes 4 bytes in an offsets file and
10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
that we already fsync the multixact SLRUs today, so any increase will arise
from the widening of member entries from 4 bytes to 5.  The realism of this
test is attractive.  Nearly-static parent tables are plenty common, and this
test will illustrate the impact on those designs.

 3. It would be nice to test the impact of increased mxid lookups in
 the parent, but I've realized that the visibility map will probably
 mask a good chunk of that effect, which is a good thing.  Still, maybe
 something like this: a fairly large parent table, say a million rows,
 but narrow rows, so that many of them fit on a page, with frequent
 reads and occasional updates (if there are only reads, autovacuum
 might end with all the visibility map bits set); plus a child table
 with one or a few rows per parent which is heavily updated.  In theory
 this ought to be good for the patch, since the the more fine-grained
 locking will avoid blocking, but in this case the parent table is
 large enough that you shouldn't get much blocking anyway, yet you'll
 still pay the cost of mxid lookups because the occasional updates on
 the parent will clear VM bits.  This might not be the exactly right
 workload to measure this effect, but if it's not maybe someone can
 devote a little time to thinking about what would be.

You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
lookups, so I think something more sophisticated is needed to exercise that
cost.  Not sure what.

 4. A plain old pgbench run or two, to see whether there's any
 regression when none of this matters at all...

Might as well.

 This isn't exactly a test case, but from Noah's previous comments I
 gather that there is a theoretical risk of mxid consumption running
 ahead of xid consumption.  We should try to think about whether there
 are any realistic workloads where that might actually happen.  I'm
 willing to believe that there aren't, but not just because somebody
 asserts it.  The reason I'm concerned about this is because, if it
 should happen, the result will be more frequent anti-wraparound
 vacuums on every table in the cluster.  Those are already quite
 painful for some users.

Yes.  Pre-release, what can we really do here other than have more people
thinking about ways it might happen in practice?  Post-release, we could
suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
is using more mxid space than xid space.


Also consider a benchmark that does plenty of non-key updates on a parent
table with no activity on the child table.  We'll pay the overhead of
determining that the key column(s) have not changed, but it will never pay off
by preventing a lock wait.  Granted, this is barely representative of
application behavior.  Perhaps, too, we already have a good sense of this cost
from the HOT benchmarking efforts and have no cause to revisit it.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas
On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas robertmh...@gmail.com wrote:
 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

 That's probably true, although there is some deferred cost that is
 hard to account for.  You might not block immediately, but then later
 somebody might block either because the mxact SLRU now needs fsyncs or
 because they've got to decode an mxid long after the relevant segment
 has been evicted from the SLRU buffers.  In general, it's hard to
 bound that latter cost, because you only avoid blocking once (when the
 initial update happens) but you might pay the extra cost of decoding
 the mxid as many times as the row is read, which could be arbitrarily
 many.  How much of a problem that is in practice, I'm not completely
 sure, but it has worried me before and it still does.  In the worst
 case scenario, a handful of frequently-accessed rows with MXIDs all of
 whose members are dead except for the UPDATE they contain could result
 in continual SLRU cache-thrashing.

 From a performance standpoint, we really need to think not only about
 the cases where the patch wins, but also, and maybe more importantly,
 the cases where it loses.  There are some cases where the current
 mechanism, use SHARE locks for foreign keys, is adequate.  In
 particular, it's adequate whenever the parent table is not updated at
 all, or only very lightly.  I believe that those people will pay
 somewhat more with this patch, and especially in any case where
 backends end up waiting for fsyncs in order to create new mxids, but
 also just because I think this patch will have the effect of
 increasing the space consumed by each individual mxid, which imposes a
 distributed cost of its own.

I spent some time thinking about this over the weekend, and I have an
observation, and an idea.  Here's the observation: I believe that
locking a tuple whose xmin is uncommitted is always a noop, because if
it's ever possible for a transaction to wait for an XID that is part
of its own transaction (exact same XID, or sub-XIDs of the same top
XID), then a transaction could deadlock against itself.  I believe
that this is not possible: if a transaction were to wait for an XID
assigned to that same backend, then the lock manager would observe
that an ExclusiveLock on the xid is already held, so the request for a
ShareLock would be granted immediately.  I also don't believe there's
any situation in which the existence of an uncommitted tuple fails to
block another backend, but a lock on that same uncommitted tuple would
have caused another backend to block.  If any of that sounds wrong,
you can stop reading here (but please tell me why it sounds wrong).

If it's right, then here's the idea: what if we stored mxids using
xmin rather than xmax?  This would mean that, instead of making mxids
contain the tuple's original xmax, they'd need to instead contain the
tuple's original xmin.  This might seem like rearranging the deck
chairs on the titanic, but I think it actually works out substantially
better, because if we can assume that the xmin is committed, then we
only need to know its exact value until it becomes older than
RecentGlobalXmin.  This means that a tuple can be both updated and
locked at the same time without the MultiXact SLRU needing to be
crash-safe, because if we crash and restart, any mxids that are still
around from before the crash are known to contain only xmins that are
now all-visible.  We therefore don't need their exact values, so it
doesn't matter if that data actually made it to disk.  Furthermore, in
the case where a previously-locked tuple is read repeatedly, we only
need to keep doing SLRU lookups until the xmin becomes all-visible;
after that, we can set a hint bit indicating that the tuple's xmin is
all-visible, and any future readers (or writers) can use that to skip
the SLRU lookup.  In the degenerate (and probably common) case where a
tuple is already all-visible at the time it's locked, we don't really
need to record the original xmin at all; we can still do so if
convenient, but we can set the xmin-all-visible hint right away, so
nobody needs to probe the SLRU just to get xmin.

In other words, we'd entirely avoid needing to make mxacts crash-safe,
and we'd avoid most of the extra SLRU lookups that the current
implementation requires; they'd only be needed when (and for as long
as) the locked tuple was not yet all-visible.

This also seems like it would make the anti-wraparound issues simpler
to handle - once an mxid is old enough that any xmin it contains must
be all-visible, we can simply overwrite the tuple's xmin with
FrozenXID, which is pretty much what we're already doing anyway.  It's
already the case that a 

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs
On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:

 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

The current implementation only requires additional lookups in the
update/check case, which is the case that doesn't do anything other
than block right now. Since we're replacing lock contention with
physical access contention even the worst case situation is still
better than what we have now. Please feel free to point out worst case
situations and show that isn't true.

I've also pointed out how to avoid overhead of making mxacts crash
safe when the new facilities are not in use, so I don't see problems
with the proposed mechanism. Given that I am still myself reviewing
the actual code.

So those things are not something we need to avoid.

My feeling is that overwriting xmin is a clever idea, but arrives too
late to require sensible analysis in this stage of the CF. It's not
solving a problem, its just an alternate mechanism and at best an
optimisation of the mechanism. Were we to explore it now, it seems
certain that another person would observe that design were taking
place and so the patch should be rejected, which would be unnecessary
and wasteful. I also think it would alter our ability to diagnose
problems, not least the normal test that xmax matches xmin across an
update.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas
On Mon, Mar 12, 2012 at 1:50 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:
 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

 The current implementation only requires additional lookups in the
 update/check case, which is the case that doesn't do anything other
 than block right now. Since we're replacing lock contention with
 physical access contention even the worst case situation is still
 better than what we have now. Please feel free to point out worst case
 situations and show that isn't true.

I think I already have:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01258.php

The case I'm worried about is where we are allocating mxids quickly,
and we end up having to wait for fsyncs on mxact segments.  That might
be very slow, but you could argue that it could *possibly* be still
worthwhile if it avoids blocking.  That doesn't strike me as a
slam-dunk, though, because we've already seen and fixed cases where
too many fsyncs causes the performance of the entire system to go down
the tubes (cf. commit 7f242d880b5b5d9642675517466d31373961cf98).  But
it's really bad if there are no updates on the parent table - then,
whatever extra overhead there is will be all for naught, since the
more fine-grained locking doesn't help anyway.

 I've also pointed out how to avoid overhead of making mxacts crash
 safe when the new facilities are not in use, so I don't see problems
 with the proposed mechanism. Given that I am still myself reviewing
 the actual code.

The closest thing I can find to a proposal from you in that regard is
this comment:

# I was really thinking we could skip the fsync of a page if we've not
# persisted anything important on that page, since that was one of
# Robert's performance points.

It might be possible to do something with that idea, but at the moment
I'm not seeing how to make it work.

 So those things are not something we need to avoid.

 My feeling is that overwriting xmin is a clever idea, but arrives too
 late to require sensible analysis in this stage of the CF. It's not
 solving a problem, its just an alternate mechanism and at best an
 optimisation of the mechanism. Were we to explore it now, it seems
 certain that another person would observe that design were taking
 place and so the patch should be rejected, which would be unnecessary
 and wasteful.

Considering that nobody's done any work to resolve the uncertainty
about whether the worst-case performance characteristics of this patch
are acceptable, and considering further that it was undergoing massive
code churn for more than a month after the final CommitFest, I think
it's not that unreasonable to think it might not be ready for prime
time at this point.  In any event, your argument is exactly backwards:
we need to first decide whether the patch needs a redesign and then,
if it does, postpone it.  Deciding that we don't want to postpone it
first, and therefore we're not going to redesign it even if that is
what's really needed makes no sense.

 I also think it would alter our ability to diagnose
 problems, not least the normal test that xmax matches xmin across an
 update.

There's nothing stopping the new tuple from being frozen before the
old one, even today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs
On Mon, Mar 12, 2012 at 6:14 PM, Robert Haas robertmh...@gmail.com wrote:

 Considering that nobody's done any work to resolve the uncertainty
 about whether the worst-case performance characteristics of this patch
 are acceptable, and considering further that it was undergoing massive
 code churn for more than a month after the final CommitFest, I think
 it's not that unreasonable to think it might not be ready for prime
 time at this point.

Thank you for cutting to the chase.

The uncertainty of which you speak is a theoretical point you
raised. It has been explained, but nobody has yet shown the
performance numbers to illustrate the point but only because they
seemed so clear. I would point out that you haven't demonstrated the
existence of a problem either, so redesigning something without any
proof of a problem seems a strange.

Let me explain again what this patch does and why it has such major
performance benefit.

This feature give us a step change in lock reductions from FKs. A real
world best case might be to examine the benefit this patch has on a
large batch load that inserts many new orders for existing customers.
In my example case the orders table has a FK to the customer table. At
the same time as the data load, we attempt to update a customer's
additional details, address or current balance etc. The large load
takes locks on the customer table and keeps them for the whole
transaction. So the customer updates are locked out for multiple
seconds, minutes or maybe hours, depending upon how far you want to
stretch the example. With this patch the customer updates don't cause
lock conflicts but they require mxact lookups in *some* cases, so they
might take 1-10ms extra, rather than 1-10 minutes more. 1000x faster.
The only case that causes the additional lookups is the case that
otherwise would have been locked. So producing best case results is
trivial and can be as enormous as you like.

I agree with you that some worst case performance tests should be
done. Could you please say what you think the worst cases would be, so
those can be tested? That would avoid wasting time or getting anything
backwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Noah Misch
On Mon, Mar 12, 2012 at 01:28:11PM -0400, Robert Haas wrote:
 I spent some time thinking about this over the weekend, and I have an
 observation, and an idea.  Here's the observation: I believe that
 locking a tuple whose xmin is uncommitted is always a noop, because if
 it's ever possible for a transaction to wait for an XID that is part
 of its own transaction (exact same XID, or sub-XIDs of the same top
 XID), then a transaction could deadlock against itself.  I believe
 that this is not possible: if a transaction were to wait for an XID
 assigned to that same backend, then the lock manager would observe
 that an ExclusiveLock on the xid is already held, so the request for a
 ShareLock would be granted immediately.  I also don't believe there's
 any situation in which the existence of an uncommitted tuple fails to
 block another backend, but a lock on that same uncommitted tuple would
 have caused another backend to block.  If any of that sounds wrong,
 you can stop reading here (but please tell me why it sounds wrong).

When we lock an update-in-progress row, we walk the t_ctid chain and lock all
descendant tuples.  They may all have uncommitted xmins.  This is essential to
ensure that the final outcome of the updating transaction does not affect
whether the locking transaction has its KEY SHARE lock.  Similarly, when we
update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
to the new version.  That new tuple is both uncommitted and has locks, and we
cannot easily sacrifice either property.  Do you see a way to extend your
scheme to cover these needs?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram
I feel sad, that i followed this topic very late. But i still want to put
forward my views.
Have we thought on the lines of how Robert has implemented relation level
locks. In short it should go like this

a) The locks for enforcing Referential integrity should be taken only when
the rarest of the events( that would cause the integrity failure) occur.
That would be the update of the referenced column. Other cases of update,
delete and insert should not be required to take locks. In this way, we can
reduce a lot of lock traffic.

So if we have a table like employee( empid, empname, ... depid references
dept(deptid)) and table dept(depid depname).

Currently we are taking shared locks on referenced rows in dept table,
whenever we are updating something in the employee table. This should not
happen. Instead any insert / update of referenced column / delete should
check for some lock in its PGPROC structure, which will only get created
when the depid gets updated / deleted( rare event )

b) But the operation of update of the referenced column will be made more
costly. May be it can create something like a predicate lock(used for
enforcing serializable) and keep it in all the PG_PROC structures.

I know this is a abstract idea, but just wanted to know, whether we have
thought on those lines.

Thanks,
Gokul.


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs
On Wed, Mar 7, 2012 at 9:24 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 I feel sad, that i followed this topic very late. But i still want to put
 forward my views.
 Have we thought on the lines of how Robert has implemented relation level
 locks. In short it should go like this

 a) The locks for enforcing Referential integrity should be taken only when
 the rarest of the events( that would cause the integrity failure) occur.
 That would be the update of the referenced column. Other cases of update,
 delete and insert should not be required to take locks. In this way, we can
 reduce a lot of lock traffic.

Insert, Update and Delete don't take locks they simply mark the tuples
they change with an xid. Anybody else wanting to wait on the lock
just waits on the xid. We do insert a lock row for each xid, but not
one per row changed.

 So if we have a table like employee( empid, empname, ... depid references
 dept(deptid)) and table dept(depid depname).

 Currently we are taking shared locks on referenced rows in dept table,
 whenever we are updating something in the employee table. This should not
 happen. Instead any insert / update of referenced column / delete should
 check for some lock in its PGPROC structure, which will only get created
 when the depid gets updated / deleted( rare event )

It's worked that way for 5 years, so its too late to modify it now and
this patch won't change that.

The way we do RI locking is designed to prevent holding that in memory
and then having the lock table overflow, which then either requires us
to revert to the current design or upgrade to table level locks to
save space in the lock table - which is a total disaster, if you've
ever worked with DB2.

What you're suggesting is that we store the locks in memory only as a
way of avoiding updating the row.

My understanding is we have two optimisation choices. A single set of
xids can be used in many places, since the same set of transactions
may do roughly the same thing.

1. We could assign a new mxactid every time we touch a new row. That
way there is no correspondence between sets of xids, and we may hold
the same set many times. OTOH since each set is unique we can expand
it easily and we don't need to touch each row once for each lock. That
saves on row touches but it also greatly increases the mxactid
creation rate, which causes cache scrolling.

2. We assign a new mxactid each time we create a new unique set of
rows. We have a separate cache for local sets. This way reduces the
mxactid creation rate but causes row updates each time we lock the
row, which then needs WAL.

(2) is how we currently handle the very difficult decision of how to
optimise this for the general case. I'm not sure that is right in all
cases, but it is at least scalable and it is the devil we know.

 b) But the operation of update of the referenced column will be made more
 costly. May be it can create something like a predicate lock(used for
 enforcing serializable) and keep it in all the PG_PROC structures.

No, updates of referenced columns are exactly the same as now when no
RI checks happening.

If the update occurs when an RI check takes place there is more work
to do, but previously it would have just blocked and done nothing. So
that path is relatively heavyweight but much better than nothing.


 I know this is a abstract idea, but just wanted to know, whether we have
 thought on those lines.

Thanks for your thoughts.

The most useful way to help with this patch right now is to run
performance investigations and see if there are non-optimal cases. We
can then see how the patch handles those. Theory is good, but it needs
to drive experimentation, as I myself re-discover continually.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram


 Insert, Update and Delete don't take locks they simply mark the tuples
 they change with an xid. Anybody else wanting to wait on the lock
 just waits on the xid. We do insert a lock row for each xid, but not
 one per row changed.

I mean the foreign key checks here. They take a Select for Share Lock
right. That's what we are trying to optimize here. Or am i missing
something? So by following the suggested methodology, the foreign key
checks won't take any locks.


 It's worked that way for 5 years, so its too late to modify it now and
 this patch won't change that.

 The way we do RI locking is designed to prevent holding that in memory
 and then having the lock table overflow, which then either requires us
 to revert to the current design or upgrade to table level locks to
 save space in the lock table - which is a total disaster, if you've
 ever worked with DB2.

 What you're suggesting is that we store the locks in memory only as a
 way of avoiding updating the row.

 But that memory would be consumed, only when someone updates the
referenced column( which will usually be the primary key of the referenced
table). Any normal database programmer knows that updating primary key is
not good for performance. So we go by the same logic.


 No, updates of referenced columns are exactly the same as now when no
 RI checks happening.

 If the update occurs when an RI check takes place there is more work
 to do, but previously it would have just blocked and done nothing. So
 that path is relatively heavyweight but much better than nothing.

 As i have already said, that path is definitely heavy weight( like how
Robert has made the DDL path heavy weight). If we assume that DDLs are
going to be a rare phenomenon, then we can also assume that update of
primary keys is a rare phenomenon in a normal database.



 The most useful way to help with this patch right now is to run
 performance investigations and see if there are non-optimal cases. We
 can then see how the patch handles those. Theory is good, but it needs
 to drive experimentation, as I myself re-discover continually.

 I understand. I just wanted to know, whether the developer considered that
line of thought.

Thanks,
Gokul.


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs
On Wed, Mar 7, 2012 at 10:18 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:

 Insert, Update and Delete don't take locks they simply mark the tuples
 they change with an xid. Anybody else wanting to wait on the lock
 just waits on the xid. We do insert a lock row for each xid, but not
 one per row changed.

 I mean the foreign key checks here. They take a Select for Share Lock right.
 That's what we are trying to optimize here. Or am i missing something? So by
 following the suggested methodology, the foreign key checks won't take any
 locks.

Please explain in detail your idea of how it will work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram


 Please explain in detail your idea of how it will work.


OK. I will try to explain the abstract idea, i have.
a) Referential integrity gets violated, when there are referencing key
values, not present in the referenced key values. We are maintaining the
integrity by taking a Select for Share Lock during the foreign key checks,
 so that referred value is not updated/deleted during the operation.

b) We can do the same in the reverse way. When there is a update/delete of
the referred value, we don't want any new inserts with the referred value
in referring table, any update that will update its value to the referred
value being updated/deleted. So we will take some kind of lock, which will
stop such a happening. This can be achieved through
i) predicate locking infrastructure already present (or)
ii) a temporary B-Tree index ( no WAL protection ), that gets created only
for the referred value updations and holds those values that are being
updated/deleted (if we are scared of predicate locking).

So whenever we de foreign key checks, we just need to make sure there is no
such referential integrity lock in our own PGPROC structure(if implemented
with predicate locking) /  check the temporary B-Tree index for any entry
matching the entry that we are going to insert/update to.( the empty tree
can be tracked with a flag to optimize )

May be someone can come up with better ideas than this.

Gokul.


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs
n Wed, Mar 7, 2012 at 11:37 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:

 Please explain in detail your idea of how it will work.

 So we will take some kind of lock, which will stop such a happening.
...
 May be someone can come up with better ideas than this.

With respect, I don't call this a detailed explanation of an idea. For
consideration here, come up with a very detailed design of how your
suggestion will work. Think about it carefully, spend hours and days
thinking it through and when you are personally sure it is better than
what we have now, please raise it on list at an appropriate time. Bear
in mind that most people throw away 90% of their ideas before even
mentioning them here. I hope that helps you to contribute.

At the moment we're trying to review patches for specific code to
include or exclude, not discuss huge redesign of internal mechanisms
using broad brush descriptions. It is possible you may find an
improvement and if you do, people will be interested but that seems an
unlikely thing to happen here and now.

If you have specific comments or tests on this patch those are very welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:

 It does however, illustrate my next review comment which is that the
 comments and README items are sorely lacking here. It's quite hard to
 see how it works, let along comment on major design decisions. It
 would help myself and others immensely if we could improve that.

Here's a first attempt at a README illustrating this.  I intend this to
be placed in src/backend/access/heap/README.tuplock; the first three
paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
those from there, directing people to this new file instead.  Is there
something that you think should be covered more extensively (or at all)
here?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Locking tuples
--

Because the shared-memory lock table is of finite size, but users could
reasonably want to lock large numbers of tuples, we do not rely on the
standard lock manager to store tuple-level locks over the long term.  Instead,
a tuple is marked as locked by setting the current transaction's XID as its
XMAX, and setting additional infomask bits to distinguish this usage from the
more normal case of having deleted the tuple.  When multiple transactions
concurrently lock a tuple, a MultiXact is used; see below.

When it is necessary to wait for a tuple-level lock to be released, the basic
delay is provided by XactLockTableWait or MultiXactIdWait on the contents of
the tuple's XMAX.  However, that mechanism will release all waiters
concurrently, so there would be a race condition as to which waiter gets the
tuple, potentially leading to indefinite starvation of some waiters.  The
possibility of share-locking makes the problem much worse --- a steady stream
of share-lockers can easily block an exclusive locker forever.  To provide
more reliable semantics about who gets a tuple-level lock first, we use the
standard lock manager.  The protocol for waiting for a tuple-level lock is
really

 LockTuple()
 XactLockTableWait()
 mark tuple as locked by me
 UnlockTuple()

When there are multiple waiters, arbitration of who is to get the lock next
is provided by LockTuple().  However, at most one tuple-level lock will
be held or awaited per backend at any time, so we don't risk overflow
of the lock table.  Note that incoming share-lockers are required to
do LockTuple as well, if there is any conflict, to ensure that they don't
starve out waiting exclusive-lockers.  However, if there is not any active
conflict for a tuple, we don't incur any extra overhead.

We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
super-exclusive locking (used to delete tuples and more generally to update
tuples modifying the values of the columns that make up the key of the tuple);
SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
UPDATE.  This last mode implements a mode just strong enough to implement RI
checks, i.e. it ensures that tuples do not go away from under a check, without
blocking when some other transaction that want to update the tuple without
changing its key.

The conflict table is:

KEY UPDATEUPDATESHAREKEY SHARE
KEY UPDATE   conflictconflict  conflict  conflict
UPDATE   conflictconflict  conflict
SHAREconflictconflict
KEY SHAREconflict

When there is a single locker in a tuple, we can just store the locking info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
setting hint bits specifying the locking strength.  There is one exception
here: since hint bit space is limited, we do not provide a separate hint bit
for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
presumably more commonly used due to being the standards-mandated locking
mechanism, or heavily used by the RI code, so we want to provide fast paths
for those.)

MultiXacts
--

A tuple header provides very limited space for storing information about tuple
locking and updates: there is room only for a single Xid and a small number of
hint bits.  Whenever we need to store more than one lock, we replace the first
locker's Xid with a new MultiXactId.  Each MultiXact provides extended locking
data; it comprises an array of Xids plus some flags bits for each one.  The
flags are currently used to store the locking strength of each member
transaction.  (The flags also distinguish a pure locker from an actual
updater.)

In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
locked in shared mode by multiple 

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs
On Mon, Mar 5, 2012 at 8:35 PM, Simon Riggs si...@2ndquadrant.com wrote:

 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

 Any page that contains at least one multi with an update as a member
 must persist.  It's possible that some pages contain no update (and this
 is even likely in some workloads, if updates are rare), but I'm not sure
 it's worth complicating the code to cater for early removal of some
 pages.

 If the multixact contains an xid and that is being persisted then you
 need to set an LSN to ensure that a page writes causes an XLogFlush()
 before the multixact write. And you need to set do_fsync, no? Or
 explain why not in comments...

 I was really thinking we could skip the fsync of a page if we've not
 persisted anything important on that page, since that was one of
 Robert's performance points.

We need to increase these values to 32 as well

 #define NUM_MXACTOFFSET_BUFFERS8
 #define NUM_MXACTMEMBER_BUFFERS16

using same logic as for clog.

We're using 25% more space and we already know clog benefits from
increasing them, so there's little doubt we need it here also, since
we are increasing the access rate and potentially the longevity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs
On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

So there are 4 lock types, but we only have room for 3 on the tuple
header, so we store the least common/deprecated of the 4 types as a
multixactid. Some rewording would help there.

Neat scheme!


My understanding is that all of theses workloads will change

* Users of explicit SHARE lockers will be slightly worse in the case
of the 1st locker, but then after that they'll be the same as before.

* Updates against an RI locked table will be dramatically faster
because of reduced lock waits


...and that these previous workloads are effectively unchanged:

* Stream of RI checks causes mxacts

* Multi row deadlocks still possible

* Queues of writers still wait in the same way

* Deletes don't cause mxacts unless by same transaction



 In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
 locked in shared mode by multiple transactions.  This is no longer the case; a
 MultiXact may contain an update or delete Xid.  (Keep in mind that tuple locks
 in a transaction do not conflict with other tuple locks in the same
 transaction, so it's possible to have otherwise conflicting locks in a
 MultiXact if they belong to the same transaction).

Somewhat confusing, but am getting there.

 Note that each lock is attributed to the subtransaction that acquires it.
 This means that a subtransaction that aborts is seen as though it releases the
 locks it acquired; concurrent transactions can then proceed without having to
 wait for the main transaction to finish.  It also means that a subtransaction
 can upgrade to a stronger lock level than an earlier transaction had, and if
 the subxact aborts, the earlier, weaker lock is kept.

OK

 The possibility of having an update within a MultiXact means that they must
 persist across crashes and restarts: a future reader of the tuple needs to
 figure out whether the update committed or aborted.  So we have a requirement
 that pg_multixact needs to retain pages of its data until we're certain that
 the MultiXacts in them are no longer of interest.

I think the no longer of interest aspect needs to be tracked more
closely because it will necessarily lead to more I/O.

If we store the LSN on each mxact page, as I think we need to, we can
get rid of pages more quickly if we know they don't have an LSN set.
So its possible we can optimise that more.

 VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.

You mean mxact segments?

Surely we set hint bits on tuples same as now? Hope so.

 This works in the same way that pg_clog segments are removed: we have a
 pg_class column that stores the earliest multixact that could possibly be
 stored in the table; the minimum of all such values is stored in a pg_database
 column.  VACUUM computes the minimum across all pg_database values, and
 removes pg_multixact segments older than the minimum.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas
Preliminary comment:

This README is very helpful.

On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

I feel like there is a naming problem here.  The semantics that have
always been associated with SELECT FOR UPDATE are now attached to
SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
 I think users will be surprised to find that SELECT FOR UPDATE
doesn't block all concurrent updates.

It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
UPDATE, and what you're calling SELECT FOR UPDATE should be called
something else - essentially NONKEY UPDATE, though I don't much like
that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:
 
 Preliminary comment:
 
 This README is very helpful.

Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.
 
 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
block all updates.  It continues to block SELECT FOR SHARE as well.
The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
since those didn't exist before, it doesn't seem correct to consider
that SELECT FOR UPDATE changed in any way.

The main difference in the UPDATE behavior is that an UPDATE is regarded
as though it might acquire two different lock modes -- it either
acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
consider that previous to this patch, what UPDATE did was always acquire
a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
weakened.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs
On Tue, Mar 6, 2012 at 9:10 PM, Robert Haas robertmh...@gmail.com wrote:
 Preliminary comment:

 This README is very helpful.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the 
 tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
 mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, 
 without
 blocking when some other transaction that want to update the tuple without
 changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
 UPDATE, and what you're calling SELECT FOR UPDATE should be called
 something else - essentially NONKEY UPDATE, though I don't much like
 that name.

No, because that would stop it from doing what it is designed to do.

The lock modes are correct, appropriate and IMHO have meaningful
names. No redesign required here.

Not sure about the naming of some of the flag bits however.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas
On Tue, Mar 6, 2012 at 4:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:

 Preliminary comment:

 This README is very helpful.

 Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement 
  RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
 block all updates.  It continues to block SELECT FOR SHARE as well.
 The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
 since those didn't exist before, it doesn't seem correct to consider
 that SELECT FOR UPDATE changed in any way.

 The main difference in the UPDATE behavior is that an UPDATE is regarded
 as though it might acquire two different lock modes -- it either
 acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
 UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
 consider that previous to this patch, what UPDATE did was always acquire
 a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
 weakened.

Ah, I see.  My mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs
On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

 That's probably true, although there is some deferred cost that is
 hard to account for.  You might not block immediately, but then later
 somebody might block either because the mxact SLRU now needs fsyncs or
 because they've got to decode an mxid long after the relevant segment
 has been evicted from the SLRU buffers.  In general, it's hard to
 bound that latter cost, because you only avoid blocking once (when the
 initial update happens) but you might pay the extra cost of decoding
 the mxid as many times as the row is read, which could be arbitrarily
 many.  How much of a problem that is in practice, I'm not completely
 sure, but it has worried me before and it still does.  In the worst
 case scenario, a handful of frequently-accessed rows with MXIDs all of
 whose members are dead except for the UPDATE they contain could result
 in continual SLRU cache-thrashing.

Cases I regularly see involve wait times of many seconds.

When this patch helps, it will help performance by algorithmic gains,
so perhaps x10-100.

That can and should be demonstrated though, I agree.

 From a performance standpoint, we really need to think not only about
 the cases where the patch wins, but also, and maybe more importantly,
 the cases where it loses.  There are some cases where the current
 mechanism, use SHARE locks for foreign keys, is adequate.  In
 particular, it's adequate whenever the parent table is not updated at
 all, or only very lightly.  I believe that those people will pay
 somewhat more with this patch, and especially in any case where
 backends end up waiting for fsyncs in order to create new mxids, but
 also just because I think this patch will have the effect of
 increasing the space consumed by each individual mxid, which imposes a
 distributed cost of its own.

That is a concern also.

It's taken me a while reviewing the patch to realise that space usage
is actually 4 times worse than before.

 I think we should avoid having a theoretical argument about how
 serious these problems are; instead, you should try to construct
 somewhat-realistic worst case scenarios and benchmark them.  Tom's
 complaint about code complexity is basically a question of opinion, so
 I don't know how to evaluate that objectively, but performance is
 something we can measure.  We might still disagree on the
 interpretation of the results, but I still think having some real
 numbers to talk about based on carefully-thought-out test cases would
 advance the debate.

It's a shame that the isolation tester can't be used directly by
pgbench - I think we need something similar for performance regression
testing.

So yes, performance testing is required.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
 
 On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:

  From a performance standpoint, we really need to think not only about
  the cases where the patch wins, but also, and maybe more importantly,
  the cases where it loses.  There are some cases where the current
  mechanism, use SHARE locks for foreign keys, is adequate.  In
  particular, it's adequate whenever the parent table is not updated at
  all, or only very lightly.  I believe that those people will pay
  somewhat more with this patch, and especially in any case where
  backends end up waiting for fsyncs in order to create new mxids, but
  also just because I think this patch will have the effect of
  increasing the space consumed by each individual mxid, which imposes a
  distributed cost of its own.
 
 That is a concern also.
 
 It's taken me a while reviewing the patch to realise that space usage
 is actually 4 times worse than before.

Eh.  You're probably misreading something.  Previously each member of a
multixact used 4 bytes (the size of an Xid).  With the current patch a
member uses 5 bytes (same plus a flags byte).  An earlier version used
4.25 bytes per multi, which I increased to leave space for future
expansion.

So it's 1.25x worse, not 4x worse.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs
On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:

 On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:

  From a performance standpoint, we really need to think not only about
  the cases where the patch wins, but also, and maybe more importantly,
  the cases where it loses.  There are some cases where the current
  mechanism, use SHARE locks for foreign keys, is adequate.  In
  particular, it's adequate whenever the parent table is not updated at
  all, or only very lightly.  I believe that those people will pay
  somewhat more with this patch, and especially in any case where
  backends end up waiting for fsyncs in order to create new mxids, but
  also just because I think this patch will have the effect of
  increasing the space consumed by each individual mxid, which imposes a
  distributed cost of its own.

 That is a concern also.

 It's taken me a while reviewing the patch to realise that space usage
 is actually 4 times worse than before.

 Eh.  You're probably misreading something.  Previously each member of a
 multixact used 4 bytes (the size of an Xid).  With the current patch a
 member uses 5 bytes (same plus a flags byte).  An earlier version used
 4.25 bytes per multi, which I increased to leave space for future
 expansion.

 So it's 1.25x worse, not 4x worse.

Thanks for correcting me. That sounds better.

It does however, illustrate my next review comment which is that the
comments and README items are sorely lacking here. It's quite hard to
see how it works, let along comment on major design decisions. It
would help myself and others immensely if we could improve that.

Is there a working copy on a git repo? Easier than waiting for next
versions of a patch.

My other comments so far are

* some permutations commented out - no comments as to why
Something of a fault with the isolation tester that it just shows
output, there's no way to record expected output in the spec

Comments required for these points

* Why do we need multixact to be persistent? Do we need every page of
multixact to be persistent, or just particular pages in certain
circumstances?

* Why do we need to expand multixact with flags? Can we avoid that in
some cases?

* Why do we need to store just single xids in multixact members?
Didn't understand comments, no explanation

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:
 On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 It does however, illustrate my next review comment which is that the
 comments and README items are sorely lacking here. It's quite hard to
 see how it works, let along comment on major design decisions. It
 would help myself and others immensely if we could improve that.

Hm.  Okay.

 Is there a working copy on a git repo? Easier than waiting for next
 versions of a patch.

No, I don't have an external mirror of my local repo.

 My other comments so far are
 
 * some permutations commented out - no comments as to why
 Something of a fault with the isolation tester that it just shows
 output, there's no way to record expected output in the spec

The reason they are commented out is that they are invalid, that is,
it requires running a command on a session that's blocked in the
previous command.  Obviously, that cannot happen in real life.

isolationtester now has support for detecting such conditions; if the
spec specifies running a command in a locked session, the permutation is
killed with an error message invalid permutation and just continues
with the next permutation.  It used to simply die, aborting the test.
Maybe we could just modify the specs so that all permutations are there
(this can be done by simply removing the permutation lines), and the
invalid permutation messages are part of the expected file.  Would
that be better?

 Comments required for these points
 
 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

Any page that contains at least one multi with an update as a member
must persist.  It's possible that some pages contain no update (and this
is even likely in some workloads, if updates are rare), but I'm not sure
it's worth complicating the code to cater for early removal of some
pages.

 * Why do we need to expand multixact with flags? Can we avoid that in
 some cases?

Did you read my blog post?
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
This explains the reason -- the point is that we need to distinguish the
lock strength acquired by each locker.

 * Why do we need to store just single xids in multixact members?
 Didn't understand comments, no explanation

This is just for SELECT FOR SHARE.  We don't have a hint bit to indicate
this tuple has a for-share lock, so we need to create a multi for it.
Since FOR SHARE is probably going to be very uncommon, this isn't likely
to be a problem.  We're mainly catering for users of SELECT FOR SHARE so
that it continues to work, i.e. maintain backwards compatibility.

(Maybe I misunderstood your question -- what I think you're asking is,
why are there some multixacts that have a single member?)

I'll try to come up with a good place to add some paragraphs about all
this.  Please let me know if answers here are unclear and/or you have
further questions.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs
On Mon, Mar 5, 2012 at 7:53 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 My other comments so far are

 * some permutations commented out - no comments as to why
 Something of a fault with the isolation tester that it just shows
 output, there's no way to record expected output in the spec

 The reason they are commented out is that they are invalid, that is,
 it requires running a command on a session that's blocked in the
 previous command.  Obviously, that cannot happen in real life.

 isolationtester now has support for detecting such conditions; if the
 spec specifies running a command in a locked session, the permutation is
 killed with an error message invalid permutation and just continues
 with the next permutation.  It used to simply die, aborting the test.
 Maybe we could just modify the specs so that all permutations are there
 (this can be done by simply removing the permutation lines), and the
 invalid permutation messages are part of the expected file.  Would
 that be better?

It would be better to have an isolation tester mode that checks to see
it was invalid and if not, report that.

At the moment we can't say why you commented something out. There's no
comment or explanation, and we need something, otherwise 3 years from
now we'll be completely in the dark.


 Comments required for these points

 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

 Any page that contains at least one multi with an update as a member
 must persist.  It's possible that some pages contain no update (and this
 is even likely in some workloads, if updates are rare), but I'm not sure
 it's worth complicating the code to cater for early removal of some
 pages.

If the multixact contains an xid and that is being persisted then you
need to set an LSN to ensure that a page writes causes an XLogFlush()
before the multixact write. And you need to set do_fsync, no? Or
explain why not in comments...

I was really thinking we could skip the fsync of a page if we've not
persisted anything important on that page, since that was one of
Robert's performance points.


 * Why do we need to expand multixact with flags? Can we avoid that in
 some cases?

 Did you read my blog post?
 http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
 This explains the reason -- the point is that we need to distinguish the
 lock strength acquired by each locker.

Thanks, I will, but it all belongs in a README please.


 * Why do we need to store just single xids in multixact members?
 Didn't understand comments, no explanation

 This is just for SELECT FOR SHARE.  We don't have a hint bit to indicate
 this tuple has a for-share lock, so we need to create a multi for it.
 Since FOR SHARE is probably going to be very uncommon, this isn't likely
 to be a problem.  We're mainly catering for users of SELECT FOR SHARE so
 that it continues to work, i.e. maintain backwards compatibility.

Good, thanks.

Are we actively recommending people use FOR KEY SHARE rather than FOR
SHARE, in explicit use?

 (Maybe I misunderstood your question -- what I think you're asking is,
 why are there some multixacts that have a single member?)

 I'll try to come up with a good place to add some paragraphs about all
 this.  Please let me know if answers here are unclear and/or you have
 further questions.

Thanks

I think we need to define some test workloads to measure the
performance impact of this patch. We need to be certain that it has a
good impact in target cases, plus a known impact in other cases.

Suggest

* basic pgbench - no RI

* inserts into large table, RI checks to small table, no activity on small table

* large table parent, large table: child
20 child rows per parent, fk from child to parent
updates of multiple children at same time
low/medium/heavy locking

* large table parent, large table: child
20 child rows per parent,fk from child to parent
updates of parent and child at same time
low/medium/heavy locking

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Heikki Linnakangas

On 23.02.2012 18:01, Alvaro Herrera wrote:


Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:


Alvaro Herreraalvhe...@commandprompt.com  writes:

Sure.  The problem is that we are allowing updated rows to be locked (and
locked rows to be updated).  This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.



(In the previous code, a multixact never meant an update, it always
signified only shared locks.  After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)


Ugh.  I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.


Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.


This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint.  The only reason multixacts
were tolerable at all was that they had only one semantics.  Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.


As far as complexity, yeah, it's a lot more complex now -- no question
about that.


How about assigning a new, real, transaction id, to represent the group 
of transaction ids. The new transaction id would be treated as a 
subtransaction of the updater, and the xids of the lockers would be 
stored in the multixact-members slru. That way the multixact structures 
wouldn't need to survive a crash; you don't care about the shared 
lockers after a crash, and the xid of the updater would be safely stored 
as is in the xmax field.


That way you wouldn't need to handle multixact wraparound, because we 
already handle xid wraparound, and you wouldn't need to make multixact 
slrus crash-safe.


Not sure what the performance implications would be. You would use up 
xids more quickly, which would require more frequent anti-wraparound 
vacuuming. And if we just start using real xids as the key to 
multixact-offsets slru, we would need to extend that a lot more often. 
But I feel it would probably be acceptable.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Noah Misch
On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
 On 23.02.2012 18:01, Alvaro Herrera wrote:
 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 How about assigning a new, real, transaction id, to represent the group  
 of transaction ids. The new transaction id would be treated as a  
 subtransaction of the updater, and the xids of the lockers would be  
 stored in the multixact-members slru. That way the multixact structures  
 wouldn't need to survive a crash; you don't care about the shared  
 lockers after a crash, and the xid of the updater would be safely stored  
 as is in the xmax field.

 That way you wouldn't need to handle multixact wraparound, because we  
 already handle xid wraparound, and you wouldn't need to make multixact  
 slrus crash-safe.

 Not sure what the performance implications would be. You would use up  
 xids more quickly, which would require more frequent anti-wraparound  
 vacuuming. And if we just start using real xids as the key to  
 multixact-offsets slru, we would need to extend that a lot more often.  
 But I feel it would probably be acceptable.

When a key locker arrives after the updater and creates this implicit
subtransaction of the updater, how might you arrange for the xid's clog status
to eventually get updated in accordance with the updater's outcome?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Simon Riggs
On Tue, Feb 28, 2012 at 12:28 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
 On 23.02.2012 18:01, Alvaro Herrera wrote:
 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 How about assigning a new, real, transaction id, to represent the group
 of transaction ids. The new transaction id would be treated as a
 subtransaction of the updater, and the xids of the lockers would be
 stored in the multixact-members slru. That way the multixact structures
 wouldn't need to survive a crash; you don't care about the shared
 lockers after a crash, and the xid of the updater would be safely stored
 as is in the xmax field.

 That way you wouldn't need to handle multixact wraparound, because we
 already handle xid wraparound, and you wouldn't need to make multixact
 slrus crash-safe.

 Not sure what the performance implications would be. You would use up
 xids more quickly, which would require more frequent anti-wraparound
 vacuuming. And if we just start using real xids as the key to
 multixact-offsets slru, we would need to extend that a lot more often.
 But I feel it would probably be acceptable.

 When a key locker arrives after the updater and creates this implicit
 subtransaction of the updater, how might you arrange for the xid's clog status
 to eventually get updated in accordance with the updater's outcome?

Somewhat off-topic, but just seen another bad case of FK lock contention.

Thanks for working on this everybody.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-26 Thread Robert Haas
On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

That's probably true, although there is some deferred cost that is
hard to account for.  You might not block immediately, but then later
somebody might block either because the mxact SLRU now needs fsyncs or
because they've got to decode an mxid long after the relevant segment
has been evicted from the SLRU buffers.  In general, it's hard to
bound that latter cost, because you only avoid blocking once (when the
initial update happens) but you might pay the extra cost of decoding
the mxid as many times as the row is read, which could be arbitrarily
many.  How much of a problem that is in practice, I'm not completely
sure, but it has worried me before and it still does.  In the worst
case scenario, a handful of frequently-accessed rows with MXIDs all of
whose members are dead except for the UPDATE they contain could result
in continual SLRU cache-thrashing.

From a performance standpoint, we really need to think not only about
the cases where the patch wins, but also, and maybe more importantly,
the cases where it loses.  There are some cases where the current
mechanism, use SHARE locks for foreign keys, is adequate.  In
particular, it's adequate whenever the parent table is not updated at
all, or only very lightly.  I believe that those people will pay
somewhat more with this patch, and especially in any case where
backends end up waiting for fsyncs in order to create new mxids, but
also just because I think this patch will have the effect of
increasing the space consumed by each individual mxid, which imposes a
distributed cost of its own.

I think we should avoid having a theoretical argument about how
serious these problems are; instead, you should try to construct
somewhat-realistic worst case scenarios and benchmark them.  Tom's
complaint about code complexity is basically a question of opinion, so
I don't know how to evaluate that objectively, but performance is
something we can measure.  We might still disagree on the
interpretation of the results, but I still think having some real
numbers to talk about based on carefully-thought-out test cases would
advance the debate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-25 Thread Kevin Grittner
Vik Reykja vikrey...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.govwrote:
 
 One of the problems that Florian was trying to address is that
 people often have a need to enforce something with a lot of
 similarity to a foreign key, but with more subtle logic than
 declarative foreign keys support.  One example would be the case
 Robert has used in some presentations, where the manager column
 in each row in a project table must contain the id of a row in a
 person table *which has the project_manager boolean column set to
 TRUE*.  Short of using the new serializable transaction isolation
 level in all related transactions, hand-coding enforcement of
 this useful invariant through trigger code (or application code
 enforced through some framework) is very tricky.  The change to
 SELECT FOR UPDATE that Florian was working on would make it
 pretty straightforward.
 
 I'm not sure what Florian's patch does, but I've been trying to
 advocate syntax like the following for this exact scenario:
 
 foreign key (manager_id, true) references person (id, is_manager)
 
 Basically, allow us to use constants instead of field names as
 part of foreign keys.
 
Interesting.  IMV, a declarative approach like that is almost always
better than the alternatives, so something like this (possibly with
different syntax) would be another step in the right direction.  I
suspect that there will always be a few corner cases where the
business logic required is too esoteric to be handled by a
generalized declarative construct, so I think Florian's idea still
has merit -- especially if we want to ease the transition to
PostgreSQL for large shops using other products.
 
 I have no idea what the implementation aspect of this is,
 but I need the user aspect of it and don't know the best way to
 get it.
 
There are those in the community who make their livings by helping
people get the features they want.  If you have some money to fund
development, I would bet you could get this addressed -- it sure
sounds reasonable to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Jeroen Vermeulen

On 2012-02-23 22:12, Noah Misch wrote:


That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.


Ah, so there's the technical hitch.  From previous discussion I was 
under the impression that:


1. Foreign-key updates only inherently conflict with _key_ updates on 
the foreign side, and that non-key updates on the foreign side were just 
caught in the locking cross-fire, so to speak.


And

2. The DELETE case was somehow trivially accounted for.  But, for 
instance, there does not seem to be a lighter lock type that DELETE 
conflicts with but UPDATE does not.  Bummer.




By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.


No, to be clear, it was never meant to be.  Only a possible way to give 
users a way out of foreign-key locks more quickly.  Not a way to get 
some of the branch out to users more quickly.


At any rate, that seems to be moot then.  And to be honest, mechanisms 
designed for more than one purpose rarely pan out.


Thanks for explaining!


Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Vik Reykja
On Thu, Feb 23, 2012 at 19:44, Kevin Grittner
kevin.gritt...@wicourts.govwrote:

 One of the problems that Florian was trying to address is that
 people often have a need to enforce something with a lot of
 similarity to a foreign key, but with more subtle logic than
 declarative foreign keys support.  One example would be the case
 Robert has used in some presentations, where the manager column in
 each row in a project table must contain the id of a row in a person
 table *which has the project_manager boolean column set to TRUE*.
 Short of using the new serializable transaction isolation level in
 all related transactions, hand-coding enforcement of this useful
 invariant through trigger code (or application code enforced through
 some framework) is very tricky.  The change to SELECT FOR UPDATE
 that Florian was working on would make it pretty straightforward.


I'm not sure what Florian's patch does, but I've been trying to advocate
syntax like the following for this exact scenario:

foreign key (manager_id, true) references person (id, is_manager)

Basically, allow us to use constants instead of field names as part of
foreign keys.  I have no idea what the implementation aspect of this is,
but I need the user aspect of it and don't know the best way to get it.


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:

 All in all, I think this is in pretty much final shape.  Only pg_upgrade
 bits are still missing.  If sharp eyes could give this a critical look
 and knuckle-cracking testers could give it a spin, that would be
 helpful.

 Lack of pg_upgrade support leaves this version incomplete, because that
 omission would constitute a blocker for beta 2.  This version changes as much
 code compared to the version I reviewed at the beginning of the CommitFest as
 that version changed overall.  In that light, it's time to close the books on
 this patch for the purpose of this CommitFest; I'm marking it Returned with
 Feedback.  Thanks for your efforts thus far.

My view would be that with 90 files touched this is a very large
patch, so that alone makes me wonder whether we should commit this
patch, so I agree with Noah and compliment him on an excellent
detailed review.

However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Jeroen Vermeulen

On 2012-02-23 10:18, Simon Riggs wrote:


However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.


The suggested immutable-column constraint was meant as a potential 
80/20 workaround.  Definitely not a full solution, helpful to some, 
probably easier to do.  I don't know if an immutable key would actually 
be enough to elide foreign-key locks though.


Simon, I think you had a reason why it couldn't work, but I didn't quite 
get your meaning and didn't want to distract things further at that 
stage.  You wrote that it doesn't do what KEY LOCKS are designed to 
do...  any chance you might recall what the problem was?


I don't mean to be pushy about my pet idea, and heaven knows I don't 
have time to implement it, but it'd be good to know whether I should put 
the whole thought to rest.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

 Making pg_multixact persistent across clean shutdowns is no bridge to cross
 lightly, since it means committing to an on-disk format for an indefinite
 period.  We should do it; the benefits of this patch justify it, and I haven't
 identified a way to avoid it without incurring worse problems.

I can't actually see anything in the patch that explains why this is
required. (That is something we should reject more patches on, since
it creates a higher maintenance burden).

Can someone explain? We might think of a way to avoid that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 1:08 PM, Jeroen Vermeulen j...@xs4all.nl wrote:

 Simon, I think you had a reason why it couldn't work, but I didn't quite get
 your meaning and didn't want to distract things further at that stage.  You
 wrote that it doesn't do what KEY LOCKS are designed to do...  any chance
 you might recall what the problem was?

The IMMUTABLE idea would work, but it requires all users to recode
their apps. By the time they've done that we'll have probably fixed
the problem in full anyway, so then we have to ask them to stop again,
which is hard so we'll be stuck with a performance tweak that applies
to just one release. So its the fully automatic solution we're looking
for. I don't object to someone implementing IMMUTABLE, I'm just saying
its not a way to get this patch simpler and therefore acceptable.

If people are willing to recode apps to avoid this then hire me and
I'll tell you how ;-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:
 
  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.
 
 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).
 
 Can someone explain? We might think of a way to avoid that.

Sure.  The problem is that we are allowing updated rows to be locked (and
locked rows to be updated).  This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks.  After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

This means that this extended Xmax info needs to be able to survive, so
that it's possible to retrieve it after a crash; because even if the
lockers are all gone, the updater might have committed and this means
the tuple is dead.  If we failed to keep this, the tuple would be
considered live which would be wrong because the other version of the
tuple, which was created by the update, is also live.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.

 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).

 Can someone explain? We might think of a way to avoid that.

 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

 This means that this extended Xmax info needs to be able to survive, so
 that it's possible to retrieve it after a crash; because even if the
 lockers are all gone, the updater might have committed and this means
 the tuple is dead.  If we failed to keep this, the tuple would be
 considered live which would be wrong because the other version of the
 tuple, which was created by the update, is also live.

OK, thanks.

So why do we need pg_upgrade support?

If pg_multixact is not persistent now, surely there is no requirement
to have pg_upgrade do any form of upgrade? The only time we'll need to
do this is from 9.2 to 9.3, which can of course occur any time in next
year. That doesn't sound like a reason to block a patch now, because
of something that will be needed a year from now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

Ugh.  I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.  This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint.  The only reason multixacts
were tolerable at all was that they had only one semantics.  Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
 
 On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:
 
  All in all, I think this is in pretty much final shape.  Only pg_upgrade
  bits are still missing.  If sharp eyes could give this a critical look
  and knuckle-cracking testers could give it a spin, that would be
  helpful.
 
  Lack of pg_upgrade support leaves this version incomplete, because that
  omission would constitute a blocker for beta 2.  This version changes as 
  much
  code compared to the version I reviewed at the beginning of the CommitFest 
  as
  that version changed overall.  In that light, it's time to close the books 
  on
  this patch for the purpose of this CommitFest; I'm marking it Returned with
  Feedback.  Thanks for your efforts thus far.

Now this is an interesting turn of events.  I must thank you for your
extensive review effort in the current version of the patch, and also
thank you and credit you for the idea that initially kicked this patch
from the older, smaller, simpler version I wrote during the 9.1 timeline
(which you also reviewed exhaustively).  Without your and Simon's
brilliant ideas, this patch wouldn't exist at all.

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.

As for the late code churn, it mostly happened as a result of your
own feedback; I would have left most of it in the original state, but as
I went ahead it seemed much better to refactor things.  This is mostly
in heapam.c.  As for multixact.c, it also had a lot of churn, but that
was mostly to restore it to the state it has in the master branch,
dropping much of the code I had written to handle multixact truncation.
The new code there and in the vacuum code path (relminmxid and so on) is
a lot smaller than that other code was, and it's closely based on
relfrozenxid which is a known piece of technology.

 My view would be that with 90 files touched this is a very large
 patch, so that alone makes me wonder whether we should commit this
 patch, so I agree with Noah and compliment him on an excellent
 detailed review.

I note, however, that the bulk of the patch is in three files --
multixact.c, tqual.c, heapam.c, as is clearly illustrated in the diff
stats I posted.  The rest of them are touched mostly to follow their new
APIs (and of course to add tests and docs).

To summarize, of 94 files touched in total:
* 22 files are in src/test/isolation/
  (new and updated tests and expected files)
* 19 files are in src/include/
* 10 files are in contrib/
* 39 files are in src/backend;
 * in that subdir, there are 3097 insertions and 1006 deletions
 * 3047 (83%) of which are in heapam.c multixact.c tqual.c
 * one is a README

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.
 
 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.
 
 So please let's not focus on the 100%, lets focus on 80/20.

Well, we have the patch I originally posted in the 9.1 timeframe.
That's a lot smaller and simpler.  However, that solves only part of the
blocking problem, and in particular it doesn't fix the initial deadlock
reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
you wonder about his change of email address) that started this effort
in the first place.  I don't think we can cut down to that and still
satisfy the users that requested this; and Glue was just the first one,
because after I started blogging about this, some more people started
asking for it.

I don't think there's any useful middlepoint between that one and the
current one, but maybe I'm wrong.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue feb 23 12:12:13 -0300 2012:
 On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Sure.  The problem is that we are allowing updated rows to be locked (and
  locked rows to be updated).  This means that we need to store extended
  Xmax information in tuples that goes beyond mere locks, which is what we
  were doing previously -- they may now have locks and updates simultaneously.

 OK, thanks.
 
 So why do we need pg_upgrade support?

Two reasons.  One is that in upgrades from a version that contains this
patch to another version that also contains this patch (i.e. future
upgrades), we need to copy the multixact files from the old cluster to
the new.

The other is that in upgrades from a version that doesn't contain this
patch to a version that does, we need to set the multixact limit values
so that values that were used in the old cluster are returned as empty
values (keeping the old semantics); otherwise they would cause errors
trying to read the member Xids from disk.

 If pg_multixact is not persistent now, surely there is no requirement
 to have pg_upgrade do any form of upgrade? The only time we'll need to
 do this is from 9.2 to 9.3, which can of course occur any time in next
 year. That doesn't sound like a reason to block a patch now, because
 of something that will be needed a year from now.

I think there's a policy that we must allow upgrades from one beta to
the next, which is why Noah says this is a blocker starting from beta2.

The pg_upgrade code for this is rather simple however.  There's no
rocket science there.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Sure.  The problem is that we are allowing updated rows to be locked (and
  locked rows to be updated).  This means that we need to store extended
  Xmax information in tuples that goes beyond mere locks, which is what we
  were doing previously -- they may now have locks and updates simultaneously.
 
  (In the previous code, a multixact never meant an update, it always
  signified only shared locks.  After a crash, all backends that could
  have been holding locks must necessarily be gone, so the multixact info
  is not interesting and can be treated like the tuple is simply live.)
 
 Ugh.  I had not been paying attention to what you were doing in this
 patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

Regarding performance, the good thing about this patch is that if you
have an operation that used to block, it might now not block.  So maybe
multixact-related operation is a bit slower than before, but if it
allows you to continue operating rather than sit waiting until some
other transaction releases you, it's much better.

As for sanity -- I regard multixacts as a way to store extended Xmax
information.  The original idea was obviously much more limited in that
the extended info was just locking info.  We've extended it but I don't
think it's such a stretch.

I have been posting about (most? all of?) the ideas that I've been
following to make this work at all, so that people had plenty of chances
to disagree with them -- and Noah and others did disagree with many of
them, so I changed the patch accordingly.  I'm not closed to further
rework, but I'm not going to entirely abandon the idea too lightly.

I'm sure there are bugs too, but hopefully there are as shallow as
interested reviewer eyeballs there are.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 As for sanity -- I regard multixacts as a way to store extended
 Xmax information.  The original idea was obviously much more
 limited in that the extended info was just locking info.  We've
 extended it but I don't think it's such a stretch.
 
Since the limitation on what can be stored in xmax was the killer
for Florian's attempt to support SELECT FOR UPDATE in a form which
was arguably more useful (and certainly more convenient for those
converting from other database products), I'm wondering whether
anyone has determined whether this new scheme would allow Florian's
work to be successfully completed.  The issues seem very similar. 
If this approach also provides a basis for the other work, I think
it helps bolster the argument that this is a good design; if not, I
think it suggests that maybe it should be made more general or
extensible in some way.  Once this has to be supported by pg_upgrade
it will be harder to change the format, if that is needed for some
other feature.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of jue feb 23 13:31:36 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com wrote:
  
  As for sanity -- I regard multixacts as a way to store extended
  Xmax information.  The original idea was obviously much more
  limited in that the extended info was just locking info.  We've
  extended it but I don't think it's such a stretch.
  
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form which
 was arguably more useful (and certainly more convenient for those
 converting from other database products), I'm wondering whether
 anyone has determined whether this new scheme would allow Florian's
 work to be successfully completed.  The issues seem very similar. 
 If this approach also provides a basis for the other work, I think
 it helps bolster the argument that this is a good design; if not, I
 think it suggests that maybe it should be made more general or
 extensible in some way.  Once this has to be supported by pg_upgrade
 it will be harder to change the format, if that is needed for some
 other feature.

I have no idea what improvements Florian was seeking, but multixacts now
have plenty of bit flag space to indicate whatever we want for each
member transaction, so most likely the answer is yes.  However we need
to make clear that a single SELECT FOR UPDATE in a tuple does not
currently use a multixact; if we wish to always store flags then we are
forced to use one which incurs a performance hit.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith

On 02/23/2012 10:43 AM, Alvaro Herrera wrote:

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.


This feature made Robert's list of serious CF concerns, too, and the 
idea that majorly revised patches might be punted isn't a new one.  Noah 
is certainly justified in saying you're off his community support list, 
after all the review work he's been doing for this CF.


We here think it would be a shame for all of these other performance 
bits to be sorted but still have this one loose though, if it's possible 
to keep going on it.  It's well known as something on Simon's peeve list 
for some time now.  I was just reading someone else ranting about how 
this foreign key locking issue proves Postgres isn't enterprise scale 
yesterday, it was part of an article proving why DB2 is worth paying for 
I think.  This change crosses over into the advocacy area due to that, 
albeit only for the people who have been burned by this already.


If the main problem is pg_upgrade complexity, eventually progress on 
that front needs to be made.  I'm surprised the project has survived 
this long without needing anything beyond catalog conversion for 
in-place upgrade.  That luck won't hold forever.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Greg Smith's message of jue feb 23 14:48:13 -0300 2012:
 On 02/23/2012 10:43 AM, Alvaro Herrera wrote:
  I completely understand that you don't want to review this latest
  version of the patch; it's a lot of effort and I wouldn't inflict it on
  anybody who hasn't not volunteered.  However, it doesn't seem to me that
  this is reason to boot the patch from the commitfest.  I think the thing
  to do would be to remove yourself from the reviewers column and set it
  back to needs review, so that other reviewers can pick it up.
 
 This feature made Robert's list of serious CF concerns, too, and the 
 idea that majorly revised patches might be punted isn't a new one.

Well, this patch (or rather, a previous incarnation of it) got punted
from 9.1's fourth commitfest; I intended to have the new version in
9.2's first CF, but business reasons (which I will not discuss in
public) forced me otherwise.  So here we are again -- as I said to Tom,
I don't intend to let go of this one easily, though of course I will
concede to whatever the community decides.

 Noah 
 is certainly justified in saying you're off his community support list, 
 after all the review work he's been doing for this CF.

Yeah, I can't blame him.  I've been trying to focus most of my review
availability on his own patches precisely due to that, but it's very
clear to me that his effort is larger than mine.

 We here think it would be a shame for all of these other performance 
 bits to be sorted but still have this one loose though, if it's possible 
 to keep going on it.  It's well known as something on Simon's peeve list 
 for some time now.  I was just reading someone else ranting about how 
 this foreign key locking issue proves Postgres isn't enterprise scale 
 yesterday, it was part of an article proving why DB2 is worth paying for 
 I think.  This change crosses over into the advocacy area due to that, 
 albeit only for the people who have been burned by this already.

Yeah, Simon's been on this particular issue for quite some time -- which
is probably why the initial idea that kickstarted this patch was his.
Personally I've been in the not enterprise strength camp for a long
time, mostly unintentionally; you can see that by tracing how my major
patches close holes in that kind of area (cluster loses indexes, we
don't have subtransactions, foreign key concurrency sucks (-- SELECT
FOR SHARE), manual vacuum is teh sux0r, and now this one about FKs
again).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith

On 02/23/2012 01:04 PM, Alvaro Herrera wrote:

manual vacuum is teh sux0r


I think you've just named my next conference talk submission.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Simon Riggs
On Thu, Feb 23, 2012 at 4:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

As far as complexity goes, would it be easier if we treated the UPDATE
of a primary key column as a DELETE plus an INSERT?

There's not really a logical reason why updating a primary key has
meaning, so allowing an ExecPlanQual to follow the chain across
primary key values doesn't seem valid to me.

That would make all primary keys IMMUTABLE to updates.

No primary key, no problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Kevin Grittner's message:
 
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form
 which was arguably more useful (and certainly more convenient for
 those converting from other database products), I'm wondering
 whether anyone has determined whether this new scheme would allow
 Florian's work to be successfully completed.  The issues seem
 very similar. If this approach also provides a basis for the
 other work, I think it helps bolster the argument that this is a
 good design; if not, I think it suggests that maybe it should be
 made more general or extensible in some way.  Once this has to be
 supported by pg_upgrade it will be harder to change the format,
 if that is needed for some other feature.
 
 I have no idea what improvements Florian was seeking, but
 multixacts now have plenty of bit flag space to indicate whatever
 we want for each member transaction, so most likely the answer is
 yes.  However we need to make clear that a single SELECT FOR
 UPDATE in a tuple does not currently use a multixact; if we wish
 to always store flags then we are forced to use one which incurs a
 performance hit.
 
Well, his effort really started to go into a tailspin on the related
issues here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01743.php
 
... with a summary of the problem and possible directions for a
solution here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01833.php
 
One of the problems that Florian was trying to address is that
people often have a need to enforce something with a lot of
similarity to a foreign key, but with more subtle logic than
declarative foreign keys support.  One example would be the case
Robert has used in some presentations, where the manager column in
each row in a project table must contain the id of a row in a person
table *which has the project_manager boolean column set to TRUE*. 
Short of using the new serializable transaction isolation level in
all related transactions, hand-coding enforcement of this useful
invariant through trigger code (or application code enforced through
some framework) is very tricky.  The change to SELECT FOR UPDATE
that Florian was working on would make it pretty straightforward.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 02:08:28PM +0100, Jeroen Vermeulen wrote:
 On 2012-02-23 10:18, Simon Riggs wrote:

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.

 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.

 So please let's not focus on the 100%, lets focus on 80/20.

 The suggested immutable-column constraint was meant as a potential  
 80/20 workaround.  Definitely not a full solution, helpful to some,  
 probably easier to do.  I don't know if an immutable key would actually  
 be enough to elide foreign-key locks though.

That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.

If you had both an immutable column constraint and a never-deleted table
constraint, that combination would be sufficient to simplify the picture.
(Directly or indirectly, it would not actually be a never-deleted constraint
so much as a you must take AccessExclusiveLock to DELETE constraint.)
Foreign-side DML would then take an AccessShareLock on the parent table with
no tuple lock at all.

By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Alvaro Herrera

Excerpts from Noah Misch's message of mié feb 22 14:00:07 -0300 2012:
 
 On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:

 On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
  On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
   * Columns that are part of the key
 Noah thinks the set of columns should only consider those actually 
   referenced
 by keys, not those that *could* be referenced.
  
  Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
  the
  UPDATE-time overhead of this patch to FK-referenced tables rather than all
  tables having a PK or PK-like index (often just all tables).
 
 You have not answered my question above.

Sorry.  The reason I didn't research this is that at the very start of
the discussion it was said that having heapam.c figure out whether
columns are being used as FK destinations or not would be more of a
modularity violation than indexed columns already are for HOT support
(this was a contentious issue for HOT, so I don't take it lightly.  I
don't think I need any more reasons for Tom to object to this patch, or
more bulk into it.  Both are already serious issues.)

In any case, with the way we've defined FOR KEY SHARE locks (in the docs
it's explicitely said that the set of columns considered could vary in
the future), it's a relatively easy patch to add on top of what I've
submitted.  Just as the ALTER TABLE bits to add columns to the set of
columns considered, it could be left for a second pass on the issue.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 06:36:42PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Noah Misch's message of mi?? feb 22 14:00:07 -0300 2012:
  
  On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:
 
  On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
   On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
* Columns that are part of the key
  Noah thinks the set of columns should only consider those actually 
referenced
  by keys, not those that *could* be referenced.
   
   Well, do you disagree?  To me it's low-hanging fruit, because it isolates 
   the
   UPDATE-time overhead of this patch to FK-referenced tables rather than all
   tables having a PK or PK-like index (often just all tables).
  
  You have not answered my question above.
 
 Sorry.  The reason I didn't research this is that at the very start of
 the discussion it was said that having heapam.c figure out whether
 columns are being used as FK destinations or not would be more of a
 modularity violation than indexed columns already are for HOT support
 (this was a contentious issue for HOT, so I don't take it lightly.  I
 don't think I need any more reasons for Tom to object to this patch, or
 more bulk into it.  Both are already serious issues.)

That's fair.

 In any case, with the way we've defined FOR KEY SHARE locks (in the docs
 it's explicitely said that the set of columns considered could vary in
 the future), it's a relatively easy patch to add on top of what I've
 submitted.  Just as the ALTER TABLE bits to add columns to the set of
 columns considered, it could be left for a second pass on the issue.

Agreed.  Let's have that debate another day, as a follow-on patch.

Thanks for shedding this light.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch
On Thu, Feb 23, 2012 at 12:43:11PM -0300, Alvaro Herrera wrote:
 Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
  On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:
  
   All in all, I think this is in pretty much final shape. ??Only pg_upgrade
   bits are still missing. ??If sharp eyes could give this a critical look
   and knuckle-cracking testers could give it a spin, that would be
   helpful.
  
   Lack of pg_upgrade support leaves this version incomplete, because that
   omission would constitute a blocker for beta 2. ??This version changes as 
   much
   code compared to the version I reviewed at the beginning of the 
   CommitFest as
   that version changed overall. ??In that light, it's time to close the 
   books on
   this patch for the purpose of this CommitFest; I'm marking it Returned 
   with
   Feedback. ??Thanks for your efforts thus far.
 
 Now this is an interesting turn of events.  I must thank you for your
 extensive review effort in the current version of the patch, and also
 thank you and credit you for the idea that initially kicked this patch
 from the older, smaller, simpler version I wrote during the 9.1 timeline
 (which you also reviewed exhaustively).  Without your and Simon's
 brilliant ideas, this patch wouldn't exist at all.
 
 I completely understand that you don't want to review this latest
 version of the patch; it's a lot of effort and I wouldn't inflict it on
 anybody who hasn't not volunteered.  However, it doesn't seem to me that
 this is reason to boot the patch from the commitfest.  I think the thing
 to do would be to remove yourself from the reviewers column and set it
 back to needs review, so that other reviewers can pick it up.

It would indeed be wrong to change any patch from Needs Review to Returned
with Feedback on account of a personal distaste for reviewing the patch.  I
hope I did not harbor such a motive here.  Rather, this CommitFest has given
your patch its fair shake, and I and other reviewers would better serve the
CF's needs by reviewing, say, ECPG FETCH readahead instead of your latest
submission.  Likewise, you would better serve the CF by evaluating one of the
four non-committer patches that have been Ready for Committer since January.
That's not to imply that the goals of the CF align with my goals, your goals,
or broader PGDG goals.  The patch status on commitfest.postgresql.org does
exist solely for the advancement of the CF, and I have set it in accordingly.

 As for the late code churn, it mostly happened as a result of your
 own feedback; I would have left most of it in the original state, but as
 I went ahead it seemed much better to refactor things.  This is mostly
 in heapam.c.  As for multixact.c, it also had a lot of churn, but that
 was mostly to restore it to the state it has in the master branch,
 dropping much of the code I had written to handle multixact truncation.
 The new code there and in the vacuum code path (relminmxid and so on) is
 a lot smaller than that other code was, and it's closely based on
 relfrozenxid which is a known piece of technology.

I appreciate that.

  However, review of such a large patch should not be simply pass or
  fail. We should be looking back at the original problem and ask
  ourselves whether some subset of the patch could solve a useful subset
  of the problem. For me, that seems quite likely and this is very
  definitely an important patch.

Incidentally, I find it harmful to think of Returned with Feedback as
fail.  For large patches, it's healthier to think of a CF as a bimonthly
project status meeting with stakeholders.  When the project is done,
wonderful!  When there's work left, that's no great surprise.

  Even if we can't solve some part of the problem we can at least commit
  some useful parts of infrastructure to allow later work to happen more
  smoothly and quickly.
  
  So please let's not focus on the 100%, lets focus on 80/20.
 
 Well, we have the patch I originally posted in the 9.1 timeframe.
 That's a lot smaller and simpler.  However, that solves only part of the
 blocking problem, and in particular it doesn't fix the initial deadlock
 reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
 you wonder about his change of email address) that started this effort
 in the first place.  I don't think we can cut down to that and still
 satisfy the users that requested this; and Glue was just the first one,
 because after I started blogging about this, some more people started
 asking for it.
 
 I don't think there's any useful middlepoint between that one and the
 current one, but maybe I'm wrong.

Nothing additional comes to my mind, either.  This patch is monolithic.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-22 Thread Noah Misch
On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote:
 Okay, so this patch fixes the truncation and wraparound issues through a
 mechanism much like pg_clog's: it keeps track of the oldest possibly
 existing multis on each and every table, and then during tuple freezing
 those are removed.  I also took the liberty to make the code remove
 multis altogether (i.e. resetting the IS_MULTI hint bit) when only the
 update remains and lockers are all gone.
 
 I also cleaned up the code in heapam so that there's a couple of tables
 mapping MultiXactStatus to LockTupleMode and back, and to heavyweight
 lock modes (the older patches used functions to do this, which was
 pretty ugly).  I had to add a little helper function to lock.c to make
 this work.  I made a rather large bunch of other minor changes to close
 minor bugs here and there.
 
 Docs have been added, as have new tests for the isolation harness, which
 I've ensured pass in both read committed and serializable modes; WAL
 logging for locking updated versions of a tuple, when an old one is
 locked due to an old snapshot, was also added; there's plenty of room
 for growth in the MultiXact flag bits; the bit that made tables with no
 keys lock the entire row all the time was removed; multiple places in
 code comments were cleaned up that referred to this feature as FOR KEY
 LOCK and ensured that it also mentions FOR KEY UPDATE; the pg_rowlocks,
 pageinspect, pg_controldata, pg_resetxlog utilities have been updated.

All of the above sounds great.  I especially like the growing test coverage.

 All in all, I think this is in pretty much final shape.  Only pg_upgrade
 bits are still missing.  If sharp eyes could give this a critical look
 and knuckle-cracking testers could give it a spin, that would be
 helpful.

Lack of pg_upgrade support leaves this version incomplete, because that
omission would constitute a blocker for beta 2.  This version changes as much
code compared to the version I reviewed at the beginning of the CommitFest as
that version changed overall.  In that light, it's time to close the books on
this patch for the purpose of this CommitFest; I'm marking it Returned with
Feedback.  Thanks for your efforts thus far.

On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote:
 On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
  * Columns that are part of the key
Noah thinks the set of columns should only consider those actually 
  referenced
by keys, not those that *could* be referenced.
 
 Well, do you disagree?  To me it's low-hanging fruit, because it isolates the
 UPDATE-time overhead of this patch to FK-referenced tables rather than all
 tables having a PK or PK-like index (often just all tables).

You have not answered my question above.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-01 Thread Jim Nasby
On Jan 31, 2012, at 10:58 AM, Alvaro Herrera wrote:
 I think it's butt-ugly, but it's only slightly uglier than
 relfrozenxid which we're already stuck with.  The slight amount of
 additional ugliness is that you're going to use an XID column to store
 a uint4 that is not an XID - but I don't have a great idea how to fix
 that.  You could mislabel it as an OID or a (signed) int4, but I'm not
 sure that either of those is any better.  We could also create an mxid
 data type, but that seems like it might be overkill.
 
 Well, we're already storing a multixact in Xmax, so it's not like we
 don't assume that we can store multis in space normally reserved for
 Xids.  What I've been wondering is not how ugly it is, but rather of the
 fact that we're bloating pg_class some more.

FWIW, users have been known to request uint datatypes; so if this really is a 
uint perhaps we should just create a uint datatype...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-01 Thread Alvaro Herrera

Excerpts from Jim Nasby's message of mié feb 01 21:33:47 -0300 2012:
 
 On Jan 31, 2012, at 10:58 AM, Alvaro Herrera wrote:
  I think it's butt-ugly, but it's only slightly uglier than
  relfrozenxid which we're already stuck with.  The slight amount of
  additional ugliness is that you're going to use an XID column to store
  a uint4 that is not an XID - but I don't have a great idea how to fix
  that.  You could mislabel it as an OID or a (signed) int4, but I'm not
  sure that either of those is any better.  We could also create an mxid
  data type, but that seems like it might be overkill.
  
  Well, we're already storing a multixact in Xmax, so it's not like we
  don't assume that we can store multis in space normally reserved for
  Xids.  What I've been wondering is not how ugly it is, but rather of the
  fact that we're bloating pg_class some more.
 
 FWIW, users have been known to request uint datatypes; so if this really is a 
 uint perhaps we should just create a uint datatype...

Yeah.  This is just for internal consumption, though, not a full-blown
datatype.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-31 Thread Robert Haas
On Mon, Jan 30, 2012 at 6:48 PM, Noah Misch n...@leadboat.com wrote:
 On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
 The biggest item remaining is the point you raised about multixactid
 wraparound.  This is closely related to multixact truncation and the way
 checkpoints are to be handled.  If we think that MultiXactId wraparound
 is possible, and we need to involve autovacuum to keep it at bay, then I

 To prove it possible, we need prove there exists some sequence of operations
 consuming N xids and M  N multixactids.  Have N transactions key-lock N-1
 rows apiece, then have each of them key-lock one of the rows locked by each
 other transaction.  This consumes N xids and N(N-1) multixactids.  I believe
 you could construct a workload with N! multixact usage, too.

 Existence proofs are one thing, real workloads another.  My unsubstantiated
 guess is that multixactid use will not overtake xid use in bulk on a workload
 not specifically tailored to do so.  So, I think it's enough to notice it,
 refuse to assign a new multixactid, and tell the user to clear the condition
 with a VACUUM FREEZE of all databases.  Other opinions would indeed be useful.

I suspect you are right that it is unlikely, but OTOH that sounds like
an extremely painful recovery procedure.  We probably don't need to
put a ton of thought into handling this case as efficiently as
possible, but I think we would do well to avoid situations that could
lead to, basically, a full-cluster shutdown.  If that happens to one
of my customers I expect to lose the customer.

I have a couple of other concerns about this patch:

1. I think it's probably fair to assume that this is going to be a
huge win in cases where it avoids deadlocks or lock waits.  But is
there a worst case where we don't avoid that but still add a lot of
extra multi-xact lookups?  What's the worst case we can imagine and
how pathological does the workload have to be to tickle that case?

2. What algorithm did we end up using do fix the set of key columns,
and is there any user configuration that can or needs to happen there?
 Do we handle cleanly the case where the set of key columns is changed
by DDL?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-31 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012:

 I suspect you are right that it is unlikely, but OTOH that sounds like
 an extremely painful recovery procedure.  We probably don't need to
 put a ton of thought into handling this case as efficiently as
 possible, but I think we would do well to avoid situations that could
 lead to, basically, a full-cluster shutdown.  If that happens to one
 of my customers I expect to lose the customer.

Okay, so the worst case here is really bad and we should do something
about it.  Are you okay with a new pg_class column of type xid?  The
advantage is not only that we would be able to track it with high
precision; we would also get rid of a lot of code in which I have little
confidence.

 I have a couple of other concerns about this patch:
 
 1. I think it's probably fair to assume that this is going to be a
 huge win in cases where it avoids deadlocks or lock waits.  But is
 there a worst case where we don't avoid that but still add a lot of
 extra multi-xact lookups?  What's the worst case we can imagine and
 how pathological does the workload have to be to tickle that case?

Hm.  I haven't really thought about this.  There are some code paths
that now have to resolve Multixacts that previously did not; things like
vacuum.  I don't think there's any case in which we previously did not
block and now block, but there might be things that got slower without
blocking.  One thing that definitely got slower is use of SELECT FOR
SHARE.  (This command previously used hint bits to mark the row as
locked; now it is always going to create a multixact).  However, I
expect that with foreign keys switching to FOR KEY SHARE, the use of FOR
SHARE is going to decline, maybe disappear completely, so it shouldn't
be a problem.

 2. What algorithm did we end up using do fix the set of key columns,
 and is there any user configuration that can or needs to happen there?

Currently we just use all columns indexed by unique indexes (excluding
expressional and partial ones).  Furthermore we consider key column
all columns in a table without unique indexes.  Noah disagrees with this
choice; he says we should drop this last point, and that we should relax
the first to columns actually used by foreign key constraints.  I
expect that this is a rather simple change.

Currently there's nothing that the user can do to add more columns to
the set considered (other than creating more unique indexes, of course).
We discussed having an ALTER TABLE command to do it, but this isn't seen
as essential.

  Do we handle cleanly the case where the set of key columns is changed
 by DDL?

Hmm, I remember thinking about this at some point, but now I'm not 100%
sure.  I think it doesn't matter due to multis being so ephemeral.  Let
me try and figure it out.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-31 Thread Robert Haas
On Tue, Jan 31, 2012 at 9:19 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012:
 I suspect you are right that it is unlikely, but OTOH that sounds like
 an extremely painful recovery procedure.  We probably don't need to
 put a ton of thought into handling this case as efficiently as
 possible, but I think we would do well to avoid situations that could
 lead to, basically, a full-cluster shutdown.  If that happens to one
 of my customers I expect to lose the customer.

 Okay, so the worst case here is really bad and we should do something
 about it.  Are you okay with a new pg_class column of type xid?  The
 advantage is not only that we would be able to track it with high
 precision; we would also get rid of a lot of code in which I have little
 confidence.

I think it's butt-ugly, but it's only slightly uglier than
relfrozenxid which we're already stuck with.  The slight amount of
additional ugliness is that you're going to use an XID column to store
a uint4 that is not an XID - but I don't have a great idea how to fix
that.  You could mislabel it as an OID or a (signed) int4, but I'm not
sure that either of those is any better.  We could also create an mxid
data type, but that seems like it might be overkill.

 I have a couple of other concerns about this patch:

 1. I think it's probably fair to assume that this is going to be a
 huge win in cases where it avoids deadlocks or lock waits.  But is
 there a worst case where we don't avoid that but still add a lot of
 extra multi-xact lookups?  What's the worst case we can imagine and
 how pathological does the workload have to be to tickle that case?

 Hm.  I haven't really thought about this.  There are some code paths
 that now have to resolve Multixacts that previously did not; things like
 vacuum.  I don't think there's any case in which we previously did not
 block and now block, but there might be things that got slower without
 blocking.  One thing that definitely got slower is use of SELECT FOR
 SHARE.  (This command previously used hint bits to mark the row as
 locked; now it is always going to create a multixact).  However, I
 expect that with foreign keys switching to FOR KEY SHARE, the use of FOR
 SHARE is going to decline, maybe disappear completely, so it shouldn't
 be a problem.

What about SELECT FOR UPDATE?  That's a pretty common case, I think.
If that's now going to force a multixact to get created and
additionally force multixact lookups when the row is subsequently
examined, that seems, well, actually pretty scary at first glance.
SELECT FOR UPDATE is fairly expensive as it stands, and is commonly
used.

 2. What algorithm did we end up using do fix the set of key columns,
 and is there any user configuration that can or needs to happen there?

 Currently we just use all columns indexed by unique indexes (excluding
 expressional and partial ones).  Furthermore we consider key column
 all columns in a table without unique indexes.  Noah disagrees with this
 choice; he says we should drop this last point, and that we should relax
 the first to columns actually used by foreign key constraints.  I
 expect that this is a rather simple change.

Why the special case for tables without unique indexes?  Like Noah, I
don't see the point.  Unless there's some trade-off I'm not seeing, we
should want the number of key columns to be as minimal as possible, so
that as many updates as possible can use the cheap path that doesn't
involve locking the whole tuple.

  Do we handle cleanly the case where the set of key columns is changed
 by DDL?

 Hmm, I remember thinking about this at some point, but now I'm not 100%
 sure.  I think it doesn't matter due to multis being so ephemeral.  Let
 me try and figure it out.

I thought part of the point here was that multixacts aren't so
ephemeral any more: they're going to stick around until the table gets
frozen.  I'm worried that's going to turn out to be a problem somehow.
 With respect to this particular issue, what I'm worried about is
something like this:

1. Transaction A begins.
2. Transaction B begins and does some updating or locking of table T,
and then commits.
3. Transaction C begins and does DDL on table T, acquiring
AccessExclusiveLock while it does so, and changes the set of key
columns.  It then commits.
4A.Transaction A now accesses table T
and/or
4B. Transaction D begins and accesses table T.

At step 4, A and/or D have an up-to-date relcache entries that
correctly describes the current set of key columns in T.  But the work
done by transaction B was done with a different set of key columns
(could be more or less), and A and/or D mustn't get confused on that
basis.  Also, in the case of A, there is the further possibility that
A's snapshot can't see B as committed yet (even though C subsequently
held an AccessExclusiveLock on the table).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise 

Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-31 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar ene 31 13:18:30 -0300 2012:
 
 On Tue, Jan 31, 2012 at 9:19 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012:
  I suspect you are right that it is unlikely, but OTOH that sounds like
  an extremely painful recovery procedure.  We probably don't need to
  put a ton of thought into handling this case as efficiently as
  possible, but I think we would do well to avoid situations that could
  lead to, basically, a full-cluster shutdown.  If that happens to one
  of my customers I expect to lose the customer.
 
  Okay, so the worst case here is really bad and we should do something
  about it.  Are you okay with a new pg_class column of type xid?  The
  advantage is not only that we would be able to track it with high
  precision; we would also get rid of a lot of code in which I have little
  confidence.
 
 I think it's butt-ugly, but it's only slightly uglier than
 relfrozenxid which we're already stuck with.  The slight amount of
 additional ugliness is that you're going to use an XID column to store
 a uint4 that is not an XID - but I don't have a great idea how to fix
 that.  You could mislabel it as an OID or a (signed) int4, but I'm not
 sure that either of those is any better.  We could also create an mxid
 data type, but that seems like it might be overkill.

Well, we're already storing a multixact in Xmax, so it's not like we
don't assume that we can store multis in space normally reserved for
Xids.  What I've been wondering is not how ugly it is, but rather of the
fact that we're bloating pg_class some more.

  1. I think it's probably fair to assume that this is going to be a
  huge win in cases where it avoids deadlocks or lock waits.  But is
  there a worst case where we don't avoid that but still add a lot of
  extra multi-xact lookups?  What's the worst case we can imagine and
  how pathological does the workload have to be to tickle that case?
 
  Hm.  I haven't really thought about this.  There are some code paths
  that now have to resolve Multixacts that previously did not; things like
  vacuum.  I don't think there's any case in which we previously did not
  block and now block, but there might be things that got slower without
  blocking.  One thing that definitely got slower is use of SELECT FOR
  SHARE.  (This command previously used hint bits to mark the row as
  locked; now it is always going to create a multixact).  However, I
  expect that with foreign keys switching to FOR KEY SHARE, the use of FOR
  SHARE is going to decline, maybe disappear completely, so it shouldn't
  be a problem.
 
 What about SELECT FOR UPDATE?  That's a pretty common case, I think.
 If that's now going to force a multixact to get created and
 additionally force multixact lookups when the row is subsequently
 examined, that seems, well, actually pretty scary at first glance.
 SELECT FOR UPDATE is fairly expensive as it stands, and is commonly
 used.

SELECT FOR UPDATE is still going to work without a multi in the simple
cases.  The case where it's different is when somebody else grabs a KEY
SHARE lock on the same tuple; it's now going to get a multi, where it
previously blocked.  So other transactions later checking the tuple will
have a bit of a larger cost.  That's okay considering that it meant
the other transaction did not have to wait anymore.

  2. What algorithm did we end up using do fix the set of key columns,
  and is there any user configuration that can or needs to happen there?
 
  Currently we just use all columns indexed by unique indexes (excluding
  expressional and partial ones).  Furthermore we consider key column
  all columns in a table without unique indexes.  Noah disagrees with this
  choice; he says we should drop this last point, and that we should relax
  the first to columns actually used by foreign key constraints.  I
  expect that this is a rather simple change.
 
 Why the special case for tables without unique indexes?  Like Noah, I
 don't see the point.  Unless there's some trade-off I'm not seeing, we
 should want the number of key columns to be as minimal as possible, so
 that as many updates as possible can use the cheap path that doesn't
 involve locking the whole tuple.

No trade-off.  I just thought it was safer: my thought was that if
there's no nominated key column, the safer bet was that any of them
could have been.  But then, in reality there cannot be any foreign key
here anyway.  I'll revert that bit.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-31 Thread Robert Haas
On Tue, Jan 31, 2012 at 11:58 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Well, we're already storing a multixact in Xmax, so it's not like we
 don't assume that we can store multis in space normally reserved for
 Xids.  What I've been wondering is not how ugly it is, but rather of the
 fact that we're bloating pg_class some more.

I don't think another 4 bytes in pg_class is that big a deal.  We
don't do relcache rebuilds frequently enough for that to really matter
much.  The bigger cost of this patch seems to me to be that we're
going to have to carry around multi-xact IDs for a long time, and
probably fsync and/or WAL-log them moreso than now.  I'm not sure how
much you've worried about that, but a new column in pg_class seems
like line noise by comparison.

 What about SELECT FOR UPDATE?  That's a pretty common case, I think.
 If that's now going to force a multixact to get created and
 additionally force multixact lookups when the row is subsequently
 examined, that seems, well, actually pretty scary at first glance.
 SELECT FOR UPDATE is fairly expensive as it stands, and is commonly
 used.

 SELECT FOR UPDATE is still going to work without a multi in the simple
 cases.  The case where it's different is when somebody else grabs a KEY
 SHARE lock on the same tuple; it's now going to get a multi, where it
 previously blocked.  So other transactions later checking the tuple will
 have a bit of a larger cost.  That's okay considering that it meant
 the other transaction did not have to wait anymore.

OK.  I assume that the different treatment of SELECT FOR SHARE is due
to lack of bit space?

  2. What algorithm did we end up using do fix the set of key columns,
  and is there any user configuration that can or needs to happen there?
 
  Currently we just use all columns indexed by unique indexes (excluding
  expressional and partial ones).  Furthermore we consider key column
  all columns in a table without unique indexes.  Noah disagrees with this
  choice; he says we should drop this last point, and that we should relax
  the first to columns actually used by foreign key constraints.  I
  expect that this is a rather simple change.

 Why the special case for tables without unique indexes?  Like Noah, I
 don't see the point.  Unless there's some trade-off I'm not seeing, we
 should want the number of key columns to be as minimal as possible, so
 that as many updates as possible can use the cheap path that doesn't
 involve locking the whole tuple.

 No trade-off.  I just thought it was safer: my thought was that if
 there's no nominated key column, the safer bet was that any of them
 could have been.  But then, in reality there cannot be any foreign key
 here anyway.  I'll revert that bit.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-30 Thread Noah Misch
On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
 The biggest item remaining is the point you raised about multixactid
 wraparound.  This is closely related to multixact truncation and the way
 checkpoints are to be handled.  If we think that MultiXactId wraparound
 is possible, and we need to involve autovacuum to keep it at bay, then I

To prove it possible, we need prove there exists some sequence of operations
consuming N xids and M  N multixactids.  Have N transactions key-lock N-1
rows apiece, then have each of them key-lock one of the rows locked by each
other transaction.  This consumes N xids and N(N-1) multixactids.  I believe
you could construct a workload with N! multixact usage, too.

Existence proofs are one thing, real workloads another.  My unsubstantiated
guess is that multixactid use will not overtake xid use in bulk on a workload
not specifically tailored to do so.  So, I think it's enough to notice it,
refuse to assign a new multixactid, and tell the user to clear the condition
with a VACUUM FREEZE of all databases.  Other opinions would indeed be useful.

 think the only way to make that work is to add another column to
 pg_class so that each table's oldest multixact is tracked, same as we do
 with relfrozenxid for Xids.  If we do that, I think we can do away with
 most of the MultiXactTruncate junk I added -- things would become a lot
 simpler.  The cost would be bloating pg_class a bit more.  Are we okay
 with paying that cost?  I asked this question some months ago and I
 decided that I would not add the column, but I am starting to lean the
 other way.  I would like some opinions on this.

That's not the only way; autovacuum could just accelerate normal freezing to
advance the multixactid horizon indirectly.  Your proposal could make it far
more efficient, though.

 You asked two questions about WAL-logging locks: one was about the level
 of detail we log for each lock we grab; the other was about
 heap_xlog_update logging the right info.  AFAICS, the main thing that
 makes detailed WAL logging necessary is hot standbys.  That is, the
 standby must have all the locking info so that concurrent transactions
 are similarly locked as in the master ... or am I wrong in that?  (ISTM
 this means I need to fix heap_xlog_update so that it faithfully
 registers the lock info we're storing, not just the current Xid).

Standby servers do not need accurate tuple locks, because it's not possible to
wait on a tuple lock during recovery.  (By the way, the question about log
detail was just from my own curiosity.  We don't especially need an answer to
move forward with this patch, unless you want one.)

 * Columns that are part of the key
   Noah thinks the set of columns should only consider those actually 
 referenced
   by keys, not those that *could* be referenced.

Well, do you disagree?  To me it's low-hanging fruit, because it isolates the
UPDATE-time overhead of this patch to FK-referenced tables rather than all
tables having a PK or PK-like index (often just all tables).

   Also, in a table without columns, are all columns part of the key, or is the
   key the empty set?  I changed HeapSatisfiesHOTUpdate but that seems 
 arbitrary.

It does seem arbitrary.  What led you to switch in a later version?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-26 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of mar ene 24 15:47:16 -0300 2012:

 Need more code changes for the following:

 * export FOR KEY UPDATE lock mode in SQL

While doing this, I realized that there's an open item here regarding a
transaction that locks a tuple, and then in an aborted savepoint deletes
it.  As things stand, what happens is that the original tuple lock is
forgotten entirely, which was one of the things I wanted to fix (and in
fact is fixed for all other cases AFAICS).  So what we need is to be
able to store a MultiXactId that includes a member for KeyUpdate locks,
which will represent an UPDATE that touches key columns as well as
DELETEs.  That closes the hole.  However, the problem with this is that
we have no more bits left in the flag bitmask, which is another concern
you had raised.  I chose the easy way out and added a full byte of flags
per transaction.

This means that we now have 1636 xacts per members page rather than
1900+, but I'm not too concerned about that.  (We could cut back to 4
flag bits per xact -- but then, having some room for future growth is
probably a good plan anyway).

So DELETEs can also create multis.  I'm going to submit an updated patch
shortly.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-18 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of mar ene 17 03:21:28 -0300 2012:
 
 On 16.01.2012 21:52, Alvaro Herrera wrote:
 
  Excerpts from Heikki Linnakangas's message of lun ene 16 16:17:42 -0300 
  2012:
 
  On 15.01.2012 06:49, Alvaro Herrera wrote:
  - pg_upgrade bits are missing.
 
  I guess we'll need to rewrite pg_multixact contents in pg_upgrade. Is
  the page format backwards-compatible?
 
  It's not.
 
  I haven't worked out what pg_upgrade needs to do, honestly.  I think we
  should just not copy old pg_multixact files when upgrading across this
  patch.
 
 Sorry, I meant whether the *data* page format is backwards-compatible? 
 the multixact page format clearly isn't.

It's supposed to be, yes.  The HEAP_XMAX_IS_NOT_UPDATE bit (now renamed)
was chosen so that it'd take the place of the old HEAP_XMAX_SHARE_LOCK
bit, so any pages with that bit set should continue to work similarly.
The other infomask bits I used were previously unused.

   I was initially thinking that pg_multixact should return the
  empty set if requested members of a multi that preceded the freeze
  point.  That way, I thought, we would just never try to access a page
  originated in the older version (assuming the freeze point is set to
  current whenever pg_upgrade runs).  However, as things currently
  stand, accessing an old multi raises an error.  So maybe we need a
  scheme a bit more complex to handle this.
 
 Hmm, could we create new multixact files filled with zeros, covering the 
 range that was valid in the old cluster?

Hm, we could do something like that I guess.  I'm not sure that just
zeroes is the right pattern, but it should be something simple.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-18 Thread Noah Misch
On Wed, Jan 18, 2012 at 05:18:31PM -0300, Alvaro Herrera wrote:
 Excerpts from Heikki Linnakangas's message of mar ene 17 03:21:28 -0300 2012:
  On 16.01.2012 21:52, Alvaro Herrera wrote:
I was initially thinking that pg_multixact should return the
   empty set if requested members of a multi that preceded the freeze
   point.  That way, I thought, we would just never try to access a page
   originated in the older version (assuming the freeze point is set to
   current whenever pg_upgrade runs).  However, as things currently
   stand, accessing an old multi raises an error.  So maybe we need a
   scheme a bit more complex to handle this.
  
  Hmm, could we create new multixact files filled with zeros, covering the 
  range that was valid in the old cluster?
 
 Hm, we could do something like that I guess.  I'm not sure that just
 zeroes is the right pattern, but it should be something simple.

PostgreSQL 9.1 can have all ~4B MultiXactId on disk at any given time.

We could silently ignore the lookup miss when HEAP_XMAX_LOCK_ONLY is also set.
That makes existing data files acceptable while still catching data loss
scenarios going forward.  (It's tempting to be stricter when we know the
cluster data files originated in PostgreSQL 9.2+, but I'm not sure whether
that's worth its weight.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-17 Thread Noah Misch
On Sun, Jan 15, 2012 at 01:49:54AM -0300, Alvaro Herrera wrote:
 - I'm not sure that the multixact truncation code is sane on
 checkpoints.  It might be that I need to tweak more the pg_control info
 we keep about truncation.  The whole truncation thing needs more
 testing, too.

My largest outstanding concern involves the possibility of MultiXactId
wraparound.  From my last review:

This raises a notable formal hazard: it's possible to burn through the
MultiXactId space faster than the regular TransactionId space.  We 
could get
into a situation where pg_clog is covering 2B xids, and yet we need 4B
MultiXactId to cover that period.  We had better at least notice this 
and
halt, if not have autovacuum actively prevent it.

(That should have been 2B rather than 4B, since MultiXactId uses the same
2B-in-past, 2B-in-future behavior as regular xids.)

Are we willing to guess that this will never happen and make recovery
minimally possible?  If so, we could have GetNewMultiXactId() grow defenses
similar to GetNewTransactionId() and leave it at that.  If not, we need to
involve autovacuum.


The other remaining high-level thing is to have key_attrs contain only columns
actually referenced by FKs.

 - Go over Noah's two reviews again and see if I missed anything; also
 make sure I haven't missed anything from other reviewers.

There are some, yes.

 *** a/src/backend/access/heap/heapam.c
 --- b/src/backend/access/heap/heapam.c

 ***
 *** 2773,2783  l2:
   }
   else if (result == HeapTupleBeingUpdated  wait)
   {
 ! TransactionId xwait;
   uint16  infomask;
   
   /* must copy state data before unlocking buffer */
 ! xwait = HeapTupleHeaderGetXmax(oldtup.t_data);
   infomask = oldtup.t_data-t_infomask;
   
   LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 --- 2848,2871 
   }
   else if (result == HeapTupleBeingUpdated  wait)
   {
 ! TransactionId   xwait;
   uint16  infomask;
 + boolnone_remain = false;

Nothing can ever set this variable to anything different.  It seems that
keep_xact == InvalidTransactionId substitutes well enough, though.


   /*
* We may overwrite if previous xmax aborted, or if it 
 committed but
 !  * only locked the tuple without updating it, or if we are 
 going to
 !  * keep it around in Xmax.
*/

The second possibility is just a subset of the third.

 ! if (TransactionIdIsValid(keep_xmax) ||
 ! none_remain ||
 ! (oldtup.t_data-t_infomask  HEAP_XMAX_INVALID))
   result = HeapTupleMayBeUpdated;
   else
   result = HeapTupleUpdated;

 ***
 *** 3314,3323  heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
*/
   static bool
   HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs,
 !HeapTuple oldtup, HeapTuple newtup)
   {
   int attrnum;
   
   while ((attrnum = bms_first_member(hot_attrs)) = 0)
   {
   /* Adjust for system attributes */
 --- 3537,3549 
*/
   static bool
   HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs,
 !HeapTuple oldtup, HeapTuple newtup, 
 bool empty_okay)
   {
   int attrnum;
   
 + if (!empty_okay  bms_is_empty(hot_attrs))
 + return false;

When a table contains no key attributes, it seems arbitrary whether we call
the key revoked or not.  What is the motivation for this change?

 ! /*
 !  * If we're requesting KeyShare, and there's no update present, 
 we
 !  * don't need to wait.  Even if there is an update, we can still
 !  * continue if the key hasn't been modified.
 !  *
 !  * However, if there are updates, we need to walk the update 
 chain
 !  * to mark future versions of the row as locked, too.  That 
 way, if
 !  * somebody deletes that future version, we're protected 
 against the
 !  * key going away.  This locking of future versions could block
 !  * momentarily, if a concurrent transaction is deleting a key; 
 or it
 !  * could return a value to the effect that the transaction 
 deleting the
 !  * key has already committed.  So we do this before re-locking 
 the
 !  * buffer; otherwise this would be prone to deadlocks.  Note 
 that the TID
 !  * we're locking was grabbed before we unlocked the buffer.  
 For it to
 !  * change while we're not looking, the other properties we're 
 testing
 !  * for below after re-locking the buffer would also change, in 
 which
 !  

  1   2   >