subject:"Re\: \[HACKERS\] foreign key locks, 2nd attempt"

On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:

 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.

So because committed does not equal all visible there will be
additional lookups on mxids? That's complete rubbish.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 2:15 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch n...@leadboat.com wrote:
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

 I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
 syntax additions.  We could even avoid doing that by not documenting them.  A
 later major release could implement them using a completely different
 mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
 UPDATE.  To be sure, let's still do a good job the first time.

 What I mean is really that, once the release is out, we don't get to
 take it back.  Sure, the next release can fix things, but any
 regressions will become obstacles to upgrading and pain points for new
 users.

This comment is completely superfluous. It's a complete waste of time
to turn up on a thread and remind people that if they commit something
and it doesn't actually work that it would be a bad thing. Why, we
might ask do you think that thought needs to be expressed here?
Please, don't answer, lets spend the time on actually reviewing the
patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?

 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other 
 multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

 Hmm, yeah.

 In a way, the fact that we don't do that is kind of fortuitous in this
 situation.  I had just assumed that we were not fsyncing it because
 there seems to be no reason to do so.  But since we are, we already
 know that the fsyncs resulting from frequent mxid allocation aren't a
 huge pain point.  If they were, somebody would have presumably
 complained about it and fixed it before now.  So that means that what
 we're really worrying about here is the overhead of fsyncing a little
 more often, which is a lot less scary than starting to do it when we
 weren't previously.

Good

 Now, we could look at this as an opportunity to optimize the existing
 implementation by removing the fsyncs, rather than adding the new
 infrastructure Alvaro is proposing.

This is not an exercise in tuning mxact code. There is a serious
algorithmic problem that is causing real world problems.

Removing the fsync will *not* provide a solution to the problem, so
there is no opportunity here.

 But that would only make sense if
 we thought that getting rid of the fsyncs would be more valuable than
 avoiding the blocking here, and I don't.

You're right that the existing code could use some optimisation.

I'm a little tired, but I can't see a reason to fsync this except at checkpoint.

Also seeing that we issue 2 WAL records for each RI check. We issue
one during MultiXactIdCreate/MultiXactIdExpand and then immediately
afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
that each thinks it is doing it for the same reason and is the only
place its being done. Alvaro, any ideas why that is.


 I still think that someone needs to do some benchmarking here, because
 this is a big complicated performance patch, and we can't predict the
 impact of it on real-world scenarios without testing.  There is
 clearly some additional overhead, and it makes sense to measure it and
 hopefully discover that it isn't excessive.  Still, I'm a bit
 relieved.

Very much agreed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 As things stand today

Can I confirm where we are now? Is there another version of the patch
coming out soon?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
 On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:

  But that would only make sense if
  we thought that getting rid of the fsyncs would be more valuable than
  avoiding the blocking here, and I don't.
 
 You're right that the existing code could use some optimisation.
 
 I'm a little tired, but I can't see a reason to fsync this except at 
 checkpoint.

Hang on.  What fsyncs are we talking about?  I don't see that the
multixact code calls any fsync except that checkpoint and shutdown.

 Also seeing that we issue 2 WAL records for each RI check. We issue
 one during MultiXactIdCreate/MultiXactIdExpand and then immediately
 afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
 that each thinks it is doing it for the same reason and is the only
 place its being done. Alvaro, any ideas why that is.

AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
being locked by a multixact -- it doesn't record the contents (member
xids) of said multixact, which is what the other log entry records.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 18:46:44 -0300 2012:
 
 On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  As things stand today
 
 Can I confirm where we are now? Is there another version of the patch
 coming out soon?

Yes, another version is coming soon.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
 On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:

  But that would only make sense if
  we thought that getting rid of the fsyncs would be more valuable than
  avoiding the blocking here, and I don't.

 You're right that the existing code could use some optimisation.

 I'm a little tired, but I can't see a reason to fsync this except at 
 checkpoint.

 Hang on.  What fsyncs are we talking about?  I don't see that the
 multixact code calls any fsync except that checkpoint and shutdown.

If a dirty page is evicted it will fsync.

 Also seeing that we issue 2 WAL records for each RI check. We issue
 one during MultiXactIdCreate/MultiXactIdExpand and then immediately
 afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
 that each thinks it is doing it for the same reason and is the only
 place its being done. Alvaro, any ideas why that is.

 AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
 being locked by a multixact -- it doesn't record the contents (member
 xids) of said multixact, which is what the other log entry records.

Agreed. But issuing two records when we could issue just one seems a
little strange, especially when the two record types follow one
another so closely - so we end up queuing for the lock twice while
holding the lock on the data block.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:
 
 On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
  On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote:
 
   But that would only make sense if
   we thought that getting rid of the fsyncs would be more valuable than
   avoiding the blocking here, and I don't.
 
  You're right that the existing code could use some optimisation.
 
  I'm a little tired, but I can't see a reason to fsync this except at 
  checkpoint.
 
  Hang on.  What fsyncs are we talking about?  I don't see that the
  multixact code calls any fsync except that checkpoint and shutdown.
 
 If a dirty page is evicted it will fsync.

Ah, right.

  Also seeing that we issue 2 WAL records for each RI check. We issue
  one during MultiXactIdCreate/MultiXactIdExpand and then immediately
  afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
  that each thinks it is doing it for the same reason and is the only
  place its being done. Alvaro, any ideas why that is.
 
  AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
  being locked by a multixact -- it doesn't record the contents (member
  xids) of said multixact, which is what the other log entry records.
 
 Agreed. But issuing two records when we could issue just one seems a
 little strange, especially when the two record types follow one
 another so closely - so we end up queuing for the lock twice while
 holding the lock on the data block.

Hmm, that seems optimization that could be done separately.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Mar 15, 2012 at 10:13 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012:

 On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012:
  On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas robertmh...@gmail.com 
  wrote:
 
   But that would only make sense if
   we thought that getting rid of the fsyncs would be more valuable than
   avoiding the blocking here, and I don't.
 
  You're right that the existing code could use some optimisation.
 
  I'm a little tired, but I can't see a reason to fsync this except at 
  checkpoint.
 
  Hang on.  What fsyncs are we talking about?  I don't see that the
  multixact code calls any fsync except that checkpoint and shutdown.

 If a dirty page is evicted it will fsync.

 Ah, right.

  Also seeing that we issue 2 WAL records for each RI check. We issue
  one during MultiXactIdCreate/MultiXactIdExpand and then immediately
  afterwards issue a XLOG_HEAP_LOCK record. The comments on both show
  that each thinks it is doing it for the same reason and is the only
  place its being done. Alvaro, any ideas why that is.
 
  AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is
  being locked by a multixact -- it doesn't record the contents (member
  xids) of said multixact, which is what the other log entry records.

 Agreed. But issuing two records when we could issue just one seems a
 little strange, especially when the two record types follow one
 another so closely - so we end up queuing for the lock twice while
 holding the lock on the data block.

 Hmm, that seems optimization that could be done separately.

Oh yes, definitely not something for you to add to the main patch.

Just some additional tuning to alleviate Robert's concerns.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Robert Haas

On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.

 So because committed does not equal all visible there will be
 additional lookups on mxids? That's complete rubbish.

Noah seemed to be implying that once the updating transaction
committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
 But I think that's not true, because anyone who looks at the tuple
afterward will still need to know the exact xmax, to test it against
their snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Robert Haas's message of jue mar 15 21:37:36 -0300 2012:
 
 On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
  mxid
  lookups, so I think something more sophisticated is needed to exercise 
  that
  cost.  Not sure what.
 
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.
 
  So because committed does not equal all visible there will be
  additional lookups on mxids? That's complete rubbish.
 
 Noah seemed to be implying that once the updating transaction
 committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
  But I think that's not true, because anyone who looks at the tuple
 afterward will still need to know the exact xmax, to test it against
 their snapshot.

Yeah, we don't set HEAP_XMAX_COMMITTED on multis, even when there's an
update in it and it committed.  I think we could handle it, at least
some of the cases, but that'd require careful re-examination of all the
tqual.c code, which is not something I want to do right now.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Noah Misch

On Thu, Mar 15, 2012 at 08:37:36PM -0400, Robert Haas wrote:
 On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas robertmh...@gmail.com wrote:
  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
  mxid
  lookups, so I think something more sophisticated is needed to exercise 
  that
  cost. ?Not sure what.
 
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.
 
  So because committed does not equal all visible there will be
  additional lookups on mxids? That's complete rubbish.
 
 Noah seemed to be implying that once the updating transaction
 committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup.
  But I think that's not true, because anyone who looks at the tuple
 afterward will still need to know the exact xmax, to test it against
 their snapshot.

Yeah, my comment above was wrong.  I agree that we'll need to retrieve the
mxid members during every MVCC scan until we either mark the page all-visible
or have occasion to simplify the mxid xmax to the updater xid.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
  
  On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
 
   When there is a single locker in a tuple, we can just store the locking 
   info
   in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
   setting hint bits specifying the locking strength.  There is one exception
   here: since hint bit space is limited, we do not provide a separate hint 
   bit
   for SELECT FOR SHARE, so we have to use the extended info in a MultiXact 
   in
   that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, 
   are
   presumably more commonly used due to being the standards-mandated locking
   mechanism, or heavily used by the RI code, so we want to provide fast 
   paths
   for those.)
  
  Are those tuple bits actually hint bits?  They seem quite a bit more
  powerful than a hint.
 
 I'm not sure what's your point.  We've had a hint bit for SELECT FOR
 UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
 HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
 hints, but it's not the job of this patch to fix that problem.

Now I am confused.  Where do you see the word hint used by
HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
bits, not hints, meaning they are not optional or there just for
performance.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian

On Thu, Mar 15, 2012 at 11:04:06PM -0400, Bruce Momjian wrote:
 On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote:
  
  Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
   
   On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
  
When there is a single locker in a tuple, we can just store the locking 
info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, 
and
setting hint bits specifying the locking strength.  There is one 
exception
here: since hint bit space is limited, we do not provide a separate 
hint bit
for SELECT FOR SHARE, so we have to use the extended info in a 
MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY 
SHARE, are
presumably more commonly used due to being the standards-mandated 
locking
mechanism, or heavily used by the RI code, so we want to provide fast 
paths
for those.)
   
   Are those tuple bits actually hint bits?  They seem quite a bit more
   powerful than a hint.
  
  I'm not sure what's your point.  We've had a hint bit for SELECT FOR
  UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
  HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
  hints, but it's not the job of this patch to fix that problem.
 
 Now I am confused.  Where do you see the word hint used by
 HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK.  These are tuple infomask
 bits, not hints, meaning they are not optional or there just for
 performance.

Are you saying that the bit is only a guide and is there only for
performance?  If so, I understand why it is called hint.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-15 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
 On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I agree with you that some worst case performance tests should be
  done. Could you please say what you think the worst cases would be, so
  those can be tested? That would avoid wasting time or getting anything
  backwards.
 
 I've thought about this some and here's what I've come up with so far:

I question whether we are in a position to do the testing necessary to
commit this for 9.2.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
 More often than that; each 2-member mxid takes 4 bytes in an offsets file and
 10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
 that we already fsync the multixact SLRUs today, so any increase will arise
 from the widening of member entries from 4 bytes to 5.  The realism of this
 test is attractive.  Nearly-static parent tables are plenty common, and this
 test will illustrate the impact on those designs.

Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU today?

 You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
 lookups, so I think something more sophisticated is needed to exercise that
 cost.  Not sure what.

I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

 This isn't exactly a test case, but from Noah's previous comments I
 gather that there is a theoretical risk of mxid consumption running
 ahead of xid consumption.  We should try to think about whether there
 are any realistic workloads where that might actually happen.  I'm
 willing to believe that there aren't, but not just because somebody
 asserts it.  The reason I'm concerned about this is because, if it
 should happen, the result will be more frequent anti-wraparound
 vacuums on every table in the cluster.  Those are already quite
 painful for some users.

 Yes.  Pre-release, what can we really do here other than have more people
 thinking about ways it might happen in practice?  Post-release, we could
 suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
 is using more mxid space than xid space.

Well, post-release, the cat is out of the bag: we'll be stuck with
this whether the performance characteristics are acceptable or not.
That's why we'd better be as sure as possible before committing to
this implementation that there's nothing we can't live with.  It's not
like there's any reasonable way to turn this off if you don't like it.

 Also consider a benchmark that does plenty of non-key updates on a parent
 table with no activity on the child table.  We'll pay the overhead of
 determining that the key column(s) have not changed, but it will never pay off
 by preventing a lock wait.

Good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Noah Misch

On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
 On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
  More often than that; each 2-member mxid takes 4 bytes in an offsets file 
  and
  10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
  that we already fsync the multixact SLRUs today, so any increase will arise
  from the widening of member entries from 4 bytes to 5. ?The realism of this
  test is attractive. ?Nearly-static parent tables are plenty common, and this
  test will illustrate the impact on those designs.
 
 Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
 today?

Good question.  So far, I can't think of a reason.  nextMulti is critical,
but we already fsync it with pg_control.  We could delete the other multixact
state data at every startup and set OldestVisibleMXactId accordingly.

  You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
  lookups, so I think something more sophisticated is needed to exercise that
  cost. ?Not sure what.
 
 I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
 all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.

True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
the updater xid when all transactions of the multixact have ended.  You would
need a test workload with long-running multixacts that delay such replacement.
However, the workloads that come to mind are the very workloads for which this
patch eliminates lock waits; they wouldn't illustrate a worst-case.

  This isn't exactly a test case, but from Noah's previous comments I
  gather that there is a theoretical risk of mxid consumption running
  ahead of xid consumption. ?We should try to think about whether there
  are any realistic workloads where that might actually happen. ?I'm
  willing to believe that there aren't, but not just because somebody
  asserts it. ?The reason I'm concerned about this is because, if it
  should happen, the result will be more frequent anti-wraparound
  vacuums on every table in the cluster. ?Those are already quite
  painful for some users.
 
  Yes. ?Pre-release, what can we really do here other than have more people
  thinking about ways it might happen in practice? ?Post-release, we could
  suggest monitoring methods or perhaps have VACUUM emit a WARNING when a 
  table
  is using more mxid space than xid space.
 
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
syntax additions.  We could even avoid doing that by not documenting them.  A
later major release could implement them using a completely different
mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
UPDATE.  To be sure, let's still do a good job the first time.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Alvaro Herrera


Excerpts from Noah Misch's message of mié mar 14 19:10:00 -0300 2012:
 
 On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote:
  On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
   More often than that; each 2-member mxid takes 4 bytes in an offsets file 
   and
   10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note
   that we already fsync the multixact SLRUs today, so any increase will 
   arise
   from the widening of member entries from 4 bytes to 5. ?The realism of 
   this
   test is attractive. ?Nearly-static parent tables are plenty common, and 
   this
   test will illustrate the impact on those designs.
  
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?
 
 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

Hmm, yeah.

   You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on 
   mxid
   lookups, so I think something more sophisticated is needed to exercise 
   that
   cost. ?Not sure what.
  
  I don't think HEAP_XMAX_COMMITTED is much help, because committed !=
  all-visible.  HEAP_XMAX_INVALID will obviously help, when it happens.
 
 True.  The patch (see ResetMultiHintBit()) also replaces a multixact xmax with
 the updater xid when all transactions of the multixact have ended.

I have noticed that this code is not correct, because we don't know that
we're holding an appropriate lock on the page, so we can't simply change
the Xmax and reset those hint bits.  As things stand today, mxids
persist longer.  (We could do some cleanup at HOT-style page prune, for
example, though the lock we need is even weaker than that.)  Overall
this means that coming up with a test case demonstrating this pressure
probably isn't that hard.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch n...@leadboat.com wrote:
 Well, post-release, the cat is out of the bag: we'll be stuck with
 this whether the performance characteristics are acceptable or not.
 That's why we'd better be as sure as possible before committing to
 this implementation that there's nothing we can't live with.  It's not
 like there's any reasonable way to turn this off if you don't like it.

 I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE
 syntax additions.  We could even avoid doing that by not documenting them.  A
 later major release could implement them using a completely different
 mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE =
 UPDATE.  To be sure, let's still do a good job the first time.

What I mean is really that, once the release is out, we don't get to
take it back.  Sure, the next release can fix things, but any
regressions will become obstacles to upgrading and pain points for new
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
  Agreed.  But speaking of that, why exactly do we fsync the multixact SLRU 
  today?

 Good question.  So far, I can't think of a reason.  nextMulti is critical,
 but we already fsync it with pg_control.  We could delete the other multixact
 state data at every startup and set OldestVisibleMXactId accordingly.

 Hmm, yeah.

In a way, the fact that we don't do that is kind of fortuitous in this
situation.  I had just assumed that we were not fsyncing it because
there seems to be no reason to do so.  But since we are, we already
know that the fsyncs resulting from frequent mxid allocation aren't a
huge pain point.  If they were, somebody would have presumably
complained about it and fixed it before now.  So that means that what
we're really worrying about here is the overhead of fsyncing a little
more often, which is a lot less scary than starting to do it when we
weren't previously.

Now, we could look at this as an opportunity to optimize the existing
implementation by removing the fsyncs, rather than adding the new
infrastructure Alvaro is proposing.  But that would only make sense if
we thought that getting rid of the fsyncs would be more valuable than
avoiding the blocking here, and I don't.

I still think that someone needs to do some benchmarking here, because
this is a big complicated performance patch, and we can't predict the
impact of it on real-world scenarios without testing.  There is
clearly some additional overhead, and it makes sense to measure it and
hopefully discover that it isn't excessive.  Still, I'm a bit
relieved.

 I have noticed that this code is not correct, because we don't know that
 we're holding an appropriate lock on the page, so we can't simply change
 the Xmax and reset those hint bits.  As things stand today, mxids
 persist longer.  (We could do some cleanup at HOT-style page prune, for
 example, though the lock we need is even weaker than that.)  Overall
 this means that coming up with a test case demonstrating this pressure
 probably isn't that hard.

What would such a test case look like?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Bruce Momjian

On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:
 Here's a first attempt at a README illustrating this.  I intend this to
 be placed in src/backend/access/heap/README.tuplock; the first three
 paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
 those from there, directing people to this new file instead.  Is there
 something that you think should be covered more extensively (or at all)
 here?
...
 
 When there is a single locker in a tuple, we can just store the locking info
 in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
 setting hint bits specifying the locking strength.  There is one exception
 here: since hint bit space is limited, we do not provide a separate hint bit
 for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
 that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
 presumably more commonly used due to being the standards-mandated locking
 mechanism, or heavily used by the RI code, so we want to provide fast paths
 for those.)

Are those tuple bits actually hint bits?  They seem quite a bit more
powerful than a hint.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas

On Mon, Mar 12, 2012 at 9:24 PM, Noah Misch n...@leadboat.com wrote:
 When we lock an update-in-progress row, we walk the t_ctid chain and lock all
 descendant tuples.  They may all have uncommitted xmins.  This is essential to
 ensure that the final outcome of the updating transaction does not affect
 whether the locking transaction has its KEY SHARE lock.  Similarly, when we
 update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
 to the new version.  That new tuple is both uncommitted and has locks, and we
 cannot easily sacrifice either property.  Do you see a way to extend your
 scheme to cover these needs?

No, I think that sinks it.  Good analysis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Alvaro Herrera


Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012:
 
 On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote:

  When there is a single locker in a tuple, we can just store the locking info
  in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
  setting hint bits specifying the locking strength.  There is one exception
  here: since hint bit space is limited, we do not provide a separate hint bit
  for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
  that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, 
  are
  presumably more commonly used due to being the standards-mandated locking
  mechanism, or heavily used by the RI code, so we want to provide fast paths
  for those.)
 
 Are those tuple bits actually hint bits?  They seem quite a bit more
 powerful than a hint.

I'm not sure what's your point.  We've had a hint bit for SELECT FOR
UPDATE for ages.  Even 8.2 had HEAP_XMAX_EXCL_LOCK and
HEAP_XMAX_SHARED_LOCK.  Maybe they are misnamed and aren't really
hints, but it's not the job of this patch to fix that problem.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Robert Haas

On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I agree with you that some worst case performance tests should be
 done. Could you please say what you think the worst cases would be, so
 those can be tested? That would avoid wasting time or getting anything
 backwards.

I've thought about this some and here's what I've come up with so far:

1. SELECT FOR SHARE on a large table on a system with no write cache.

2. A small parent table (say 30 rows or so) and a larger child table
with a many-to-one FK relationship to the parent (say 100 child rows
per parent row), with heavy update activity on the child table, on a
system where fsyncs are very slow.  This should generate lots of mxid
consumption, and every 1600 or so mxids (I think) we've got to fsync;
does that generate a noticeable performance hit?

3. It would be nice to test the impact of increased mxid lookups in
the parent, but I've realized that the visibility map will probably
mask a good chunk of that effect, which is a good thing.  Still, maybe
something like this: a fairly large parent table, say a million rows,
but narrow rows, so that many of them fit on a page, with frequent
reads and occasional updates (if there are only reads, autovacuum
might end with all the visibility map bits set); plus a child table
with one or a few rows per parent which is heavily updated.  In theory
this ought to be good for the patch, since the the more fine-grained
locking will avoid blocking, but in this case the parent table is
large enough that you shouldn't get much blocking anyway, yet you'll
still pay the cost of mxid lookups because the occasional updates on
the parent will clear VM bits.  This might not be the exactly right
workload to measure this effect, but if it's not maybe someone can
devote a little time to thinking about what would be.

4. A plain old pgbench run or two, to see whether there's any
regression when none of this matters at all...

This isn't exactly a test case, but from Noah's previous comments I
gather that there is a theoretical risk of mxid consumption running
ahead of xid consumption.  We should try to think about whether there
are any realistic workloads where that might actually happen.  I'm
willing to believe that there aren't, but not just because somebody
asserts it.  The reason I'm concerned about this is because, if it
should happen, the result will be more frequent anti-wraparound
vacuums on every table in the cluster.  Those are already quite
painful for some users.

It would be nice if Noah or someone else who has reviewed this patch
in detail could comment further.  I am shooting from the hip here, a
bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-13 Thread Noah Misch

On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote:
 On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I agree with you that some worst case performance tests should be
  done. Could you please say what you think the worst cases would be, so
  those can be tested? That would avoid wasting time or getting anything
  backwards.
 
 I've thought about this some and here's what I've come up with so far:
 
 1. SELECT FOR SHARE on a large table on a system with no write cache.

Easy enough that we may as well check it.  Share-locking an entire large table
is impractical in a real application, so I would not worry if this shows a
substantial regression.

 2. A small parent table (say 30 rows or so) and a larger child table
 with a many-to-one FK relationship to the parent (say 100 child rows
 per parent row), with heavy update activity on the child table, on a
 system where fsyncs are very slow.  This should generate lots of mxid
 consumption, and every 1600 or so mxids (I think) we've got to fsync;
 does that generate a noticeable performance hit?

More often than that; each 2-member mxid takes 4 bytes in an offsets file and
10 bytes in a members file.  So, more like one fsync per ~580 mxids.  Note
that we already fsync the multixact SLRUs today, so any increase will arise
from the widening of member entries from 4 bytes to 5.  The realism of this
test is attractive.  Nearly-static parent tables are plenty common, and this
test will illustrate the impact on those designs.

 3. It would be nice to test the impact of increased mxid lookups in
 the parent, but I've realized that the visibility map will probably
 mask a good chunk of that effect, which is a good thing.  Still, maybe
 something like this: a fairly large parent table, say a million rows,
 but narrow rows, so that many of them fit on a page, with frequent
 reads and occasional updates (if there are only reads, autovacuum
 might end with all the visibility map bits set); plus a child table
 with one or a few rows per parent which is heavily updated.  In theory
 this ought to be good for the patch, since the the more fine-grained
 locking will avoid blocking, but in this case the parent table is
 large enough that you shouldn't get much blocking anyway, yet you'll
 still pay the cost of mxid lookups because the occasional updates on
 the parent will clear VM bits.  This might not be the exactly right
 workload to measure this effect, but if it's not maybe someone can
 devote a little time to thinking about what would be.

You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid
lookups, so I think something more sophisticated is needed to exercise that
cost.  Not sure what.

 4. A plain old pgbench run or two, to see whether there's any
 regression when none of this matters at all...

Might as well.

 This isn't exactly a test case, but from Noah's previous comments I
 gather that there is a theoretical risk of mxid consumption running
 ahead of xid consumption.  We should try to think about whether there
 are any realistic workloads where that might actually happen.  I'm
 willing to believe that there aren't, but not just because somebody
 asserts it.  The reason I'm concerned about this is because, if it
 should happen, the result will be more frequent anti-wraparound
 vacuums on every table in the cluster.  Those are already quite
 painful for some users.

Yes.  Pre-release, what can we really do here other than have more people
thinking about ways it might happen in practice?  Post-release, we could
suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table
is using more mxid space than xid space.


Also consider a benchmark that does plenty of non-key updates on a parent
table with no activity on the child table.  We'll pay the overhead of
determining that the key column(s) have not changed, but it will never pay off
by preventing a lock wait.  Granted, this is barely representative of
application behavior.  Perhaps, too, we already have a good sense of this cost
from the HOT benchmarking efforts and have no cause to revisit it.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas robertmh...@gmail.com wrote:
 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

 That's probably true, although there is some deferred cost that is
 hard to account for.  You might not block immediately, but then later
 somebody might block either because the mxact SLRU now needs fsyncs or
 because they've got to decode an mxid long after the relevant segment
 has been evicted from the SLRU buffers.  In general, it's hard to
 bound that latter cost, because you only avoid blocking once (when the
 initial update happens) but you might pay the extra cost of decoding
 the mxid as many times as the row is read, which could be arbitrarily
 many.  How much of a problem that is in practice, I'm not completely
 sure, but it has worried me before and it still does.  In the worst
 case scenario, a handful of frequently-accessed rows with MXIDs all of
 whose members are dead except for the UPDATE they contain could result
 in continual SLRU cache-thrashing.

 From a performance standpoint, we really need to think not only about
 the cases where the patch wins, but also, and maybe more importantly,
 the cases where it loses.  There are some cases where the current
 mechanism, use SHARE locks for foreign keys, is adequate.  In
 particular, it's adequate whenever the parent table is not updated at
 all, or only very lightly.  I believe that those people will pay
 somewhat more with this patch, and especially in any case where
 backends end up waiting for fsyncs in order to create new mxids, but
 also just because I think this patch will have the effect of
 increasing the space consumed by each individual mxid, which imposes a
 distributed cost of its own.

I spent some time thinking about this over the weekend, and I have an
observation, and an idea.  Here's the observation: I believe that
locking a tuple whose xmin is uncommitted is always a noop, because if
it's ever possible for a transaction to wait for an XID that is part
of its own transaction (exact same XID, or sub-XIDs of the same top
XID), then a transaction could deadlock against itself.  I believe
that this is not possible: if a transaction were to wait for an XID
assigned to that same backend, then the lock manager would observe
that an ExclusiveLock on the xid is already held, so the request for a
ShareLock would be granted immediately.  I also don't believe there's
any situation in which the existence of an uncommitted tuple fails to
block another backend, but a lock on that same uncommitted tuple would
have caused another backend to block.  If any of that sounds wrong,
you can stop reading here (but please tell me why it sounds wrong).

If it's right, then here's the idea: what if we stored mxids using
xmin rather than xmax?  This would mean that, instead of making mxids
contain the tuple's original xmax, they'd need to instead contain the
tuple's original xmin.  This might seem like rearranging the deck
chairs on the titanic, but I think it actually works out substantially
better, because if we can assume that the xmin is committed, then we
only need to know its exact value until it becomes older than
RecentGlobalXmin.  This means that a tuple can be both updated and
locked at the same time without the MultiXact SLRU needing to be
crash-safe, because if we crash and restart, any mxids that are still
around from before the crash are known to contain only xmins that are
now all-visible.  We therefore don't need their exact values, so it
doesn't matter if that data actually made it to disk.  Furthermore, in
the case where a previously-locked tuple is read repeatedly, we only
need to keep doing SLRU lookups until the xmin becomes all-visible;
after that, we can set a hint bit indicating that the tuple's xmin is
all-visible, and any future readers (or writers) can use that to skip
the SLRU lookup.  In the degenerate (and probably common) case where a
tuple is already all-visible at the time it's locked, we don't really
need to record the original xmin at all; we can still do so if
convenient, but we can set the xmin-all-visible hint right away, so
nobody needs to probe the SLRU just to get xmin.

In other words, we'd entirely avoid needing to make mxacts crash-safe,
and we'd avoid most of the extra SLRU lookups that the current
implementation requires; they'd only be needed when (and for as long
as) the locked tuple was not yet all-visible.

This also seems like it would make the anti-wraparound issues simpler
to handle - once an mxid is old enough that any xmin it contains must
be all-visible, we can simply overwrite the tuple's xmin with
FrozenXID, which is pretty much what we're already doing anyway.  It's
already the case that a

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:

 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

The current implementation only requires additional lookups in the
update/check case, which is the case that doesn't do anything other
than block right now. Since we're replacing lock contention with
physical access contention even the worst case situation is still
better than what we have now. Please feel free to point out worst case
situations and show that isn't true.

I've also pointed out how to avoid overhead of making mxacts crash
safe when the new facilities are not in use, so I don't see problems
with the proposed mechanism. Given that I am still myself reviewing
the actual code.

So those things are not something we need to avoid.

My feeling is that overwriting xmin is a clever idea, but arrives too
late to require sensible analysis in this stage of the CF. It's not
solving a problem, its just an alternate mechanism and at best an
optimisation of the mechanism. Were we to explore it now, it seems
certain that another person would observe that design were taking
place and so the patch should be rejected, which would be unnecessary
and wasteful. I also think it would alter our ability to diagnose
problems, not least the normal test that xmax matches xmin across an
update.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Robert Haas

On Mon, Mar 12, 2012 at 1:50 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas robertmh...@gmail.com wrote:
 In other words, we'd entirely avoid needing to make mxacts crash-safe,
 and we'd avoid most of the extra SLRU lookups that the current
 implementation requires; they'd only be needed when (and for as long
 as) the locked tuple was not yet all-visible.

 The current implementation only requires additional lookups in the
 update/check case, which is the case that doesn't do anything other
 than block right now. Since we're replacing lock contention with
 physical access contention even the worst case situation is still
 better than what we have now. Please feel free to point out worst case
 situations and show that isn't true.

I think I already have:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01258.php

The case I'm worried about is where we are allocating mxids quickly,
and we end up having to wait for fsyncs on mxact segments.  That might
be very slow, but you could argue that it could *possibly* be still
worthwhile if it avoids blocking.  That doesn't strike me as a
slam-dunk, though, because we've already seen and fixed cases where
too many fsyncs causes the performance of the entire system to go down
the tubes (cf. commit 7f242d880b5b5d9642675517466d31373961cf98).  But
it's really bad if there are no updates on the parent table - then,
whatever extra overhead there is will be all for naught, since the
more fine-grained locking doesn't help anyway.

 I've also pointed out how to avoid overhead of making mxacts crash
 safe when the new facilities are not in use, so I don't see problems
 with the proposed mechanism. Given that I am still myself reviewing
 the actual code.

The closest thing I can find to a proposal from you in that regard is
this comment:

# I was really thinking we could skip the fsync of a page if we've not
# persisted anything important on that page, since that was one of
# Robert's performance points.

It might be possible to do something with that idea, but at the moment
I'm not seeing how to make it work.

 So those things are not something we need to avoid.

 My feeling is that overwriting xmin is a clever idea, but arrives too
 late to require sensible analysis in this stage of the CF. It's not
 solving a problem, its just an alternate mechanism and at best an
 optimisation of the mechanism. Were we to explore it now, it seems
 certain that another person would observe that design were taking
 place and so the patch should be rejected, which would be unnecessary
 and wasteful.

Considering that nobody's done any work to resolve the uncertainty
about whether the worst-case performance characteristics of this patch
are acceptable, and considering further that it was undergoing massive
code churn for more than a month after the final CommitFest, I think
it's not that unreasonable to think it might not be ready for prime
time at this point.  In any event, your argument is exactly backwards:
we need to first decide whether the patch needs a redesign and then,
if it does, postpone it.  Deciding that we don't want to postpone it
first, and therefore we're not going to redesign it even if that is
what's really needed makes no sense.

 I also think it would alter our ability to diagnose
 problems, not least the normal test that xmax matches xmin across an
 update.

There's nothing stopping the new tuple from being frozen before the
old one, even today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Simon Riggs

On Mon, Mar 12, 2012 at 6:14 PM, Robert Haas robertmh...@gmail.com wrote:

 Considering that nobody's done any work to resolve the uncertainty
 about whether the worst-case performance characteristics of this patch
 are acceptable, and considering further that it was undergoing massive
 code churn for more than a month after the final CommitFest, I think
 it's not that unreasonable to think it might not be ready for prime
 time at this point.

Thank you for cutting to the chase.

The uncertainty of which you speak is a theoretical point you
raised. It has been explained, but nobody has yet shown the
performance numbers to illustrate the point but only because they
seemed so clear. I would point out that you haven't demonstrated the
existence of a problem either, so redesigning something without any
proof of a problem seems a strange.

Let me explain again what this patch does and why it has such major
performance benefit.

This feature give us a step change in lock reductions from FKs. A real
world best case might be to examine the benefit this patch has on a
large batch load that inserts many new orders for existing customers.
In my example case the orders table has a FK to the customer table. At
the same time as the data load, we attempt to update a customer's
additional details, address or current balance etc. The large load
takes locks on the customer table and keeps them for the whole
transaction. So the customer updates are locked out for multiple
seconds, minutes or maybe hours, depending upon how far you want to
stretch the example. With this patch the customer updates don't cause
lock conflicts but they require mxact lookups in *some* cases, so they
might take 1-10ms extra, rather than 1-10 minutes more. 1000x faster.
The only case that causes the additional lookups is the case that
otherwise would have been locked. So producing best case results is
trivial and can be as enormous as you like.

I agree with you that some worst case performance tests should be
done. Could you please say what you think the worst cases would be, so
those can be tested? That would avoid wasting time or getting anything
backwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-12 Thread Noah Misch

On Mon, Mar 12, 2012 at 01:28:11PM -0400, Robert Haas wrote:
 I spent some time thinking about this over the weekend, and I have an
 observation, and an idea.  Here's the observation: I believe that
 locking a tuple whose xmin is uncommitted is always a noop, because if
 it's ever possible for a transaction to wait for an XID that is part
 of its own transaction (exact same XID, or sub-XIDs of the same top
 XID), then a transaction could deadlock against itself.  I believe
 that this is not possible: if a transaction were to wait for an XID
 assigned to that same backend, then the lock manager would observe
 that an ExclusiveLock on the xid is already held, so the request for a
 ShareLock would be granted immediately.  I also don't believe there's
 any situation in which the existence of an uncommitted tuple fails to
 block another backend, but a lock on that same uncommitted tuple would
 have caused another backend to block.  If any of that sounds wrong,
 you can stop reading here (but please tell me why it sounds wrong).

When we lock an update-in-progress row, we walk the t_ctid chain and lock all
descendant tuples.  They may all have uncommitted xmins.  This is essential to
ensure that the final outcome of the updating transaction does not affect
whether the locking transaction has its KEY SHARE lock.  Similarly, when we
update a previously-locked tuple, we copy any locks (always KEY SHARE locks)
to the new version.  That new tuple is both uncommitted and has locks, and we
cannot easily sacrifice either property.  Do you see a way to extend your
scheme to cover these needs?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram

I feel sad, that i followed this topic very late. But i still want to put
forward my views.
Have we thought on the lines of how Robert has implemented relation level
locks. In short it should go like this

a) The locks for enforcing Referential integrity should be taken only when
the rarest of the events( that would cause the integrity failure) occur.
That would be the update of the referenced column. Other cases of update,
delete and insert should not be required to take locks. In this way, we can
reduce a lot of lock traffic.

So if we have a table like employee( empid, empname, ... depid references
dept(deptid)) and table dept(depid depname).

Currently we are taking shared locks on referenced rows in dept table,
whenever we are updating something in the employee table. This should not
happen. Instead any insert / update of referenced column / delete should
check for some lock in its PGPROC structure, which will only get created
when the depid gets updated / deleted( rare event )

b) But the operation of update of the referenced column will be made more
costly. May be it can create something like a predicate lock(used for
enforcing serializable) and keep it in all the PG_PROC structures.

I know this is a abstract idea, but just wanted to know, whether we have
thought on those lines.

Thanks,
Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

On Wed, Mar 7, 2012 at 9:24 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 I feel sad, that i followed this topic very late. But i still want to put
 forward my views.
 Have we thought on the lines of how Robert has implemented relation level
 locks. In short it should go like this

 a) The locks for enforcing Referential integrity should be taken only when
 the rarest of the events( that would cause the integrity failure) occur.
 That would be the update of the referenced column. Other cases of update,
 delete and insert should not be required to take locks. In this way, we can
 reduce a lot of lock traffic.

Insert, Update and Delete don't take locks they simply mark the tuples
they change with an xid. Anybody else wanting to wait on the lock
just waits on the xid. We do insert a lock row for each xid, but not
one per row changed.

 So if we have a table like employee( empid, empname, ... depid references
 dept(deptid)) and table dept(depid depname).

 Currently we are taking shared locks on referenced rows in dept table,
 whenever we are updating something in the employee table. This should not
 happen. Instead any insert / update of referenced column / delete should
 check for some lock in its PGPROC structure, which will only get created
 when the depid gets updated / deleted( rare event )

It's worked that way for 5 years, so its too late to modify it now and
this patch won't change that.

The way we do RI locking is designed to prevent holding that in memory
and then having the lock table overflow, which then either requires us
to revert to the current design or upgrade to table level locks to
save space in the lock table - which is a total disaster, if you've
ever worked with DB2.

What you're suggesting is that we store the locks in memory only as a
way of avoiding updating the row.

My understanding is we have two optimisation choices. A single set of
xids can be used in many places, since the same set of transactions
may do roughly the same thing.

1. We could assign a new mxactid every time we touch a new row. That
way there is no correspondence between sets of xids, and we may hold
the same set many times. OTOH since each set is unique we can expand
it easily and we don't need to touch each row once for each lock. That
saves on row touches but it also greatly increases the mxactid
creation rate, which causes cache scrolling.

2. We assign a new mxactid each time we create a new unique set of
rows. We have a separate cache for local sets. This way reduces the
mxactid creation rate but causes row updates each time we lock the
row, which then needs WAL.

(2) is how we currently handle the very difficult decision of how to
optimise this for the general case. I'm not sure that is right in all
cases, but it is at least scalable and it is the devil we know.

 b) But the operation of update of the referenced column will be made more
 costly. May be it can create something like a predicate lock(used for
 enforcing serializable) and keep it in all the PG_PROC structures.

No, updates of referenced columns are exactly the same as now when no
RI checks happening.

If the update occurs when an RI check takes place there is more work
to do, but previously it would have just blocked and done nothing. So
that path is relatively heavyweight but much better than nothing.


 I know this is a abstract idea, but just wanted to know, whether we have
 thought on those lines.

Thanks for your thoughts.

The most useful way to help with this patch right now is to run
performance investigations and see if there are non-optimal cases. We
can then see how the patch handles those. Theory is good, but it needs
to drive experimentation, as I myself re-discover continually.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram



 Insert, Update and Delete don't take locks they simply mark the tuples
 they change with an xid. Anybody else wanting to wait on the lock
 just waits on the xid. We do insert a lock row for each xid, but not
 one per row changed.

I mean the foreign key checks here. They take a Select for Share Lock
right. That's what we are trying to optimize here. Or am i missing
something? So by following the suggested methodology, the foreign key
checks won't take any locks.


 It's worked that way for 5 years, so its too late to modify it now and
 this patch won't change that.

 The way we do RI locking is designed to prevent holding that in memory
 and then having the lock table overflow, which then either requires us
 to revert to the current design or upgrade to table level locks to
 save space in the lock table - which is a total disaster, if you've
 ever worked with DB2.

 What you're suggesting is that we store the locks in memory only as a
 way of avoiding updating the row.

 But that memory would be consumed, only when someone updates the
referenced column( which will usually be the primary key of the referenced
table). Any normal database programmer knows that updating primary key is
not good for performance. So we go by the same logic.


 No, updates of referenced columns are exactly the same as now when no
 RI checks happening.

 If the update occurs when an RI check takes place there is more work
 to do, but previously it would have just blocked and done nothing. So
 that path is relatively heavyweight but much better than nothing.

 As i have already said, that path is definitely heavy weight( like how
Robert has made the DDL path heavy weight). If we assume that DDLs are
going to be a rare phenomenon, then we can also assume that update of
primary keys is a rare phenomenon in a normal database.



 The most useful way to help with this patch right now is to run
 performance investigations and see if there are non-optimal cases. We
 can then see how the patch handles those. Theory is good, but it needs
 to drive experimentation, as I myself re-discover continually.

 I understand. I just wanted to know, whether the developer considered that
line of thought.

Thanks,
Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

On Wed, Mar 7, 2012 at 10:18 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:

 Insert, Update and Delete don't take locks they simply mark the tuples
 they change with an xid. Anybody else wanting to wait on the lock
 just waits on the xid. We do insert a lock row for each xid, but not
 one per row changed.

 I mean the foreign key checks here. They take a Select for Share Lock right.
 That's what we are trying to optimize here. Or am i missing something? So by
 following the suggested methodology, the foreign key checks won't take any
 locks.

Please explain in detail your idea of how it will work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram



 Please explain in detail your idea of how it will work.


OK. I will try to explain the abstract idea, i have.
a) Referential integrity gets violated, when there are referencing key
values, not present in the referenced key values. We are maintaining the
integrity by taking a Select for Share Lock during the foreign key checks,
 so that referred value is not updated/deleted during the operation.

b) We can do the same in the reverse way. When there is a update/delete of
the referred value, we don't want any new inserts with the referred value
in referring table, any update that will update its value to the referred
value being updated/deleted. So we will take some kind of lock, which will
stop such a happening. This can be achieved through
i) predicate locking infrastructure already present (or)
ii) a temporary B-Tree index ( no WAL protection ), that gets created only
for the referred value updations and holds those values that are being
updated/deleted (if we are scared of predicate locking).

So whenever we de foreign key checks, we just need to make sure there is no
such referential integrity lock in our own PGPROC structure(if implemented
with predicate locking) /  check the temporary B-Tree index for any entry
matching the entry that we are going to insert/update to.( the empty tree
can be tracked with a flag to optimize )

May be someone can come up with better ideas than this.

Gokul.

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Simon Riggs

n Wed, Mar 7, 2012 at 11:37 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:

 Please explain in detail your idea of how it will work.

 So we will take some kind of lock, which will stop such a happening.
...
 May be someone can come up with better ideas than this.

With respect, I don't call this a detailed explanation of an idea. For
consideration here, come up with a very detailed design of how your
suggestion will work. Think about it carefully, spend hours and days
thinking it through and when you are personally sure it is better than
what we have now, please raise it on list at an appropriate time. Bear
in mind that most people throw away 90% of their ideas before even
mentioning them here. I hope that helps you to contribute.

At the moment we're trying to review patches for specific code to
include or exclude, not discuss huge redesign of internal mechanisms
using broad brush descriptions. It is possible you may find an
improvement and if you do, people will be interested but that seems an
unlikely thing to happen here and now.

If you have specific comments or tests on this patch those are very welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:

 It does however, illustrate my next review comment which is that the
 comments and README items are sorely lacking here. It's quite hard to
 see how it works, let along comment on major design decisions. It
 would help myself and others immensely if we could improve that.

Here's a first attempt at a README illustrating this.  I intend this to
be placed in src/backend/access/heap/README.tuplock; the first three
paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
those from there, directing people to this new file instead.  Is there
something that you think should be covered more extensively (or at all)
here?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Locking tuples
--

Because the shared-memory lock table is of finite size, but users could
reasonably want to lock large numbers of tuples, we do not rely on the
standard lock manager to store tuple-level locks over the long term.  Instead,
a tuple is marked as locked by setting the current transaction's XID as its
XMAX, and setting additional infomask bits to distinguish this usage from the
more normal case of having deleted the tuple.  When multiple transactions
concurrently lock a tuple, a MultiXact is used; see below.

When it is necessary to wait for a tuple-level lock to be released, the basic
delay is provided by XactLockTableWait or MultiXactIdWait on the contents of
the tuple's XMAX.  However, that mechanism will release all waiters
concurrently, so there would be a race condition as to which waiter gets the
tuple, potentially leading to indefinite starvation of some waiters.  The
possibility of share-locking makes the problem much worse --- a steady stream
of share-lockers can easily block an exclusive locker forever.  To provide
more reliable semantics about who gets a tuple-level lock first, we use the
standard lock manager.  The protocol for waiting for a tuple-level lock is
really

 LockTuple()
 XactLockTableWait()
 mark tuple as locked by me
 UnlockTuple()

When there are multiple waiters, arbitration of who is to get the lock next
is provided by LockTuple().  However, at most one tuple-level lock will
be held or awaited per backend at any time, so we don't risk overflow
of the lock table.  Note that incoming share-lockers are required to
do LockTuple as well, if there is any conflict, to ensure that they don't
starve out waiting exclusive-lockers.  However, if there is not any active
conflict for a tuple, we don't incur any extra overhead.

We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
super-exclusive locking (used to delete tuples and more generally to update
tuples modifying the values of the columns that make up the key of the tuple);
SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
UPDATE.  This last mode implements a mode just strong enough to implement RI
checks, i.e. it ensures that tuples do not go away from under a check, without
blocking when some other transaction that want to update the tuple without
changing its key.

The conflict table is:

KEY UPDATEUPDATESHAREKEY SHARE
KEY UPDATE   conflictconflict  conflict  conflict
UPDATE   conflictconflict  conflict
SHAREconflictconflict
KEY SHAREconflict

When there is a single locker in a tuple, we can just store the locking info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
setting hint bits specifying the locking strength.  There is one exception
here: since hint bit space is limited, we do not provide a separate hint bit
for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
presumably more commonly used due to being the standards-mandated locking
mechanism, or heavily used by the RI code, so we want to provide fast paths
for those.)

MultiXacts
--

A tuple header provides very limited space for storing information about tuple
locking and updates: there is room only for a single Xid and a small number of
hint bits.  Whenever we need to store more than one lock, we replace the first
locker's Xid with a new MultiXactId.  Each MultiXact provides extended locking
data; it comprises an array of Xids plus some flags bits for each one.  The
flags are currently used to store the locking strength of each member
transaction.  (The flags also distinguish a pure locker from an actual
updater.)

In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
locked in shared mode by multiple

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Mon, Mar 5, 2012 at 8:35 PM, Simon Riggs si...@2ndquadrant.com wrote:

 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

 Any page that contains at least one multi with an update as a member
 must persist.  It's possible that some pages contain no update (and this
 is even likely in some workloads, if updates are rare), but I'm not sure
 it's worth complicating the code to cater for early removal of some
 pages.

 If the multixact contains an xid and that is being persisted then you
 need to set an LSN to ensure that a page writes causes an XLogFlush()
 before the multixact write. And you need to set do_fsync, no? Or
 explain why not in comments...

 I was really thinking we could skip the fsync of a page if we've not
 persisted anything important on that page, since that was one of
 Robert's performance points.

We need to increase these values to 32 as well

 #define NUM_MXACTOFFSET_BUFFERS8
 #define NUM_MXACTMEMBER_BUFFERS16

using same logic as for clog.

We're using 25% more space and we already know clog benefits from
increasing them, so there's little doubt we need it here also, since
we are increasing the access rate and potentially the longevity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

So there are 4 lock types, but we only have room for 3 on the tuple
header, so we store the least common/deprecated of the 4 types as a
multixactid. Some rewording would help there.

Neat scheme!


My understanding is that all of theses workloads will change

* Users of explicit SHARE lockers will be slightly worse in the case
of the 1st locker, but then after that they'll be the same as before.

* Updates against an RI locked table will be dramatically faster
because of reduced lock waits


...and that these previous workloads are effectively unchanged:

* Stream of RI checks causes mxacts

* Multi row deadlocks still possible

* Queues of writers still wait in the same way

* Deletes don't cause mxacts unless by same transaction



 In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
 locked in shared mode by multiple transactions.  This is no longer the case; a
 MultiXact may contain an update or delete Xid.  (Keep in mind that tuple locks
 in a transaction do not conflict with other tuple locks in the same
 transaction, so it's possible to have otherwise conflicting locks in a
 MultiXact if they belong to the same transaction).

Somewhat confusing, but am getting there.

 Note that each lock is attributed to the subtransaction that acquires it.
 This means that a subtransaction that aborts is seen as though it releases the
 locks it acquired; concurrent transactions can then proceed without having to
 wait for the main transaction to finish.  It also means that a subtransaction
 can upgrade to a stronger lock level than an earlier transaction had, and if
 the subxact aborts, the earlier, weaker lock is kept.

OK

 The possibility of having an update within a MultiXact means that they must
 persist across crashes and restarts: a future reader of the tuple needs to
 figure out whether the update committed or aborted.  So we have a requirement
 that pg_multixact needs to retain pages of its data until we're certain that
 the MultiXacts in them are no longer of interest.

I think the no longer of interest aspect needs to be tracked more
closely because it will necessarily lead to more I/O.

If we store the LSN on each mxact page, as I think we need to, we can
get rid of pages more quickly if we know they don't have an LSN set.
So its possible we can optimise that more.

 VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.

You mean mxact segments?

Surely we set hint bits on tuples same as now? Hope so.

 This works in the same way that pg_clog segments are removed: we have a
 pg_class column that stores the earliest multixact that could possibly be
 stored in the table; the minimum of all such values is stored in a pg_database
 column.  VACUUM computes the minimum across all pg_database values, and
 removes pg_multixact segments older than the minimum.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas

Preliminary comment:

This README is very helpful.

On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

I feel like there is a naming problem here.  The semantics that have
always been associated with SELECT FOR UPDATE are now attached to
SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
 I think users will be surprised to find that SELECT FOR UPDATE
doesn't block all concurrent updates.

It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
UPDATE, and what you're calling SELECT FOR UPDATE should be called
something else - essentially NONKEY UPDATE, though I don't much like
that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Alvaro Herrera


Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:
 
 Preliminary comment:
 
 This README is very helpful.

Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.
 
 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
block all updates.  It continues to block SELECT FOR SHARE as well.
The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
since those didn't exist before, it doesn't seem correct to consider
that SELECT FOR UPDATE changed in any way.

The main difference in the UPDATE behavior is that an UPDATE is regarded
as though it might acquire two different lock modes -- it either
acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
consider that previous to this patch, what UPDATE did was always acquire
a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
weakened.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Simon Riggs

On Tue, Mar 6, 2012 at 9:10 PM, Robert Haas robertmh...@gmail.com wrote:
 Preliminary comment:

 This README is very helpful.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the 
 tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
 mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, 
 without
 blocking when some other transaction that want to update the tuple without
 changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
 UPDATE, and what you're calling SELECT FOR UPDATE should be called
 something else - essentially NONKEY UPDATE, though I don't much like
 that name.

No, because that would stop it from doing what it is designed to do.

The lock modes are correct, appropriate and IMHO have meaningful
names. No redesign required here.

Not sure about the naming of some of the flag bits however.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-06 Thread Robert Haas

On Tue, Mar 6, 2012 at 4:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:

 Preliminary comment:

 This README is very helpful.

 Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement 
  RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
 block all updates.  It continues to block SELECT FOR SHARE as well.
 The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
 since those didn't exist before, it doesn't seem correct to consider
 that SELECT FOR UPDATE changed in any way.

 The main difference in the UPDATE behavior is that an UPDATE is regarded
 as though it might acquire two different lock modes -- it either
 acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
 UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
 consider that previous to this patch, what UPDATE did was always acquire
 a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
 weakened.

Ah, I see.  My mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

 That's probably true, although there is some deferred cost that is
 hard to account for.  You might not block immediately, but then later
 somebody might block either because the mxact SLRU now needs fsyncs or
 because they've got to decode an mxid long after the relevant segment
 has been evicted from the SLRU buffers.  In general, it's hard to
 bound that latter cost, because you only avoid blocking once (when the
 initial update happens) but you might pay the extra cost of decoding
 the mxid as many times as the row is read, which could be arbitrarily
 many.  How much of a problem that is in practice, I'm not completely
 sure, but it has worried me before and it still does.  In the worst
 case scenario, a handful of frequently-accessed rows with MXIDs all of
 whose members are dead except for the UPDATE they contain could result
 in continual SLRU cache-thrashing.

Cases I regularly see involve wait times of many seconds.

When this patch helps, it will help performance by algorithmic gains,
so perhaps x10-100.

That can and should be demonstrated though, I agree.

 From a performance standpoint, we really need to think not only about
 the cases where the patch wins, but also, and maybe more importantly,
 the cases where it loses.  There are some cases where the current
 mechanism, use SHARE locks for foreign keys, is adequate.  In
 particular, it's adequate whenever the parent table is not updated at
 all, or only very lightly.  I believe that those people will pay
 somewhat more with this patch, and especially in any case where
 backends end up waiting for fsyncs in order to create new mxids, but
 also just because I think this patch will have the effect of
 increasing the space consumed by each individual mxid, which imposes a
 distributed cost of its own.

That is a concern also.

It's taken me a while reviewing the patch to realise that space usage
is actually 4 times worse than before.

 I think we should avoid having a theoretical argument about how
 serious these problems are; instead, you should try to construct
 somewhat-realistic worst case scenarios and benchmark them.  Tom's
 complaint about code complexity is basically a question of opinion, so
 I don't know how to evaluate that objectively, but performance is
 something we can measure.  We might still disagree on the
 interpretation of the results, but I still think having some real
 numbers to talk about based on carefully-thought-out test cases would
 advance the debate.

It's a shame that the isolation tester can't be used directly by
pgbench - I think we need something similar for performance regression
testing.

So yes, performance testing is required.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
 
 On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:

  From a performance standpoint, we really need to think not only about
  the cases where the patch wins, but also, and maybe more importantly,
  the cases where it loses.  There are some cases where the current
  mechanism, use SHARE locks for foreign keys, is adequate.  In
  particular, it's adequate whenever the parent table is not updated at
  all, or only very lightly.  I believe that those people will pay
  somewhat more with this patch, and especially in any case where
  backends end up waiting for fsyncs in order to create new mxids, but
  also just because I think this patch will have the effect of
  increasing the space consumed by each individual mxid, which imposes a
  distributed cost of its own.
 
 That is a concern also.
 
 It's taken me a while reviewing the patch to realise that space usage
 is actually 4 times worse than before.

Eh.  You're probably misreading something.  Previously each member of a
multixact used 4 bytes (the size of an Xid).  With the current patch a
member uses 5 bytes (same plus a flags byte).  An earlier version used
4.25 bytes per multi, which I increased to leave space for future
expansion.

So it's 1.25x worse, not 4x worse.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:

 On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:

  From a performance standpoint, we really need to think not only about
  the cases where the patch wins, but also, and maybe more importantly,
  the cases where it loses.  There are some cases where the current
  mechanism, use SHARE locks for foreign keys, is adequate.  In
  particular, it's adequate whenever the parent table is not updated at
  all, or only very lightly.  I believe that those people will pay
  somewhat more with this patch, and especially in any case where
  backends end up waiting for fsyncs in order to create new mxids, but
  also just because I think this patch will have the effect of
  increasing the space consumed by each individual mxid, which imposes a
  distributed cost of its own.

 That is a concern also.

 It's taken me a while reviewing the patch to realise that space usage
 is actually 4 times worse than before.

 Eh.  You're probably misreading something.  Previously each member of a
 multixact used 4 bytes (the size of an Xid).  With the current patch a
 member uses 5 bytes (same plus a flags byte).  An earlier version used
 4.25 bytes per multi, which I increased to leave space for future
 expansion.

 So it's 1.25x worse, not 4x worse.

Thanks for correcting me. That sounds better.

It does however, illustrate my next review comment which is that the
comments and README items are sorely lacking here. It's quite hard to
see how it works, let along comment on major design decisions. It
would help myself and others immensely if we could improve that.

Is there a working copy on a git repo? Easier than waiting for next
versions of a patch.

My other comments so far are

* some permutations commented out - no comments as to why
Something of a fault with the isolation tester that it just shows
output, there's no way to record expected output in the spec

Comments required for these points

* Why do we need multixact to be persistent? Do we need every page of
multixact to be persistent, or just particular pages in certain
circumstances?

* Why do we need to expand multixact with flags? Can we avoid that in
some cases?

* Why do we need to store just single xids in multixact members?
Didn't understand comments, no explanation

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:
 On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 It does however, illustrate my next review comment which is that the
 comments and README items are sorely lacking here. It's quite hard to
 see how it works, let along comment on major design decisions. It
 would help myself and others immensely if we could improve that.

Hm.  Okay.

 Is there a working copy on a git repo? Easier than waiting for next
 versions of a patch.

No, I don't have an external mirror of my local repo.

 My other comments so far are
 
 * some permutations commented out - no comments as to why
 Something of a fault with the isolation tester that it just shows
 output, there's no way to record expected output in the spec

The reason they are commented out is that they are invalid, that is,
it requires running a command on a session that's blocked in the
previous command.  Obviously, that cannot happen in real life.

isolationtester now has support for detecting such conditions; if the
spec specifies running a command in a locked session, the permutation is
killed with an error message invalid permutation and just continues
with the next permutation.  It used to simply die, aborting the test.
Maybe we could just modify the specs so that all permutations are there
(this can be done by simply removing the permutation lines), and the
invalid permutation messages are part of the expected file.  Would
that be better?

 Comments required for these points
 
 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

Any page that contains at least one multi with an update as a member
must persist.  It's possible that some pages contain no update (and this
is even likely in some workloads, if updates are rare), but I'm not sure
it's worth complicating the code to cater for early removal of some
pages.

 * Why do we need to expand multixact with flags? Can we avoid that in
 some cases?

Did you read my blog post?
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
This explains the reason -- the point is that we need to distinguish the
lock strength acquired by each locker.

 * Why do we need to store just single xids in multixact members?
 Didn't understand comments, no explanation

This is just for SELECT FOR SHARE.  We don't have a hint bit to indicate
this tuple has a for-share lock, so we need to create a multi for it.
Since FOR SHARE is probably going to be very uncommon, this isn't likely
to be a problem.  We're mainly catering for users of SELECT FOR SHARE so
that it continues to work, i.e. maintain backwards compatibility.

(Maybe I misunderstood your question -- what I think you're asking is,
why are there some multixacts that have a single member?)

I'll try to come up with a good place to add some paragraphs about all
this.  Please let me know if answers here are unclear and/or you have
further questions.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-05 Thread Simon Riggs

On Mon, Mar 5, 2012 at 7:53 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

My other comments so far are

* some permutations commented out - no comments as to why
Something of a fault with the isolation tester that it just shows
output, there's no way to record expected output in the spec

The reason they are commented out is that they are invalid, that is,
it requires running a command on a session that's blocked in the
previous command. Obviously, that cannot happen in real life.

isolationtester now has support for detecting such conditions; if the
spec specifies running a command in a locked session, the permutation is
killed with an error message invalid permutation and just continues
with the next permutation. It used to simply die, aborting the test.
Maybe we could just modify the specs so that all permutations are there
(this can be done by simply removing the permutation lines), and the
invalid permutation messages are part of the expected file. Would
that be better?

It would be better to have an isolation tester mode that checks to see
it was invalid and if not, report that.

At the moment we can't say why you commented something out. There's no
comment or explanation, and we need something, otherwise 3 years from
now we'll be completely in the dark.

Comments required for these points

* Why do we need multixact to be persistent? Do we need every page of
multixact to be persistent, or just particular pages in certain
circumstances?

Any page that contains at least one multi with an update as a member
must persist. It's possible that some pages contain no update (and this
is even likely in some workloads, if updates are rare), but I'm not sure
it's worth complicating the code to cater for early removal of some
pages.

If the multixact contains an xid and that is being persisted then you
need to set an LSN to ensure that a page writes causes an XLogFlush()
before the multixact write. And you need to set do_fsync, no? Or
explain why not in comments...

I was really thinking we could skip the fsync of a page if we've not
persisted anything important on that page, since that was one of
Robert's performance points.

* Why do we need to expand multixact with flags? Can we avoid that in
some cases?

Did you read my blog post?
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/
This explains the reason -- the point is that we need to distinguish the
lock strength acquired by each locker.

Thanks, I will, but it all belongs in a README please.

* Why do we need to store just single xids in multixact members?
Didn't understand comments, no explanation

This is just for SELECT FOR SHARE. We don't have a hint bit to indicate
this tuple has a for-share lock, so we need to create a multi for it.
Since FOR SHARE is probably going to be very uncommon, this isn't likely
to be a problem. We're mainly catering for users of SELECT FOR SHARE so
that it continues to work, i.e. maintain backwards compatibility.

Good, thanks.

Are we actively recommending people use FOR KEY SHARE rather than FOR
SHARE, in explicit use?

(Maybe I misunderstood your question -- what I think you're asking is,
why are there some multixacts that have a single member?)

I'll try to come up with a good place to add some paragraphs about all
this. Please let me know if answers here are unclear and/or you have
further questions.

Thanks

I think we need to define some test workloads to measure the
performance impact of this patch. We need to be certain that it has a
good impact in target cases, plus a known impact in other cases.

Suggest

* basic pgbench - no RI

* inserts into large table, RI checks to small table, no activity on small table

* large table parent, large table: child
20 child rows per parent, fk from child to parent
updates of multiple children at same time
low/medium/heavy locking

* large table parent, large table: child
20 child rows per parent,fk from child to parent
updates of parent and child at same time
low/medium/heavy locking

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Heikki Linnakangas

On 23.02.2012 18:01, Alvaro Herrera wrote:

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:

Alvaro Herreraalvhe...@commandprompt.com writes:

Sure. The problem is that we are allowing updated rows to be locked (and
locked rows to be updated). This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks. After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

Ugh. I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint. The only reason multixacts
were tolerable at all was that they had only one semantics. Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

How about assigning a new, real, transaction id, to represent the group
of transaction ids. The new transaction id would be treated as a
subtransaction of the updater, and the xids of the lockers would be
stored in the multixact-members slru. That way the multixact structures
wouldn't need to survive a crash; you don't care about the shared
lockers after a crash, and the xid of the updater would be safely stored
as is in the xmax field.

That way you wouldn't need to handle multixact wraparound, because we
already handle xid wraparound, and you wouldn't need to make multixact
slrus crash-safe.

Not sure what the performance implications would be. You would use up
xids more quickly, which would require more frequent anti-wraparound
vacuuming. And if we just start using real xids as the key to
multixact-offsets slru, we would need to extend that a lot more often.
But I feel it would probably be acceptable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Noah Misch

On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
 On 23.02.2012 18:01, Alvaro Herrera wrote:
 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 How about assigning a new, real, transaction id, to represent the group  
 of transaction ids. The new transaction id would be treated as a  
 subtransaction of the updater, and the xids of the lockers would be  
 stored in the multixact-members slru. That way the multixact structures  
 wouldn't need to survive a crash; you don't care about the shared  
 lockers after a crash, and the xid of the updater would be safely stored  
 as is in the xmax field.

 That way you wouldn't need to handle multixact wraparound, because we  
 already handle xid wraparound, and you wouldn't need to make multixact  
 slrus crash-safe.

 Not sure what the performance implications would be. You would use up  
 xids more quickly, which would require more frequent anti-wraparound  
 vacuuming. And if we just start using real xids as the key to  
 multixact-offsets slru, we would need to extend that a lot more often.  
 But I feel it would probably be acceptable.

When a key locker arrives after the updater and creates this implicit
subtransaction of the updater, how might you arrange for the xid's clog status
to eventually get updated in accordance with the updater's outcome?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-27 Thread Simon Riggs

On Tue, Feb 28, 2012 at 12:28 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote:
 On 23.02.2012 18:01, Alvaro Herrera wrote:
 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 How about assigning a new, real, transaction id, to represent the group
 of transaction ids. The new transaction id would be treated as a
 subtransaction of the updater, and the xids of the lockers would be
 stored in the multixact-members slru. That way the multixact structures
 wouldn't need to survive a crash; you don't care about the shared
 lockers after a crash, and the xid of the updater would be safely stored
 as is in the xmax field.

 That way you wouldn't need to handle multixact wraparound, because we
 already handle xid wraparound, and you wouldn't need to make multixact
 slrus crash-safe.

 Not sure what the performance implications would be. You would use up
 xids more quickly, which would require more frequent anti-wraparound
 vacuuming. And if we just start using real xids as the key to
 multixact-offsets slru, we would need to extend that a lot more often.
 But I feel it would probably be acceptable.

 When a key locker arrives after the updater and creates this implicit
 subtransaction of the updater, how might you arrange for the xid's clog status
 to eventually get updated in accordance with the updater's outcome?

Somewhat off-topic, but just seen another bad case of FK lock contention.

Thanks for working on this everybody.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-26 Thread Robert Haas

On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 This
 seems like a horrid mess that's going to be unsustainable both from a
 complexity and a performance standpoint.  The only reason multixacts
 were tolerable at all was that they had only one semantics.  Changing
 it so that maybe a multixact represents an actual updater and maybe
 it doesn't is not sane.

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

 Regarding performance, the good thing about this patch is that if you
 have an operation that used to block, it might now not block.  So maybe
 multixact-related operation is a bit slower than before, but if it
 allows you to continue operating rather than sit waiting until some
 other transaction releases you, it's much better.

That's probably true, although there is some deferred cost that is
hard to account for.  You might not block immediately, but then later
somebody might block either because the mxact SLRU now needs fsyncs or
because they've got to decode an mxid long after the relevant segment
has been evicted from the SLRU buffers.  In general, it's hard to
bound that latter cost, because you only avoid blocking once (when the
initial update happens) but you might pay the extra cost of decoding
the mxid as many times as the row is read, which could be arbitrarily
many.  How much of a problem that is in practice, I'm not completely
sure, but it has worried me before and it still does.  In the worst
case scenario, a handful of frequently-accessed rows with MXIDs all of
whose members are dead except for the UPDATE they contain could result
in continual SLRU cache-thrashing.

From a performance standpoint, we really need to think not only about
the cases where the patch wins, but also, and maybe more importantly,
the cases where it loses.  There are some cases where the current
mechanism, use SHARE locks for foreign keys, is adequate.  In
particular, it's adequate whenever the parent table is not updated at
all, or only very lightly.  I believe that those people will pay
somewhat more with this patch, and especially in any case where
backends end up waiting for fsyncs in order to create new mxids, but
also just because I think this patch will have the effect of
increasing the space consumed by each individual mxid, which imposes a
distributed cost of its own.

I think we should avoid having a theoretical argument about how
serious these problems are; instead, you should try to construct
somewhat-realistic worst case scenarios and benchmark them.  Tom's
complaint about code complexity is basically a question of opinion, so
I don't know how to evaluate that objectively, but performance is
something we can measure.  We might still disagree on the
interpretation of the results, but I still think having some real
numbers to talk about based on carefully-thought-out test cases would
advance the debate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-25 Thread Kevin Grittner

Vik Reykja vikrey...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.govwrote:
 
 One of the problems that Florian was trying to address is that
 people often have a need to enforce something with a lot of
 similarity to a foreign key, but with more subtle logic than
 declarative foreign keys support.  One example would be the case
 Robert has used in some presentations, where the manager column
 in each row in a project table must contain the id of a row in a
 person table *which has the project_manager boolean column set to
 TRUE*.  Short of using the new serializable transaction isolation
 level in all related transactions, hand-coding enforcement of
 this useful invariant through trigger code (or application code
 enforced through some framework) is very tricky.  The change to
 SELECT FOR UPDATE that Florian was working on would make it
 pretty straightforward.
 
 I'm not sure what Florian's patch does, but I've been trying to
 advocate syntax like the following for this exact scenario:
 
 foreign key (manager_id, true) references person (id, is_manager)
 
 Basically, allow us to use constants instead of field names as
 part of foreign keys.
 
Interesting.  IMV, a declarative approach like that is almost always
better than the alternatives, so something like this (possibly with
different syntax) would be another step in the right direction.  I
suspect that there will always be a few corner cases where the
business logic required is too esoteric to be handled by a
generalized declarative construct, so I think Florian's idea still
has merit -- especially if we want to ease the transition to
PostgreSQL for large shops using other products.
 
 I have no idea what the implementation aspect of this is,
 but I need the user aspect of it and don't know the best way to
 get it.
 
There are those in the community who make their livings by helping
people get the features they want.  If you have some money to fund
development, I would bet you could get this addressed -- it sure
sounds reasonable to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Jeroen Vermeulen


On 2012-02-23 22:12, Noah Misch wrote:


That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.


Ah, so there's the technical hitch.  From previous discussion I was 
under the impression that:


1. Foreign-key updates only inherently conflict with _key_ updates on 
the foreign side, and that non-key updates on the foreign side were just 
caught in the locking cross-fire, so to speak.


And

2. The DELETE case was somehow trivially accounted for.  But, for 
instance, there does not seem to be a lighter lock type that DELETE 
conflicts with but UPDATE does not.  Bummer.




By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.


No, to be clear, it was never meant to be.  Only a possible way to give 
users a way out of foreign-key locks more quickly.  Not a way to get 
some of the branch out to users more quickly.


At any rate, that seems to be moot then.  And to be honest, mechanisms 
designed for more than one purpose rarely pan out.


Thanks for explaining!


Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Vik Reykja

On Thu, Feb 23, 2012 at 19:44, Kevin Grittner
kevin.gritt...@wicourts.govwrote:

 One of the problems that Florian was trying to address is that
 people often have a need to enforce something with a lot of
 similarity to a foreign key, but with more subtle logic than
 declarative foreign keys support.  One example would be the case
 Robert has used in some presentations, where the manager column in
 each row in a project table must contain the id of a row in a person
 table *which has the project_manager boolean column set to TRUE*.
 Short of using the new serializable transaction isolation level in
 all related transactions, hand-coding enforcement of this useful
 invariant through trigger code (or application code enforced through
 some framework) is very tricky.  The change to SELECT FOR UPDATE
 that Florian was working on would make it pretty straightforward.


I'm not sure what Florian's patch does, but I've been trying to advocate
syntax like the following for this exact scenario:

foreign key (manager_id, true) references person (id, is_manager)

Basically, allow us to use constants instead of field names as part of
foreign keys.  I have no idea what the implementation aspect of this is,
but I need the user aspect of it and don't know the best way to get it.

Re: [HACKERS] foreign key locks, 2nd attempt

On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:

 All in all, I think this is in pretty much final shape.  Only pg_upgrade
 bits are still missing.  If sharp eyes could give this a critical look
 and knuckle-cracking testers could give it a spin, that would be
 helpful.

 Lack of pg_upgrade support leaves this version incomplete, because that
 omission would constitute a blocker for beta 2.  This version changes as much
 code compared to the version I reviewed at the beginning of the CommitFest as
 that version changed overall.  In that light, it's time to close the books on
 this patch for the purpose of this CommitFest; I'm marking it Returned with
 Feedback.  Thanks for your efforts thus far.

My view would be that with 90 files touched this is a very large
patch, so that alone makes me wonder whether we should commit this
patch, so I agree with Noah and compliment him on an excellent
detailed review.

However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Jeroen Vermeulen


On 2012-02-23 10:18, Simon Riggs wrote:


However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.


The suggested immutable-column constraint was meant as a potential 
80/20 workaround.  Definitely not a full solution, helpful to some, 
probably easier to do.  I don't know if an immutable key would actually 
be enough to elide foreign-key locks though.


Simon, I think you had a reason why it couldn't work, but I didn't quite 
get your meaning and didn't want to distract things further at that 
stage.  You wrote that it doesn't do what KEY LOCKS are designed to 
do...  any chance you might recall what the problem was?


I don't mean to be pushy about my pet idea, and heaven knows I don't 
have time to implement it, but it'd be good to know whether I should put 
the whole thought to rest.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

 Making pg_multixact persistent across clean shutdowns is no bridge to cross
 lightly, since it means committing to an on-disk format for an indefinite
 period.  We should do it; the benefits of this patch justify it, and I haven't
 identified a way to avoid it without incurring worse problems.

I can't actually see anything in the patch that explains why this is
required. (That is something we should reject more patches on, since
it creates a higher maintenance burden).

Can someone explain? We might think of a way to avoid that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 1:08 PM, Jeroen Vermeulen j...@xs4all.nl wrote:

 Simon, I think you had a reason why it couldn't work, but I didn't quite get
 your meaning and didn't want to distract things further at that stage.  You
 wrote that it doesn't do what KEY LOCKS are designed to do...  any chance
 you might recall what the problem was?

The IMMUTABLE idea would work, but it requires all users to recode
their apps. By the time they've done that we'll have probably fixed
the problem in full anyway, so then we have to ask them to stop again,
which is hard so we'll be stuck with a performance tweak that applies
to just one release. So its the fully automatic solution we're looking
for. I don't object to someone implementing IMMUTABLE, I'm just saying
its not a way to get this patch simpler and therefore acceptable.

If people are willing to recode apps to avoid this then hire me and
I'll tell you how ;-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:
 
  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.
 
 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).
 
 Can someone explain? We might think of a way to avoid that.

Sure.  The problem is that we are allowing updated rows to be locked (and
locked rows to be updated).  This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

(In the previous code, a multixact never meant an update, it always
signified only shared locks.  After a crash, all backends that could
have been holding locks must necessarily be gone, so the multixact info
is not interesting and can be treated like the tuple is simply live.)

This means that this extended Xmax info needs to be able to survive, so
that it's possible to retrieve it after a crash; because even if the
lockers are all gone, the updater might have committed and this means
the tuple is dead.  If we failed to keep this, the tuple would be
considered live which would be wrong because the other version of the
tuple, which was created by the update, is also live.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012:
 On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch n...@leadboat.com wrote:

  Making pg_multixact persistent across clean shutdowns is no bridge to cross
  lightly, since it means committing to an on-disk format for an indefinite
  period.  We should do it; the benefits of this patch justify it, and I 
  haven't
  identified a way to avoid it without incurring worse problems.

 I can't actually see anything in the patch that explains why this is
 required. (That is something we should reject more patches on, since
 it creates a higher maintenance burden).

 Can someone explain? We might think of a way to avoid that.

 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

 This means that this extended Xmax info needs to be able to survive, so
 that it's possible to retrieve it after a crash; because even if the
 lockers are all gone, the updater might have committed and this means
 the tuple is dead.  If we failed to keep this, the tuple would be
 considered live which would be wrong because the other version of the
 tuple, which was created by the update, is also live.

OK, thanks.

So why do we need pg_upgrade support?

If pg_multixact is not persistent now, surely there is no requirement
to have pg_upgrade do any form of upgrade? The only time we'll need to
do this is from 9.2 to 9.3, which can of course occur any time in next
year. That doesn't sound like a reason to block a patch now, because
of something that will be needed a year from now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Sure.  The problem is that we are allowing updated rows to be locked (and
 locked rows to be updated).  This means that we need to store extended
 Xmax information in tuples that goes beyond mere locks, which is what we
 were doing previously -- they may now have locks and updates simultaneously.

 (In the previous code, a multixact never meant an update, it always
 signified only shared locks.  After a crash, all backends that could
 have been holding locks must necessarily be gone, so the multixact info
 is not interesting and can be treated like the tuple is simply live.)

Ugh.  I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.  This
seems like a horrid mess that's going to be unsustainable both from a
complexity and a performance standpoint.  The only reason multixacts
were tolerable at all was that they had only one semantics.  Changing
it so that maybe a multixact represents an actual updater and maybe
it doesn't is not sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012:
 
 On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch n...@leadboat.com wrote:
 
  All in all, I think this is in pretty much final shape.  Only pg_upgrade
  bits are still missing.  If sharp eyes could give this a critical look
  and knuckle-cracking testers could give it a spin, that would be
  helpful.
 
  Lack of pg_upgrade support leaves this version incomplete, because that
  omission would constitute a blocker for beta 2.  This version changes as 
  much
  code compared to the version I reviewed at the beginning of the CommitFest 
  as
  that version changed overall.  In that light, it's time to close the books 
  on
  this patch for the purpose of this CommitFest; I'm marking it Returned with
  Feedback.  Thanks for your efforts thus far.

Now this is an interesting turn of events.  I must thank you for your
extensive review effort in the current version of the patch, and also
thank you and credit you for the idea that initially kicked this patch
from the older, smaller, simpler version I wrote during the 9.1 timeline
(which you also reviewed exhaustively).  Without your and Simon's
brilliant ideas, this patch wouldn't exist at all.

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.

As for the late code churn, it mostly happened as a result of your
own feedback; I would have left most of it in the original state, but as
I went ahead it seemed much better to refactor things.  This is mostly
in heapam.c.  As for multixact.c, it also had a lot of churn, but that
was mostly to restore it to the state it has in the master branch,
dropping much of the code I had written to handle multixact truncation.
The new code there and in the vacuum code path (relminmxid and so on) is
a lot smaller than that other code was, and it's closely based on
relfrozenxid which is a known piece of technology.

 My view would be that with 90 files touched this is a very large
 patch, so that alone makes me wonder whether we should commit this
 patch, so I agree with Noah and compliment him on an excellent
 detailed review.

I note, however, that the bulk of the patch is in three files --
multixact.c, tqual.c, heapam.c, as is clearly illustrated in the diff
stats I posted.  The rest of them are touched mostly to follow their new
APIs (and of course to add tests and docs).

To summarize, of 94 files touched in total:
* 22 files are in src/test/isolation/
  (new and updated tests and expected files)
* 19 files are in src/include/
* 10 files are in contrib/
* 39 files are in src/backend;
 * in that subdir, there are 3097 insertions and 1006 deletions
 * 3047 (83%) of which are in heapam.c multixact.c tqual.c
 * one is a README

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.
 
 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.
 
 So please let's not focus on the 100%, lets focus on 80/20.

Well, we have the patch I originally posted in the 9.1 timeframe.
That's a lot smaller and simpler.  However, that solves only part of the
blocking problem, and in particular it doesn't fix the initial deadlock
reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case
you wonder about his change of email address) that started this effort
in the first place.  I don't think we can cut down to that and still
satisfy the users that requested this; and Glue was just the first one,
because after I started blogging about this, some more people started
asking for it.

I don't think there's any useful middlepoint between that one and the
current one, but maybe I'm wrong.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of jue feb 23 12:12:13 -0300 2012:
 On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Sure.  The problem is that we are allowing updated rows to be locked (and
  locked rows to be updated).  This means that we need to store extended
  Xmax information in tuples that goes beyond mere locks, which is what we
  were doing previously -- they may now have locks and updates simultaneously.

 OK, thanks.
 
 So why do we need pg_upgrade support?

Two reasons.  One is that in upgrades from a version that contains this
patch to another version that also contains this patch (i.e. future
upgrades), we need to copy the multixact files from the old cluster to
the new.

The other is that in upgrades from a version that doesn't contain this
patch to a version that does, we need to set the multixact limit values
so that values that were used in the old cluster are returned as empty
values (keeping the old semantics); otherwise they would cause errors
trying to read the member Xids from disk.

 If pg_multixact is not persistent now, surely there is no requirement
 to have pg_upgrade do any form of upgrade? The only time we'll need to
 do this is from 9.2 to 9.3, which can of course occur any time in next
 year. That doesn't sound like a reason to block a patch now, because
 of something that will be needed a year from now.

I think there's a policy that we must allow upgrades from one beta to
the next, which is why Noah says this is a blocker starting from beta2.

The pg_upgrade code for this is rather simple however.  There's no
rocket science there.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012:

Alvaro Herrera alvhe...@commandprompt.com writes:
Sure. The problem is that we are allowing updated rows to be locked (and
locked rows to be updated). This means that we need to store extended
Xmax information in tuples that goes beyond mere locks, which is what we
were doing previously -- they may now have locks and updates simultaneously.

Ugh. I had not been paying attention to what you were doing in this
patch, and now that I read this I wish I had objected earlier.

Uhm, yeah, a lot earlier -- I initially blogged about this in August
last year:
http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/

and in several posts in pgsql-hackers.

As far as complexity, yeah, it's a lot more complex now -- no question
about that.

Regarding performance, the good thing about this patch is that if you
have an operation that used to block, it might now not block. So maybe
multixact-related operation is a bit slower than before, but if it
allows you to continue operating rather than sit waiting until some
other transaction releases you, it's much better.

As for sanity -- I regard multixacts as a way to store extended Xmax
information. The original idea was obviously much more limited in that
the extended info was just locking info. We've extended it but I don't
think it's such a stretch.

I have been posting about (most? all of?) the ideas that I've been
following to make this work at all, so that people had plenty of chances
to disagree with them -- and Noah and others did disagree with many of
them, so I changed the patch accordingly. I'm not closed to further
rework, but I'm not going to entirely abandon the idea too lightly.

I'm sure there are bugs too, but hopefully there are as shallow as
interested reviewer eyeballs there are.

--
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner

Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 As for sanity -- I regard multixacts as a way to store extended
 Xmax information.  The original idea was obviously much more
 limited in that the extended info was just locking info.  We've
 extended it but I don't think it's such a stretch.
 
Since the limitation on what can be stored in xmax was the killer
for Florian's attempt to support SELECT FOR UPDATE in a form which
was arguably more useful (and certainly more convenient for those
converting from other database products), I'm wondering whether
anyone has determined whether this new scheme would allow Florian's
work to be successfully completed.  The issues seem very similar. 
If this approach also provides a basis for the other work, I think
it helps bolster the argument that this is a good design; if not, I
think it suggests that maybe it should be made more general or
extensible in some way.  Once this has to be supported by pg_upgrade
it will be harder to change the format, if that is needed for some
other feature.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Kevin Grittner's message of jue feb 23 13:31:36 -0300 2012:
 
 Alvaro Herrera alvhe...@commandprompt.com wrote:
  
  As for sanity -- I regard multixacts as a way to store extended
  Xmax information.  The original idea was obviously much more
  limited in that the extended info was just locking info.  We've
  extended it but I don't think it's such a stretch.
  
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form which
 was arguably more useful (and certainly more convenient for those
 converting from other database products), I'm wondering whether
 anyone has determined whether this new scheme would allow Florian's
 work to be successfully completed.  The issues seem very similar. 
 If this approach also provides a basis for the other work, I think
 it helps bolster the argument that this is a good design; if not, I
 think it suggests that maybe it should be made more general or
 extensible in some way.  Once this has to be supported by pg_upgrade
 it will be harder to change the format, if that is needed for some
 other feature.

I have no idea what improvements Florian was seeking, but multixacts now
have plenty of bit flag space to indicate whatever we want for each
member transaction, so most likely the answer is yes.  However we need
to make clear that a single SELECT FOR UPDATE in a tuple does not
currently use a multixact; if we wish to always store flags then we are
forced to use one which incurs a performance hit.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith


On 02/23/2012 10:43 AM, Alvaro Herrera wrote:

I completely understand that you don't want to review this latest
version of the patch; it's a lot of effort and I wouldn't inflict it on
anybody who hasn't not volunteered.  However, it doesn't seem to me that
this is reason to boot the patch from the commitfest.  I think the thing
to do would be to remove yourself from the reviewers column and set it
back to needs review, so that other reviewers can pick it up.


This feature made Robert's list of serious CF concerns, too, and the 
idea that majorly revised patches might be punted isn't a new one.  Noah 
is certainly justified in saying you're off his community support list, 
after all the review work he's been doing for this CF.


We here think it would be a shame for all of these other performance 
bits to be sorted but still have this one loose though, if it's possible 
to keep going on it.  It's well known as something on Simon's peeve list 
for some time now.  I was just reading someone else ranting about how 
this foreign key locking issue proves Postgres isn't enterprise scale 
yesterday, it was part of an article proving why DB2 is worth paying for 
I think.  This change crosses over into the advocacy area due to that, 
albeit only for the people who have been burned by this already.


If the main problem is pg_upgrade complexity, eventually progress on 
that front needs to be made.  I'm surprised the project has survived 
this long without needing anything beyond catalog conversion for 
in-place upgrade.  That luck won't hold forever.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Greg Smith's message of jue feb 23 14:48:13 -0300 2012:
 On 02/23/2012 10:43 AM, Alvaro Herrera wrote:
  I completely understand that you don't want to review this latest
  version of the patch; it's a lot of effort and I wouldn't inflict it on
  anybody who hasn't not volunteered.  However, it doesn't seem to me that
  this is reason to boot the patch from the commitfest.  I think the thing
  to do would be to remove yourself from the reviewers column and set it
  back to needs review, so that other reviewers can pick it up.
 
 This feature made Robert's list of serious CF concerns, too, and the 
 idea that majorly revised patches might be punted isn't a new one.

Well, this patch (or rather, a previous incarnation of it) got punted
from 9.1's fourth commitfest; I intended to have the new version in
9.2's first CF, but business reasons (which I will not discuss in
public) forced me otherwise.  So here we are again -- as I said to Tom,
I don't intend to let go of this one easily, though of course I will
concede to whatever the community decides.

 Noah 
 is certainly justified in saying you're off his community support list, 
 after all the review work he's been doing for this CF.

Yeah, I can't blame him.  I've been trying to focus most of my review
availability on his own patches precisely due to that, but it's very
clear to me that his effort is larger than mine.

 We here think it would be a shame for all of these other performance 
 bits to be sorted but still have this one loose though, if it's possible 
 to keep going on it.  It's well known as something on Simon's peeve list 
 for some time now.  I was just reading someone else ranting about how 
 this foreign key locking issue proves Postgres isn't enterprise scale 
 yesterday, it was part of an article proving why DB2 is worth paying for 
 I think.  This change crosses over into the advocacy area due to that, 
 albeit only for the people who have been burned by this already.

Yeah, Simon's been on this particular issue for quite some time -- which
is probably why the initial idea that kickstarted this patch was his.
Personally I've been in the not enterprise strength camp for a long
time, mostly unintentionally; you can see that by tracing how my major
patches close holes in that kind of area (cluster loses indexes, we
don't have subtransactions, foreign key concurrency sucks (-- SELECT
FOR SHARE), manual vacuum is teh sux0r, and now this one about FKs
again).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Greg Smith


On 02/23/2012 01:04 PM, Alvaro Herrera wrote:

manual vacuum is teh sux0r


I think you've just named my next conference talk submission.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Thu, Feb 23, 2012 at 4:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 As far as complexity, yeah, it's a lot more complex now -- no question
 about that.

As far as complexity goes, would it be easier if we treated the UPDATE
of a primary key column as a DELETE plus an INSERT?

There's not really a logical reason why updating a primary key has
meaning, so allowing an ExecPlanQual to follow the chain across
primary key values doesn't seem valid to me.

That would make all primary keys IMMUTABLE to updates.

No primary key, no problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Kevin Grittner

Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Kevin Grittner's message:
 
 Since the limitation on what can be stored in xmax was the killer
 for Florian's attempt to support SELECT FOR UPDATE in a form
 which was arguably more useful (and certainly more convenient for
 those converting from other database products), I'm wondering
 whether anyone has determined whether this new scheme would allow
 Florian's work to be successfully completed.  The issues seem
 very similar. If this approach also provides a basis for the
 other work, I think it helps bolster the argument that this is a
 good design; if not, I think it suggests that maybe it should be
 made more general or extensible in some way.  Once this has to be
 supported by pg_upgrade it will be harder to change the format,
 if that is needed for some other feature.
 
 I have no idea what improvements Florian was seeking, but
 multixacts now have plenty of bit flag space to indicate whatever
 we want for each member transaction, so most likely the answer is
 yes.  However we need to make clear that a single SELECT FOR
 UPDATE in a tuple does not currently use a multixact; if we wish
 to always store flags then we are forced to use one which incurs a
 performance hit.
 
Well, his effort really started to go into a tailspin on the related
issues here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01743.php
 
... with a summary of the problem and possible directions for a
solution here:
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg01833.php
 
One of the problems that Florian was trying to address is that
people often have a need to enforce something with a lot of
similarity to a foreign key, but with more subtle logic than
declarative foreign keys support.  One example would be the case
Robert has used in some presentations, where the manager column in
each row in a project table must contain the id of a row in a person
table *which has the project_manager boolean column set to TRUE*. 
Short of using the new serializable transaction isolation level in
all related transactions, hand-coding enforcement of this useful
invariant through trigger code (or application code enforced through
some framework) is very tricky.  The change to SELECT FOR UPDATE
that Florian was working on would make it pretty straightforward.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Noah Misch

On Thu, Feb 23, 2012 at 02:08:28PM +0100, Jeroen Vermeulen wrote:
 On 2012-02-23 10:18, Simon Riggs wrote:

 However, review of such a large patch should not be simply pass or
 fail. We should be looking back at the original problem and ask
 ourselves whether some subset of the patch could solve a useful subset
 of the problem. For me, that seems quite likely and this is very
 definitely an important patch.

 Even if we can't solve some part of the problem we can at least commit
 some useful parts of infrastructure to allow later work to happen more
 smoothly and quickly.

 So please let's not focus on the 100%, lets focus on 80/20.

 The suggested immutable-column constraint was meant as a potential  
 80/20 workaround.  Definitely not a full solution, helpful to some,  
 probably easier to do.  I don't know if an immutable key would actually  
 be enough to elide foreign-key locks though.

That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.

If you had both an immutable column constraint and a never-deleted table
constraint, that combination would be sufficient to simplify the picture.
(Directly or indirectly, it would not actually be a never-deleted constraint
so much as a you must take AccessExclusiveLock to DELETE constraint.)
Foreign-side DML would then take an AccessShareLock on the parent table with
no tuple lock at all.

By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt