Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-26 Thread Robert Haas
On Mon, Jun 25, 2012 at 11:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  That was actually a gloss I added on existing code to try to
 convince myself that it was safe; I don't think that the changes I
 made make that any more or less safe than it was before.

 Right, sorry.  I thought there was some strength reduction going on
 there as well.

 Thanks for the various explanations, they address my concerns.  I see
 that v2 applies over v1.

 I've verified performance improvements using 8 cores with my proposed
 pgbench -P benchmark, with a scale that fits in shared_buffers.
 It brings it most of the way, but not quite, up to the btree performance.


 I've marked this as ready for committer.

Thanks for the review; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-25 Thread Jeff Janes
On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  That was actually a gloss I added on existing code to try to
 convince myself that it was safe; I don't think that the changes I
 made make that any more or less safe than it was before.

Right, sorry.  I thought there was some strength reduction going on
there as well.

Thanks for the various explanations, they address my concerns.  I see
that v2 applies over v1.

I've verified performance improvements using 8 cores with my proposed
pgbench -P benchmark, with a scale that fits in shared_buffers.
It brings it most of the way, but not quite, up to the btree performance.


I've marked this as ready for committer.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-18 Thread Robert Haas
On Fri, Jun 15, 2012 at 1:58 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Do we need the retry flag (applies to two places)?  If oldblkno is
 still InvalidBlockNumber then it can't equal blkno.

I was a bit concerned that blkno = BUCKET_TO_BLKNO(metap, bucket)
might set blkno to InvalidBlockNumber, thus possibly messing up the
algorithm.  This way seemed better in terms of not requiring any
assumption in that area.

 In the README, the psuedo codes probably needs to mention re-locking
 the meta page in the loop.

Good point.  Fixed.

 Also, page is used to mean either the disk page or the shared buffer
 currently holding that page, depending on context.  This is confusing.
  Maybe we should clarify Lock the meta page buffer.  Of course this
 gripe precedes your patch and applies to other parts of the code as
 well, but since we mingle LW locks (on buffers) and heavy locks (on
 names of disk pages) it might make sense to be more meticulous here.

I attempted to improve this somewhat in the README, but I think it may
be more than I can do completely.

 exclusive-lock page 0 (assert the right to begin a split) is no
 longer true, nor is release X-lock on page 0

I think this is fixed.

 Also in the README, section To prevent deadlock we enforce these
 coding rules: would need to be changed as those rules are being
 changed.  But, should we change them at all?

 In _hash_expandtable, the claim But since we are only trylocking here
 it should be OK doesn't completely satisfy me.  Even a conditional
 heavy-lock acquire still takes a transient non-conditional LW Lock on
 the lock manager partition.  Unless there is a global rule that no one
 can take a buffer content lock while holding a lock manager partition
 lock, this seems dangerous.  Could this be redone to follow the
 pattern of heavy locking with no content lock, then reacquiring the
 buffer content lock to check to make sure we locked the correct
 things?  I don't know if it would be better to loop, or just give up,
 if the meta page changed underneath us.

Hmm.  That was actually a gloss I added on existing code to try to
convince myself that it was safe; I don't think that the changes I
made make that any more or less safe than it was before.  But the
question of whether or not it is safe is an interesting one.  I do
believe that your statement is true, though: lock manager locks - or
backend locks, for the fast-path - should be the last thing we lock.
In some cases we lock more than one lock manager partition lock, but
we never lock anything else afterwards, AFAIK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


hash-avoid-heavyweight-metapage-locks-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-15 Thread Jeff Janes
On Wed, May 30, 2012 at 3:14 PM, Robert Haas robertmh...@gmail.com wrote:
 I developed the attached patch to avoid taking a heavyweight lock on
 the metapage of a hash index.  Instead, an exclusive buffer content
 lock is viewed as sufficient permission to modify the metapage, and a
 shared buffer content lock is used when such modifications need to be
 prevented.  For the most part this is a trivial change, because we
 were already taking these locks: we were just taking the heavyweight
 locks in addition.  The only sticking point is that, when we're
 searching or inserting, we previously locked the bucket before
 releasing the heavyweight metapage lock, which is unworkable when
 holding only a buffer content lock because (1) we might deadlock and
 (2) buffer content locks can't be held for long periods of time even
 when there's no deadlock risk.  To fix this, I implemented a simple
 loop-and-retry system: we release the metapage content lock, acquire
 the heavyweight lock on the target bucket, and then reacquire the
 metapage content lock and check that the bucket mapping has not
 changed.   Normally it hasn't, and we're done.  But if by chance it
 has, we simply unlock the metapage, release the heavyweight lock we
 acquired previously, lock the new bucket, and loop around again.  Even
 in the worst case we cannot loop very many times here, since we don't
 split the same bucket again until we've split all the other buckets,
 and 2^N gets big pretty fast.

Do we need the retry flag (applies to two places)?  If oldblkno is
still InvalidBlockNumber then it can't equal blkno.
I think the extra variable might be clearer than the magic value, but
we already have the magic value so do we want to have both a flag
variable and a magic value?

+   if (retry)
+   {
+   if (oldblkno == blkno)
+   break;
+   _hash_droplock(rel, oldblkno, HASH_SHARE);
+   }


In the README, the psuedo codes probably needs to mention re-locking
the meta page in the loop.

Also, page is used to mean either the disk page or the shared buffer
currently holding that page, depending on context.  This is confusing.
 Maybe we should clarify Lock the meta page buffer.  Of course this
gripe precedes your patch and applies to other parts of the code as
well, but since we mingle LW locks (on buffers) and heavy locks (on
names of disk pages) it might make sense to be more meticulous here.

exclusive-lock page 0 (assert the right to begin a split) is no
longer true, nor is release X-lock on page 0


Also in the README, section To prevent deadlock we enforce these
coding rules: would need to be changed as those rules are being
changed.  But, should we change them at all?

In _hash_expandtable, the claim But since we are only trylocking here
it should be OK doesn't completely satisfy me.  Even a conditional
heavy-lock acquire still takes a transient non-conditional LW Lock on
the lock manager partition.  Unless there is a global rule that no one
can take a buffer content lock while holding a lock manager partition
lock, this seems dangerous.  Could this be redone to follow the
pattern of heavy locking with no content lock, then reacquiring the
buffer content lock to check to make sure we locked the correct
things?  I don't know if it would be better to loop, or just give up,
if the meta page changed underneath us.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Dickson S. Guedes
2012/5/30 Robert Haas robertmh...@gmail.com:
 I tested the effect of this by setting up a series of 5-minute
 read-only pgbench run at scale factor 300 with 8GB of shared buffers
 on the IBM POWER7 machine.

I know it doesn't matter, but out of curiosity what OS you used?

best regards,
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 7:07 AM, Dickson S. Guedes lis...@guedesoft.net wrote:
 2012/5/30 Robert Haas robertmh...@gmail.com:
 I tested the effect of this by setting up a series of 5-minute
 read-only pgbench run at scale factor 300 with 8GB of shared buffers
 on the IBM POWER7 machine.

 I know it doesn't matter, but out of curiosity what OS you used?

Fedora 16, Linux 3.2.6-3.fc16.ppc64

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-30 Thread Robert Haas
I developed the attached patch to avoid taking a heavyweight lock on
the metapage of a hash index.  Instead, an exclusive buffer content
lock is viewed as sufficient permission to modify the metapage, and a
shared buffer content lock is used when such modifications need to be
prevented.  For the most part this is a trivial change, because we
were already taking these locks: we were just taking the heavyweight
locks in addition.  The only sticking point is that, when we're
searching or inserting, we previously locked the bucket before
releasing the heavyweight metapage lock, which is unworkable when
holding only a buffer content lock because (1) we might deadlock and
(2) buffer content locks can't be held for long periods of time even
when there's no deadlock risk.  To fix this, I implemented a simple
loop-and-retry system: we release the metapage content lock, acquire
the heavyweight lock on the target bucket, and then reacquire the
metapage content lock and check that the bucket mapping has not
changed.   Normally it hasn't, and we're done.  But if by chance it
has, we simply unlock the metapage, release the heavyweight lock we
acquired previously, lock the new bucket, and loop around again.  Even
in the worst case we cannot loop very many times here, since we don't
split the same bucket again until we've split all the other buckets,
and 2^N gets big pretty fast.

I tested the effect of this by setting up a series of 5-minute
read-only pgbench run at scale factor 300 with 8GB of shared buffers
on the IBM POWER7 machine.  For these runs, I dropped the the primary
key constraint on pgbench_accounts (aid) and created a hash index on
that column instead.  I ran each test three times and took the median
result.  Here are the results on unpatched master, at various client
counts:

m01 tps = 9004.070422 (including connections establishing)
m04 tps = 34838.126542 (including connections establishing)
m08 tps = 70584.356826 (including connections establishing)
m16 tps = 128726.248198 (including connections establishing)
m32 tps = 123639.248172 (including connections establishing)
m64 tps = 104650.296143 (including connections establishing)
m80 tps = 88412.736416 (including connections establishing)

And here are the results with the patch:

h01 tps = 9110.561413 (including connections establishing) [+1.2%]
h04 tps = 36012.787524 (including connections establishing) [+3.4%]
h08 tps = 72606.302993 (including connections establishing) [+2.9%]
h16 tps = 141938.762793 (including connections establishing) [+10%]
h32 tps = 205325.232316 (including connections establishing) [+66%]
h64 tps = 274156.881975 (including connections establishing) [+162%]
h80 tps = 291224.012066 (including connections establishing) [+229%]

Obviously, even with this change, there's a lot not to like about hash
indexes: they still won't be crash-safe, and they still won't perform
as well under high concurrency as btree indexes.  But neither of those
problems seems like a good reason not to fix this problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


hash-avoid-heavyweight-metapage-locks-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers