Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-12-07 Thread Michael Paquier
On Wed, Dec 3, 2014 at 4:03 PM, Michael Paquier
 wrote:
> Ping? This patch is in a stale state for a couple of weeks and still
> marked as waiting on author for this CF.
Marked as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-12-02 Thread Michael Paquier
On Tue, Nov 18, 2014 at 12:33 AM, Robert Haas  wrote:
> On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund  
> wrote:
>> On 2014-11-17 10:21:04 -0500, Robert Haas wrote:
>>> Andres, where are we with this patch?
>>>
>>> 1. You're going to commit it, but haven't gotten around to it yet.
>>>
>>> 2. You're going to modify it some more and repost, but haven't gotten
>>> around to it yet.
>>>
>>> 3. You're willing to see it modified if somebody else does the work,
>>> but are out of time to spend on it yourself.
>>>
>>> 4. Something else?
>>
>> I'm working on it. Amit had found a hang on PPC that I couldn't
>> reproduce on x86. Since then I've reproduced it and I think yesterday I
>> found the problem. Unfortunately it always took a couple hours to
>> trigger...
>>
>> I've also made some, in my opinion, cleanups to the patch since
>> then. Those have the nice side effect of making the size of struct
>> LWLock smaller, but that wasn't actually the indended effect.
>>
>> I'll repost once I've verified the problem is fixed and I've updated all
>> commentary.
>>
>> The current problem is that I seem to have found a problem that's also
>> reproducible with master :(. After a couple of hours a
>> pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S
>> against a
>> -c max_connections=200 -c shared_buffers=4GB
>> cluster seems to hang on PPC. With all the backends waiting in buffer
>> mapping locks. I'm now making sure it's really master and not my patch
>> causing the problem - it's just not trivial with 180 processes involved.
>
> Ah, OK.  Thanks for the update.
Ping? This patch is in a stale state for a couple of weeks and still
marked as waiting on author for this CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-11-17 Thread Robert Haas
On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund  wrote:
> On 2014-11-17 10:21:04 -0500, Robert Haas wrote:
>> Andres, where are we with this patch?
>>
>> 1. You're going to commit it, but haven't gotten around to it yet.
>>
>> 2. You're going to modify it some more and repost, but haven't gotten
>> around to it yet.
>>
>> 3. You're willing to see it modified if somebody else does the work,
>> but are out of time to spend on it yourself.
>>
>> 4. Something else?
>
> I'm working on it. Amit had found a hang on PPC that I couldn't
> reproduce on x86. Since then I've reproduced it and I think yesterday I
> found the problem. Unfortunately it always took a couple hours to
> trigger...
>
> I've also made some, in my opinion, cleanups to the patch since
> then. Those have the nice side effect of making the size of struct
> LWLock smaller, but that wasn't actually the indended effect.
>
> I'll repost once I've verified the problem is fixed and I've updated all
> commentary.
>
> The current problem is that I seem to have found a problem that's also
> reproducible with master :(. After a couple of hours a
> pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S
> against a
> -c max_connections=200 -c shared_buffers=4GB
> cluster seems to hang on PPC. With all the backends waiting in buffer
> mapping locks. I'm now making sure it's really master and not my patch
> causing the problem - it's just not trivial with 180 processes involved.

Ah, OK.  Thanks for the update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-11-17 Thread Andres Freund
On 2014-11-17 10:21:04 -0500, Robert Haas wrote:
> Andres, where are we with this patch?
> 
> 1. You're going to commit it, but haven't gotten around to it yet.
> 
> 2. You're going to modify it some more and repost, but haven't gotten
> around to it yet.
> 
> 3. You're willing to see it modified if somebody else does the work,
> but are out of time to spend on it yourself.
> 
> 4. Something else?

I'm working on it. Amit had found a hang on PPC that I couldn't
reproduce on x86. Since then I've reproduced it and I think yesterday I
found the problem. Unfortunately it always took a couple hours to
trigger...

I've also made some, in my opinion, cleanups to the patch since
then. Those have the nice side effect of making the size of struct
LWLock smaller, but that wasn't actually the indended effect.

I'll repost once I've verified the problem is fixed and I've updated all
commentary.

The current problem is that I seem to have found a problem that's also
reproducible with master :(. After a couple of hours a
pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S
against a
-c max_connections=200 -c shared_buffers=4GB
cluster seems to hang on PPC. With all the backends waiting in buffer
mapping locks. I'm now making sure it's really master and not my patch
causing the problem - it's just not trivial with 180 processes involved.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-11-17 Thread Robert Haas
On Sat, Oct 25, 2014 at 1:50 AM, Amit Kapila  wrote:
> On Fri, Oct 24, 2014 at 4:05 PM, Andres Freund 
> wrote:
>>
>> On 2014-10-24 15:59:30 +0530, Amit Kapila wrote:
>> > > > and w.r.t performance it can lead extra
>> > > > function call, few checks and I think in some cases even can
>> > > > acquire/release spinlock.
>> > >
>> > > I fail to see how that could be the case.
>> >
>> > Won't it happen incase first backend sets releaseOK to true and another
>> > backend which tries to wakeup waiters on lock will acquire spinlock
>> > and tries to release the waiters.
>>
>> Sure, that can happen.
>> > > And again, this is code that's
>> > > only executed around a couple syscalls. And the cacheline will be
>> > > touched around there *anyway*.
>> >
>> > Sure, but I think syscalls are required in case we need to wake any
>> > waiter.
>>
>> It won't wake up a waiter if there's none on the list.
>
> Yeap, but still it will acquire/release spinlock.
>
>> > > > > And it'd be a pretty pointless
>> > > > > behaviour, leading to useless increased contention. The only time
>> > > > > it'd
>> > > > > make sense for X to be woken up is when it gets run faster than
>> > > > > the S
>> > > > > processes.
>> > > >
>> > > > Do we get any major benefit by changing the logic of waking up
>> > > > waiters?
>> > >
>> > > Yes.
>> >
>> > I think one downside I could see of new strategy is that the chance of
>> > Exclusive waiter to take more time before getting woked up is increased
>> > as now it will by pass Exclusive waiters in queue.
>>
>> Note that that *already* happens for any *new* shared locker that comes
>> in. It doesn't really make sense to have share lockers queued behind the
>> exclusive locker if others just go in front of it anyway.
>
> Yeah, but I think it is difficult to avoid that behaviour as even when it
> wakes
> Exclusive locker, some new shared locker can comes in and acquire the
> lock before Exclusive locker.
>
> I think it is difficult to say what is the best waking strategy, as priority
> for
> Exclusive lockers is not clearly defined incase of LWLocks.

Andres, where are we with this patch?

1. You're going to commit it, but haven't gotten around to it yet.

2. You're going to modify it some more and repost, but haven't gotten
around to it yet.

3. You're willing to see it modified if somebody else does the work,
but are out of time to spend on it yourself.

4. Something else?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-24 Thread Amit Kapila
On Fri, Oct 24, 2014 at 4:05 PM, Andres Freund 
wrote:
>
> On 2014-10-24 15:59:30 +0530, Amit Kapila wrote:
> > > > and w.r.t performance it can lead extra
> > > > function call, few checks and I think in some cases even can
> > > > acquire/release spinlock.
> > >
> > > I fail to see how that could be the case.
> >
> > Won't it happen incase first backend sets releaseOK to true and another
> > backend which tries to wakeup waiters on lock will acquire spinlock
> > and tries to release the waiters.
>
> Sure, that can happen.
> > > And again, this is code that's
> > > only executed around a couple syscalls. And the cacheline will be
> > > touched around there *anyway*.
> >
> > Sure, but I think syscalls are required in case we need to wake any
> > waiter.
>
> It won't wake up a waiter if there's none on the list.

Yeap, but still it will acquire/release spinlock.

> > > > > And it'd be a pretty pointless
> > > > > behaviour, leading to useless increased contention. The only time
it'd
> > > > > make sense for X to be woken up is when it gets run faster than
the S
> > > > > processes.
> > > >
> > > > Do we get any major benefit by changing the logic of waking up
waiters?
> > >
> > > Yes.
> >
> > I think one downside I could see of new strategy is that the chance of
> > Exclusive waiter to take more time before getting woked up is increased
> > as now it will by pass Exclusive waiters in queue.
>
> Note that that *already* happens for any *new* shared locker that comes
> in. It doesn't really make sense to have share lockers queued behind the
> exclusive locker if others just go in front of it anyway.

Yeah, but I think it is difficult to avoid that behaviour as even when it
wakes
Exclusive locker, some new shared locker can comes in and acquire the
lock before Exclusive locker.

I think it is difficult to say what is the best waking strategy, as
priority for
Exclusive lockers is not clearly defined incase of LWLocks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-24 Thread Andres Freund
On 2014-10-24 15:59:30 +0530, Amit Kapila wrote:
> > > and w.r.t performance it can lead extra
> > > function call, few checks and I think in some cases even can
> > > acquire/release spinlock.
> >
> > I fail to see how that could be the case.
> 
> Won't it happen incase first backend sets releaseOK to true and another
> backend which tries to wakeup waiters on lock will acquire spinlock
> and tries to release the waiters.

Sure, that can happen.

> > And again, this is code that's
> > only executed around a couple syscalls. And the cacheline will be
> > touched around there *anyway*.
> 
> Sure, but I think syscalls are required in case we need to wake any
> waiter.

It won't wake up a waiter if there's none on the list.
> > > > And it'd be a pretty pointless
> > > > behaviour, leading to useless increased contention. The only time it'd
> > > > make sense for X to be woken up is when it gets run faster than the S
> > > > processes.
> > >
> > > Do we get any major benefit by changing the logic of waking up waiters?
> >
> > Yes.
> 
> I think one downside I could see of new strategy is that the chance of
> Exclusive waiter to take more time before getting woked up is increased
> as now it will by pass Exclusive waiters in queue.

Note that that *already* happens for any *new* shared locker that comes
in. It doesn't really make sense to have share lockers queued behind the
exclusive locker if others just go in front of it anyway.

> > > Code is more readable, but I don't understand why you
> > > want to do refactoring as part of this patch which ideally
> > > doesn't get any benefit from the same.
> >
> > I did it first without. But there's required stuff like
> > LWLockDequeueSelf(). And I had several bugs because of the list stuff.
> >
> > And I did separate the conversion into a separate patch?
> 
> Yeah, but the main patch for wait free LW_SHARED also uses
> it.

Well, the only thing that it could have done given that the other patch
is a preqrequisite is reverting the behaviour?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-24 Thread Amit Kapila
On Wed, Oct 22, 2014 at 8:04 PM, Andres Freund 
wrote:
> On 2014-10-21 19:56:05 +0530, Amit Kapila wrote:
> > On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
> > wrote:
> > spin_delay_count gives
> > how much delay has happened to acquire spinlock which when
> > combined with other stats gives the clear situation about
> > the contention around aquisation of corresponding LWLock.
> > Now if we want to count the spin lock delay for Release call
> > as well, then the meaning of the stat is getting changed.
> > It might be that new meaning of spin_delay_count stat is more
> > useful in some situations, however the older one has its own
> > benefits, so I am not sure if changing this as part of this
> > patch is the best thing to do.
>
> In which case does the old definition make sense, where the new one
> doesn't? I don't think it exists.
>
> And changing it here seems to make sense because spinlock contention
> fundamentally changes it meaning for lwlocks anyway as in most paths we
> don't take a spinlock anymore.

On second thought, I think probably you are right here.

> > > > Why can't we decrement the nwaiters after waking up? I don't think
> > > > there is any major problem even if callers do that themselves, but
> > > > in some rare cases LWLockRelease() might spuriously assume that
> > > > there are some waiters and tries to call LWLockWakeup().  Although
> > > > this doesn't create any problem, keeping the value sane is good
unless
> > > > there is some problem in doing so.
> > >
> > > That won't work because then LWLockWakeup() wouldn't be called when
> > > necessary - precisely because nwaiters is 0.
> >
> > > The reason I've done so is that it's otherwise much harder to debug
> > > issues where there are backend that have been woken up already, but
> > > haven't rerun yet. Without this there's simply no evidence of that
> > > state. I can't see this being relevant for performance, so I'd rather
> > > have it stay that way.
> >
> > I am not sure what useful information we can get during debugging by not
> > doing this in LWLockWakeup()
>
> It's useful because you can detect backends that have been scheduled to
> acquire the lock, but haven't yet. They're otherwise "invisible".
>
> > and w.r.t performance it can lead extra
> > function call, few checks and I think in some cases even can
> > acquire/release spinlock.
>
> I fail to see how that could be the case.

Won't it happen incase first backend sets releaseOK to true and another
backend which tries to wakeup waiters on lock will acquire spinlock
and tries to release the waiters.

> And again, this is code that's
> only executed around a couple syscalls. And the cacheline will be
> touched around there *anyway*.

Sure, but I think syscalls are required in case we need to wake any
waiter.

> >
> >
> > In the above code, if the first waiter to be woken up is Exclusive
waiter,
> > then it will woke that waiter, else shared waiters till it got
> > the first exclusive waiter and then first exlusive waiter.
>
> That's would be bug then.

I am not sure of it, but I think it's more important to validate the
new waking startegy as you see benefits by doing so.

>Per the comment you quoted "If the front
> waiter wants exclusive lock, awaken him only. Otherwise awaken as many
> waiters as want shared access.".
>
> But I don't think it's what's happening. Note that 'proc =
> proc->lwWaitLink;' is only executed if 'proc->lwWaitLink->lwWaitMode !=
> LW_EXCLUSIVE'. Which is the next waiter...
>
>
> > > And it'd be a pretty pointless
> > > behaviour, leading to useless increased contention. The only time it'd
> > > make sense for X to be woken up is when it gets run faster than the S
> > > processes.
> >
> > Do we get any major benefit by changing the logic of waking up waiters?
>
> Yes.

I think one downside I could see of new strategy is that the chance of
Exclusive waiter to take more time before getting woked up is increased
as now it will by pass Exclusive waiters in queue.  I don't have any
concrete proof that it can do any harm to performance, so may be it's
okay to have this new mechanism, however I think it might be helpful
if you could add a comment in code to explain the benefit by skipping
Exclusive lockers.

> > Code is more readable, but I don't understand why you
> > want to do refactoring as part of this patch which ideally
> > doesn't get any benefit from the same.
>
> I did it first without. But there's required stuff like
> LWLockDequeueSelf(). And I had several bugs because of the list stuff.
>
> And I did separate the conversion into a separate patch?

Yeah, but the main patch for wait free LW_SHARED also uses
it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-24 Thread Amit Kapila
On Wed, Oct 22, 2014 at 7:12 PM, Andres Freund 
wrote:
>
> On 2014-10-22 13:32:07 +0530, Amit Kapila wrote:
> > On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila 
> > wrote:
> > > On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
> > wrote:
> > > >
> > > > On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> >
> > Today, I have verified all previous comments raised by
> > me and looked at new code and below are my findings:
> >
> > >>
> > >> 4.
> > >> LWLockAcquireCommon()
> > >> {
> > >> ..
> > >> if (!LWLockDequeueSelf(l))
> > >> {
> > >>  for (;;)
> > >> {
> > >> PGSemaphoreLock(&proc->sem, false);
> > >>  if (!proc->lwWaiting)
> > >> break;
> > >> extraWaits++;
> > >>  }
> > >> lock->releaseOK = true;
> > >> ..
> > >> }
> > >>
> > >> Setting releaseOK in above context might not be required  because if
the
> > >> control comes in this part of code, it will not retry to acquire
another
> > >> time.
> >
> > > Hm. You're probably right.
> >
> > You have agreed to fix this comment, but it seems you have forgot
> > to change it.
>
> After I've thought more about it, it's is actually required. This
> essentially *is* a retry.

Won't it needs to be set before retry? Whats the use of setting it
when we have got the lock and we are not going to retry.

> Someobdy woke us up, which is where releaseOK is supposed to be set.

I think that is true only in case when we are again going to retry or
atleast that seems to be the mechanism used currently in
LWLockAcquireCommon.

>
> > >> 11.
> > >> LWLockRelease()
> > >> {
> > >> ..
> > >> PRINT_LWDEBUG("LWLockRelease", lock, mode);
> > >> }
> > >>
> > >> Shouldn't this be in begining of LWLockRelease function rather than
> > >> after processing held_lwlocks array?
> >
> > > Ok.
> >
> > You have agreed to fix this comment, but it seems you have forgot
> > to change it.
>
>
>
> > Below comment doesn't seem to be adressed?
> >
> > > LWLockAcquireOrWait()
> > > {
> > > ..
> > > LOG_LWDEBUG("LWLockAcquireOrWait", lockid, mode, "succeeded");
> > > ..
> > > }
> >
> > > a. such a log is not there in any other LWLock.. variants,
> > >  if we want to introduce it, then shouldn't it be done at
> > >  other places as well.
>
> I think you're placing unneccessarily high consistency constraints on a
> debugging feature here.

This was just a very minor suggestion to keep code consistent,
which if you want to ignore is okay.  I understand that having
or not having code consistent for this doesn't matter.

> > Below point is yet to be resolved.
> >
> > > > 12.
> > > > #ifdef LWLOCK_DEBUG
> > > > lock->owner = MyProc;
> > > > #endif
> > > >
> > > > Shouldn't it be reset in LWLockRelease?
> > >
> > > That's actually intentional. It's quite useful to know the last owner
> > > when debugging lwlock code.
> >
> > Won't it cause any problem if the last owner process exits?
>
> No. PGPROCs aren't deallocated or anything. And it's a debugging only
> variable.

Thats right, the problem I was thinking is of wrong information.
Ex. if process holding Exclusive locker has exited and then
lot of other processes took shared locks and one new Exclusive
locker waits on getting the lock, at that moment during debugging
we can get wrong information about lock owner.

However I think you are mainly worried about situtions when many
backends are waiting for Exclusive locker which is probably the
most common scenario.


> > Can you explain how pg_read_barrier() in below code makes this
> > access safe?
> >
> > LWLockWakeup()
> > {
> > ..
> > + pg_read_barrier(); /* pairs with nwaiters-- */
> > + if (!BOOL_ACCESS_ONCE(lock->releaseOK))
> > ..
> > }
>
> What's the concern you have? Full memory barriers (the atomic
> nwaiters--) pair with read memory barriers.

IIUC, then pairing with nwaiters in LWLockAcquireCommon() ensures
that releaseOK is set before again attemting for lock as atomic
operation provides the necessary barrier.  The point I am not
getting is what kind of guarantee pg_read_barrier() provides us
or why is it required?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Andres Freund
On 2014-10-21 19:56:05 +0530, Amit Kapila wrote:
> On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
> wrote:
> >
> > On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > > 2.
> > > LWLockWakeup()
> > > {
> > > ..
> > > #ifdef LWLOCK_STATS
> > > lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> > > #else
> > > SpinLockAcquire(&lock->mutex);
> > > #endif
> > > ..
> > > }
> > >
> > > Earlier while releasing lock, we don't count it towards LWLock stats
> > > spin_delay_count.  I think if we see other places in lwlock.c, it only
> gets
> > > counted when we try to acquire it in a loop.
> >
> > I think the previous situation was clearly suboptimal. I've now modified
> > things so all spinlock acquirations are counted.
> 
> Code has mainly 4 stats (sh_acquire_count, ex_acquire_count,
> block_count, spin_delay_count) to track, if I try to see
> all stats together to understand the contention situation,
> the unpatched code makes sense.

I don't think it does. It completely disregards that the contention may
actually be in LWLockRelease(). That contributes to to spinlock
contention just as much as LWLockAcquire().

> spin_delay_count gives
> how much delay has happened to acquire spinlock which when
> combined with other stats gives the clear situation about
> the contention around aquisation of corresponding LWLock.
> Now if we want to count the spin lock delay for Release call
> as well, then the meaning of the stat is getting changed.
> It might be that new meaning of spin_delay_count stat is more
> useful in some situations, however the older one has its own
> benefits, so I am not sure if changing this as part of this
> patch is the best thing to do.

In which case does the old definition make sense, where the new one
doesn't? I don't think it exists.

And changing it here seems to make sense because spinlock contention
fundamentally changes it meaning for lwlocks anyway as in most paths we
don't take a spinlock anymore.

> > > 5.
> > > LWLockWakeup()
> > > {
> > > ..
> > > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > > {
> > > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > > dlist_delete(&waiter->lwWaitLink);
> > > pg_write_barrier();
> > > waiter->lwWaiting = false;
> > > PGSemaphoreUnlock(&waiter->sem);
> > > }
> > > ..
> > > }
> > >
> > > Why can't we decrement the nwaiters after waking up? I don't think
> > > there is any major problem even if callers do that themselves, but
> > > in some rare cases LWLockRelease() might spuriously assume that
> > > there are some waiters and tries to call LWLockWakeup().  Although
> > > this doesn't create any problem, keeping the value sane is good unless
> > > there is some problem in doing so.
> >
> > That won't work because then LWLockWakeup() wouldn't be called when
> > necessary - precisely because nwaiters is 0.
> 
> > The reason I've done so is that it's otherwise much harder to debug
> > issues where there are backend that have been woken up already, but
> > haven't rerun yet. Without this there's simply no evidence of that
> > state. I can't see this being relevant for performance, so I'd rather
> > have it stay that way.
> 
> I am not sure what useful information we can get during debugging by not
> doing this in LWLockWakeup()

It's useful because you can detect backends that have been scheduled to
acquire the lock, but haven't yet. They're otherwise "invisible".

> and w.r.t performance it can lead extra
> function call, few checks and I think in some cases even can
> acquire/release spinlock.

I fail to see how that could be the case. And again, this is code that's
only executed around a couple syscalls. And the cacheline will be
touched around there *anyway*.

> > > 6.
> > > LWLockWakeup()
> > > {
> > > ..
> > > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > > {
> > > ..
> > > if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> > > continue;
> > > ..
> > > if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> > > {
> > > ..
> > > wokeup_somebody = true;
> > > }
> > > ..
> > > }
> > > ..
> > > }
> > >
> > > a.
> > > IIUC above logic, if the waiter queue is as follows:
> > > (S-Shared; X-Exclusive) S X S S S X S S
> > >
> > > it can skip the exclusive waiters and release shared waiter.
> > >
> > > If my understanding is right, then I think instead of continue, there
> > > should be *break* in above logic.
> >
> > No, it looks correct to me. What happened is that the first S was woken
> > up. So there's no point in waking up an exclusive locker, but further
> > non-exclusive lockers can be woken up.
> 
> Okay, even then it makes the current logic of wakingup
> different which I am not sure is what this patch is intended
> for.

It's already done in a separate patch...

> > > b.
> > > Consider below sequence of waiters:
> > > (S-Shared; X-Exclusive) S S X S S
> > >
> > > I think as per un-patched code, it will wakeup waiters uptill
> (includin

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Andres Freund
On 2014-10-22 13:32:07 +0530, Amit Kapila wrote:
> On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila 
> wrote:
> > On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
> wrote:
> > >
> > > On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> 
> Today, I have verified all previous comments raised by
> me and looked at new code and below are my findings:
> 
> >>
> >> 4.
> >> LWLockAcquireCommon()
> >> {
> >> ..
> >> if (!LWLockDequeueSelf(l))
> >> {
> >>  for (;;)
> >> {
> >> PGSemaphoreLock(&proc->sem, false);
> >>  if (!proc->lwWaiting)
> >> break;
> >> extraWaits++;
> >>  }
> >> lock->releaseOK = true;
> >> ..
> >> }
> >>
> >> Setting releaseOK in above context might not be required  because if the
> >> control comes in this part of code, it will not retry to acquire another
> >> time.
> 
> > Hm. You're probably right.
> 
> You have agreed to fix this comment, but it seems you have forgot
> to change it.

After I've thought more about it, it's is actually required. This
essentially *is* a retry. Someobdy woke us up, which is where releaseOK
is supposed to be set.

> >> 11.
> >> LWLockRelease()
> >> {
> >> ..
> >> PRINT_LWDEBUG("LWLockRelease", lock, mode);
> >> }
> >>
> >> Shouldn't this be in begining of LWLockRelease function rather than
> >> after processing held_lwlocks array?
> 
> > Ok.
> 
> You have agreed to fix this comment, but it seems you have forgot
> to change it.



> Below comment doesn't seem to be adressed?
> 
> > LWLockAcquireOrWait()
> > {
> > ..
> > LOG_LWDEBUG("LWLockAcquireOrWait", lockid, mode, "succeeded");
> > ..
> > }
> 
> > a. such a log is not there in any other LWLock.. variants,
> >  if we want to introduce it, then shouldn't it be done at
> >  other places as well.

I think you're placing unneccessarily high consistency constraints on a
debugging feature here.

> Below point is yet to be resolved.
> 
> > > 12.
> > > #ifdef LWLOCK_DEBUG
> > > lock->owner = MyProc;
> > > #endif
> > >
> > > Shouldn't it be reset in LWLockRelease?
> >
> > That's actually intentional. It's quite useful to know the last owner
> > when debugging lwlock code.
> 
> Won't it cause any problem if the last owner process exits?

No. PGPROCs aren't deallocated or anything. And it's a debugging only
variable.

> Can you explain how pg_read_barrier() in below code makes this
> access safe?
> 
> LWLockWakeup()
> {
> ..
> + pg_read_barrier(); /* pairs with nwaiters-- */
> + if (!BOOL_ACCESS_ONCE(lock->releaseOK))
> ..
> }

What's the concern you have? Full memory barriers (the atomic
nwaiters--) pair with read memory barriers.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-22 Thread Amit Kapila
On Tue, Oct 21, 2014 at 7:56 PM, Amit Kapila 
wrote:
> On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
wrote:
> >
> > On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:

Today, I have verified all previous comments raised by
me and looked at new code and below are my findings:

>>
>> 4.
>> LWLockAcquireCommon()
>> {
>> ..
>> if (!LWLockDequeueSelf(l))
>> {
>>  for (;;)
>> {
>> PGSemaphoreLock(&proc->sem, false);
>>  if (!proc->lwWaiting)
>> break;
>> extraWaits++;
>>  }
>> lock->releaseOK = true;
>> ..
>> }
>>
>> Setting releaseOK in above context might not be required  because if the
>> control comes in this part of code, it will not retry to acquire another
>> time.

> Hm. You're probably right.

You have agreed to fix this comment, but it seems you have forgot
to change it.

>> 11.
>> LWLockRelease()
>> {
>> ..
>> PRINT_LWDEBUG("LWLockRelease", lock, mode);
>> }
>>
>> Shouldn't this be in begining of LWLockRelease function rather than
>> after processing held_lwlocks array?

> Ok.

You have agreed to fix this comment, but it seems you have forgot
to change it.

Below comment doesn't seem to be adressed?

> LWLockAcquireOrWait()
> {
> ..
> LOG_LWDEBUG("LWLockAcquireOrWait", lockid, mode, "succeeded");
> ..
> }

> a. such a log is not there in any other LWLock.. variants,
>  if we want to introduce it, then shouldn't it be done at
>  other places as well.


Below point is yet to be resolved.

> > 12.
> > #ifdef LWLOCK_DEBUG
> > lock->owner = MyProc;
> > #endif
> >
> > Shouldn't it be reset in LWLockRelease?
>
> That's actually intentional. It's quite useful to know the last owner
> when debugging lwlock code.

Won't it cause any problem if the last owner process exits?


Can you explain how pg_read_barrier() in below code makes this
access safe?

LWLockWakeup()
{
..
+ pg_read_barrier(); /* pairs with nwaiters-- */
+ if (!BOOL_ACCESS_ONCE(lock->releaseOK))
..
}



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-21 Thread Amit Kapila
On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund 
wrote:
>
> On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > 2.
> > LWLockWakeup()
> > {
> > ..
> > #ifdef LWLOCK_STATS
> > lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> > #else
> > SpinLockAcquire(&lock->mutex);
> > #endif
> > ..
> > }
> >
> > Earlier while releasing lock, we don't count it towards LWLock stats
> > spin_delay_count.  I think if we see other places in lwlock.c, it only
gets
> > counted when we try to acquire it in a loop.
>
> I think the previous situation was clearly suboptimal. I've now modified
> things so all spinlock acquirations are counted.

Code has mainly 4 stats (sh_acquire_count, ex_acquire_count,
block_count, spin_delay_count) to track, if I try to see
all stats together to understand the contention situation,
the unpatched code makes sense.   spin_delay_count gives
how much delay has happened to acquire spinlock which when
combined with other stats gives the clear situation about
the contention around aquisation of corresponding LWLock.
Now if we want to count the spin lock delay for Release call
as well, then the meaning of the stat is getting changed.
It might be that new meaning of spin_delay_count stat is more
useful in some situations, however the older one has its own
benefits, so I am not sure if changing this as part of this
patch is the best thing to do.

> > 3.
> > LWLockRelease()
> > {
> > ..
> > /* grant permission to run, even if a spurious share lock increases
> > lockcount */
> > else if (mode == LW_EXCLUSIVE && have_waiters)
> > check_waiters = true;
> > /* nobody has this locked anymore, potential exclusive lockers get a
chance
> > */
> > else if (lockcount == 0 && have_waiters)
> > check_waiters = true;
> > ..
> > }
> >
> > It seems comments have been reversed in above code.
>
> No, they look right. But I've expanded them in the version I'm going to
> post in a couple minutes.

okay.

> > 5.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > {
> > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > dlist_delete(&waiter->lwWaitLink);
> > pg_write_barrier();
> > waiter->lwWaiting = false;
> > PGSemaphoreUnlock(&waiter->sem);
> > }
> > ..
> > }
> >
> > Why can't we decrement the nwaiters after waking up? I don't think
> > there is any major problem even if callers do that themselves, but
> > in some rare cases LWLockRelease() might spuriously assume that
> > there are some waiters and tries to call LWLockWakeup().  Although
> > this doesn't create any problem, keeping the value sane is good unless
> > there is some problem in doing so.
>
> That won't work because then LWLockWakeup() wouldn't be called when
> necessary - precisely because nwaiters is 0.

> The reason I've done so is that it's otherwise much harder to debug
> issues where there are backend that have been woken up already, but
> haven't rerun yet. Without this there's simply no evidence of that
> state. I can't see this being relevant for performance, so I'd rather
> have it stay that way.

I am not sure what useful information we can get during debugging by not
doing this in LWLockWakeup() and w.r.t performance it can lead extra
function call, few checks and I think in some cases even can
acquire/release spinlock.

> > 6.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > {
> > ..
> > if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> > continue;
> > ..
> > if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> > {
> > ..
> > wokeup_somebody = true;
> > }
> > ..
> > }
> > ..
> > }
> >
> > a.
> > IIUC above logic, if the waiter queue is as follows:
> > (S-Shared; X-Exclusive) S X S S S X S S
> >
> > it can skip the exclusive waiters and release shared waiter.
> >
> > If my understanding is right, then I think instead of continue, there
> > should be *break* in above logic.
>
> No, it looks correct to me. What happened is that the first S was woken
> up. So there's no point in waking up an exclusive locker, but further
> non-exclusive lockers can be woken up.

Okay, even then it makes the current logic of wakingup
different which I am not sure is what this patch is intended
for.

> > b.
> > Consider below sequence of waiters:
> > (S-Shared; X-Exclusive) S S X S S
> >
> > I think as per un-patched code, it will wakeup waiters uptill
(including)
> > first Exclusive, but patch will wake up uptill (*excluding*) first
> > Exclusive.
>
> I don't think the current code does that.

LWLockRelease()
{
..
/*
 * If the front waiter wants exclusive lock, awaken him only.
 *
Otherwise awaken as many waiters as want shared access.
 */
if (proc-
>lwWaitMode != LW_EXCLUSIVE)
{
while (proc->lwWaitLink !=
NULL &&
   proc->lwWaitLink->lwWaitMode != LW_EXCLUSIVE)
 {
if (proc->lwWaitMode != LW_WAIT_UNTIL_FREE)
 releaseOK = false;
proc = proc->lwWaitLink;
 }
}
/* proc is now the last PGPROC to be

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-10 Thread Andres Freund
On 2014-10-08 20:07:35 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund  wrote:
> > So, what makes it work for me (among other unrelated stuff) seems to be
> > the following in .gdbinit, defineing away some things that gdb doesn't
> > handle:
> > macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
> > macro define __extension__
> > macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
> >
> > Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
> > postgres' macros. At least if you're in the right scope.
> >
> > As an example, the following works:
> > (gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, 
> > elem, &BackendList)
> 
> Ah, cool.  I'll try that.

If that works for you, should we put it somewhere in the docs? If so,
where?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas
On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund  wrote:
> So, what makes it work for me (among other unrelated stuff) seems to be
> the following in .gdbinit, defineing away some things that gdb doesn't
> handle:
> macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
> macro define __extension__
> macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
>
> Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
> postgres' macros. At least if you're in the right scope.
>
> As an example, the following works:
> (gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, 
> elem, &BackendList)

Ah, cool.  I'll try that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> > On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  
> > wrote:
> > > I don't see that as being relevant. The difference is an instruction or
> > > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > > matter in comparison.
> > > And the code is *so* much more readable.
> > 
> > I find the slist/dlist stuff actually quite difficult to get right
> > compared to a hand-rolled linked list.  But the really big problem is
> > that the debugger can't do anything useful with it.  You have to work
> > out the structure-member offset in order to walk the list and manually
> > cast to char *, adjust the pointer, and cast back.  That sucks.
> 
> As far as I recall you can get gdb to understand those pointer games
> by defining some structs or macros.  Maybe we can improve by documenting
> this.

So, what makes it work for me (among other unrelated stuff) seems to be
the following in .gdbinit, defineing away some things that gdb doesn't
handle:
macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
macro define __extension__
macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
postgres' macros. At least if you're in the right scope.

As an example, the following works:
(gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, elem, 
&BackendList)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> > I don't see that as being relevant. The difference is an instruction or
> > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > matter in comparison.
> > And the code is *so* much more readable.
> 
> I find the slist/dlist stuff actually quite difficult to get right
> compared to a hand-rolled linked list.  But the really big problem is
> that the debugger can't do anything useful with it.  You have to work
> out the structure-member offset in order to walk the list and manually
> cast to char *, adjust the pointer, and cast back.  That sucks.

As far as I recall you can get gdb to understand those pointer games
by defining some structs or macros.  Maybe we can improve by documenting
this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 13:13:33 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> > I don't see that as being relevant. The difference is an instruction or
> > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > matter in comparison.
> > And the code is *so* much more readable.
> 
> I find the slist/dlist stuff actually quite difficult to get right
> compared to a hand-rolled linked list.

Really? I've spent more than a day debugging things with the current
code. And Heikki introduced a bug in it. If you look at how the code
looks before/after I find the difference pretty clear.

> But the really big problem is
> that the debugger can't do anything useful with it.  You have to work
> out the structure-member offset in order to walk the list and manually
> cast to char *, adjust the pointer, and cast back.  That sucks.

Hm. I can just do that with the debugger here. Not sure if that's
because I added the right thing to my .gdbinit or because I use the
correct compiler flags.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Robert Haas
On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> I don't see that as being relevant. The difference is an instruction or
> two - in the slow path we'll enter the kernel and sleep. This doesn't
> matter in comparison.
> And the code is *so* much more readable.

I find the slist/dlist stuff actually quite difficult to get right
compared to a hand-rolled linked list.  But the really big problem is
that the debugger can't do anything useful with it.  You have to work
out the structure-member offset in order to walk the list and manually
cast to char *, adjust the pointer, and cast back.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-10-08 14:47:44 +0200, Andres Freund wrote:
> On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > 5.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > {
> > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > dlist_delete(&waiter->lwWaitLink);
> > pg_write_barrier();
> > waiter->lwWaiting = false;
> > PGSemaphoreUnlock(&waiter->sem);
> > }
> > ..
> > }
> > 
> > Why can't we decrement the nwaiters after waking up? I don't think
> > there is any major problem even if callers do that themselves, but
> > in some rare cases LWLockRelease() might spuriously assume that
> > there are some waiters and tries to call LWLockWakeup().  Although
> > this doesn't create any problem, keeping the value sane is good unless
> > there is some problem in doing so.
> 
> That won't work because then LWLockWakeup() wouldn't be called when
> necessary - precisely because nwaiters is 0.

Err, this is bogus. Memory fail.

The reason I've done so is that it's otherwise much harder to debug
issues where there are backend that have been woken up already, but
haven't rerun yet. Without this there's simply no evidence of that
state. I can't see this being relevant for performance, so I'd rather
have it stay that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Andres Freund
On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> 2.
> LWLockWakeup()
> {
> ..
> #ifdef LWLOCK_STATS
> lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> #else
> SpinLockAcquire(&lock->mutex);
> #endif
> ..
> }
> 
> Earlier while releasing lock, we don't count it towards LWLock stats
> spin_delay_count.  I think if we see other places in lwlock.c, it only gets
> counted when we try to acquire it in a loop.

I think the previous situation was clearly suboptimal. I've now modified
things so all spinlock acquirations are counted.

> 3.
> LWLockRelease()
> {
> ..
> /* grant permission to run, even if a spurious share lock increases
> lockcount */
> else if (mode == LW_EXCLUSIVE && have_waiters)
> check_waiters = true;
> /* nobody has this locked anymore, potential exclusive lockers get a chance
> */
> else if (lockcount == 0 && have_waiters)
> check_waiters = true;
> ..
> }
> 
> It seems comments have been reversed in above code.

No, they look right. But I've expanded them in the version I'm going to
post in a couple minutes.
 
> 5.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> {
> PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> dlist_delete(&waiter->lwWaitLink);
> pg_write_barrier();
> waiter->lwWaiting = false;
> PGSemaphoreUnlock(&waiter->sem);
> }
> ..
> }
> 
> Why can't we decrement the nwaiters after waking up? I don't think
> there is any major problem even if callers do that themselves, but
> in some rare cases LWLockRelease() might spuriously assume that
> there are some waiters and tries to call LWLockWakeup().  Although
> this doesn't create any problem, keeping the value sane is good unless
> there is some problem in doing so.

That won't work because then LWLockWakeup() wouldn't be called when
necessary - precisely because nwaiters is 0.

> 6.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> continue;
> ..
> if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> {
> ..
> wokeup_somebody = true;
> }
> ..
> }
> ..
> }
> 
> a.
> IIUC above logic, if the waiter queue is as follows:
> (S-Shared; X-Exclusive) S X S S S X S S
> 
> it can skip the exclusive waiters and release shared waiter.
> 
> If my understanding is right, then I think instead of continue, there
> should be *break* in above logic.

No, it looks correct to me. What happened is that the first S was woken
up. So there's no point in waking up an exclusive locker, but further
non-exclusive lockers can be woken up.

> b.
> Consider below sequence of waiters:
> (S-Shared; X-Exclusive) S S X S S
> 
> I think as per un-patched code, it will wakeup waiters uptill (including)
> first Exclusive, but patch will wake up uptill (*excluding*) first
> Exclusive.

I don't think the current code does that. And it'd be a pretty pointless
behaviour, leading to useless increased contention. The only time it'd
make sense for X to be woken up is when it gets run faster than the S
processes.

> 7.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> dlist_delete(&waiter->lwWaitLink);
> dlist_push_tail(&wakeup, &waiter->lwWaitLink);
> ..
> }
> ..
> }
> 
> Use of dlist has simplified the code, but I think there might be a slight
> overhead of maintaining wakeup queue as compare to un-patched
> mechanism especially when there is a long waiter queue.

I don't see that as being relevant. The difference is an instruction or
two - in the slow path we'll enter the kernel and sleep. This doesn't
matter in comparison.
And the code is *so* much more readable.

> 8.
> LWLockConditionalAcquire()
> {
> ..
> /*
>  * We ran into an exclusive lock and might have blocked another
>  * exclusive lock from taking a shot because it took a time to back
>  * off. Retry till we are either sure we didn't block somebody (because
>  * somebody else certainly has the lock) or till we got it.
>  *
>  * We cannot rely on the two-step lock-acquisition protocol as in
>  * LWLockAcquire because we're not using it.
>  */
> if (potentially_spurious)
> {
> SPIN_DELAY();
> goto retry;
> }
> ..
> }
> 
> Due to above logic, I think it can keep on retrying for long time before
> it actually concludes whether it got lock or not incase other backend/'s
> takes Exclusive lock after *double_check* and release before
> unconditional increment of  shared lock in function LWLockAttemptLock.
> I understand that it might be difficult to have such a practical scenario,
> however still there is a theoratical possibility of same.

I'm not particularly concerned. We could optimize it a bit, but I really
don't think it's necessary.

> Is there any advantage of retrying in LWLockConditionalAcquire()?

It's required for correctness. We only retry if we potentially blocked
an exclusive acquirer (by spuriously incrementing/decrementing lockcount
with 1). We 

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-25 Thread Amit Kapila
On Tue, Jun 24, 2014 at 9:33 AM, Amit Kapila 
wrote:
> On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund 
wrote:
> > On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
> > > 7.
> > > LWLockWaitForVar()
> > > {
> > > ..
> > > /*
> > >  * Add myself to wait queue. Note that this is racy, somebody else
> > >  * could wakeup before we're finished queuing.
> > >  * NB: We're using nearly the same twice-in-a-row lock acquisition
> > >  * protocol as LWLockAcquire(). Check its comments for details.
> > >  */
> > > LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
> > >
> > > /* we're now guaranteed to be woken up if necessary */
> > >  mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
> > > &potentially_spurious);
> > > }
> > >
> > > Why is it important to Attempt lock after queuing in this case, can't
> > > we just re-check exclusive lock as done before queuing?
> >
> > Well, that's how Heikki designed LWLockWaitForVar().
>
> In that case I might be missing some point here, un-patched code of
> LWLockWaitForVar() never tries to acquire the lock, but the new code
> does so.  Basically I am not able to think what is the problem if we just
> do below after queuing:
> mustwait = pg_atomic_read_u32(&lock->lockcount) != 0;
>
> Could you please explain what is the problem in just rechecking?


I have further reviewed the lwlock related changes and thought
its good to share my findings with you. This completes my initial
review for lwlock related changes and below are my findings:

1.
LWLockRelease()
{
..
TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(l), T_ID(l));
}

Dynamic tracing macro seems to be omitted from LWLockRelease()
call.

2.
LWLockWakeup()
{
..
#ifdef LWLOCK_STATS
lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
#else
SpinLockAcquire(&lock->mutex);
#endif
..
}

Earlier while releasing lock, we don't count it towards LWLock stats
spin_delay_count.  I think if we see other places in lwlock.c, it only gets
counted when we try to acquire it in a loop.

3.
LWLockRelease()
{
..
/* grant permission to run, even if a spurious share lock increases
lockcount */
else if (mode == LW_EXCLUSIVE && have_waiters)
check_waiters = true;
/* nobody has this locked anymore, potential exclusive lockers get a chance
*/
else if (lockcount == 0 && have_waiters)
check_waiters = true;
..
}

It seems comments have been reversed in above code.

4.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
..
}

Shouldn't we need to use volatile variable in above loop (lock instead of
l)?

5.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) &wakeup)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
dlist_delete(&waiter->lwWaitLink);
pg_write_barrier();
waiter->lwWaiting = false;
PGSemaphoreUnlock(&waiter->sem);
}
..
}

Why can't we decrement the nwaiters after waking up? I don't think
there is any major problem even if callers do that themselves, but
in some rare cases LWLockRelease() might spuriously assume that
there are some waiters and tries to call LWLockWakeup().  Although
this doesn't create any problem, keeping the value sane is good unless
there is some problem in doing so.

6.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
{
..
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
continue;
..
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
{
..
wokeup_somebody = true;
}
..
}
..
}

a.
IIUC above logic, if the waiter queue is as follows:
(S-Shared; X-Exclusive) S X S S S X S S

it can skip the exclusive waiters and release shared waiter.

If my understanding is right, then I think instead of continue, there
should be *break* in above logic.

b.
Consider below sequence of waiters:
(S-Shared; X-Exclusive) S S X S S

I think as per un-patched code, it will wakeup waiters uptill (including)
first Exclusive, but patch will wake up uptill (*excluding*) first
Exclusive.

7.
LWLockWakeup()
{
..
dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
{
..
dlist_delete(&waiter->lwWaitLink);
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
..
}
..
}

Use of dlist has simplified the code, but I think there might be a slight
overhead of maintaining wakeup queue as compare to un-patched
mechanism especially when there is a long waiter queue.

8.
LWLockConditionalAcquire()
{
..
/*
 * We ran into an exclusive lock and might have blocked another
 * exclusive lock from taking a shot because it took a time to back
 * off. Retry till we are either sure we didn't block somebody (because
 * somebody else certainly has the lock) or till we got it.
 *
 * We cannot rely on the two-step lock-acquisition protocol as in
 * LWLockAcquire because we're not using it.
 */
if (potentially_spurious)
{
SPIN_DELAY();
goto retry;
}
..
}

Due to above logic, I think it can keep on retrying for long time before
it actually concludes whether it got lock or not incase other backend/'s
takes Exclusive lock after *double_check* and release before
un

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila
On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund 
wrote:
> On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
> > 12.
> > #ifdef LWLOCK_DEBUG
> > lock->owner = MyProc;
> > #endif
> >
> > Shouldn't it be reset in LWLockRelease?
>
> That's actually intentional. It's quite useful to know the last owner
> when debugging lwlock code.

Won't it cause any problem if the last owner process exits?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila
On Mon, Jun 23, 2014 at 9:12 PM, Andres Freund 
wrote:
> On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
> > On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund 
> > wrote:
> > 2.
> > LWLockAcquireCommon()
> > {
> > ..
> > if (!LWLockDequeueSelf(l))
> > {
> > /*
> > * Somebody else dequeued us and has or will..
> >  ..
> > */
> > for (;;)
> > {
> >  PGSemaphoreLock(&proc->sem, false);
> > if (!proc->lwWaiting)
> > break;
> >  extraWaits++;
> > }
> > lock->releaseOK = true;
> > }
> >
> > Do we want to set result = false; after waking in above code?
> > The idea behind setting false is to indicate whether we get the lock
> > immediately or not which previously was decided based on if it needs
> > to queue itself?
>
> Hm. I don't think it's clear which version is better.

I thought if we get the lock at first attempt, then result should be
true which seems to be clear, but for the case of second attempt you
are right that it's not clear.  In such a case, I think we can go either
way and then later during tests or otherwise if any problem is discovered,
we can revert it.

> > 7.
> > LWLockWaitForVar()
> > {
> > ..
> > /*
> >  * Add myself to wait queue. Note that this is racy, somebody else
> >  * could wakeup before we're finished queuing.
> >  * NB: We're using nearly the same twice-in-a-row lock acquisition
> >  * protocol as LWLockAcquire(). Check its comments for details.
> >  */
> > LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
> >
> > /* we're now guaranteed to be woken up if necessary */
> >  mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
> > &potentially_spurious);
> > }
> >
> > Why is it important to Attempt lock after queuing in this case, can't
> > we just re-check exclusive lock as done before queuing?
>
> Well, that's how Heikki designed LWLockWaitForVar().

In that case I might be missing some point here, un-patched code of
LWLockWaitForVar() never tries to acquire the lock, but the new code
does so.  Basically I am not able to think what is the problem if we just
do below after queuing:
mustwait = pg_atomic_read_u32(&lock->lockcount) != 0;

Could you please explain what is the problem in just rechecking?

> > > I think both are actually critical for performance... Otherwise even a
> > > only lightly contended lock would require scheduler activity when a
> > > processes tries to lock something twice. Given the frequency we
acquire
> > > some locks with that'd be disastrous...
> >
> > Do you have any suggestion how both behaviours can be retained?
>
> Not sure what you mean.

I just wanted to say that current behaviour of releaseOK seems to
be of use for some cases and if you want to change it, then would it
retain the current behaviour we get by releaseOK?

I understand that till now your patch has not changed anything specific
to releaseOK, but by above discussion I got the impression that you are
planing to change it, that's why I had asked above question.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Andres Freund
On 2014-06-23 19:59:10 +0530, Amit Kapila wrote:
> On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund 
> wrote:
> > On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
> > > On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
> > > wrote:
> > >
> > > You have followed it pretty well as far as I can understand from your
> > > replies, as there is no reproducible test (which I think is bit tricky
> to
> > > prepare), so it becomes difficult to explain by theory.
> >
> > I'm working an updated patch that moves the releaseOK into the
> > spinlocks. Maybe that's the problem already - it's certainly not correct
> > as is.
> 
> Sure, I will do the test/performance test with updated patch; you
> might want to include some more changes based on comments
> in mail below:

I'm nearly finished in cleaning up the atomics part of the patch which
also includes a bit of cleanup of the lwlocks code.

> Few more comments:
> 
> 1.
> LWLockAcquireCommon()
> {
> ..
> iterations++;
> }
> 
> In current logic, I could not see any use of these *iterations* variable.

It's useful for debugging. Should be gone in the final code.

> 2.
> LWLockAcquireCommon()
> {
> ..
> if (!LWLockDequeueSelf(l))
> {
> /*
> * Somebody else dequeued us and has or will..
>  ..
> */
> for (;;)
> {
>  PGSemaphoreLock(&proc->sem, false);
> if (!proc->lwWaiting)
> break;
>  extraWaits++;
> }
> lock->releaseOK = true;
> }
> 
> Do we want to set result = false; after waking in above code?
> The idea behind setting false is to indicate whether we get the lock
> immediately or not which previously was decided based on if it needs
> to queue itself?

Hm. I don't think it's clear which version is better.

> 3.
> LWLockAcquireCommon()
> {
> ..
> /*
>  * Ok, at this point we couldn't grab the lock on the first try. We
>  * cannot simply queue ourselves to the end of the list and wait to be
>  * woken up because by now the lock could long have been released.
>  * Instead add us to the queue and try to grab the lock again. If we
>  * suceed we need to revert the queuing and be happy, otherwise we
>  * recheck the lock. If we still couldn't grab it, we know that the
>  * other lock will see our queue entries when releasing since they
>  * existed before we checked for the lock.
>  */
> /* add to the queue */
> LWLockQueueSelf(l, mode);
> 
> /* we're now guaranteed to be woken up if necessary */
> mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
> ..
> }
> 
> a. By reading above code and comments, it is not quite clear why
> second attempt is important unless somebody thinks on it or refer
> your comments in *Notes* section at top of file.  I think it's better to
> indicate in some way so that code reader can refer to Notes section or
> whereever you are planing to keep those comments.

Ok.

> b. There is typo in above comment suceed/succeed.

Thanks, fixed.

> 
> 4.
> LWLockAcquireCommon()
> {
> ..
> if (!LWLockDequeueSelf(l))
> {
>  for (;;)
> {
> PGSemaphoreLock(&proc->sem, false);
>  if (!proc->lwWaiting)
> break;
> extraWaits++;
>  }
> lock->releaseOK = true;
> ..
> }
> 
> Setting releaseOK in above context might not be required  because if the
> control comes in this part of code, it will not retry to acquire another
> time.

Hm. You're probably right.

> 5.
> LWLockWaitForVar()
> {
> ..
> else
> mustwait = false;
> 
> if (!mustwait)
> break;
> ..
> }
> 
> I think we can directly break in else part in above code.

Well, there's another case of mustwait=false above which is triggered
while the spinlock is held. Don't think it'd get simpler.

> 6.
> LWLockWaitForVar()
> {
> ..
> /*
>  * Quick test first to see if it the slot is free right now.
>  *
>  * XXX: the caller uses a spinlock before this,...
>  */
> if (pg_atomic_read_u32(&lock->lockcount) == 0)
>  return true;
> }
> 
> Does the part of comment that refers to spinlock is still relevant
> after using atomic ops?

Yes. pg_atomic_read_u32() isn't a memory barrier (and explicitly
documented not to be).

> 7.
> LWLockWaitForVar()
> {
> ..
> /*
>  * Add myself to wait queue. Note that this is racy, somebody else
>  * could wakeup before we're finished queuing.
>  * NB: We're using nearly the same twice-in-a-row lock acquisition
>  * protocol as LWLockAcquire(). Check its comments for details.
>  */
> LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);
> 
> /* we're now guaranteed to be woken up if necessary */
>  mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
> &potentially_spurious);
> }
> 
> Why is it important to Attempt lock after queuing in this case, can't
> we just re-check exclusive lock as done before queuing?

Well, that's how Heikki designed LWLockWaitForVar().

> 8.
> LWLockWaitForVar()
> {
> ..
> PRINT_LWDEBUG("LWLockAcquire undo queue", lock, mode);
>  break;
> }
> else
> {
>  PRINT_LWDEBUG("LWLockAcquire waiting 4", lock, mode);
> }
> ..
> }
> 
> a. I think instead of LWLockAcquire, here we should use
>LWLockWaitForVar

right.

> b. Isn't it better to use LOG_LWDEBUG instead of PRINT_LWDEB

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-23 Thread Amit Kapila
On Tue, Jun 17, 2014 at 8:56 PM, Andres Freund 
wrote:
> On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
> > On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
> > wrote:
> >
> > You have followed it pretty well as far as I can understand from your
> > replies, as there is no reproducible test (which I think is bit tricky
to
> > prepare), so it becomes difficult to explain by theory.
>
> I'm working an updated patch that moves the releaseOK into the
> spinlocks. Maybe that's the problem already - it's certainly not correct
> as is.

Sure, I will do the test/performance test with updated patch; you
might want to include some more changes based on comments
in mail below:

> > You are right, it will wakeup the existing waiters, but I think the
> > new logic has one difference which is that it can allow the backend to
> > take Exclusive lock when there are already waiters in queue.  As per
> > above example even though Session-2 and Session-3 are in wait
> > queue, Session-4 will be able to acquire Exclusive lock which I think
> > was previously not possible.
>
> I think that was previously possible as well - in a slightly different
> set of circumstances though. After a process releases a lock and wakes
> up some of several waiters another process can come in and acquire the
> lock. Before the woken up process gets scheduled again. lwlocks aren't
> fair locks...

Okay, but I think changing behaviour for lwlocks might impact some
tests/applications.  As they are not fair, I think defining exact
behaviour is not easy and we don't have any concrete scenario which
can be effected, so there should not be problem in accepting
slightly different behaviour.

Few more comments:

1.
LWLockAcquireCommon()
{
..
iterations++;
}

In current logic, I could not see any use of these *iterations* variable.

2.
LWLockAcquireCommon()
{
..
if (!LWLockDequeueSelf(l))
{
/*
* Somebody else dequeued us and has or will..
 ..
*/
for (;;)
{
 PGSemaphoreLock(&proc->sem, false);
if (!proc->lwWaiting)
break;
 extraWaits++;
}
lock->releaseOK = true;
}

Do we want to set result = false; after waking in above code?
The idea behind setting false is to indicate whether we get the lock
immediately or not which previously was decided based on if it needs
to queue itself?

3.
LWLockAcquireCommon()
{
..
/*
 * Ok, at this point we couldn't grab the lock on the first try. We
 * cannot simply queue ourselves to the end of the list and wait to be
 * woken up because by now the lock could long have been released.
 * Instead add us to the queue and try to grab the lock again. If we
 * suceed we need to revert the queuing and be happy, otherwise we
 * recheck the lock. If we still couldn't grab it, we know that the
 * other lock will see our queue entries when releasing since they
 * existed before we checked for the lock.
 */
/* add to the queue */
LWLockQueueSelf(l, mode);

/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
..
}

a. By reading above code and comments, it is not quite clear why
second attempt is important unless somebody thinks on it or refer
your comments in *Notes* section at top of file.  I think it's better to
indicate in some way so that code reader can refer to Notes section or
whereever you are planing to keep those comments.

b. There is typo in above comment suceed/succeed.


4.
LWLockAcquireCommon()
{
..
if (!LWLockDequeueSelf(l))
{
 for (;;)
{
PGSemaphoreLock(&proc->sem, false);
 if (!proc->lwWaiting)
break;
extraWaits++;
 }
lock->releaseOK = true;
..
}

Setting releaseOK in above context might not be required  because if the
control comes in this part of code, it will not retry to acquire another
time.

5.
LWLockWaitForVar()
{
..
else
mustwait = false;

if (!mustwait)
break;
..
}

I think we can directly break in else part in above code.

6.
LWLockWaitForVar()
{
..
/*
 * Quick test first to see if it the slot is free right now.
 *
 * XXX: the caller uses a spinlock before this,...
 */
if (pg_atomic_read_u32(&lock->lockcount) == 0)
 return true;
}

Does the part of comment that refers to spinlock is still relevant
after using atomic ops?

7.
LWLockWaitForVar()
{
..
/*
 * Add myself to wait queue. Note that this is racy, somebody else
 * could wakeup before we're finished queuing.
 * NB: We're using nearly the same twice-in-a-row lock acquisition
 * protocol as LWLockAcquire(). Check its comments for details.
 */
LWLockQueueSelf(l, LW_WAIT_UNTIL_FREE);

/* we're now guaranteed to be woken up if necessary */
 mustwait = LWLockAttemptLock(l, LW_EXCLUSIVE, false,
&potentially_spurious);
}

Why is it important to Attempt lock after queuing in this case, can't
we just re-check exclusive lock as done before queuing?

8.
LWLockWaitForVar()
{
..
PRINT_LWDEBUG("LWLockAcquire undo queue", lock, mode);
 break;
}
else
{
 PRINT_LWDEBUG("LWLockAcquire waiting 4", lock, mode);
}
..
}

a. I think instead of LWLockAcquire, here we should use
   LWLockWaitForVar
b. Isn't it b

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund
On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
> On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
> wrote:
> > On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
> > > On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
> > > > On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > I unfortunately still can't follow.
> 
> You have followed it pretty well as far as I can understand from your
> replies, as there is no reproducible test (which I think is bit tricky to
> prepare), so it becomes difficult to explain by theory.

I'm working an updated patch that moves the releaseOK into the
spinlocks. Maybe that's the problem already - it's certainly not correct
as is.

> > If Session-1 woke up some previous
> > waiter the woken up process will set releaseOK to true again when it
> > loops to acquire the lock?
> 
> You are right, it will wakeup the existing waiters, but I think the
> new logic has one difference which is that it can allow the backend to
> take Exclusive lock when there are already waiters in queue.  As per
> above example even though Session-2 and Session-3 are in wait
> queue, Session-4 will be able to acquire Exclusive lock which I think
> was previously not possible.

I think that was previously possible as well - in a slightly different
set of circumstances though. After a process releases a lock and wakes
up some of several waiters another process can come in and acquire the
lock. Before the woken up process gets scheduled again. lwlocks aren't
fair locks...

> > Somewhat unrelated:
> >
> > I have a fair amount of doubt about the effectiveness of the releaseOK
> > logic (which imo also is pretty poorly documented).
> > Essentially its intent is to avoid unneccessary scheduling when other
> > processes have already been woken up (i.e. releaseOK has been set to
> > false). I believe the theory is that if any process has already been
> > woken up it's pointless to wake up additional processes
> > (i.e. PGSemaphoreUnlock()) because the originally woken up process will
> > wake up at some point. But if the to-be-woken up process is scheduled
> > out because it used all his last timeslices fully that means we'll not
> > wakeup other waiters for a relatively long time.
> 
> I think it will also maintain that the wokedup process won't stall for
> very long time, because if we wake new waiters, then previously woked
> process can again enter into wait queue and similar thing can repeat
> for long time.

I don't think it effectively does that - newly incoming lockers ignore
the queue and just acquire the lock. Even if there's some other backend
scheduled to wake up. And shared locks can be acquired when there's
exclusive locks waiting.

I think both are actually critical for performance... Otherwise even a
only lightly contended lock would require scheduler activity when a
processes tries to lock something twice. Given the frequency we acquire
some locks with that'd be disastrous...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila
On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
wrote:
>
> On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
> > On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
> > > On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > > > 2.
> > > > Handling of potentialy_spurious case seems to be pending
> > > > in LWLock functions like LWLockAcquireCommon().
> > > >
> > > > LWLockAcquireCommon()
> > > > {
> > > > ..
> > > > /* add to the queue */
> > > > LWLockQueueSelf(l, mode);
> > > >
> > > > /* we're now guaranteed to be woken up if necessary */
> > > > mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
> > > >
> > > > }
> > > >
> > > > I think it can lead to some problems, example:
> > > >
> > > > Session -1
> > > > 1. Acquire Exclusive LWlock
> > > >
> > > > Session -2
> > > > 1. Acquire Shared LWlock
> > > >
> > > > 1a. Unconditionally incrementing shared count by session-2
> > > >
> > > > Session -1
> > > > 2. Release Exclusive lock
> > > >
> > > > Session -3
> > > > 1. Acquire Exclusive LWlock
> > > > It will put itself to wait queue by seeing the lock count
incremented
> > > > by Session-2
> > > >
> > > > Session-2
> > > > 1b. Decrement the shared count and add it to wait queue.
> > > >
> > > > Session-4
> > > > 1. Acquire Exclusive lock
> > > >This session will get the exclusive lock, because even
> > > >though other lockers are waiting, lockcount is zero.
> > > >
> > > > Session-2
> > > > 2. Try second time to take shared lock, it won't get
> > > >as session-4 already has an exclusive lock, so it will
> > > >start waiting
> > > >
> > > > Session-4
> > > > 2. Release Exclusive lock
> > > >it will not wake the waiters because waiters have been added
> > > >before acquiring this lock.
> > >
> > > I don't understand this step here? When releasing the lock it'll
notice
> > > that the waiters is <> 0 and acquire the spinlock which should protect
> > > against badness here?
> >
> > While Releasing lock, I think it will not go to Wakeup waiters
> > (LWLockWakeup), because releaseOK will be false.  releaseOK
> > can be set as false when Session-1 has Released Exclusive lock
> > and wakedup some previous waiter.  Once it is set to false, it can
> > be reset to true only for retry logic(after getting semaphore).
>
> I unfortunately still can't follow.

You have followed it pretty well as far as I can understand from your
replies, as there is no reproducible test (which I think is bit tricky to
prepare), so it becomes difficult to explain by theory.

> If Session-1 woke up some previous
> waiter the woken up process will set releaseOK to true again when it
> loops to acquire the lock?

You are right, it will wakeup the existing waiters, but I think the
new logic has one difference which is that it can allow the backend to
take Exclusive lock when there are already waiters in queue.  As per
above example even though Session-2 and Session-3 are in wait
queue, Session-4 will be able to acquire Exclusive lock which I think
was previously not possible.


> Somewhat unrelated:
>
> I have a fair amount of doubt about the effectiveness of the releaseOK
> logic (which imo also is pretty poorly documented).
> Essentially its intent is to avoid unneccessary scheduling when other
> processes have already been woken up (i.e. releaseOK has been set to
> false). I believe the theory is that if any process has already been
> woken up it's pointless to wake up additional processes
> (i.e. PGSemaphoreUnlock()) because the originally woken up process will
> wake up at some point. But if the to-be-woken up process is scheduled
> out because it used all his last timeslices fully that means we'll not
> wakeup other waiters for a relatively long time.

I think it will also maintain that the wokedup process won't stall for
very long time, because if we wake new waiters, then previously woked
process can again enter into wait queue and similar thing can repeat
for long time.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund
On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
> On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
> > On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > > 2.
> > > Handling of potentialy_spurious case seems to be pending
> > > in LWLock functions like LWLockAcquireCommon().
> > >
> > > LWLockAcquireCommon()
> > > {
> > > ..
> > > /* add to the queue */
> > > LWLockQueueSelf(l, mode);
> > >
> > > /* we're now guaranteed to be woken up if necessary */
> > > mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
> > >
> > > }
> > >
> > > I think it can lead to some problems, example:
> > >
> > > Session -1
> > > 1. Acquire Exclusive LWlock
> > >
> > > Session -2
> > > 1. Acquire Shared LWlock
> > >
> > > 1a. Unconditionally incrementing shared count by session-2
> > >
> > > Session -1
> > > 2. Release Exclusive lock
> > >
> > > Session -3
> > > 1. Acquire Exclusive LWlock
> > > It will put itself to wait queue by seeing the lock count incremented
> > > by Session-2
> > >
> > > Session-2
> > > 1b. Decrement the shared count and add it to wait queue.
> > >
> > > Session-4
> > > 1. Acquire Exclusive lock
> > >This session will get the exclusive lock, because even
> > >though other lockers are waiting, lockcount is zero.
> > >
> > > Session-2
> > > 2. Try second time to take shared lock, it won't get
> > >as session-4 already has an exclusive lock, so it will
> > >start waiting
> > >
> > > Session-4
> > > 2. Release Exclusive lock
> > >it will not wake the waiters because waiters have been added
> > >before acquiring this lock.
> >
> > I don't understand this step here? When releasing the lock it'll notice
> > that the waiters is <> 0 and acquire the spinlock which should protect
> > against badness here?
> 
> While Releasing lock, I think it will not go to Wakeup waiters
> (LWLockWakeup), because releaseOK will be false.  releaseOK
> can be set as false when Session-1 has Released Exclusive lock
> and wakedup some previous waiter.  Once it is set to false, it can
> be reset to true only for retry logic(after getting semaphore).

I unfortunately still can't follow. If Session-1 woke up some previous
waiter the woken up process will set releaseOK to true again when it
loops to acquire the lock?


Somewhat unrelated:

I have a fair amount of doubt about the effectiveness of the releaseOK
logic (which imo also is pretty poorly documented).
Essentially its intent is to avoid unneccessary scheduling when other
processes have already been woken up (i.e. releaseOK has been set to
false). I believe the theory is that if any process has already been
woken up it's pointless to wake up additional processes
(i.e. PGSemaphoreUnlock()) because the originally woken up process will
wake up at some point. But if the to-be-woken up process is scheduled
out because it used all his last timeslices fully that means we'll not
wakeup other waiters for a relatively long time.

It's been introduced in the course of
5b9a058384e714b89e050fc0b6381f97037c665a whose logic generally is rather
sound - I just doubt that the releaseOK part is necessary.

It'd certainly interesting to rip releaseOK out and benchmark the
result... My theory is that the average latency will go down on busy
systems that aren't IO bound.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila
On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
wrote:
>
> On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > On Fri, May 23, 2014 at 10:01 PM, Amit Kapila 
> > wrote:
> > > On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund 
> > wrote:
> > > > I've pushed a rebased version of the patchset to
> > > > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
> > > > branch rwlock contention.
> > > > 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small
problem,
> > > > ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.
> > >
> > > As per discussion in developer meeting, I wanted to test shared
> > > buffer scaling patch with this branch.  I am getting merge
> > > conflicts as per HEAD.  Could you please get it resolved, so that
> > > I can get the data.
> >
> > I have started looking into this patch and have few questions/
> > findings which are shared below:
> >
> > 1. I think stats for lwstats->ex_acquire_count will be counted twice,
> > first it is incremented in LWLockAcquireCommon() and then in
> > LWLockAttemptLock()
>
> Hrmpf. Will fix.
>
> > 2.
> > Handling of potentialy_spurious case seems to be pending
> > in LWLock functions like LWLockAcquireCommon().
> >
> > LWLockAcquireCommon()
> > {
> > ..
> > /* add to the queue */
> > LWLockQueueSelf(l, mode);
> >
> > /* we're now guaranteed to be woken up if necessary */
> > mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
> >
> > }
> >
> > I think it can lead to some problems, example:
> >
> > Session -1
> > 1. Acquire Exclusive LWlock
> >
> > Session -2
> > 1. Acquire Shared LWlock
> >
> > 1a. Unconditionally incrementing shared count by session-2
> >
> > Session -1
> > 2. Release Exclusive lock
> >
> > Session -3
> > 1. Acquire Exclusive LWlock
> > It will put itself to wait queue by seeing the lock count incremented
> > by Session-2
> >
> > Session-2
> > 1b. Decrement the shared count and add it to wait queue.
> >
> > Session-4
> > 1. Acquire Exclusive lock
> >This session will get the exclusive lock, because even
> >though other lockers are waiting, lockcount is zero.
> >
> > Session-2
> > 2. Try second time to take shared lock, it won't get
> >as session-4 already has an exclusive lock, so it will
> >start waiting
> >
> > Session-4
> > 2. Release Exclusive lock
> >it will not wake the waiters because waiters have been added
> >before acquiring this lock.
>
> I don't understand this step here? When releasing the lock it'll notice
> that the waiters is <> 0 and acquire the spinlock which should protect
> against badness here?

While Releasing lock, I think it will not go to Wakeup waiters
(LWLockWakeup), because releaseOK will be false.  releaseOK
can be set as false when Session-1 has Released Exclusive lock
and wakedup some previous waiter.  Once it is set to false, it can
be reset to true only for retry logic(after getting semaphore).

> > 3.
> I don't think there's dangers here, lwWaiting will only ever be
> manipulated by the the PGPROC's owner. As discussed elsewhere there
> needs to be a write barrier before the proc->lwWaiting = false, even in
> upstream code.

Agreed.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund
On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> On Fri, May 23, 2014 at 10:01 PM, Amit Kapila 
> wrote:
> > On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund 
> wrote:
> > > I've pushed a rebased version of the patchset to
> > > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
> > > branch rwlock contention.
> > > 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
> > > ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.
> >
> > As per discussion in developer meeting, I wanted to test shared
> > buffer scaling patch with this branch.  I am getting merge
> > conflicts as per HEAD.  Could you please get it resolved, so that
> > I can get the data.
>
> I have started looking into this patch and have few questions/
> findings which are shared below:
>
> 1. I think stats for lwstats->ex_acquire_count will be counted twice,
> first it is incremented in LWLockAcquireCommon() and then in
> LWLockAttemptLock()

Hrmpf. Will fix.

> 2.
> Handling of potentialy_spurious case seems to be pending
> in LWLock functions like LWLockAcquireCommon().
>
> LWLockAcquireCommon()
> {
> ..
> /* add to the queue */
> LWLockQueueSelf(l, mode);
>
> /* we're now guaranteed to be woken up if necessary */
> mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
>
> }
>
> I think it can lead to some problems, example:
>
> Session -1
> 1. Acquire Exclusive LWlock
>
> Session -2
> 1. Acquire Shared LWlock
>
> 1a. Unconditionally incrementing shared count by session-2
>
> Session -1
> 2. Release Exclusive lock
>
> Session -3
> 1. Acquire Exclusive LWlock
> It will put itself to wait queue by seeing the lock count incremented
> by Session-2
>
> Session-2
> 1b. Decrement the shared count and add it to wait queue.
>
> Session-4
> 1. Acquire Exclusive lock
>This session will get the exclusive lock, because even
>though other lockers are waiting, lockcount is zero.
>
> Session-2
> 2. Try second time to take shared lock, it won't get
>as session-4 already has an exclusive lock, so it will
>start waiting
>
> Session-4
> 2. Release Exclusive lock
>it will not wake the waiters because waiters have been added
>before acquiring this lock.

I don't understand this step here? When releasing the lock it'll notice
that the waiters is <> 0 and acquire the spinlock which should protect
against badness here?

> 3.
> LWLockAcquireCommon()
> {
> for (;;)
> {
> PGSemaphoreLock(&proc->sem, false);
> if (!proc->lwWaiting)
> ..
> }
> proc->lwWaiting is checked, updated without spinklock where
> as previously it was done under spinlock, won't it be unsafe?

It was previously checked/unset without a spinlock as well:
/*
 * Awaken any waiters I removed from the queue.
 */
while (head != NULL)
{
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release 
waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
}
I don't think there's dangers here, lwWaiting will only ever be
manipulated by the the PGPROC's owner. As discussed elsewhere there
needs to be a write barrier before the proc->lwWaiting = false, even in
upstream code.

> 4.
> LWLockAcquireCommon()
> {
> ..
> for (;;)
> {
> /* "false" means cannot accept cancel/die interrupt here. */
> PGSemaphoreLock(&proc->sem, false);
> if (!proc->lwWaiting)
> break;
> extraWaits++;
> }
> lock->releaseOK = true;
> }
>
> lock->releaseOK is updated/checked without spinklock where
> as previously it was done under spinlock, won't it be unsafe?

Hm. That's probably buggy. Good catch. Especially if you have a compiler
that does byte manipulation by reading e.g. 4 bytes from a struct and
then write the wider variable back... So the releaseOk bit needs to move
into LWLockDequeueSelf().

Thanks for looking!

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila
On Fri, May 23, 2014 at 10:01 PM, Amit Kapila 
wrote:
> On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund 
wrote:
> > I've pushed a rebased version of the patchset to
> > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
> > branch rwlock contention.
> > 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
> > ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.
>
> As per discussion in developer meeting, I wanted to test shared
> buffer scaling patch with this branch.  I am getting merge
> conflicts as per HEAD.  Could you please get it resolved, so that
> I can get the data.

I have started looking into this patch and have few questions/
findings which are shared below:

1. I think stats for lwstats->ex_acquire_count will be counted twice,
first it is incremented in LWLockAcquireCommon() and then in
LWLockAttemptLock()

2.
Handling of potentialy_spurious case seems to be pending
in LWLock functions like LWLockAcquireCommon().

LWLockAcquireCommon()
{
..
/* add to the queue */
LWLockQueueSelf(l, mode);

/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);

}

I think it can lead to some problems, example:

Session -1
1. Acquire Exclusive LWlock

Session -2
1. Acquire Shared LWlock

1a. Unconditionally incrementing shared count by session-2

Session -1
2. Release Exclusive lock

Session -3
1. Acquire Exclusive LWlock
It will put itself to wait queue by seeing the lock count incremented
by Session-2

Session-2
1b. Decrement the shared count and add it to wait queue.

Session-4
1. Acquire Exclusive lock
   This session will get the exclusive lock, because even
   though other lockers are waiting, lockcount is zero.

Session-2
2. Try second time to take shared lock, it won't get
   as session-4 already has an exclusive lock, so it will
   start waiting

Session-4
2. Release Exclusive lock
   it will not wake the waiters because waiters have been added
   before acquiring this lock.

So in above scenario, Session-3 and Session-2 are waiting in queue
with nobody to awake them.

I have not reproduced the exact scenario above,
so I might be missing some thing which will not
lead to above situation.

3.
LWLockAcquireCommon()
{
for (;;)
{
PGSemaphoreLock(&proc->sem, false);
if (!proc->lwWaiting)
..
}
proc->lwWaiting is checked, updated without spinklock where
as previously it was done under spinlock, won't it be unsafe?

4.
LWLockAcquireCommon()
{
..
for (;;)
{
/* "false" means cannot accept cancel/die interrupt here. */
PGSemaphoreLock(&proc->sem, false);
if (!proc->lwWaiting)
break;
extraWaits++;
}
lock->releaseOK = true;
}

lock->releaseOK is updated/checked without spinklock where
as previously it was done under spinlock, won't it be unsafe?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-05-23 Thread Amit Kapila
On Fri, Jan 31, 2014 at 3:24 PM, Andres Freund 
wrote:
> I've pushed a rebased version of the patchset to
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
> branch rwlock contention.
> 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
> ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

As per discussion in developer meeting, I wanted to test shared
buffer scaling patch with this branch.  I am getting merge
conflicts as per HEAD.  Could you please get it resolved, so that
I can get the data.

>From git://git.postgresql.org/git/users/andresfreund/postgres
 * branchrwlock-contention -> FETCH_HEAD
Auto-merging src/test/regress/regress.c
CONFLICT (content): Merge conflict in src/test/regress/regress.c
Auto-merging src/include/storage/proc.h
Auto-merging src/include/storage/lwlock.h
CONFLICT (content): Merge conflict in src/include/storage/lwlock.h
Auto-merging src/include/storage/ipc.h
CONFLICT (content): Merge conflict in src/include/storage/ipc.h
Auto-merging src/include/storage/barrier.h
CONFLICT (content): Merge conflict in src/include/storage/barrier.h
Auto-merging src/include/pg_config_manual.h
Auto-merging src/include/c.h
Auto-merging src/backend/storage/lmgr/spin.c
Auto-merging src/backend/storage/lmgr/proc.c
Auto-merging src/backend/storage/lmgr/lwlock.c
CONFLICT (content): Merge conflict in src/backend/storage/lmgr/lwlock.c
Auto-merging src/backend/storage/ipc/shmem.c
Auto-merging src/backend/storage/ipc/ipci.c
Auto-merging src/backend/access/transam/xlog.c
CONFLICT (content): Merge conflict in src/backend/access/transam/xlog.c
Auto-merging src/backend/access/transam/twophase.c
Auto-merging configure.in
Auto-merging configure
Auto-merging config/c-compiler.m4
Automatic merge failed; fix conflicts and then commit the result.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-10 Thread Heikki Linnakangas

On 01/31/2014 11:54 AM, Andres Freund wrote:

Hi,

On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote:

On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund  wrote:

1) I've added an abstracted atomic ops implementation. Needs a fair
amount of work, also submitted as a separate CF entry. (Patch 1 & 2)


Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when
applying 0002-Very-basic-atomic-ops-implementation.patch. Please
rebase.


I've pushed a rebased version of the patchset to
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
branch rwlock contention.
220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

I plan to split the atomics patch into smaller chunks before
reposting. Imo the "Convert the PGPROC->lwWaitLink list into a dlist
instead of open coding it." is worth being applied independently from
the rest of the series, it simplies code and it fixes a bug...


I committed a fix for the WakeupWaiters() bug now, without the rest of 
the "open coding" patch. Converting lwWaitLInk into a dlist is probably 
a good idea, but seems better to fix the bug separately, for the sake of 
git history if nothing else.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-04 Thread Andres Freund
On 2014-02-03 17:51:20 -0800, Peter Geoghegan wrote:
> On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund  wrote:
> > On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> >> Here are the results of a benchmark on Nathan Boley's 64-core, 4
> >> socket server: 
> >> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
> >
> > That's interesting. The maximum number of what you see here (~293125)
> > is markedly lower than what I can get.
> >
> > ... poke around ...
> >
> > Hm, that's partially because you're using pgbench without -M prepared if
> > I see that correctly. The bottleneck in that case is primarily memory
> > allocation. But even after that I am getting higher
> > numbers: ~342497.
> >
> > Trying to nail down the differnce it oddly seems to be your
> > max_connections=80 vs my 100. The profile in both cases is markedly
> > different, way much more spinlock contention with 80. All in
> > Pin/UnpinBuffer().
> 
> I updated this benchmark, with your BufferDescriptors alignment patch
> [1] applied on top of master (while still not using "-M prepared" in
> order to keep the numbers comparable). So once again, that's:
> 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
> 
> It made a bigger, fairly noticeable difference, but not so big a
> difference as you describe here. Are you sure that you saw this kind
> of difference with only 64 clients, as you mentioned elsewhere [1]
> (perhaps you fat-fingered [1] -- "-cj" is ambiguous)? Obviously
> max_connections is still 80 in the above. Should I have gone past 64
> clients to see the problem? The best numbers I see with the [1] patch
> applied on master is only ~327809 for -S 10 64 clients. Perhaps I've
> misunderstood.

That's likely -M prepared.  It was with -c 64 -j 64...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Peter Geoghegan
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund  wrote:
> On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
>> Here are the results of a benchmark on Nathan Boley's 64-core, 4
>> socket server: 
>> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
>
> That's interesting. The maximum number of what you see here (~293125)
> is markedly lower than what I can get.
>
> ... poke around ...
>
> Hm, that's partially because you're using pgbench without -M prepared if
> I see that correctly. The bottleneck in that case is primarily memory
> allocation. But even after that I am getting higher
> numbers: ~342497.
>
> Trying to nail down the differnce it oddly seems to be your
> max_connections=80 vs my 100. The profile in both cases is markedly
> different, way much more spinlock contention with 80. All in
> Pin/UnpinBuffer().

I updated this benchmark, with your BufferDescriptors alignment patch
[1] applied on top of master (while still not using "-M prepared" in
order to keep the numbers comparable). So once again, that's:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/

It made a bigger, fairly noticeable difference, but not so big a
difference as you describe here. Are you sure that you saw this kind
of difference with only 64 clients, as you mentioned elsewhere [1]
(perhaps you fat-fingered [1] -- "-cj" is ambiguous)? Obviously
max_connections is still 80 in the above. Should I have gone past 64
clients to see the problem? The best numbers I see with the [1] patch
applied on master is only ~327809 for -S 10 64 clients. Perhaps I've
misunderstood.

[1] "Misaligned BufferDescriptors causing major performance problems
on AMD": 
http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Peter Geoghegan
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund  wrote:
> The changed algorithm for lwlock imo is an *algorithmic* improvement,
> not one for a particular architecture. The advantage being that locking
> a lwlock which is primarily taken in shared mode will never need need to
> wait or loop.

I agree. My point was only that the messaging ought to be that this is
something that those with multi-socket Intel systems should take note
of.

> Yes, that branch is used by some of them. But to make that clear to all
> that are still reading, I have *first* presented the patch & findings to
> -hackers and *then* backported it, and I have referenced the existance
> of the patch for 9.2 on list before. This isn't some kind of "secret
> sauce" deal...

No, of course not. I certainly didn't mean to imply that. My point was
only that anyone that is affected to the same degree as the party with
the 4 socket server might be left with a very poor impression of
Postgres if we failed to fix the problem. It clearly rises to the
level of a bugfix.

> That might be something to do later, as it *really* can hurt in
> practice. We had one server go from load 240 to 11...

Well, we have to commit something on master first. But it should be a
priority to avoid having this hurt users further, since the problems
are highly predictable for certain types of servers.

> But I think we should first focus on getting the patch ready for
> master, then we can see where it's going. At the very least I'd like to
> split of the part modifying the current spinlocks to use the atomics,
> that seems far to invasive.

Agreed.

> I unfortunately can't tell you that much more, not because it's private,
> but because it mostly was diagnosed by remote hand debugging, limiting
> insights considerably.

Of course.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-03 Thread Jeff Janes
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund wrote:

>
> Some background:
> The setups that triggered me into working on the patchset didn't really
> have a pgbench like workload, the individual queries were/are more
> complicated even though it's still an high throughput OLTP workload. And
> the contention was *much* higher than what I can reproduce with pgbench
> -S, there was often nearly all time spent in the lwlock's spinlock, and
> it was primarily the buffer mapping lwlocks, being locked in shared
> mode. The difference is that instead of locking very few buffers per
> query like pgbench does, they touched much more.
>


Perhaps I should try to argue for this extension to pgbench again:

http://www.postgresql.org/message-id/CAMkU=1w0K3RNhtPuLF8WQoVi6gxgG6mcnpC=-ivjwkjkydp...@mail.gmail.com

I think it would go a good job of exercising what you want, provided you
set the scale so that all data fit in RAM but not in shared_buffers.

Or maybe you want it to fit in shared_buffers, since the buffer mapping
lock was contended in shared mode--that suggests the problem is finding the
buffer that already has the page, not making a buffer to have the page.

Cheers,

Jeff


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-02 Thread Andres Freund
Hi,

On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> Here are the results of a benchmark on Nathan Boley's 64-core, 4
> socket server: 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/

That's interesting. The maximum number of what you see here (~293125)
is markedly lower than what I can get.

... poke around ...

Hm, that's partially because you're using pgbench without -M prepared if
I see that correctly. The bottleneck in that case is primarily memory
allocation. But even after that I am getting higher
numbers: ~342497.

Trying to nail down the differnce it oddly seems to be your
max_connections=80 vs my 100. The profile in both cases is markedly
different, way much more spinlock contention with 80. All in
Pin/UnpinBuffer().

I think =80 has to lead to some data being badly aligned. I can
reproduce that =91 has *much* better performance than =90. 170841.844938
vs 368490.268577 in a 10s test. Reproducable both with an without the test.
That's certainly worth some investigation.
This is *not* reproducable on the intel machine, so it might the
associativity of the L1/L2 cache on the AMD.

> Perhaps I should have gone past 64 clients, because in the document
> "Lock Scaling Analysis on Intel Xeon Processors" [1], Intel write:
>
> "This implies that with oversubscription (more threads running than
> available logical CPUs), the performance of spinlocks can depend
> heavily on the exact OS scheduler behavior, and may change drastically
> with operating system or VM updates."
>
> I haven't bothered with a higher client counts though, because Andres
> noted it's the same with 90 clients on this AMD system. Andres: Do you
> see big problems when # clients < # logical cores on the affected
> Intel systems?

There's some slowdown with the patch applied, but it's not big. Without
it, the slowdown is much earlier.

> There is only a marginal improvement in performance on this big 4
> socket system. Andres informs me privately that he has reproduced the
> problem on multiple new 4-socket Intel servers, so it seems reasonable
> to suppose that more or less an Intel thing.

I've just poked around, it's not just 4 socket, but also 2 socket
systems.

Some background:
The setups that triggered me into working on the patchset didn't really
have a pgbench like workload, the individual queries were/are more
complicated even though it's still an high throughput OLTP workload. And
the contention was *much* higher than what I can reproduce with pgbench
-S, there was often nearly all time spent in the lwlock's spinlock, and
it was primarily the buffer mapping lwlocks, being locked in shared
mode. The difference is that instead of locking very few buffers per
query like pgbench does, they touched much more.
If you look at a profile of a pgbench -S workload -cj64 it's pretty much all
bottlenecked by GetSnapshotData():
unpatched:
-  40.91%  postgres_plainl  postgres_plainlw[.] s_lock
   - s_lock
  - 51.34% LWLockAcquire
   GetSnapshotData
 - GetTransactionSnapshot
+ 55.23% PortalStart
+ 44.77% PostgresMain
  - 48.66% LWLockRelease
   GetSnapshotData
 - GetTransactionSnapshot
+ 53.64% PostgresMain
+ 46.36% PortalStart
+   2.65%  pgbench  [kernel.kallsyms]   [k] try_to_wake_up
+   2.61%  postgres_plainl  postgres_plainlw[.] LWLockRelease
+   1.36%  postgres_plainl  postgres_plainlw[.] LWLockAcquire
+   1.25%  postgres_plainl  postgres_plainlw[.] GetSnapshotData

patched:
-   2.94%   postgres  postgres  [.] LWLockAcquire
   - LWLockAcquire
  + 26.61% ReadBuffer_common
  + 17.08% GetSnapshotData
  + 10.71% _bt_relandgetbuf
  + 10.24% LockAcquireExtended
  + 10.11% VirtualXactLockTableInsert
  + 9.17% VirtualXactLockTableCleanup
  + 7.55% _bt_getbuf
  + 3.40% index_fetch_heap
  + 1.51% LockReleaseAll
  + 0.89% StartTransactionCommand
  + 0.83% _bt_next
  + 0.75% LockRelationOid
  + 0.52% ReadBufferExtended
-   2.75%   postgres  postgres  [.] _bt_compare
   - _bt_compare
  + 96.72% _bt_binsrch
  + 1.71% _bt_moveright
  + 1.29% _bt_search
-   2.67%   postgres  postgres  [.] GetSnapshotData
   - GetSnapshotData
  + 97.03% GetTransactionSnapshot
  + 2.16% PostgresMain
  + 0.81% PortalStart

So now the profile looks much saner/less contended which immediately is
visible in transaction rates: 192584.218530 vs 552834.002573.

But if you try to attack the contention from the other end, by setting
default_transaction_isolation='repeatable read' to reduce the number of
snapshots taken, its suddenly 536789.807391 vs 566839.328922. A *much*
smaller benefit.
The reason the patch doesn't help that much with that setting is that there
simply isn't as much actual contention there:

+   2.77%   postgres  postgres[.] _bt_compare
-   2.72%   postgres  postgres

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-01 Thread Peter Geoghegan
On Sat, Feb 1, 2014 at 1:41 PM, Andres Freund  wrote:
>> However, I tested the
>> most recent revision from your git remote on the AWS instance.

> But that was before my fix, right. Except you managed to timetravel :)

Heh, okay. So Nathan Boley has generously made available a machine
with 4 AMD Opteron 6272s. I've performed the same benchmark on that
server. However, I thought it might be interesting to circle back and
get some additional numbers for the AWS instance already tested - I'd
like to see what it looks like after your recent tweaks to fix the
regression. The single client performance of that instance seems to be
markedly better than that of Nathan's server.

Tip: AWS command line tools + S3 are a great way to easily publish
bulky pgbench-tools results, once you figure out how to correctly set
your S3 bucket's security manifest to allow public http. It has
similar advantages to rsync, and just works with the minimum of fuss.

Anyway, I don't think that the new, third c3.8xlarge-rwlocks testset
tells us much of anything:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/

Here are the results of a benchmark on Nathan Boley's 64-core, 4
socket server: 
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/

Perhaps I should have gone past 64 clients, because in the document
"Lock Scaling Analysis on Intel Xeon Processors" [1], Intel write:

"This implies that with oversubscription (more threads running than
available logical CPUs), the performance of spinlocks can depend
heavily on the exact OS scheduler behavior, and may change drastically
with operating system or VM updates."

I haven't bothered with a higher client counts though, because Andres
noted it's the same with 90 clients on this AMD system. Andres: Do you
see big problems when # clients < # logical cores on the affected
Intel systems?

There is only a marginal improvement in performance on this big 4
socket system. Andres informs me privately that he has reproduced the
problem on multiple new 4-socket Intel servers, so it seems reasonable
to suppose that more or less an Intel thing.

The Intel document [1] further notes:

"As the number of threads polling the status of a lock address
increases, the time it takes to process those polling requests will
increase. Initially, the latency to transfer data across socket
boundaries will always be an order of magnitude longer than the
on-chip cache-to-cache transfer latencies. Such cross-socket
transfers, if they are not effectively minimized by software, will
negatively impact the performance of any lock algorithm that depends
on them."

So, I think it's fair to say, given what we now know from Andres'
numbers and the numbers I got from Nathan's server, that this patch is
closer to being something that addresses a particularly unfortunate
pathology on many-socket Intel system than it is to being a general
performance optimization. Based on the above quoted passage, it isn't
unreasonable to suppose other vendors or architectures could be
affected, but that isn't in evidence. While I welcome the use of
atomic operations in the context of LW_SHARED acquisition as general
progress, I think that to the outside world my proposed messaging is
more useful. It's not quite a bug-fix, but if you're using a
many-socket Intel server, you're *definitely* going to want to use a
PostgreSQL version that is unaffected. You may well not want to take
on the burden of waiting for 9.4, or waiting for it to fully
stabilize.

I note that Andres has a feature branch of this backported to Postgres
9.2, no doubt because of a request from a 2ndQuadrant customer. I have
to wonder if we should think about making this available with a
configure switch in one or more back branches. I think that the
complete back-porting of the fsync request queue issue's fix in commit
758728 could be considered a precedent - that too was a fix for a
really bad worst-case that was encountered fairly infrequently in the
wild. It's sort of horrifying to have red-hot spinlocks in production,
so that seems like the kind of thing we should make an effort to
address for those running multi-socket systems. Of those running
Postgres on new multi-socket systems, the reality is that the majority
are running on Intel hardware. Unfortunately, everyone knows that
Intel will soon be the only game in town when it comes to high-end
x86_64 servers, which contributes to my feeling that we need to target
back branches. We should do something about the possible regression
with older compilers using the fallback first, though.

It would be useful to know more about the nature of the problem that
made such an appreciable difference in Andres' original post. Again,
through private e-mail, I saw perf profiles from affected servers and
an unaffected though roughly comparable server (i.e. Nathan's 64 core
AMD server). Andres observed that "stalled-cycles-frontend" and
"stalled-cycles-backend" Linux perf event

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-01 Thread Andres Freund
On 2014-02-01 13:40:20 -0800, Peter Geoghegan wrote:
> On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund  wrote:
> >> I'm looking at alternative options, because this is not terribly
> >> helpful. With those big caveats in mind, consider the results of the
> >> benchmark, which show the patch performing somewhat worse than the
> >> master baseline at higher client counts:
> >
> > I think that's actually something else. I'd tried to make some
> > definitions simpler, and that has, at least for the machine I have
> > occasional access to, pessimized things. I can't always run the tests
> > there, so I hadn't noticed before the repost.
> 
> I should have been clearer on one point: The pre-rebased patch (actual
> patch series) [1] was applied on top of a commit from around the same
> period, in order to work around the bit rot.

Ah. Then I indeed wouldn't expect improvements.

> However, I tested the
> most recent revision from your git remote on the AWS instance.
> 
> [1] 
> http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de

But that was before my fix, right. Except you managed to timetravel :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-01 Thread Peter Geoghegan
On Sat, Feb 1, 2014 at 4:57 AM, Andres Freund  wrote:
>> I'm looking at alternative options, because this is not terribly
>> helpful. With those big caveats in mind, consider the results of the
>> benchmark, which show the patch performing somewhat worse than the
>> master baseline at higher client counts:
>
> I think that's actually something else. I'd tried to make some
> definitions simpler, and that has, at least for the machine I have
> occasional access to, pessimized things. I can't always run the tests
> there, so I hadn't noticed before the repost.

I should have been clearer on one point: The pre-rebased patch (actual
patch series) [1] was applied on top of a commit from around the same
period, in order to work around the bit rot. However, I tested the
most recent revision from your git remote on the AWS instance.

[1] 
http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-01 Thread Andres Freund
On 2014-01-31 17:52:58 -0800, Peter Geoghegan wrote:
> On Fri, Jan 31, 2014 at 1:54 AM, Andres Freund  wrote:
> > I plan to split the atomics patch into smaller chunks before
> > reposting. Imo the "Convert the PGPROC->lwWaitLink list into a dlist
> > instead of open coding it." is worth being applied independently from
> > the rest of the series, it simplies code and it fixes a bug...
> 
> For things that require a format-patch series, I personally find it
> easier to work off a feature branch on a remote under the control of
> the patch author. The only reason that I don't do so myself is that I
> know that that isn't everyone's preference.

I do to, that's why I have a git branch for all but the most trivial
branches.

> I have access to a large server for the purposes of benchmarking this.
> On the plus side, this is a box very much capable of exercising these
> bottlenecks: a 48 core AMD system, with 256GB of RAM. However, I had
> to instruct someone else on how to conduct the benchmark, since I
> didn't have SSH access, and the OS and toolchain were antiquated,
> particularly for this kind of thing. This is Linux kernel
> 2.6.18-274.3.1.el5 (RHEL5.7). The GCC version that Postgres was built
> with was 4.1.2. This is not what I'd hoped for; obviously I would have
> preferred to be able to act on your warning: "Please also note that
> due to the current state of the atomics implementation this likely
> will only work on a somewhat recent gcc and that the performance might
> be slightly worse than in the previous version because the atomic add
> is implemented using the CAS fallback". Even still, it might be
> marginally useful to get a sense of that cost.

I *think* it should actually work on gcc 4.1, I've since added the
fallbacks I hadn't back when I wrote the above. I've added exactly those
atomics that are needed for the scalable lwlock things (xadd, xsub (yes,
that's essentially the same) and cmpxchg).

> I'm looking at alternative options, because this is not terribly
> helpful. With those big caveats in mind, consider the results of the
> benchmark, which show the patch performing somewhat worse than the
> master baseline at higher client counts:

I think that's actually something else. I'd tried to make some
definitions simpler, and that has, at least for the machine I have
occasional access to, pessimized things. I can't always run the tests
there, so I hadn't noticed before the repost.
I've pushed a preliminary relief to the git repository, any chance you
could retry?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-02-01 Thread Peter Geoghegan
I thought I'd try out what I was in an immediate position to do
without having access to dedicated multi-socket hardware: A benchmark
on AWS. This was a "c3.8xlarge" instance, which are reportedly backed
by Intel Xeon E5-2680 processors. Since the Intel ARK website reports
that these CPUs have 16 "threads" (8 cores + hyperthreading), and
AWS's marketing material indicates that this instance type has 32
"vCPUs", I inferred that the underlying hardware had 2 sockets.
However, reportedly that wasn't the case when procfs was consulted, no
doubt due to Xen Hypervisor voodoo:

ubuntu@ip-10-67-128-2:~$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):32
On-line CPU(s) list:   0-31
Thread(s) per core:32
Core(s) per socket:1
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 62
Stepping:  4
CPU MHz:   2800.074
BogoMIPS:  5600.14
Hypervisor vendor: Xen
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  25600K
NUMA node0 CPU(s): 0-31

I ran the benchmark on Ubuntu 13.10 server, because that seemed to be
the only prominent "enterprise" x86_64 AMI (operating system image)
that came with GCC 4.8 as part its standard toolchain. This exact
setup is benchmarked here:

http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_c3&num=1

(Incidentally, some of the other benchmarks on that site use pgbench
to benchmark the Linux kernel, filesystems, disks and so on. e.g.:
http://www.phoronix.com/scan.php?page=news_item&px=NzI0NQ).

I was hesitant to benchmark using a virtualized system. There is a lot
of contradictory information about the overhead and/or noise added,
which may vary from one workload or hypervisor to the next. But, needs
must when the devil drives, and all that. Besides, this kind of setup
is very commercially relevant these days, so it doesn't seem
unreasonable to see how things work out on an AWS instance that
generally performs well for this workload. Of course, I still want to
replicate the big improvement you reported for multi-socket systems,
but you might have to wait a while for that, unless some kindly
benefactor that has a 4 socket server lying around would like to help
me out.

Results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/

You can drill down and find the postgresql.conf settings from the
report. There appears to be a modest improvement in transaction
throughput. It's not as large as the improvement you reported for your
2 socket workstation, but it's there, just about.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-01-31 Thread Peter Geoghegan
On Fri, Jan 31, 2014 at 1:54 AM, Andres Freund  wrote:
> I plan to split the atomics patch into smaller chunks before
> reposting. Imo the "Convert the PGPROC->lwWaitLink list into a dlist
> instead of open coding it." is worth being applied independently from
> the rest of the series, it simplies code and it fixes a bug...

For things that require a format-patch series, I personally find it
easier to work off a feature branch on a remote under the control of
the patch author. The only reason that I don't do so myself is that I
know that that isn't everyone's preference.

I have access to a large server for the purposes of benchmarking this.
On the plus side, this is a box very much capable of exercising these
bottlenecks: a 48 core AMD system, with 256GB of RAM. However, I had
to instruct someone else on how to conduct the benchmark, since I
didn't have SSH access, and the OS and toolchain were antiquated,
particularly for this kind of thing. This is Linux kernel
2.6.18-274.3.1.el5 (RHEL5.7). The GCC version that Postgres was built
with was 4.1.2. This is not what I'd hoped for; obviously I would have
preferred to be able to act on your warning: "Please also note that
due to the current state of the atomics implementation this likely
will only work on a somewhat recent gcc and that the performance might
be slightly worse than in the previous version because the atomic add
is implemented using the CAS fallback". Even still, it might be
marginally useful to get a sense of that cost.

I'm looking at alternative options, because this is not terribly
helpful. With those big caveats in mind, consider the results of the
benchmark, which show the patch performing somewhat worse than the
master baseline at higher client counts:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/rwlock-contention/

This is exactly what you said would happen, but at least now we have a
sense of that cost.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-01-31 Thread Andres Freund
Hi,

On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote:
> On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund  
> wrote:
> > 1) I've added an abstracted atomic ops implementation. Needs a fair
> >amount of work, also submitted as a separate CF entry. (Patch 1 & 2)
> 
> Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when
> applying 0002-Very-basic-atomic-ops-implementation.patch. Please
> rebase.

I've pushed a rebased version of the patchset to
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git
branch rwlock contention.
220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem,
ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA.

I plan to split the atomics patch into smaller chunks before
reposting. Imo the "Convert the PGPROC->lwWaitLink list into a dlist
instead of open coding it." is worth being applied independently from
the rest of the series, it simplies code and it fixes a bug...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-01-28 Thread Peter Geoghegan
On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund  wrote:
> 1) I've added an abstracted atomic ops implementation. Needs a fair
>amount of work, also submitted as a separate CF entry. (Patch 1 & 2)

Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when
applying 0002-Very-basic-atomic-ops-implementation.patch. Please
rebase.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2013-12-21 Thread Peter Eisentraut
This patch didn't make it out of the 2013-11 commit fest.  You should
move it to the next commit fest (probably with an updated patch)
before January 15th.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2013-11-16 Thread Peter Eisentraut
On Fri, 2013-11-15 at 20:47 +0100, Andres Freund wrote:
> So, here's the next version of this patchset:

The 0002 patch contains non-ASCII, non-UTF8 characters:

0002-Very-basic-atomic-ops-implementation.patch: line 609, char 1, byte offset 
43: invalid UTF-8 code

Please change that to ASCII, or UTF-8 if necessary.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2013-11-15 Thread Andres Freund
On 2013-11-15 20:47:26 +0100, Andres Freund wrote:
> Hi,
> 
> On 2013-09-27 00:55:45 +0200, Andres Freund wrote:
> > So what's todo? The file header tells us:
> >  * - revive pure-spinlock implementation
> >  * - abstract away atomic ops, we really only need a few.
> >  *   - CAS
> >  *   - LOCK XADD
> >  * - convert PGPROC->lwWaitLink to ilist.h slist or even dlist.
> >  * - remove LWLockWakeup dealing with MyProc
> >  * - overhaul the mask offsets, make SHARED/EXCLUSIVE_LOCK_MASK wider, 
> > MAX_BACKENDS
> 
> So, here's the next version of this patchset:
> 1) I've added an abstracted atomic ops implementation. Needs a fair
>amount of work, also submitted as a separate CF entry. (Patch 1 & 2)
> 2) I've converted PGPROC->lwWaiting into a dlist. That makes a fair bit
>of code easier to read and reduces the size of the patchset. Also
>fixes a bug in the xlog-scalability code. (Patch 3)
> 3) Alvaro and I updated the comments in lwlock.c. (Patch 4)
> 
> I think 2) should be committable pretty soon. It's imo a pretty clear
> win in readability. 1) will need a good bit of more work.
> 
> With regard to the scalable lwlock work, what's most needed now is a
> good amount of testing.
> 
> Please note that you need to 'autoreconf' after applying the patchset. I
> don't have a compatible autoconf version on this computer causing the
> diff to be humongous if I include those changes.

Please also note that due to the current state of the atomics
implementation this likely will only work on a somewhat recent gcc and
that the performance might be slightly worse than in the previous
version because the atomic add is implemented using the CAS fallback.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers