Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Andres Freund
On 2016-05-25 17:24:22 -0400, Robert Haas wrote:
> On Wed, May 25, 2016 at 3:26 PM, Andres Freund  wrote:
> > On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
> >> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> >> I don't think anybody was doing that? The first questions on this thread
> >> were about upgrading and retesting...
> >
> > Something I've repeatedly wondered about around this topic is whether we
> > could split ProcArrayLock into one that governs entering or leaving the
> > procarray from the one that's for consistent snapshots.  I think there's
> > no need for 
> > ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
> > CountUserBackends()/CountOtherDBBackends() (and potentially some more)
> > to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
> > TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
> > as long as we're careful to ensure that by the time a entry is removed
> > ProcArrayEndTransaction() has been called.
> 
> I'm doubtful about how much that would reduce contention,

Yea, I don't think it'll usually will make a huge difference.  It'd
likely have solved this specific complaint though; and it'd remove
nearly all other exclusive acquisitions of ProcArrayLock other than
ProcArrayEndTransaction().


> However, I think it might be worth doing anyway, because redesigning
> the whole mechanism might be easier if that lock weren't doing so many
> only-semi-related things.

Very much agreed upon this.  Having a distinct 'SnapshotLock', for
EndTransaction() and GetSnapshotData() would be a first step in making
the locking more granular.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Robert Haas
On Wed, May 25, 2016 at 3:26 PM, Andres Freund  wrote:
> On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
>> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
>> I don't think anybody was doing that? The first questions on this thread
>> were about upgrading and retesting...
>
> Something I've repeatedly wondered about around this topic is whether we
> could split ProcArrayLock into one that governs entering or leaving the
> procarray from the one that's for consistent snapshots.  I think there's
> no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
> CountUserBackends()/CountOtherDBBackends() (and potentially some more)
> to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
> TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
> as long as we're careful to ensure that by the time a entry is removed
> ProcArrayEndTransaction() has been called.

I'm doubtful about how much that would reduce contention, because when
I've used perf or inserted instrumentation to see which actual call
sides are the problem, it's always been entirely down to
GetSnapshotData() and ProcArrayEndTransaction().  However, I think it
might be worth doing anyway, because redesigning the whole mechanism
might be easier if that lock weren't doing so many only-semi-related
things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Andres Freund
Hi,

On 2016-05-24 06:03:07 +, Tsunakawa, Takayuki wrote:
> At that time, many backend processes (I forgot the number) were acquiring and 
> releasing share mode lock on ProcArrayLock, most of which were from 
> TransactionIsInProgress().

FWIW, I suspect that 9.6 might be a fair bit better here, due to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8a7d0701814a4e293efad22091d6f6fb441bbe1c

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Andres Freund
On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> I don't think anybody was doing that? The first questions on this thread
> were about upgrading and retesting...

Something I've repeatedly wondered about around this topic is whether we
could split ProcArrayLock into one that governs entering or leaving the
procarray from the one that's for consistent snapshots.  I think there's
no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
CountUserBackends()/CountOtherDBBackends() (and potentially some more)
to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
as long as we're careful to ensure that by the time a entry is removed
ProcArrayEndTransaction() has been called.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Andres Freund
On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> I think this is looking at the problem from the wrong angle.  The OP's
> complaint is pretty fair: a 30-second wait for ProcArrayLock is
> horrendous, and if that's actually something that is happening with
> any significant regularity on well-configured systems, we need to fix
> it somehow.

No disagreement there.


> I don't think we can just say "oh, well,
> sometimes the system becomes totally unresponsive for more than 30
> seconds, but we don't care".  We have to care about that.

I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-25 Thread Robert Haas
On Tue, May 24, 2016 at 6:50 PM, Andres Freund  wrote:
> Now we potentially could mark individual lwlocks as being fair
> locks. But which ones would those be? Certainly not ProcArrayLock, it's
> way too heavily contended.

I think this is looking at the problem from the wrong angle.  The OP's
complaint is pretty fair: a 30-second wait for ProcArrayLock is
horrendous, and if that's actually something that is happening with
any significant regularity on well-configured systems, we need to fix
it somehow.  Whether FIFO queueing on LWLocks is the right way to fix
it is a separate question, and I agree that the answer is probably
"no" ... although it does strike me that Amit's work on group XID
clearing might make that a lot more palatable, since you won't
normally have more than one exclusive waiter in the queue at a time.
But regardless of that, I don't think we can just say "oh, well,
sometimes the system becomes totally unresponsive for more than 30
seconds, but we don't care".  We have to care about that.

What I think we need to understand better from is Tsunakawa-san is (1)
whether this behavior still occurs in 9.6 and (2) what sort of test
case is required to produce it.  Depending on how extreme that test
case is, we can decide how much of a problem we think this is.  And we
can test potential solutions to see whether and to what degree they
address it, as well as how they affect throughput.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Alexander Korotkov
I've already observed such behavior, see [1].  I think that now there is no 
consensus on how to fix that.  For instance, Andres express opinion that this 
shouldn't be fixed from LWLock side [2].


Thank you for nice pointers.  I understood.


> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ants Aasma
> 9.5 had significant LWLock scalability improvements. This might
> improve performance enough so that exclusive lockers don't get
> completely starved. It would be helpful if you could test if it's
> still possible to trigger starvation with the new code.

Unfortunately, we cannot test anymore because the customer's system is now in 
production.  The heavy ProcArray contention was caused mainly by too many tuple 
visibility tests, which in turn were caused by unintended sequential scans.  
Then the customer avoided the contention problem by adding an index and 
reducing the number of concurrent active sessions.

> From: Andres Freund [mailto:and...@anarazel.de]
> Are you sure you're actually queued behind share locks, and not primarily
> behind the lwlock's spinlocks? The latter is what I've seen in similar cases.

I think so, because the stack trace showed that the backends were waiting in 
TransactionIsInProgress (or some function in the commit processing) -> 
LWLockAcquire -> PGSemaphoreLock -> semop(), not including spinlock-related 
functions.


> The problem is that half-way fair locks, which are frequently acquired both
> in shared and exclusive mode, have really bad throughput characteristics
> on modern multi-socket systems. We mostly get away with fair locking on
> object level (after considerable work re fast-path locking), because nearly
> all access are non-conflicting.  But prohibiting any snapshot acquisitions
> when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce
> throughput dramatically.

Thanks, I understood that you chose total throughput over stable response time. 
 I feel empathetic with the decision, and I think it's the way to go.

OTOH, maybe I'll object if I'm the pitiful waiter... I'll get out of the 
Disneyland if their staff said "Please stay in the line as long as there are 
efficient guests behind you.  That's the benefit for the whole Disneyland."

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Peter Geoghegan
On Tue, May 24, 2016 at 3:50 PM, Andres Freund  wrote:
>> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
>>
>> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
>>
>> I could believe that there's a case to be made for per-LWLock fairness
>> characteristics, which may be roughly what Andres meant.
>
> The problem is that half-way fair locks, which are frequently acquired
> both in shared and exclusive mode, have really bad throughput
> characteristics on modern multi-socket systems. We mostly get away with
> fair locking on object level (after considerable work re fast-path
> locking), because nearly all access are non-conflicting.  But
> prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
> ProcArrayLock waiter, can reduce throughput dramatically.

That's a useful example of where things are different for different,
well known cases.

> I don't think randomly processing the wait queue - which is what the
> quoted paper essentially describes - is really useful here. We
> intentionally *ignore* the wait queue entirely if a lock is not
> conflicting, and that's what can prohibit exclusive locks from ever
> succeeding, because you essentially can get repetitions of:

My point about the paper is that it made me reconsider my assumption
that fairness, according to some definition, was essential for
something like LWLocks. The paper made me aware that it's perfectly
acceptable practice to make informed trade-offs around fairness for
what it calls "latches" (what we'd call LWLocks). Maybe that was
obvious to you, but it wasn't to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Andres Freund
On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 1:38 PM, Ants Aasma  wrote:
> >> I've already observed such behavior, see [1].  I think that now there is no
> >> consensus on how to fix that.  For instance, Andres express opinion that
> >> this shouldn't be fixed from LWLock side [2].
> >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
> >> various scalability issues including high ProcArrayLock contention.
> >
> > Some amount of non-fairness is ok, but degrading to the point of
> > complete denial of service is not very graceful. I don't think it's
> > realistic to hope that all lwlock contention issues will be fixed any
> > time soon. Some fallback mechanism would be extremely nice until then.
> 
> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
> 
> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
> 
> I could believe that there's a case to be made for per-LWLock fairness
> characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting.  But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Peter Geoghegan
On Tue, May 24, 2016 at 1:38 PM, Ants Aasma  wrote:
>> I've already observed such behavior, see [1].  I think that now there is no
>> consensus on how to fix that.  For instance, Andres express opinion that
>> this shouldn't be fixed from LWLock side [2].
>> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
>> various scalability issues including high ProcArrayLock contention.
>
> Some amount of non-fairness is ok, but degrading to the point of
> complete denial of service is not very graceful. I don't think it's
> realistic to hope that all lwlock contention issues will be fixed any
> time soon. Some fallback mechanism would be extremely nice until then.

Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:

http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf

I could believe that there's a case to be made for per-LWLock fairness
characteristics, which may be roughly what Andres meant.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Andres Freund
On 2016-05-24 17:20:48 -0400, Robert Haas wrote:
> On Tue, May 24, 2016 at 4:39 PM, Ants Aasma  wrote:
> > On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
> >  wrote:
> >> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  
> >> That appears to apply to 9.6, too, as far as I examine the code.  Could 
> >> you tell me if the behavior is intended or needs fix?
> >>
> >> Simply put, the unfair behavior is that waiters for exclusive mode are 
> >> overtaken by share-mode lockers who arrive later.

Are you sure you're actually queued behind share locks, and not
primarily behind the lwlock's spinlocks? The latter is what I've seen in
similar cases.


> > 9.5 had significant LWLock scalability improvements. This might
> > improve performance enough so that exclusive lockers don't get
> > completely starved. It would be helpful if you could test if it's
> > still possible to trigger starvation with the new code.
> 
> 9.5 didn't just increase the scalability; it also whacked the fairness
> aspects of this code around.

True, but the difference isn't that big. As the commit says:


> and the fairness isn't really much worse than before, as we always
> allowed new shared lockers to jump the queue.

if you have lots of incoming locks, as it appears to be the case for th
OP, the fact that queued share locks are woken up earlier doesn't make
much of a difference.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Robert Haas
On Tue, May 24, 2016 at 4:39 PM, Ants Aasma  wrote:
> On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
>  wrote:
>> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  
>> That appears to apply to 9.6, too, as far as I examine the code.  Could you 
>> tell me if the behavior is intended or needs fix?
>>
>> Simply put, the unfair behavior is that waiters for exclusive mode are 
>> overtaken by share-mode lockers who arrive later.
>
> 9.5 had significant LWLock scalability improvements. This might
> improve performance enough so that exclusive lockers don't get
> completely starved. It would be helpful if you could test if it's
> still possible to trigger starvation with the new code.

9.5 didn't just increase the scalability; it also whacked the fairness
aspects of this code around.

Author: Andres Freund 
Branch: master Release: REL9_5_BR [7882c3b0b] 2014-12-25 17:24:30 +0100

Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it.

Besides being shorter and much easier to read it changes the logic in
LWLockRelease() to release all shared lockers when waking up any. This
can yield some significant performance improvements - and the fairness
isn't really much worse than before, as we always allowed new shared
lockers to jump the queue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Ants Aasma
On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
 wrote:
> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That 
> appears to apply to 9.6, too, as far as I examine the code.  Could you tell 
> me if the behavior is intended or needs fix?
>
> Simply put, the unfair behavior is that waiters for exclusive mode are 
> overtaken by share-mode lockers who arrive later.

9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Ants Aasma
On Tue, May 24, 2016 at 1:29 PM, Alexander Korotkov
 wrote:
> On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
>  wrote:
>> Is this intentional, or should we make the later share-lockers if someone
>> is in the wait queue?
>
> I've already observed such behavior, see [1].  I think that now there is no
> consensus on how to fix that.  For instance, Andres express opinion that
> this shouldn't be fixed from LWLock side [2].
> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
> various scalability issues including high ProcArrayLock contention.

Some amount of non-fairness is ok, but degrading to the point of
complete denial of service is not very graceful. I don't think it's
realistic to hope that all lwlock contention issues will be fixed any
time soon. Some fallback mechanism would be extremely nice until then.

It seems to me that we could fix complete starvation with no
noticeable effect on common cases. The general idea would be to have a
separate flag bit for starving exclusive lockers. When lock is taken
in shared mode an exclusive locker puts itself on the wait queue like
it does now, and then proceeds to wait on its semaphore. But
differently from now, the semaphore wait has a (configurable?) timeout
after which the exclusive locker wakes up and sets the new
LW_FLAG_STARVING flag. When this flag is set new shared lockers will
now consider the lock taken and will queue up to wait. The exclusive
locker that marked itself as starving will get a chance to run once
the existing shared lockers complete and wake it up. If there are
other nearly starved exclusive lockers they are likely to be at the
head of the wait queue and will also get a chance to run. If the wait
queue is empty or has a shared locker at the head the starvation flag
gets reset and all of the queued up shared lockers get their turn.

With the proposed scheme non-starved cases will execute pretty much
exactly what they do now, except for a slightly different check for
shared lock availability. In starvation scenarios the exclusive
lockers will get a chance to run once per starvation grace period.
That might still not be ideal or "fair", but it is a lot better than
the status quo of indefinitely blocking.

PS: if/when you are picking up the CSN work, ping me to write up some
of the insights and ideas I have had while thinking about this topic.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Alexander Korotkov
Hi!

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.
> That appears to apply to 9.6, too, as far as I examine the code.  Could you
> tell me if the behavior is intended or needs fix?
>
> Simply put, the unfair behavior is that waiters for exclusive mode are
> overtaken by share-mode lockers who arrive later.
>
>
> PROBLEM
> 
>
> Under a heavy read/write workload on a big machine with dozens of CPUs and
> hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect
> to the database (and actually, it failed to connect due to our
> connect_timeout setting.)  The backend corresponding to the psql was
> waiting to acquire exclusive mode lock on ProcArrayLock.  Some other
> backends took more than 10 seconds to commit their transactions, waiting
> for exclusive mode lock on ProcArrayLock.
>
> At that time, many backend processes (I forgot the number) were acquiring
> and releasing share mode lock on ProcArrayLock, most of which were from
> TransactionIsInProgress().
>
>
> CAUSE
> 
>
> Going into the 9.2 code, I realized that those who request share mode
> don't pay attention to the wait queue.  That is, if some processes hold
> share mode lock and someone is waiting for exclusive mode in the wait
> queue, other processes who come later can get share mode overtaking those
> who are already waiting.  If many processes repeatedly request share mode,
> the waiters can't get exclusive mode for a long time.
>
> Is this intentional, or should we make the later share-lockers if someone
> is in the wait queue?
>

I've already observed such behavior, see [1].  I think that now there is no
consensus on how to fix that.  For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
various scalability issues including high ProcArrayLock contention.

References.

   1.
   
http://www.postgresql.org/message-id/CAPpHfdsytkTFMy3N-zfSo+kAuUx=u-7jg6q2byb6fpuw2cd...@mail.gmail.com
   2.
   
http://www.postgresql.org/message-id/20151211130413.go14...@awork2.anarazel.de
   3.
   
http://www.postgresql.org/message-id/CA+CSw_tEpJ=md1zgxPkjH6CWDnTDft4gBi=+p9snoc+wy3p...@mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Tsunakawa, Takayuki
Hello,

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That 
appears to apply to 9.6, too, as far as I examine the code.  Could you tell me 
if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are 
overtaken by share-mode lockers who arrive later.


PROBLEM


Under a heavy read/write workload on a big machine with dozens of CPUs and 
hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect to 
the database (and actually, it failed to connect due to our connect_timeout 
setting.)  The backend corresponding to the psql was waiting to acquire 
exclusive mode lock on ProcArrayLock.  Some other backends took more than 10 
seconds to commit their transactions, waiting for exclusive mode lock on 
ProcArrayLock.

At that time, many backend processes (I forgot the number) were acquiring and 
releasing share mode lock on ProcArrayLock, most of which were from 
TransactionIsInProgress().


CAUSE


Going into the 9.2 code, I realized that those who request share mode don't pay 
attention to the wait queue.  That is, if some processes hold share mode lock 
and someone is waiting for exclusive mode in the wait queue, other processes 
who come later can get share mode overtaking those who are already waiting.  If 
many processes repeatedly request share mode, the waiters can't get exclusive 
mode for a long time.

Is this intentional, or should we make the later share-lockers if someone is in 
the wait queue?

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers