On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> [condition-variable-v1.patch]
>
> Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1.  As above, we need to clear cvSleeping before setting the latch.

2.  The following schedule corrupts the waitlist by trying to delete
something from it that isn't in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P2:   ConditionVariableSignal: pop P1 from waitlist
P1: <check user condition, decide to break without ever sleeping>
P3:     ConditionVariablePrepareToSleep: push self onto waitlist
P1: ConditionVariableCancelSleep: delete self from waitlist (!)

Without P3 coming along you probably wouldn't notice because the
waitlist will be happily cleared and look valid, but P3's entry gets
lost and then it hangs forever.

One solution is to teach ConditionVariableCancelSleep to check if
we're still actually there first.  That can be done in O(1) time by
clearing nodes' next and prev pointers when deleting, so that we can
rely on that in a new proclist_contains() function.  See attached.

3.  The following schedule corrupts the waitlist by trying to insert
something into it that is already in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P1: <check user condition, decide to sleep>
P1: ConditionVariableSleep
P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)

Nodes before and after self's pre-existing position can be forgotten
when self's node is pushed to the front of the list.  That can be
fixed by making ConditionVariablePrepareToSleep also check if we're
already in the list.

4.  The waitlist is handled LIFO-ly.  Would it be better for the first
guy to start waiting to be woken up first, like we do for locks?  The
Pthreads equivalent says that it depends on "scheduling policy".  I
don't know if it matters much, just an observation.

5.  The new proclist function you added is the first to work in terms
of PGPROC* rather than procno.  Maybe the whole interface should work
with either PGPROC pointers or procnos?  No strong view.

Please find attached a -v2 of your patch which includes suggestions
1-3 above.  Like the -v1, it applies on top of
lwlocks-in-dsm-v3.patch.  Also, I have attached a v2->v3 diff to show
just my proposed changes.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment: diff.patch
Description: Binary data

Attachment: condition-variable-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to