Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> +      xnarch_atomic_set(mutex->owner,
>>>>>>>> +                        set_claimed(xnthread_handle(owner),
>>>>>>>> +                                    
>>>>>>>> xnsynch_nsleepers(&mutex->synchbase)));
>>>>>>> Ok. I think you have spotted a bug here. This should be mutex->sleepers
>>>>>>> instead of xnsynch_nsleepers.
>>>>>> BTW, why do you need to track sleepers separately in POSIX? Native
>>>>>> doesn't do so, e.g.
>>>>> Because of the "syscall-needed-when-unlocking-stolen-mutex" issue I
>>>>> already explained (sleepers - xnsynch_nsleepers is precisely the count
>>>>> of pending threads which have been awake then robbed the mutex).
>>>> Hmm, sounds like the new lock owner should better clear the 'claimed'
>>>> bit then, not the old one on return from unlock. Or where is the
>>>> pitfall? How does the futex algorithm handle this scenario?
>>> Ok. Please read my explanation again, I have already explained this in
>>> another mail.
>> I did this, but I'm unable to derive the answer for my question from it.
>> Let's go through it in more details:
>> When we pass a mutex to a new owner, we set its reference in the fast
>> lock variable + set the claimed bit if there are more waiters. Instead,
>> I would simple set that bit if there is a new owner. That owner will
>> then pick up the mutex eventually and clear 'claimed' on exit from it
>> lock service (if there are no further waiters then). If the new owner is
>> not able to run and we steal the lock, we simple keep the 'claimed' bit
>> as is. On exit from the stolen lock we find it set, thus we are forced
>> to issue a syscall as it should be.
>> OK, what happens if some waiter wants to leave the party while we are
>> holding the stolen lock? Then the sleeper number must be correct - that
>> is one pitfall!
>> I will have to dig into this more deeply, considering more cases. But
>> the additional "sleepers" field remains at least misplaced IMHO.
>> xnsynch_sleepers should better be fixed to respect lock stealing, as
>> lock stealing is an xnsynch property, nothing POSIX-specific.
> Ok. I have read this but did not get what you mean. I will read it again
>  quietly from home.

I think I'm getting closer to the issue. Our actual problem comes from
the fact that the xnsynch_owner is easily out of sync with the real
owner, it even sometimes points to a former owner:

Thread A releases a mutex on which thread B pends. It wakes up B,
causing it to become the new xnsynch owner, and clears the claimed bit
as there are no further sleepers. B returns, and when it wants to
release the mutex, it does this happily in user space because claimed is
not set. Now the fast lock variable is 'unlocked', while xnsynch still
reports B being the owner. This is no problem as the next time two
threads fight over this lock the waiter will simply overwrite the
xnsynch_owner before it falls asleep. But this "trick" doesn't work for
waiters that have been robbed. They will spin inside xnsynch_sleep_on
and stumble over this inconsistency.

I have two approaches in mind now: First one is something like
XNSYNCH_STEALNOINFORM, i.e. causing xnsynch_sleep_on to not set XNROBBED
so that the robbed thread spins one level higher in the skin code -
which would have to be extended a bit.

Option two is to clear xnsynch_owner once a new owner is about to return
from kernel with the lock held while there are no more xnsynch_sleepers.
That should work with even less changes and save us one syscall in the
robbed case. Need to think about it more, though.


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to