Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Hi Gilles,
>>>>>>>
>>>>>>> I'm pushing your findings to the list, also as my colleagues showed
>>>>>>> strong interest - this thing may explain rare corruptions for us as 
>>>>>>> well.
>>>>>>>
>>>>>>> I thought a bit about that likely u_mode-related crash in your test case
>>>>>>> and have the following theory so far: If the xeno_current_mode storage
>>>>>>> is allocated on the application heap (!HAVE_THREAD, that's also what we
>>>>>>> are forced to use), it is automatically freed on thread termination in
>>>>>>> the context of the dying thread. If the thread is already migrated to
>>>>>>> secondary or if that happens while it is cleaned up (i.e. before calling
>>>>>>> for exit into the kernel), there is no problem, Xenomai will not touch
>>>>>>> the mode storage anymore. But if the thread happens to delete the
>>>>>>> storage "silently", without any migration, the final exit will trigger
>>>>>>> one further access. And that takes place against an invalid head area at
>>>>>>> this point.
>>>>>>>
>>>>>>> Does this make sense?
>>>>>> Yes, it is the issue we observed.
>>>>>>
>>>>>>> If that is true, all we need to do is to force a migration before
>>>>>>> releasing the mode storage. Could you check this?
>>>>>> No, that does not fly. Calling, for instance, __wrap_pthread_mutex_lock
>>>>>> in another TSD cleanup function is which could be called after the
>>>>>> current_mode TSD cleanup is allowed and could trigger a switch to
>>>>>> primary mode and a write to the u_mode.
>>>>>>
>>>>> Good point. Mmh. Another, but ABI-breaking, way would be to add a
>>>>> syscall for deregistering the u_mode pointer...
>>>> That is the thing we did to verify that we had this bug. But this
>>>> syscall would be also called too soon, and suffers from the TSD cleanup
>>>> functions order again.
>>>>
>>> Right, the only complete fix without losing functionality is to add an
>>> option to our ABI for requesting kernel-managed memory if dynamic
>>> allocation is necessary (i.e. no TLS is available).
>> No. TLS may as well suffer from the same issue, since it is handled by
>> the glibc or libgcc, over which we have no control. So yes, it may work
>> by chance today, but may as well stop working tomorrow. We use
>> kernel-managed memory all the time, final point.
> 
> I think we are still in the solution finding process, no need for early
> conclusions.
> 
> See, we actually do not need kernel-managed storage for u_mode at all.
> u_mode is an optimization, mostly for our fast user space mutexes. We
> can indeed switch off all updates by the kernel and will still be able
> to provide all required features - just less optimally. Adding a third
> state, "invalid", we can make all mutex users assume they need the slow
> syscall path on uncontended acquisition. And assert_nrt will probably be
> happy about a syscall replacement for u_mode when it became invalid.

Thinking about the "fast" part in "fast userspace mutex": Would it be an
argument in favour of not using the global semaphore heap that said
memory is uncached on some architectures? Or is that irrelevant?

Regards,

Wolfgang

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to