On 2011-05-26 09:29, Gilles Chanteperdrix wrote:
> On 05/26/2011 09:18 AM, Jan Kiszka wrote:
>> On 2011-05-25 20:48, Gilles Chanteperdrix wrote:
>>> On 05/25/2011 02:22 PM, Jan Kiszka wrote:
>>>> On 2011-05-25 14:19, Gilles Chanteperdrix wrote:
>>>>> On 05/25/2011 02:12 PM, Jan Kiszka wrote:
>>>>>> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>>>>>>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>>>>>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>>>>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> Do you already have an idea how to get that info to the delete 
>>>>>>>>>>>>>>>> hook
>>>>>>>>>>>>>>>> function?
>>>>>>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the 
>>>>>>>>>>>>>>> sys_ppd
>>>>>>>>>>>>>>> is the first in the list. So, we can, in the function 
>>>>>>>>>>>>>>> ppd_remove_mm,
>>>>>>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd 
>>>>>>>>>>>>>>> (that is
>>>>>>>>>>>>>>> the first), last. This changes a few signatures in the core 
>>>>>>>>>>>>>>> code, a lot
>>>>>>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage 
>>>>>>>>>>>>>> even when
>>>>>>>>>>>>>> the mm is still present.
>>>>>>>>>>>>> The idea is to change the cleanup routines not to call 
>>>>>>>>>>>>> xnsys_get_ppd.
>>>>>>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>>>>>>> That would affect all thread hooks, not only the one for deletion. 
>>>>>>>>>> And
>>>>>>>>>> it would pull in more shadow-specific bits into the pod.
>>>>>>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>>>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>>>>>>> invocation. That's due to the cleanup ordering in the kernel's 
>>>>>>>>>> do_exit.
>>>>>>>>>> However, if you have a patch, I'd be happy to test and rework my 
>>>>>>>>>> leakage
>>>>>>>>>> fix.
>>>>>>>>> I will work on this ASAP.
>>>>>>>> Sorry for pushing, but I need to decide if we should role out my
>>>>>>>> imperfect fix or if there is chance to use some upstream version
>>>>>>>> directly. Were you able to look into this, or will this likely take a
>>>>>>>> bit more time?
>>>>>>> I intended to try and do this next week-end. If it is more urgent than
>>>>>>> that, I can try in one or two days. In any case, I do not think we
>>>>>>> should try and workaround the current code, it is way to fragile.
>>>>>> Mmh, might be true. I'm getting the feeling we should locally revert all
>>>>>> the recent MPS changes to work around the issues. It looks like there
>>>>>> are more related problems sleeping (we are still facing spurious
>>>>>> fast-synch related crashes here - examining ATM).
>>>>> This is the development head, it may remain broken for short times while
>>>>> we are fixing. I would understand reverting on the 2.5 branch, not on 
>>>>> -head.
>>>> I was thinking loudly about our (Siemens) local branch, not -head. We
>>>> are forced to use head to have features like automatic non-RT shadow
>>>> migration.
>>> Now that I think about it, leaking a few bytes in the private heap is
>>> completely harmless, since it is destroyed anyway,
>> Not at all harmless if you create and destroy tasks without restarting
>> the application. That's what our users are do, so this bug is killing them.
> Still, it should not cause crashes. Only allocation returning NULL at
> some point.

That might be true. Reverting some suspicious patches did not resolve
the problem so far.

>>> We do not care, we only use the mm pointer value as a key, and this one
>>> is still available when the exit callback is called. So, we have the mm
>>> pointer, we can have the process private ppd, normally, we have all that
>>> we need. It is just a question of finding an interface which is not too
>>> intrusive.
>> The problem is that the heap we need to release the fastlock to will
>> already be history at this point - classic use after release...
> As I said, xnheap_free is robust against this, so, this user after
> release does not cause any trouble. But I have also already agreed that
> we should fix it.

xnheap_test_and_free is not at all robust against working on an invalid
heap object.


Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Xenomai-core mailing list

Reply via email to