On 05/25/2011 02:12 PM, Jan Kiszka wrote:
> On 2011-05-25 13:58, Gilles Chanteperdrix wrote:
>> On 05/25/2011 01:20 PM, Jan Kiszka wrote:
>>> On 2011-05-24 16:03, Gilles Chanteperdrix wrote:
>>>> On 05/24/2011 03:52 PM, Jan Kiszka wrote:
>>>>> On 2011-05-24 14:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>> Do you already have an idea how to get that info to the delete hook
>>>>>>>>>>> function?
>>>>>>>>>> Yes. We start by not applying the list reversal patch, then the 
>>>>>>>>>> sys_ppd
>>>>>>>>>> is the first in the list. So, we can, in the function ppd_remove_mm,
>>>>>>>>>> start by removing all the others ppd, then remove the sys ppd (that 
>>>>>>>>>> is
>>>>>>>>>> the first), last. This changes a few signatures in the core code, a 
>>>>>>>>>> lot
>>>>>>>>>> of things in the skin code, but that would be for the better...
>>>>>>>>> I still don't see how this affects the order we use in
>>>>>>>>> do_taskexit_event, the one that prevents xnsys_get_ppd usage even when
>>>>>>>>> the mm is still present.
>>>>>>>> The idea is to change the cleanup routines not to call xnsys_get_ppd.
>>>>>>> ...and use what instead? Sorry, I'm slow today.
>>>>>> The sys_ppd passed as other argument to the cleanup function.
>>>>> That would affect all thread hooks, not only the one for deletion. And
>>>>> it would pull in more shadow-specific bits into the pod.
>>>>> Moreover, I think we would still be in troubles as mm, thus ppd,
>>>>> deletion takes place before last task deletion, thus taskexit hook
>>>>> invocation. That's due to the cleanup ordering in the kernel's do_exit.
>>>>> However, if you have a patch, I'd be happy to test and rework my leakage
>>>>> fix.
>>>> I will work on this ASAP.
>>> Sorry for pushing, but I need to decide if we should role out my
>>> imperfect fix or if there is chance to use some upstream version
>>> directly. Were you able to look into this, or will this likely take a
>>> bit more time?
>> I intended to try and do this next week-end. If it is more urgent than
>> that, I can try in one or two days. In any case, I do not think we
>> should try and workaround the current code, it is way to fragile.
> Mmh, might be true. I'm getting the feeling we should locally revert all
> the recent MPS changes to work around the issues. It looks like there
> are more related problems sleeping (we are still facing spurious
> fast-synch related crashes here - examining ATM).

This is the development head, it may remain broken for short times while
we are fixing. I would understand reverting on the 2.5 branch, not on -head.

> Another thing that just came to my mind: Do we have a well-defined
> destruction order of native skin or native tasks vs. system skin? I mean
> the latter destroys the local sem_heap while the former may purge
> remaining native resources (including the MPS fastlock). I think the
> ordering is inverted to what the code assumes (heap is destructed before
> the last task), no?

IMO, the system skin destroy callback should be called last, this should
solve these problems. This is what I was talking about.


