Jan Kiszka wrote:
> Dmitry Adamushko wrote:
>> On 12/09/06, Philippe Gerum <[EMAIL PROTECTED]> wrote:
>>> On Tue, 2006-09-12 at 15:24 +0200, Nils Kemper wrote:
>>>> Hi,
>>>> I want to use Xenomai, but I get (sometimes, but everytime the same)
>>>> kernel-Oops just by running xeno-test:
>>>>
>>>> [..]
>>>> Xenomai: stopping native API services.
>>>> I-pipe: Domain Xenomai unregistered.
>>>> Xenomai: hal/x86 stopped.
>>>> Xenomai: real-time nucleus unloaded.
>>> Does the issue still pop up if you set the Xenomai nucleus as static
>>> (i.e. not as a module) in the kernel configuration?
>>
>>
>> Just a weird presupposition.
>>
>> In __ipipe_dispatch_event()
>>
>>  ipipe_lock_cpu(flags);
>>
>>  start_domain = this_domain = ipipe_percpu_domain[cpuid];
>>
>>  list_for_each_safe(pos,npos,&__ipipe_pipeline) {
>>
>>                next_domain = list_entry(pos,struct ipipe_domain,p_link);
>>
>> //...
>>                if (next_domain->evhand[event] != NULL) {
>>                        ipipe_percpu_domain[cpuid] = next_domain;
>>                        ipipe_unlock_cpu(flags);
>> (1)
>>                        propagate =
>> !next_domain->evhand[event](event,start_domain,data);
>>
>> Does anything prevent another thread from preempting the current one at (1)
>> and making "next_domain" invalid?
> 
> That could explain it. I only read ipipe_lock_cpu during my first scan
> of this code earlier today, missing the unlock. One should better safe
> the handler in a local variable before releasing the lock...
> 
>> then :
>>
>> if next_domain == "rthal_domain" (aka Xenomai)  -  e.g. someone unloaded
>> all
>> the modules.
>>
>> then if it's static :
>>
>> rthal_domain is still kind of a valid object - it's at least in a valid
>> memory region + evhand points to a valid function. It's even possible to
>> jump to the next element if the rthal_domain::fields were not cleared...
>>
>> non-static :
>>
>> the module image was unloaded, next_domain doesn't point to anything
>> reasonable.
> 
> Mmh, we probably need some grace period on unload to avoid such races.
> Reminds me of issues with the IRQ proc output or the shared IRQ
> deregistration...
> 
>> Jan or Nils, what instructions does "objdump -d kernel/ipipe/core.o" show
>> for a given offset in the __ipipe_dispatch_event().
>>
>> 0xcd in case of Nils.
>>
>> [<c013f158>] __ipipe_dispatch_event+0xcd/0xeb
>>
>> ?
>>
>>
> 
> Will check this tomorrow.

It's the indirect call to the event handler.

   8b3:   8b 55 e4                mov    0xffffffe4(%ebp),%edx
   8b6:   50                      push   %eax
-> 8b7:   ff 94 93 80 22 00 00    call   *0x2280(%ebx,%edx,4)
   8be:   83 c4 0c                add    $0xc,%esp
   8c1:   85 c0                   test   %eax,%eax

In my case the kernel tries to access the address 0xd09bc5e5 which seems
like it used to be a valid one.

So this looks like we really need some mechanism to make sure all CPUs
use the updated pointers after unhooking some event handler and before
proceeding with further cleanups. Sounds like a job for RCU, but we
don't have such stuff over 2.4.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to