Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
>  > > Philippe Gerum wrote:
>  > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>  > >  > > Hi,
>  > >  > > 
>  > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the 
> slave
>  > >  > > startup procedure:
>  > >  > > 
>  > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 
> us (min/max: 33/38 us)
>  > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>  > >  > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
>  > >  > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
>  > >  > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
>  > >  > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
>  > >  > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
>  > >  > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
>  > >  > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
>  > >  > > <4>[  142.292747]  =======================
>  > >  > > <3>[  142.292860] BUG: sleeping function called from invalid 
> context at mm/slab.c:3034
>  > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>  > >  >                                                  ^^^^
>  > >  > 
>  > >  > Typical of something going wrong in entry.S.
>  > > 
>  > > You mean, interrupts are not really disabled when forking ? :-)
>  > > 
>  > 
>  > Eh, mmmh, no. Hopefully.
>  > 
>  > > So, I am afraid the new fpu_counter optimization is buggy: if a task
>  > > forks with fpu_counter greater than 5 and is preempted right after
>  > > prepare_to_copy in dup_task_struct, when the system switches back to
>  > > this task, the task FPU context will be restored and TS_USEDFPU set in
>  > > the task flags, thereby voiding the effect of prepare_to_copy.
>  > > 
>  > 
>  > You mean that the parent FPU context would leak into the child's one?
> 
> Yes, something like that. The result is random segfaults, I do not
> remember exactly why.
> 
>  > Well, maybe the LKML people would like to know about this. As a
>  > sidenote, I don't see anything bad with your latest counter-measure
>  > disabling this optimization in Xenomai's context switch code, even in
>  > the bugous case above. Right? 
> 
> Right, if there are random segfaults, they will not be xenomai's fault.
> 

I'm currently sorting the symptoms again, or better I'm looking where
they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a
re-check.

It appears now that the tracer played an important role, but I'm not
100% sure yet. I'll keep you posted.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to