subject:"Re\: \[Xenomai\-core\] \[BUG\] trunk\: screwed Linux irq state"

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Gilles Chanteperdrix

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote:
> 
>>Philippe Gerum wrote:
>>
>>>On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
>>>
>>>
Philippe Gerum wrote:


>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>
>
>>Jan Kiszka wrote:
>>
>>
>>>Jan Kiszka wrote:
>>>
>>>
>>>
2.6.19 didn't magically start to work as well. Instead I have a back
trace now, see attachment.

I included a full set of 16k points, but the thrilling things are around
-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
stack manager, prio 98). But when returning to Linux again, its IRQs
remain masked now. The reason must be that weird exception at -62. Don't
know where it comes from and why is there no report about THAT issue in
the kernel logs.
>>>
>>>The cause of this page fault will get tracked down later today, but the
>>>way it is handled already causes some doubts to me. To make discussion
>>>easier, here is the relevant excerpt from the trace:
>>
>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
>>vmalloced areas were added to all processes page directories, now they
>>are added only to the page directories of processes with the VM_PINNED
>>flag. So, if ipipe_test_root tries to access some module memory area
>>over the context of a non-realtime thread, a fault will occur.
>>
>
>Yes, it's a minor fault occurring due to on-demand memory mapping, this
>is why we don't get any alarming message in the kernel log.
>

Looks like it's something that should never happen, for sure.
>>>
>>>
>>>Now that vmalloc & ioremap memory may have their pte set on demand anew
>>>due to the nocow patch, minor faults in kernel space are possible again,
>>>but this should only happen on behalf of the Linux domain, this is not
>>>expected to happen in primary mode.
>>
>>Does not a primary mode IRQ handler borrow the mmu context from the
>>tasks it preempts ?
>>
> 
> 
> Yes, this is where the problem stands if we happen to preempt a regular
> task and tread over code which might trigger minor faults. The best way
> to check this would be to somehow enable VM_PINNED for all tasks. Back
> to square #1.
> 

Ok. I'll try to change this and send a patch ASAP.

-- 
 Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Philippe Gerum

On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> > 
> >>Philippe Gerum wrote:
> >>
> >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> >>>
> Jan Kiszka wrote:
> 
> >Jan Kiszka wrote:
> >
> >
> >>2.6.19 didn't magically start to work as well. Instead I have a back
> >>trace now, see attachment.
> >>
> >>I included a full set of 16k points, but the thrilling things are around
> >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> >>stack manager, prio 98). But when returning to Linux again, its IRQs
> >>remain masked now. The reason must be that weird exception at -62. Don't
> >>know where it comes from and why is there no report about THAT issue in
> >>the kernel logs.
> >
> >The cause of this page fault will get tracked down later today, but the
> >way it is handled already causes some doubts to me. To make discussion
> >easier, here is the relevant excerpt from the trace:
> 
> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> vmalloced areas were added to all processes page directories, now they
> are added only to the page directories of processes with the VM_PINNED
> flag. So, if ipipe_test_root tries to access some module memory area
> over the context of a non-realtime thread, a fault will occur.
> 
> >>>
> >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this
> >>>is why we don't get any alarming message in the kernel log.
> >>>
> >>
> >>Looks like it's something that should never happen, for sure.
> > 
> > 
> > Now that vmalloc & ioremap memory may have their pte set on demand anew
> > due to the nocow patch, minor faults in kernel space are possible again,
> > but this should only happen on behalf of the Linux domain, this is not
> > expected to happen in primary mode.
> 
> Does not a primary mode IRQ handler borrow the mmu context from the
> tasks it preempts ?
> 

Yes, this is where the problem stands if we happen to preempt a regular
task and tread over code which might trigger minor faults. The best way
to check this would be to somehow enable VM_PINNED for all tasks. Back
to square #1.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Gilles Chanteperdrix

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> 
>>Philippe Gerum wrote:
>>
>>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>>>
Jan Kiszka wrote:

>Jan Kiszka wrote:
>
>
>>2.6.19 didn't magically start to work as well. Instead I have a back
>>trace now, see attachment.
>>
>>I included a full set of 16k points, but the thrilling things are around
>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>stack manager, prio 98). But when returning to Linux again, its IRQs
>>remain masked now. The reason must be that weird exception at -62. Don't
>>know where it comes from and why is there no report about THAT issue in
>>the kernel logs.
>
>The cause of this page fault will get tracked down later today, but the
>way it is handled already causes some doubts to me. To make discussion
>easier, here is the relevant excerpt from the trace:

Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
vmalloced areas were added to all processes page directories, now they
are added only to the page directories of processes with the VM_PINNED
flag. So, if ipipe_test_root tries to access some module memory area
over the context of a non-realtime thread, a fault will occur.

>>>
>>>Yes, it's a minor fault occurring due to on-demand memory mapping, this
>>>is why we don't get any alarming message in the kernel log.
>>>
>>
>>Looks like it's something that should never happen, for sure.
> 
> 
> Now that vmalloc & ioremap memory may have their pte set on demand anew
> due to the nocow patch, minor faults in kernel space are possible again,
> but this should only happen on behalf of the Linux domain, this is not
> expected to happen in primary mode.

Does not a primary mode IRQ handler borrow the mmu context from the
tasks it preempts ?

-- 
 Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Philippe Gerum

On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> >> Jan Kiszka wrote:
> >>> Jan Kiszka wrote:
> >>>
>  2.6.19 didn't magically start to work as well. Instead I have a back
>  trace now, see attachment.
> 
>  I included a full set of 16k points, but the thrilling things are around
>  -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>  (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>  stack manager, prio 98). But when returning to Linux again, its IRQs
>  remain masked now. The reason must be that weird exception at -62. Don't
>  know where it comes from and why is there no report about THAT issue in
>  the kernel logs.
> >>>
> >>> The cause of this page fault will get tracked down later today, but the
> >>> way it is handled already causes some doubts to me. To make discussion
> >>> easier, here is the relevant excerpt from the trace:
> >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> >> vmalloced areas were added to all processes page directories, now they
> >> are added only to the page directories of processes with the VM_PINNED
> >> flag. So, if ipipe_test_root tries to access some module memory area
> >> over the context of a non-realtime thread, a fault will occur.
> >>
> > 
> > Yes, it's a minor fault occurring due to on-demand memory mapping, this
> > is why we don't get any alarming message in the kernel log.
> > 
> 
> Looks like it's something that should never happen, for sure.

Now that vmalloc & ioremap memory may have their pte set on demand anew
due to the nocow patch, minor faults in kernel space are possible again,
but this should only happen on behalf of the Linux domain, this is not
expected to happen in primary mode.

>  But are we
> fine with screwing up the Linux IRQ state nevertheless? In other words,
> are we seeing one or two ipipe issues here?

The I-pipe would only restore the virtual flag as seen on entry from an
exception on behalf of the Linux domain, not in primary mode.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Jan Kiszka

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>
 2.6.19 didn't magically start to work as well. Instead I have a back
 trace now, see attachment.

 I included a full set of 16k points, but the thrilling things are around
 -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
 (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
 stack manager, prio 98). But when returning to Linux again, its IRQs
 remain masked now. The reason must be that weird exception at -62. Don't
 know where it comes from and why is there no report about THAT issue in
 the kernel logs.
>>>
>>> The cause of this page fault will get tracked down later today, but the
>>> way it is handled already causes some doubts to me. To make discussion
>>> easier, here is the relevant excerpt from the trace:
>> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
>> vmalloced areas were added to all processes page directories, now they
>> are added only to the page directories of processes with the VM_PINNED
>> flag. So, if ipipe_test_root tries to access some module memory area
>> over the context of a non-realtime thread, a fault will occur.
>>
> 
> Yes, it's a minor fault occurring due to on-demand memory mapping, this
> is why we don't get any alarming message in the kernel log.
> 

Looks like it's something that should never happen, for sure. But are we
fine with screwing up the Linux IRQ state nevertheless? In other words,
are we seeing one or two ipipe issues here?



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Philippe Gerum

On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
> > Jan Kiszka wrote:
> > 
> >>2.6.19 didn't magically start to work as well. Instead I have a back
> >>trace now, see attachment.
> >>
> >>I included a full set of 16k points, but the thrilling things are around
> >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> >>stack manager, prio 98). But when returning to Linux again, its IRQs
> >>remain masked now. The reason must be that weird exception at -62. Don't
> >>know where it comes from and why is there no report about THAT issue in
> >>the kernel logs.
> > 
> > 
> > The cause of this page fault will get tracked down later today, but the
> > way it is handled already causes some doubts to me. To make discussion
> > easier, here is the relevant excerpt from the trace:
> 
> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> vmalloced areas were added to all processes page directories, now they
> are added only to the page directories of processes with the VM_PINNED
> flag. So, if ipipe_test_root tries to access some module memory area
> over the context of a non-realtime thread, a fault will occur.
> 

Yes, it's a minor fault occurring due to on-demand memory mapping, this
is why we don't get any alarming message in the kernel log.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Gilles Chanteperdrix

Jan Kiszka wrote:
> Jan Kiszka wrote:
> 
>>2.6.19 didn't magically start to work as well. Instead I have a back
>>trace now, see attachment.
>>
>>I included a full set of 16k points, but the thrilling things are around
>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>stack manager, prio 98). But when returning to Linux again, its IRQs
>>remain masked now. The reason must be that weird exception at -62. Don't
>>know where it comes from and why is there no report about THAT issue in
>>the kernel logs.
> 
> 
> The cause of this page fault will get tracked down later today, but the
> way it is handled already causes some doubts to me. To make discussion
> easier, here is the relevant excerpt from the trace:

Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
vmalloced areas were added to all processes page directories, now they
are added only to the page directories of processes with the VM_PINNED
flag. So, if ipipe_test_root tries to access some module memory area
over the context of a non-realtime thread, a fault will occur.

-- 
 Gilles Chanteperdrix

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-12 Thread Jan Kiszka

Jan Kiszka wrote:
> 2.6.19 didn't magically start to work as well. Instead I have a back
> trace now, see attachment.
> 
> I included a full set of 16k points, but the thrilling things are around
> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> stack manager, prio 98). But when returning to Linux again, its IRQs
> remain masked now. The reason must be that weird exception at -62. Don't
> know where it comes from and why is there no report about THAT issue in
> the kernel logs.

The cause of this page fault will get tracked down later today, but the
way it is handled already causes some doubts to me. To make discussion
easier, here is the relevant excerpt from the trace:

> :+func -73+   1.426  link_path_walk+0x14 
> (__link_path_walk+0xca0)
> :|   +func -720.605  __ipipe_handle_irq+0x14 
> (common_interrupt+0x18)
> :|   +func -710.472  __ipipe_ack_irq+0x8 
> (__ipipe_handle_irq+0xaf)
> :|   +func -700.224  __ipipe_ack_level_irq+0x12 
> (__ipipe_ack_irq+0x19)
> :|   +func -70+   4.424  mask_and_ack_8259A+0x14 
> (__ipipe_ack_level_irq+0x22)
> :|   +func -660.475  __ipipe_dispatch_wired+0x14 
> (__ipipe_handle_irq+0x62)
> :|  # func -650.974  xnintr_irq_handler+0xe 
> (__ipipe_dispatch_wired+0x95)
> :|  # func -64+   1.892  rtl8139_interrupt+0x11 [rt_8139too] 
> (xnintr_irq_handler+0x3b)
> :|  # func -620.382  __ipipe_handle_exception+0xe 
> (error_code+0x3e)
> :|  # func -620.222  __ipipe_test_root+0x8 
> (__ipipe_handle_exception+0x1a)
> :|  # func -620.377  __ipipe_stall_root+0x8 
> (__ipipe_handle_exception+0x15b)
> :|  #*func -620.173  trace_hardirqs_off+0xc 
> (__ipipe_handle_exception+0x165)
> :|  #*func -610.211  __ipipe_test_root+0x8 
> (trace_hardirqs_off+0x2d)
> :|  #*func -61+   1.965  do_page_fault+0xe 
> (__ipipe_handle_exception+0x6d)
> :   #*func -590.180  trace_hardirqs_on+0x11 
> (__ipipe_handle_exception+0xd9)
> :   #*func -590.163  __ipipe_test_root+0x8 
> (trace_hardirqs_on+0x5e)
> :   #*func -590.396  mark_held_locks+0xe 
> (trace_hardirqs_on+0x8b)
> :   #*func -580.212  mark_held_locks+0xe 
> (trace_hardirqs_on+0xc9)
> :   #*func -580.461  __ipipe_restore_root+0x8 
> (__ipipe_handle_exception+0xe1)
> :   #*func -580.253  __ipipe_unstall_root+0x8 
> (__ipipe_restore_root+0x18)
> :   # func -570.224  __ipipe_stall_root+0x8 
> (ret_from_exception+0x5)
> :   #*func -570.366  trace_hardirqs_off+0xc 
> (ret_from_exception+0xe)
> :   #*func -570.327  __ipipe_test_root+0x8 
> (trace_hardirqs_off+0x2d)
> :|  #*func -57+   2.089  __ipipe_unstall_iret_root+0x8 
> (restore_nocheck_notrace+0x0)
> :|  #*func -54+   1.444  alloc_rtskb+0xa [rtnet] 
> (rtl8139_interrupt+0x182 [rt_8139too])
> :|  #*func -53+   1.172  rt_eth_type_trans+0xe [rtnet] 
> (rtl8139_interrupt+0x1d6 [rt_8139too]) 

The fault gets forwarded to Linux because ipipe_trap_notify doesn't
choke: we are neither running over a task with PF_EVNOTIFY set nor over
a kernel thread yet (IPIPE_NOSTACK_FLAG). Still, we are already in
primary domain, so I wonder if this forwarding is intentional. At least
it seems to break some things later on...

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Jan Kiszka

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
>  > > Philippe Gerum wrote:
>  > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>  > >  > > Hi,
>  > >  > > 
>  > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the 
> slave
>  > >  > > startup procedure:
>  > >  > > 
>  > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 
> us (min/max: 33/38 us)
>  > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>  > >  > > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
>  > >  > > <4>[  142.291767]  [] show_trace+0x17/0x20
>  > >  > > <4>[  142.291896]  [] dump_stack+0x1b/0x20
>  > >  > > <4>[  142.292026]  [] copy_process+0x914/0x13d0
>  > >  > > <4>[  142.292190]  [] do_fork+0x70/0x1b0
>  > >  > > <4>[  142.292323]  [] sys_clone+0x38/0x40
>  > >  > > <4>[  142.292620]  [] syscall_call+0x7/0xb
>  > >  > > <4>[  142.292747]  ===
>  > >  > > <3>[  142.292860] BUG: sleeping function called from invalid 
> context at mm/slab.c:3034
>  > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>  > >  >  
>  > >  > 
>  > >  > Typical of something going wrong in entry.S.
>  > > 
>  > > You mean, interrupts are not really disabled when forking ? :-)
>  > > 
>  > 
>  > Eh, mmmh, no. Hopefully.
>  > 
>  > > So, I am afraid the new fpu_counter optimization is buggy: if a task
>  > > forks with fpu_counter greater than 5 and is preempted right after
>  > > prepare_to_copy in dup_task_struct, when the system switches back to
>  > > this task, the task FPU context will be restored and TS_USEDFPU set in
>  > > the task flags, thereby voiding the effect of prepare_to_copy.
>  > > 
>  > 
>  > You mean that the parent FPU context would leak into the child's one?
> 
> Yes, something like that. The result is random segfaults, I do not
> remember exactly why.
> 
>  > Well, maybe the LKML people would like to know about this. As a
>  > sidenote, I don't see anything bad with your latest counter-measure
>  > disabling this optimization in Xenomai's context switch code, even in
>  > the bugous case above. Right? 
> 
> Right, if there are random segfaults, they will not be xenomai's fault.
> 

I'm currently sorting the symptoms again, or better I'm looking where
they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a
re-check.

It appears now that the tracer played an important role, but I'm not
100% sure yet. I'll keep you posted.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Gilles Chanteperdrix

Philippe Gerum wrote:
 > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
 > > Philippe Gerum wrote:
 > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
 > >  > > Hi,
 > >  > > 
 > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the 
 > > slave
 > >  > > startup procedure:
 > >  > > 
 > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 
 > > us (min/max: 33/38 us)
 > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > >  > > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
 > >  > > <4>[  142.291767]  [] show_trace+0x17/0x20
 > >  > > <4>[  142.291896]  [] dump_stack+0x1b/0x20
 > >  > > <4>[  142.292026]  [] copy_process+0x914/0x13d0
 > >  > > <4>[  142.292190]  [] do_fork+0x70/0x1b0
 > >  > > <4>[  142.292323]  [] sys_clone+0x38/0x40
 > >  > > <4>[  142.292620]  [] syscall_call+0x7/0xb
 > >  > > <4>[  142.292747]  ===
 > >  > > <3>[  142.292860] BUG: sleeping function called from invalid context 
 > > at mm/slab.c:3034
 > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > >  >  
 > >  > 
 > >  > Typical of something going wrong in entry.S.
 > > 
 > > You mean, interrupts are not really disabled when forking ? :-)
 > > 
 > 
 > Eh, mmmh, no. Hopefully.
 > 
 > > So, I am afraid the new fpu_counter optimization is buggy: if a task
 > > forks with fpu_counter greater than 5 and is preempted right after
 > > prepare_to_copy in dup_task_struct, when the system switches back to
 > > this task, the task FPU context will be restored and TS_USEDFPU set in
 > > the task flags, thereby voiding the effect of prepare_to_copy.
 > > 
 > 
 > You mean that the parent FPU context would leak into the child's one?

Yes, something like that. The result is random segfaults, I do not
remember exactly why.

 > Well, maybe the LKML people would like to know about this. As a
 > sidenote, I don't see anything bad with your latest counter-measure
 > disabling this optimization in Xenomai's context switch code, even in
 > the bugous case above. Right? 

Right, if there are random segfaults, they will not be xenomai's fault.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Philippe Gerum

On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>  > > Hi,
>  > > 
>  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
>  > > startup procedure:
>  > > 
>  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us 
> (min/max: 33/38 us)
>  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>  > > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
>  > > <4>[  142.291767]  [] show_trace+0x17/0x20
>  > > <4>[  142.291896]  [] dump_stack+0x1b/0x20
>  > > <4>[  142.292026]  [] copy_process+0x914/0x13d0
>  > > <4>[  142.292190]  [] do_fork+0x70/0x1b0
>  > > <4>[  142.292323]  [] sys_clone+0x38/0x40
>  > > <4>[  142.292620]  [] syscall_call+0x7/0xb
>  > > <4>[  142.292747]  ===
>  > > <3>[  142.292860] BUG: sleeping function called from invalid context at 
> mm/slab.c:3034
>  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>  >  
>  > 
>  > Typical of something going wrong in entry.S.
> 
> You mean, interrupts are not really disabled when forking ? :-)
> 

Eh, mmmh, no. Hopefully.

> So, I am afraid the new fpu_counter optimization is buggy: if a task
> forks with fpu_counter greater than 5 and is preempted right after
> prepare_to_copy in dup_task_struct, when the system switches back to
> this task, the task FPU context will be restored and TS_USEDFPU set in
> the task flags, thereby voiding the effect of prepare_to_copy.
> 

You mean that the parent FPU context would leak into the child's one?
Well, maybe the LKML people would like to know about this. As a
sidenote, I don't see anything bad with your latest counter-measure
disabling this optimization in Xenomai's context switch code, even in
the bugous case above. Right? 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Gilles Chanteperdrix

Philippe Gerum wrote:
 > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
 > > Hi,
 > > 
 > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > > startup procedure:
 > > 
 > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us 
 > > (min/max: 33/38 us)
 > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
 > > <4>[  142.291767]  [] show_trace+0x17/0x20
 > > <4>[  142.291896]  [] dump_stack+0x1b/0x20
 > > <4>[  142.292026]  [] copy_process+0x914/0x13d0
 > > <4>[  142.292190]  [] do_fork+0x70/0x1b0
 > > <4>[  142.292323]  [] sys_clone+0x38/0x40
 > > <4>[  142.292620]  [] syscall_call+0x7/0xb
 > > <4>[  142.292747]  ===
 > > <3>[  142.292860] BUG: sleeping function called from invalid context at 
 > > mm/slab.c:3034
 > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 >  
 > 
 > Typical of something going wrong in entry.S.

You mean, interrupts are not really disabled when forking ? :-)

So, I am afraid the new fpu_counter optimization is buggy: if a task
forks with fpu_counter greater than 5 and is preempted right after
prepare_to_copy in dup_task_struct, when the system switches back to
this task, the task FPU context will be restored and TS_USEDFPU set in
the task flags, thereby voiding the effect of prepare_to_copy.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Philippe Gerum

On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
> Hi,
> 
> while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
> startup procedure:
> 
> <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us 
> (min/max: 33/38 us)
> <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
> <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
> <4>[  142.291767]  [] show_trace+0x17/0x20
> <4>[  142.291896]  [] dump_stack+0x1b/0x20
> <4>[  142.292026]  [] copy_process+0x914/0x13d0
> <4>[  142.292190]  [] do_fork+0x70/0x1b0
> <4>[  142.292323]  [] sys_clone+0x38/0x40
> <4>[  142.292620]  [] syscall_call+0x7/0xb
> <4>[  142.292747]  ===
> <3>[  142.292860] BUG: sleeping function called from invalid context at 
> mm/slab.c:3034
> <4>[  142.293052] in_atomic():0, irqs_disabled():1
 

Typical of something going wrong in entry.S.

> <4>[  142.293152] no locks held by init/1.
> <4>[  142.293244] irq event stamp: 500992
> <4>[  142.293335] hardirqs last  enabled at (500991): [] 
> __ipipe_handle_exception+0xdc/0x188
> <4>[  142.293737] hardirqs last disabled at (500992): [] 
> ret_from_exception+0xe/0x20
> <4>[  142.293967] softirqs last  enabled at (500868): [] 
> __do_softirq+0xab/0xc0
> <4>[  142.294189] softirqs last disabled at (500861): [] 
> do_softirq+0x95/0xa0
> <4>[  142.294562]  [] show_trace_log_lvl+0x1f/0x40
> <4>[  142.294743]  [] show_trace+0x17/0x20
> <4>[  142.294897]  [] dump_stack+0x1b/0x20
> <4>[  142.295050]  [] __might_sleep+0xcd/0x100
> <4>[  142.295220]  [] kmem_cache_alloc+0xa1/0xc0
> <4>[  142.295527]  [] dup_fd+0x29/0x2d0
> <4>[  142.295689]  [] copy_files+0x49/0x70
> <4>[  142.295851]  [] copy_process+0x6af/0x13d0
> <4>[  142.296019]  [] do_fork+0x70/0x1b0
> <4>[  142.296178]  [] sys_clone+0x38/0x40
> <4>[  142.296326]  [] syscall_call+0x7/0xb
> <4>[  142.296641]  ===
> 
> I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
> attached my .config. The interesting thing is that this doesn't show up
> with v2.3.x head (kernel & config identical). Switching back to 2.6.19
> doesn't change the picture.
> 

Could you disable the tracer and remove the xnpod_suspend_thread()
patch, then downgrade to 1.6-04 to remove the COW support? TIA,

> Anyone any idea? No-COW is now both in trunk and 2.3.x, right?
> 

Only with the I-pipe patches from the 1.7 series on x86.

> Jan
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core
-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Gilles Chanteperdrix

Jan Kiszka wrote:
 > Hi,
 > 
 > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > startup procedure:
 > 
 > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us 
 > (min/max: 33/38 us)
 > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.291767]  [] show_trace+0x17/0x20
 > <4>[  142.291896]  [] dump_stack+0x1b/0x20
 > <4>[  142.292026]  [] copy_process+0x914/0x13d0
 > <4>[  142.292190]  [] do_fork+0x70/0x1b0
 > <4>[  142.292323]  [] sys_clone+0x38/0x40
 > <4>[  142.292620]  [] syscall_call+0x7/0xb
 > <4>[  142.292747]  ===
 > <3>[  142.292860] BUG: sleeping function called from invalid context at 
 > mm/slab.c:3034
 > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > <4>[  142.293152] no locks held by init/1.
 > <4>[  142.293244] irq event stamp: 500992
 > <4>[  142.293335] hardirqs last  enabled at (500991): [] 
 > __ipipe_handle_exception+0xdc/0x188
 > <4>[  142.293737] hardirqs last disabled at (500992): [] 
 > ret_from_exception+0xe/0x20
 > <4>[  142.293967] softirqs last  enabled at (500868): [] 
 > __do_softirq+0xab/0xc0
 > <4>[  142.294189] softirqs last disabled at (500861): [] 
 > do_softirq+0x95/0xa0
 > <4>[  142.294562]  [] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.294743]  [] show_trace+0x17/0x20
 > <4>[  142.294897]  [] dump_stack+0x1b/0x20
 > <4>[  142.295050]  [] __might_sleep+0xcd/0x100
 > <4>[  142.295220]  [] kmem_cache_alloc+0xa1/0xc0
 > <4>[  142.295527]  [] dup_fd+0x29/0x2d0
 > <4>[  142.295689]  [] copy_files+0x49/0x70
 > <4>[  142.295851]  [] copy_process+0x6af/0x13d0
 > <4>[  142.296019]  [] do_fork+0x70/0x1b0
 > <4>[  142.296178]  [] sys_clone+0x38/0x40
 > <4>[  142.296326]  [] syscall_call+0x7/0xb
 > <4>[  142.296641]  ===
 > 
 > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
 > attached my .config. The interesting thing is that this doesn't show up
 > with v2.3.x head (kernel & config identical). Switching back to 2.6.19
 > doesn't change the picture.
 > 
 > Anyone any idea? No-COW is now both in trunk and 2.3.x, right?

alloc_page_vma in copy_one_pte uses the GFP_HIGHUSER flag, whereas the
bug suggests we are in an atomic section, so maybe we should use 
GFP_ATOMIC | __GFP_HIGHMEM

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

2007-02-11 Thread Gilles Chanteperdrix

Jan Kiszka wrote:
 > Hi,
 > 
 > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > startup procedure:
 > 
 > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us 
 > (min/max: 33/38 us)
 > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > <4>[  142.291585]  [] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.291767]  [] show_trace+0x17/0x20
 > <4>[  142.291896]  [] dump_stack+0x1b/0x20
 > <4>[  142.292026]  [] copy_process+0x914/0x13d0
 > <4>[  142.292190]  [] do_fork+0x70/0x1b0
 > <4>[  142.292323]  [] sys_clone+0x38/0x40
 > <4>[  142.292620]  [] syscall_call+0x7/0xb
 > <4>[  142.292747]  ===
 > <3>[  142.292860] BUG: sleeping function called from invalid context at 
 > mm/slab.c:3034
 > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > <4>[  142.293152] no locks held by init/1.
 > <4>[  142.293244] irq event stamp: 500992
 > <4>[  142.293335] hardirqs last  enabled at (500991): [] 
 > __ipipe_handle_exception+0xdc/0x188
 > <4>[  142.293737] hardirqs last disabled at (500992): [] 
 > ret_from_exception+0xe/0x20
 > <4>[  142.293967] softirqs last  enabled at (500868): [] 
 > __do_softirq+0xab/0xc0
 > <4>[  142.294189] softirqs last disabled at (500861): [] 
 > do_softirq+0x95/0xa0
 > <4>[  142.294562]  [] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.294743]  [] show_trace+0x17/0x20
 > <4>[  142.294897]  [] dump_stack+0x1b/0x20
 > <4>[  142.295050]  [] __might_sleep+0xcd/0x100
 > <4>[  142.295220]  [] kmem_cache_alloc+0xa1/0xc0
 > <4>[  142.295527]  [] dup_fd+0x29/0x2d0
 > <4>[  142.295689]  [] copy_files+0x49/0x70
 > <4>[  142.295851]  [] copy_process+0x6af/0x13d0
 > <4>[  142.296019]  [] do_fork+0x70/0x1b0
 > <4>[  142.296178]  [] sys_clone+0x38/0x40
 > <4>[  142.296326]  [] syscall_call+0x7/0xb
 > <4>[  142.296641]  ===
 > 
 > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
 > attached my .config. The interesting thing is that this doesn't show up
 > with v2.3.x head (kernel & config identical). Switching back to 2.6.19
 > doesn't change the picture.
 > 
 > Anyone any idea? No-COW is now both in trunk and 2.3.x, right?

In order to see if it is the effect of the no-cow patch, comment out the
I-pipe portion in copy_one_pte, in mm/memory.c. This code should have an
effect only if you are forking in a real-time application, though.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state

15 matches

Site Navigation

Mail list logo

Footer information