Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Philippe Gerum wrote: > On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote: > >>Philippe Gerum wrote: >> >>>On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: >>> >>> Philippe Gerum wrote: >On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > > >>Jan Kiszka wrote: >> >> >>>Jan Kiszka wrote: >>> >>> >>> 2.6.19 didn't magically start to work as well. Instead I have a back trace now, see attachment. I included a full set of 16k points, but the thrilling things are around -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet stack manager, prio 98). But when returning to Linux again, its IRQs remain masked now. The reason must be that weird exception at -62. Don't know where it comes from and why is there no report about THAT issue in the kernel logs. >>> >>>The cause of this page fault will get tracked down later today, but the >>>way it is handled already causes some doubts to me. To make discussion >>>easier, here is the relevant excerpt from the trace: >> >>Maybe this fault is due to the No-cow patch ? Before the no-cow patch, >>vmalloced areas were added to all processes page directories, now they >>are added only to the page directories of processes with the VM_PINNED >>flag. So, if ipipe_test_root tries to access some module memory area >>over the context of a non-realtime thread, a fault will occur. >> > >Yes, it's a minor fault occurring due to on-demand memory mapping, this >is why we don't get any alarming message in the kernel log. > Looks like it's something that should never happen, for sure. >>> >>> >>>Now that vmalloc & ioremap memory may have their pte set on demand anew >>>due to the nocow patch, minor faults in kernel space are possible again, >>>but this should only happen on behalf of the Linux domain, this is not >>>expected to happen in primary mode. >> >>Does not a primary mode IRQ handler borrow the mmu context from the >>tasks it preempts ? >> > > > Yes, this is where the problem stands if we happen to preempt a regular > task and tread over code which might trigger minor faults. The best way > to check this would be to somehow enable VM_PINNED for all tasks. Back > to square #1. > Ok. I'll try to change this and send a patch ASAP. -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > > > >>Philippe Gerum wrote: > >> > >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > >>> > Jan Kiszka wrote: > > >Jan Kiszka wrote: > > > > > >>2.6.19 didn't magically start to work as well. Instead I have a back > >>trace now, see attachment. > >> > >>I included a full set of 16k points, but the thrilling things are around > >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > >>stack manager, prio 98). But when returning to Linux again, its IRQs > >>remain masked now. The reason must be that weird exception at -62. Don't > >>know where it comes from and why is there no report about THAT issue in > >>the kernel logs. > > > >The cause of this page fault will get tracked down later today, but the > >way it is handled already causes some doubts to me. To make discussion > >easier, here is the relevant excerpt from the trace: > > Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > vmalloced areas were added to all processes page directories, now they > are added only to the page directories of processes with the VM_PINNED > flag. So, if ipipe_test_root tries to access some module memory area > over the context of a non-realtime thread, a fault will occur. > > >>> > >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this > >>>is why we don't get any alarming message in the kernel log. > >>> > >> > >>Looks like it's something that should never happen, for sure. > > > > > > Now that vmalloc & ioremap memory may have their pte set on demand anew > > due to the nocow patch, minor faults in kernel space are possible again, > > but this should only happen on behalf of the Linux domain, this is not > > expected to happen in primary mode. > > Does not a primary mode IRQ handler borrow the mmu context from the > tasks it preempts ? > Yes, this is where the problem stands if we happen to preempt a regular task and tread over code which might trigger minor faults. The best way to check this would be to somehow enable VM_PINNED for all tasks. Back to square #1. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Philippe Gerum wrote: > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > >>Philippe Gerum wrote: >> >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >Jan Kiszka wrote: > > >>2.6.19 didn't magically start to work as well. Instead I have a back >>trace now, see attachment. >> >>I included a full set of 16k points, but the thrilling things are around >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>stack manager, prio 98). But when returning to Linux again, its IRQs >>remain masked now. The reason must be that weird exception at -62. Don't >>know where it comes from and why is there no report about THAT issue in >>the kernel logs. > >The cause of this page fault will get tracked down later today, but the >way it is handled already causes some doubts to me. To make discussion >easier, here is the relevant excerpt from the trace: Maybe this fault is due to the No-cow patch ? Before the no-cow patch, vmalloced areas were added to all processes page directories, now they are added only to the page directories of processes with the VM_PINNED flag. So, if ipipe_test_root tries to access some module memory area over the context of a non-realtime thread, a fault will occur. >>> >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this >>>is why we don't get any alarming message in the kernel log. >>> >> >>Looks like it's something that should never happen, for sure. > > > Now that vmalloc & ioremap memory may have their pte set on demand anew > due to the nocow patch, minor faults in kernel space are possible again, > but this should only happen on behalf of the Linux domain, this is not > expected to happen in primary mode. Does not a primary mode IRQ handler borrow the mmu context from the tasks it preempts ? -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > >> Jan Kiszka wrote: > >>> Jan Kiszka wrote: > >>> > 2.6.19 didn't magically start to work as well. Instead I have a back > trace now, see attachment. > > I included a full set of 16k points, but the thrilling things are around > -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > stack manager, prio 98). But when returning to Linux again, its IRQs > remain masked now. The reason must be that weird exception at -62. Don't > know where it comes from and why is there no report about THAT issue in > the kernel logs. > >>> > >>> The cause of this page fault will get tracked down later today, but the > >>> way it is handled already causes some doubts to me. To make discussion > >>> easier, here is the relevant excerpt from the trace: > >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > >> vmalloced areas were added to all processes page directories, now they > >> are added only to the page directories of processes with the VM_PINNED > >> flag. So, if ipipe_test_root tries to access some module memory area > >> over the context of a non-realtime thread, a fault will occur. > >> > > > > Yes, it's a minor fault occurring due to on-demand memory mapping, this > > is why we don't get any alarming message in the kernel log. > > > > Looks like it's something that should never happen, for sure. Now that vmalloc & ioremap memory may have their pte set on demand anew due to the nocow patch, minor faults in kernel space are possible again, but this should only happen on behalf of the Linux domain, this is not expected to happen in primary mode. > But are we > fine with screwing up the Linux IRQ state nevertheless? In other words, > are we seeing one or two ipipe issues here? The I-pipe would only restore the virtual flag as seen on entry from an exception on behalf of the Linux domain, not in primary mode. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Philippe Gerum wrote: > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>> 2.6.19 didn't magically start to work as well. Instead I have a back trace now, see attachment. I included a full set of 16k points, but the thrilling things are around -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet stack manager, prio 98). But when returning to Linux again, its IRQs remain masked now. The reason must be that weird exception at -62. Don't know where it comes from and why is there no report about THAT issue in the kernel logs. >>> >>> The cause of this page fault will get tracked down later today, but the >>> way it is handled already causes some doubts to me. To make discussion >>> easier, here is the relevant excerpt from the trace: >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch, >> vmalloced areas were added to all processes page directories, now they >> are added only to the page directories of processes with the VM_PINNED >> flag. So, if ipipe_test_root tries to access some module memory area >> over the context of a non-realtime thread, a fault will occur. >> > > Yes, it's a minor fault occurring due to on-demand memory mapping, this > is why we don't get any alarming message in the kernel log. > Looks like it's something that should never happen, for sure. But are we fine with screwing up the Linux IRQ state nevertheless? In other words, are we seeing one or two ipipe issues here? signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Jan Kiszka wrote: > > > >>2.6.19 didn't magically start to work as well. Instead I have a back > >>trace now, see attachment. > >> > >>I included a full set of 16k points, but the thrilling things are around > >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > >>stack manager, prio 98). But when returning to Linux again, its IRQs > >>remain masked now. The reason must be that weird exception at -62. Don't > >>know where it comes from and why is there no report about THAT issue in > >>the kernel logs. > > > > > > The cause of this page fault will get tracked down later today, but the > > way it is handled already causes some doubts to me. To make discussion > > easier, here is the relevant excerpt from the trace: > > Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > vmalloced areas were added to all processes page directories, now they > are added only to the page directories of processes with the VM_PINNED > flag. So, if ipipe_test_root tries to access some module memory area > over the context of a non-realtime thread, a fault will occur. > Yes, it's a minor fault occurring due to on-demand memory mapping, this is why we don't get any alarming message in the kernel log. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Jan Kiszka wrote: > Jan Kiszka wrote: > >>2.6.19 didn't magically start to work as well. Instead I have a back >>trace now, see attachment. >> >>I included a full set of 16k points, but the thrilling things are around >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>stack manager, prio 98). But when returning to Linux again, its IRQs >>remain masked now. The reason must be that weird exception at -62. Don't >>know where it comes from and why is there no report about THAT issue in >>the kernel logs. > > > The cause of this page fault will get tracked down later today, but the > way it is handled already causes some doubts to me. To make discussion > easier, here is the relevant excerpt from the trace: Maybe this fault is due to the No-cow patch ? Before the no-cow patch, vmalloced areas were added to all processes page directories, now they are added only to the page directories of processes with the VM_PINNED flag. So, if ipipe_test_root tries to access some module memory area over the context of a non-realtime thread, a fault will occur. -- Gilles Chanteperdrix ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Jan Kiszka wrote: > 2.6.19 didn't magically start to work as well. Instead I have a back > trace now, see attachment. > > I included a full set of 16k points, but the thrilling things are around > -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > stack manager, prio 98). But when returning to Linux again, its IRQs > remain masked now. The reason must be that weird exception at -62. Don't > know where it comes from and why is there no report about THAT issue in > the kernel logs. The cause of this page fault will get tracked down later today, but the way it is handled already causes some doubts to me. To make discussion easier, here is the relevant excerpt from the trace: > :+func -73+ 1.426 link_path_walk+0x14 > (__link_path_walk+0xca0) > :| +func -720.605 __ipipe_handle_irq+0x14 > (common_interrupt+0x18) > :| +func -710.472 __ipipe_ack_irq+0x8 > (__ipipe_handle_irq+0xaf) > :| +func -700.224 __ipipe_ack_level_irq+0x12 > (__ipipe_ack_irq+0x19) > :| +func -70+ 4.424 mask_and_ack_8259A+0x14 > (__ipipe_ack_level_irq+0x22) > :| +func -660.475 __ipipe_dispatch_wired+0x14 > (__ipipe_handle_irq+0x62) > :| # func -650.974 xnintr_irq_handler+0xe > (__ipipe_dispatch_wired+0x95) > :| # func -64+ 1.892 rtl8139_interrupt+0x11 [rt_8139too] > (xnintr_irq_handler+0x3b) > :| # func -620.382 __ipipe_handle_exception+0xe > (error_code+0x3e) > :| # func -620.222 __ipipe_test_root+0x8 > (__ipipe_handle_exception+0x1a) > :| # func -620.377 __ipipe_stall_root+0x8 > (__ipipe_handle_exception+0x15b) > :| #*func -620.173 trace_hardirqs_off+0xc > (__ipipe_handle_exception+0x165) > :| #*func -610.211 __ipipe_test_root+0x8 > (trace_hardirqs_off+0x2d) > :| #*func -61+ 1.965 do_page_fault+0xe > (__ipipe_handle_exception+0x6d) > : #*func -590.180 trace_hardirqs_on+0x11 > (__ipipe_handle_exception+0xd9) > : #*func -590.163 __ipipe_test_root+0x8 > (trace_hardirqs_on+0x5e) > : #*func -590.396 mark_held_locks+0xe > (trace_hardirqs_on+0x8b) > : #*func -580.212 mark_held_locks+0xe > (trace_hardirqs_on+0xc9) > : #*func -580.461 __ipipe_restore_root+0x8 > (__ipipe_handle_exception+0xe1) > : #*func -580.253 __ipipe_unstall_root+0x8 > (__ipipe_restore_root+0x18) > : # func -570.224 __ipipe_stall_root+0x8 > (ret_from_exception+0x5) > : #*func -570.366 trace_hardirqs_off+0xc > (ret_from_exception+0xe) > : #*func -570.327 __ipipe_test_root+0x8 > (trace_hardirqs_off+0x2d) > :| #*func -57+ 2.089 __ipipe_unstall_iret_root+0x8 > (restore_nocheck_notrace+0x0) > :| #*func -54+ 1.444 alloc_rtskb+0xa [rtnet] > (rtl8139_interrupt+0x182 [rt_8139too]) > :| #*func -53+ 1.172 rt_eth_type_trans+0xe [rtnet] > (rtl8139_interrupt+0x1d6 [rt_8139too]) The fault gets forwarded to Linux because ipipe_trap_notify doesn't choke: we are neither running over a task with PF_EVNOTIFY set nor over a kernel thread yet (IPIPE_NOSTACK_FLAG). Still, we are already in primary domain, so I wonder if this forwarding is intentional. At least it seems to break some things later on... Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > > > Philippe Gerum wrote: > > > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > > > Hi, > > > > > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the > slave > > > > > startup procedure: > > > > > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 > us (min/max: 33/38 us) > > > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > > > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > > > > > <4>[ 142.291767] [] show_trace+0x17/0x20 > > > > > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > > > > > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > > > > > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > > > > > <4>[ 142.292323] [] sys_clone+0x38/0x40 > > > > > <4>[ 142.292620] [] syscall_call+0x7/0xb > > > > > <4>[ 142.292747] === > > > > > <3>[ 142.292860] BUG: sleeping function called from invalid > context at mm/slab.c:3034 > > > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > > > > > > > > > > Typical of something going wrong in entry.S. > > > > > > You mean, interrupts are not really disabled when forking ? :-) > > > > > > > Eh, mmmh, no. Hopefully. > > > > > So, I am afraid the new fpu_counter optimization is buggy: if a task > > > forks with fpu_counter greater than 5 and is preempted right after > > > prepare_to_copy in dup_task_struct, when the system switches back to > > > this task, the task FPU context will be restored and TS_USEDFPU set in > > > the task flags, thereby voiding the effect of prepare_to_copy. > > > > > > > You mean that the parent FPU context would leak into the child's one? > > Yes, something like that. The result is random segfaults, I do not > remember exactly why. > > > Well, maybe the LKML people would like to know about this. As a > > sidenote, I don't see anything bad with your latest counter-measure > > disabling this optimization in Xenomai's context switch code, even in > > the bugous case above. Right? > > Right, if there are random segfaults, they will not be xenomai's fault. > I'm currently sorting the symptoms again, or better I'm looking where they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a re-check. It appears now that the tracer played an important role, but I'm not 100% sure yet. I'll keep you posted. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Philippe Gerum wrote: > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > > Philippe Gerum wrote: > > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > > Hi, > > > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the > > slave > > > > startup procedure: > > > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 > > us (min/max: 33/38 us) > > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > > > > <4>[ 142.291767] [] show_trace+0x17/0x20 > > > > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > > > > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > > > > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > > > > <4>[ 142.292323] [] sys_clone+0x38/0x40 > > > > <4>[ 142.292620] [] syscall_call+0x7/0xb > > > > <4>[ 142.292747] === > > > > <3>[ 142.292860] BUG: sleeping function called from invalid context > > at mm/slab.c:3034 > > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > > > > > > > Typical of something going wrong in entry.S. > > > > You mean, interrupts are not really disabled when forking ? :-) > > > > Eh, mmmh, no. Hopefully. > > > So, I am afraid the new fpu_counter optimization is buggy: if a task > > forks with fpu_counter greater than 5 and is preempted right after > > prepare_to_copy in dup_task_struct, when the system switches back to > > this task, the task FPU context will be restored and TS_USEDFPU set in > > the task flags, thereby voiding the effect of prepare_to_copy. > > > > You mean that the parent FPU context would leak into the child's one? Yes, something like that. The result is random segfaults, I do not remember exactly why. > Well, maybe the LKML people would like to know about this. As a > sidenote, I don't see anything bad with your latest counter-measure > disabling this optimization in Xenomai's context switch code, even in > the bugous case above. Right? Right, if there are random segfaults, they will not be xenomai's fault. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > Hi, > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > > startup procedure: > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us > (min/max: 33/38 us) > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > > > <4>[ 142.291767] [] show_trace+0x17/0x20 > > > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > > > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > > > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > > > <4>[ 142.292323] [] sys_clone+0x38/0x40 > > > <4>[ 142.292620] [] syscall_call+0x7/0xb > > > <4>[ 142.292747] === > > > <3>[ 142.292860] BUG: sleeping function called from invalid context at > mm/slab.c:3034 > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > > > > Typical of something going wrong in entry.S. > > You mean, interrupts are not really disabled when forking ? :-) > Eh, mmmh, no. Hopefully. > So, I am afraid the new fpu_counter optimization is buggy: if a task > forks with fpu_counter greater than 5 and is preempted right after > prepare_to_copy in dup_task_struct, when the system switches back to > this task, the task FPU context will be restored and TS_USEDFPU set in > the task flags, thereby voiding the effect of prepare_to_copy. > You mean that the parent FPU context would leak into the child's one? Well, maybe the LKML people would like to know about this. As a sidenote, I don't see anything bad with your latest counter-measure disabling this optimization in Xenomai's context switch code, even in the bugous case above. Right? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Philippe Gerum wrote: > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > Hi, > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > startup procedure: > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us > > (min/max: 33/38 us) > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > > <4>[ 142.291767] [] show_trace+0x17/0x20 > > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > > <4>[ 142.292323] [] sys_clone+0x38/0x40 > > <4>[ 142.292620] [] syscall_call+0x7/0xb > > <4>[ 142.292747] === > > <3>[ 142.292860] BUG: sleeping function called from invalid context at > > mm/slab.c:3034 > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > Typical of something going wrong in entry.S. You mean, interrupts are not really disabled when forking ? :-) So, I am afraid the new fpu_counter optimization is buggy: if a task forks with fpu_counter greater than 5 and is preempted right after prepare_to_copy in dup_task_struct, when the system switches back to this task, the task FPU context will be restored and TS_USEDFPU set in the task flags, thereby voiding the effect of prepare_to_copy. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us > (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [] show_trace+0x17/0x20 > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > <4>[ 142.292323] [] sys_clone+0x38/0x40 > <4>[ 142.292620] [] syscall_call+0x7/0xb > <4>[ 142.292747] === > <3>[ 142.292860] BUG: sleeping function called from invalid context at > mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 Typical of something going wrong in entry.S. > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [] > __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [] > ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [] > __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [] > do_softirq+0x95/0xa0 > <4>[ 142.294562] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [] show_trace+0x17/0x20 > <4>[ 142.294897] [] dump_stack+0x1b/0x20 > <4>[ 142.295050] [] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [] copy_files+0x49/0x70 > <4>[ 142.295851] [] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [] do_fork+0x70/0x1b0 > <4>[ 142.296178] [] sys_clone+0x38/0x40 > <4>[ 142.296326] [] syscall_call+0x7/0xb > <4>[ 142.296641] === > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > Could you disable the tracer and remove the xnpod_suspend_thread() patch, then downgrade to 1.6-04 to remove the COW support? TIA, > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? > Only with the I-pipe patches from the 1.7 series on x86. > Jan > > ___ > Xenomai-core mailing list > Xenomai-core@gna.org > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us > (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [] show_trace+0x17/0x20 > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > <4>[ 142.292323] [] sys_clone+0x38/0x40 > <4>[ 142.292620] [] syscall_call+0x7/0xb > <4>[ 142.292747] === > <3>[ 142.292860] BUG: sleeping function called from invalid context at > mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [] > __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [] > ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [] > __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [] > do_softirq+0x95/0xa0 > <4>[ 142.294562] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [] show_trace+0x17/0x20 > <4>[ 142.294897] [] dump_stack+0x1b/0x20 > <4>[ 142.295050] [] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [] copy_files+0x49/0x70 > <4>[ 142.295851] [] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [] do_fork+0x70/0x1b0 > <4>[ 142.296178] [] sys_clone+0x38/0x40 > <4>[ 142.296326] [] syscall_call+0x7/0xb > <4>[ 142.296641] === > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? alloc_page_vma in copy_one_pte uses the GFP_HIGHUSER flag, whereas the bug suggests we are in an atomic section, so maybe we should use GFP_ATOMIC | __GFP_HIGHMEM -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us > (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [] show_trace+0x17/0x20 > <4>[ 142.291896] [] dump_stack+0x1b/0x20 > <4>[ 142.292026] [] copy_process+0x914/0x13d0 > <4>[ 142.292190] [] do_fork+0x70/0x1b0 > <4>[ 142.292323] [] sys_clone+0x38/0x40 > <4>[ 142.292620] [] syscall_call+0x7/0xb > <4>[ 142.292747] === > <3>[ 142.292860] BUG: sleeping function called from invalid context at > mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [] > __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [] > ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [] > __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [] > do_softirq+0x95/0xa0 > <4>[ 142.294562] [] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [] show_trace+0x17/0x20 > <4>[ 142.294897] [] dump_stack+0x1b/0x20 > <4>[ 142.295050] [] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [] copy_files+0x49/0x70 > <4>[ 142.295851] [] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [] do_fork+0x70/0x1b0 > <4>[ 142.296178] [] sys_clone+0x38/0x40 > <4>[ 142.296326] [] syscall_call+0x7/0xb > <4>[ 142.296641] === > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? In order to see if it is the effect of the no-cow patch, comment out the I-pipe portion in copy_one_pte, in mm/memory.c. This code should have an effect only if you are forking in a real-time application, though. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core