Re: [Xenomai-core] latency -t 1 crashes on SMP.
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > o instrument the xenomai exception handler with an ipipe_trace_freeze() > > > (something which should be merged into SVN later) > > > > The I-pipe tracer as well as Linux NMI handling have the same issue > > which is the cause for the exceptions 14: they rely on a correct > > "current" to read the processor id. The first attached patch, to be > > applied after the ipipe patch, add an ipipe_processor_id_notrace > > function, which is called instead of raw_smp_processor_id() in the > > second, I-pipe tracer, attached patch. > > > > Ah, good point! But why not make the original __ipipe_*_cpuid functions > non-traced? I don't think we would loose significant information. The attached patches do that. -- Gilles Chanteperdrix. --- linux/arch/i386/kernel/ipipe-root.c~2006-04-13 17:11:19.0 +0200 +++ linux/arch/i386/kernel/ipipe-root.c 2006-04-13 17:26:59.0 +0200 @@ -85,13 +85,14 @@ static void __ipipe_null_handler(unsigne #ifdef CONFIG_SMP -static int __ipipe_boot_cpuid(void) +static __attribute__((no_instrument_function)) int __ipipe_boot_cpuid(void) { return 0; } u8 __ipipe_apicid_2_cpu[IPIPE_NR_CPUS]; +#ifndef CONFIG_IPIPE_TRACE static int __ipipe_hard_cpuid(void) { unsigned long flags; @@ -102,6 +103,18 @@ static int __ipipe_hard_cpuid(void) local_irq_restore_hw(flags); return cpu; } +#else /* CONFIG_IPIPE_TRACE */ +static notrace int __ipipe_hard_cpuid(void) +{ + unsigned long flags; + int cpu; + + local_irq_save_hw_notrace(flags); + cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))]; + local_irq_restore_hw_notrace(flags); + return cpu; +} +#endif /* CONFIG_IPIPE_TRACE */ int (*__ipipe_logical_cpuid)(void) = &__ipipe_boot_cpuid; --- linux/include/asm-i386/ipipe.h~ 2006-04-13 17:11:19.0 +0200 +++ linux/include/asm-i386/ipipe.h 2006-04-13 17:14:24.0 +0200 @@ -97,7 +97,7 @@ #define IPIPE_CRITICAL_VECTOR 0xf9/* Used by ipipe_critical_enter/exit() */ #define IPIPE_CRITICAL_IPI (IPIPE_CRITICAL_VECTOR - FIRST_EXTERNAL_VECTOR) -static inline int ipipe_processor_id(void) +static inline __attribute__((no_instrument_function)) int ipipe_processor_id(void) { extern int (*__ipipe_logical_cpuid)(void); return __ipipe_logical_cpuid(); --- common/kernel/ipipe/Kconfig +++ /home/rpm/gna/adeos/ipipe/v2.6/common/kernel/ipipe/Kconfig.trace 2006-01-09 03:41:35.0 +0100 @@ -13,3 +13,24 @@ config IPIPE_STATS Activate this option if you want runtime statistics to be collected while the I-pipe is operating. This option adds a small overhead, but is useful to detect unexpected latency spots. + +config IPIPE_TRACE + bool "Latency Tracing" + depends on IPIPE && PROC_FS + select FRAME_POINTER + default n + ---help--- + Activate this option if you want to use per-function tracing of + the kernel. The tracer will collect data via instrumentation + features like the one below or with the help of explicite calls + of ipipe_trace_xxx(). See include/linux/ipipe_trace.h for the + in-kernel tracing API. The collected data and runtime control + is available via /proc/ipipe/trace/*. + +config IPIPE_TRACE_IRQSOFF + bool "Trace IRQs-off times" + depends on IPIPE_TRACE + default y + ---help--- + Activate this option if I-pipe shall trace the longest path + with hard-IRQs switched off. --- 2.6.16/Makefile 2006-03-20 06:53:29.0 +0100 +++ 2.6.16-ipipe/Makefile 2006-03-21 14:52:40.0 +0100 @@ -511,11 +511,15 @@ CFLAGS+= $(call add-align,CONFIG_CC_AL CFLAGS += $(call add-align,CONFIG_CC_ALIGN_LOOPS,-loops) CFLAGS += $(call add-align,CONFIG_CC_ALIGN_JUMPS,-jumps) +ifdef CONFIG_IPIPE_TRACE +CFLAGS += -pg -fno-omit-frame-pointer $(call cc-option,-fno-optimize-sibling-calls,) +else ifdef CONFIG_FRAME_POINTER CFLAGS += -fno-omit-frame-pointer $(call cc-option,-fno-optimize-sibling-calls,) else CFLAGS += -fomit-frame-pointer endif +endif ifdef CONFIG_DEBUG_INFO CFLAGS += -g --- 2.6.16/include/linux/ipipe_trace.h 1970-01-01 01:00:00.0 +0100 +++ 2.6.16-ipipe/include/linux/ipipe_trace.h2006-01-31 09:06:32.0 +0100 @@ -0,0 +1,37 @@ +/* -*- linux-c -*- + * include/linux/ipipe_trace.h + * + * Copyright (C) 2005 Luotao Fu. + * 2005 Jan Kiszka. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139, + * USA; either version 2 of the License, or (at your option) any later + * version. + * + * This
Re: [Xenomai-core] latency -t 1 crashes on SMP.
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > o instrument the xenomai exception handler with an ipipe_trace_freeze() > > (something which should be merged into SVN later) > > The I-pipe tracer as well as Linux NMI handling have the same issue > which is the cause for the exceptions 14: they rely on a correct > "current" to read the processor id. The first attached patch, to be > applied after the ipipe patch, add an ipipe_processor_id_notrace > function, which is called instead of raw_smp_processor_id() in the > second, I-pipe tracer, attached patch. > Ah, good point! But why not make the original __ipipe_*_cpuid functions non-traced? I don't think we would loose significant information. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] latency -t 1 crashes on SMP.
Jan Kiszka wrote: > o instrument the xenomai exception handler with an ipipe_trace_freeze() > (something which should be merged into SVN later) The I-pipe tracer as well as Linux NMI handling have the same issue which is the cause for the exceptions 14: they rely on a correct "current" to read the processor id. The first attached patch, to be applied after the ipipe patch, add an ipipe_processor_id_notrace function, which is called instead of raw_smp_processor_id() in the second, I-pipe tracer, attached patch. -- Gilles Chanteperdrix. --- linux/arch/i386/kernel/ipipe-root.c~2006-04-13 15:21:05.0 +0200 +++ linux/arch/i386/kernel/ipipe-root.c 2006-04-13 14:55:29.0 +0200 @@ -107,6 +107,30 @@ int (*__ipipe_logical_cpuid)(void) = &__ EXPORT_SYMBOL(__ipipe_logical_cpuid); +#ifdef CONFIG_IPIPE_TRACE + +static notrace int __ipipe_boot_cpuid_notrace(void) +{ +return 0; +} + +static notrace int __ipipe_hard_cpuid_notrace(void) +{ + unsigned long flags; + int cpu; + + local_irq_save_hw_notrace(flags); + cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))]; + local_irq_restore_hw_notrace(flags); + return cpu; +} + +int (*__ipipe_logical_cpuid_notrace)(void) = &__ipipe_boot_cpuid_notrace; + +EXPORT_SYMBOL(__ipipe_logical_cpuid_notrace); + +#endif + #endif /* CONFIG_SMP */ #endif /* CONFIG_X86_LOCAL_APIC */ @@ -223,6 +247,10 @@ void __init __ipipe_enable_pipeline(void * trap_init(), so it's safe to use it. */ __ipipe_logical_cpuid = &__ipipe_hard_cpuid; +#ifdef CONFIG_IPIPE_TRACE +__ipipe_logical_cpuid_notrace = &__ipipe_hard_cpuid_notrace; +#endif + #endif /* CONFIG_SMP */ --- linux/include/asm-i386/ipipe.h~ 2006-04-13 15:21:34.0 +0200 +++ linux/include/asm-i386/ipipe.h 2006-04-13 14:49:03.0 +0200 @@ -103,6 +103,15 @@ static inline int ipipe_processor_id(voi return __ipipe_logical_cpuid(); } +#ifdef CONFIG_IPIPE_TRACE +static notrace inline int ipipe_processor_id_notrace(void) +{ + extern int (*__ipipe_logical_cpuid_notrace)(void); + return __ipipe_logical_cpuid_notrace(); +} + +#endif + extern u8 __ipipe_apicid_2_cpu[]; #define ipipe_note_apicid(apicid,cpu) \ --- common/kernel/ipipe/Kconfig +++ /home/rpm/gna/adeos/ipipe/v2.6/common/kernel/ipipe/Kconfig.trace 2006-01-09 03:41:35.0 +0100 @@ -13,3 +13,24 @@ config IPIPE_STATS Activate this option if you want runtime statistics to be collected while the I-pipe is operating. This option adds a small overhead, but is useful to detect unexpected latency spots. + +config IPIPE_TRACE + bool "Latency Tracing" + depends on IPIPE && PROC_FS + select FRAME_POINTER + default n + ---help--- + Activate this option if you want to use per-function tracing of + the kernel. The tracer will collect data via instrumentation + features like the one below or with the help of explicite calls + of ipipe_trace_xxx(). See include/linux/ipipe_trace.h for the + in-kernel tracing API. The collected data and runtime control + is available via /proc/ipipe/trace/*. + +config IPIPE_TRACE_IRQSOFF + bool "Trace IRQs-off times" + depends on IPIPE_TRACE + default y + ---help--- + Activate this option if I-pipe shall trace the longest path + with hard-IRQs switched off. --- 2.6.16/Makefile 2006-03-20 06:53:29.0 +0100 +++ 2.6.16-ipipe/Makefile 2006-03-21 14:52:40.0 +0100 @@ -511,11 +511,15 @@ CFLAGS+= $(call add-align,CONFIG_CC_AL CFLAGS += $(call add-align,CONFIG_CC_ALIGN_LOOPS,-loops) CFLAGS += $(call add-align,CONFIG_CC_ALIGN_JUMPS,-jumps) +ifdef CONFIG_IPIPE_TRACE +CFLAGS += -pg -fno-omit-frame-pointer $(call cc-option,-fno-optimize-sibling-calls,) +else ifdef CONFIG_FRAME_POINTER CFLAGS += -fno-omit-frame-pointer $(call cc-option,-fno-optimize-sibling-calls,) else CFLAGS += -fomit-frame-pointer endif +endif ifdef CONFIG_DEBUG_INFO CFLAGS += -g --- 2.6.16/include/linux/ipipe_trace.h 1970-01-01 01:00:00.0 +0100 +++ 2.6.16-ipipe/include/linux/ipipe_trace.h2006-01-31 09:06:32.0 +0100 @@ -0,0 +1,37 @@ +/* -*- linux-c -*- + * include/linux/ipipe_trace.h + * + * Copyright (C) 2005 Luotao Fu. + * 2005 Jan Kiszka. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139, + * USA; either version 2 of the License, or (at your option) any later + * version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR
Re: [Xenomai-core] latency -t 1 crashes on SMP.
Philippe Gerum wrote: > Gilles Chanteperdrix wrote: > > I tried latency -t 1 on an SMP machines, and observed a very > > reproducible crash. Most of the time, the machine locks up > > completely. The rest of the time, I get : > > > > Xenomai: suspending kernel thread e7559044 ('timerbench') at 0xb01051f2 > > after ex > > ception #14 > > > > And the system remains runnable, but 0xb01051f2 is not a valid kernel > > module text address. > > Looks like a kernel-based task stack address on x86. Actually, this address really is a kernel text address, because I use the CONFIG_VMSPLIT_3G_OPT, which seems a way to configure a machine with 1GB RAM. The crash also happens with CONFIG_VMSPLIT_3G, but the text address is 0xc01051f2. I tried adding a call to show_stack() in xnpod_fault_handler, and it shows that the crash happens right after the call to rthal_nmi_arm in xnarch_program_timer_shot: Xenomai: suspending kernel thread eeed5044 ('timerbench') at 0xc01051f2 after ex ception #14 ee8c1768 c02d24c1 ee8c1798 f8ff868c eeed53b0 c01051f2 000e bfa49f69 58454e4f 000e ee8c17a4 f900aca1 ee8c17b0 ee8c17c0 f8ff57b3 ee8c17b0 000e ee8c183c 0082 ee8c17dc c0259df0 Call Trace: [] show_stack_log_lvl+0xaa/0xe0 [] show_stack+0x21/0x30 [] xnpod_fault_handler+0x8c/0x220 [xeno_nucleus] [] xnpod_trap_fault+0x61/0x70 [xeno_nucleus] [] xnarch_trap_fault+0x23/0x30 [xeno_nucleus] [] exception_event+0x70/0x80 [] __ipipe_dispatch_event+0x103/0x130 [] __ipipe_handle_exception+0x2a/0xf0 [] error_code+0x54/0x64 [] nmi_stack_correct+0x1d/0x22 [] xntimer_do_start_aperiodic+0x441/0x560 [xeno_nucleus] [] xntimer_start+0x130/0x350 [xeno_nucleus] [] xnpod_suspend_thread+0x7ad/0xd40 [xeno_nucleus] [] rtdm_task_sleep_until+0x179/0x3d0 [xeno_rtdm] [] timer_task_proc+0x1c6/0x450 [xeno_timerbench] [] xnarch_thread_redirect+0x5d/0x80 [xeno_nucleus] [<>] _stext+0x3feffd68/0x8 And now the most interesting: when compiling Xenomai in the kernel, (which moves Xenomai code out of vmalloc'ed memory) the bug disappears. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] latency -t 1 crashes on SMP.
Gilles Chanteperdrix wrote: I tried latency -t 1 on an SMP machines, and observed a very reproducible crash. Most of the time, the machine locks up completely. The rest of the time, I get : Xenomai: suspending kernel thread e7559044 ('timerbench') at 0xb01051f2 after ex ception #14 And the system remains runnable, but 0xb01051f2 is not a valid kernel module text address. Looks like a kernel-based task stack address on x86. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] latency -t 1 crashes on SMP.
Gilles Chanteperdrix wrote: > I tried latency -t 1 on an SMP machines, and observed a very > reproducible crash. Most of the time, the machine locks up > completely. The rest of the time, I get : > > Xenomai: suspending kernel thread e7559044 ('timerbench') at 0xb01051f2 after > ex > ception #14 > > And the system remains runnable, but 0xb01051f2 is not a valid kernel > module text address. > > The lock up does not seem to be detected by any Xenomai or Linux > debug or watchdog. Only enabling the NMI watchdog seems to > systematically produce the exception 14 instead of the lockup. > > I ve tried to put a printk at the beginning of timer_task_proc outer > loop. It get printed once when getting the exception 14, and twice when > getting the lock up. > > Any idea where to look ? > Two proposals to collect information: o give KGDB a try o instrument the xenomai exception handler with an ipipe_trace_freeze() (something which should be merged into SVN later) Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] latency -t 1 crashes on SMP.
I tried latency -t 1 on an SMP machines, and observed a very reproducible crash. Most of the time, the machine locks up completely. The rest of the time, I get : Xenomai: suspending kernel thread e7559044 ('timerbench') at 0xb01051f2 after ex ception #14 And the system remains runnable, but 0xb01051f2 is not a valid kernel module text address. The lock up does not seem to be detected by any Xenomai or Linux debug or watchdog. Only enabling the NMI watchdog seems to systematically produce the exception 14 instead of the lockup. I ve tried to put a printk at the beginning of timer_task_proc outer loop. It get printed once when getting the exception 14, and twice when getting the lock up. Any idea where to look ? -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core