[Xenomai-core] [PATCH] fix 16550A for 2.4
Hi, it takes more to get the serial driver running on 2.4 kernels. The current trunk version does not let the user set any module parameter (effectively commented out). This patch fixes it. Jan Index: ksrc/drivers/16550A/16550A.c === --- ksrc/drivers/16550A/16550A.c (Revision 287) +++ ksrc/drivers/16550A/16550A.c (Arbeitskopie) @@ -114,26 +114,34 @@ static struct rtdm_device *device[MAX_DEVICES]; static unsigned longioaddr[MAX_DEVICES]; -static int ioaddr_c; static unsigned int irq[MAX_DEVICES]; -static int irq_c; static unsigned int baud_base[MAX_DEVICES]; -static int baud_base_c; static int tx_fifo[MAX_DEVICES]; -static int tx_fifo_c; static unsigned int start_index; #if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,0) +static int ioaddr_c; +static int irq_c; +static int baud_base_c; +static int tx_fifo_c; + module_param_array(ioaddr, ulong, ioaddr_c, 0400); +module_param_array(irq, uint, irq_c, 0400); +module_param_array(baud_base, uint, baud_base_c, 0400); +module_param_array(tx_fifo, int, tx_fifo_c, 0400); +#else /* LINUX_VERSION_CODE KERNEL_VERSION(2,6,0) */ +MODULE_PARM(ioaddr, 1- __MODULE_STRING(MAX_DEVICES) i); +MODULE_PARM(irq, 1- __MODULE_STRING(MAX_DEVICES) i); +MODULE_PARM(baud_base, 1- __MODULE_STRING(MAX_DEVICES) i); +MODULE_PARM(tx_fifo, 1- __MODULE_STRING(MAX_DEVICES) i); +#endif /* LINUX_VERSION_CODE = KERNEL_VERSION(2,6,0) */ + MODULE_PARM_DESC(ioaddr, I/O addresses of the serial devices); -module_param_array(irq, uint, irq_c, 0400); MODULE_PARM_DESC(irq, IRQ numbers of the serial devices); -module_param_array(baud_base, uint, baud_base_c, 0400); MODULE_PARM_DESC(baud_base, Maximum baud rate of the serial device (internal clock rate / 16)); -module_param_array(tx_fifo, int, tx_fifo_c, 0400); MODULE_PARM_DESC(tx_fifo, Transmitter FIFO size); -#endif /* LINUX_VERSION_CODE = KERNEL_VERSION(2,6,0) */ + module_param(start_index, uint, 0400); MODULE_PARM_DESC(start_index, First device instance number to be used); @@ -1026,7 +1034,7 @@ device_class: RTDM_CLASS_SERIAL, device_sub_class: RTDM_SUBCLASS_16550A, driver_name:xeno_16550A, -driver_version: RTDM_DRIVER_VER(1, 2, 1), +driver_version: RTDM_DRIVER_VER(1, 2, 2), peripheral_name:UART 16550A, provider_name: Jan Kiszka, }; @@ -1040,10 +1048,15 @@ int i; -if (irq_c ioaddr_c) -return -EINVAL; +for (i = 0; i MAX_DEVICES; i++) { +if (!ioaddr[i]) +continue; -for (i = 0; i ioaddr_c; i++) { +ret = -EINVAL; +if (!irq[i]) { +goto cleanup_out; +} + dev = kmalloc(sizeof(struct rtdm_device), GFP_KERNEL); ret = -ENOMEM; if (!dev) signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] debug_maxlat as module_param
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, this tiny patch exports the NMI watchdog's threshold as module parameter debug_maxlat of xeno_hal. This means that one can either override the value via a kernel parameter or in sysfs/modules/xeno_hal/parameters (before activating the hal). rthal_maxlat_tsc should be updated when rthal_maxlat_us_arg changes. I didn't digged that deep, is this possible during runtime? My current workflow looks like this: unload xeno_nucleus (and higher modules), change maxlat, reload the modules. Mmh, I guess one has to register some update handler with sysfs in that case. Any hint where to look for a pattern? In nucleus/module.c, the latency parameter may be read and written at run-time through procfs, have a look at the functions latency_read_proc and latency_write_proc. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [bug] don't try this at home...
Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. We need that back-tracer soon - did I mentioned this before? ;) BTW, a kernel timer latency test based on a RTDM device is half-done. I'm able to dump kernel-based timed-task latencies via a patched testsuite latency. Histograms need to be added as well as a timer handler latency test. Will keep you posted. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [bug] don't try this at home...
Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. BTW, a kernel timer latency test based on a RTDM device is half-done. I'm able to dump kernel-based timed-task latencies via a patched testsuite latency. Histograms need to be added as well as a timer handler latency test. Will keep you posted. Ack. This would also cleanly solve the where-am-i-going-to-put-that-stuff issue wrt the latency kernel module the user-space section cannot/should not have to compile anymore in 2.1. I guess that moving it to the ksrc/drivers/ section would then be the most natural thing to do. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [bug] don't try this at home...
Philippe Gerum wrote: Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. Yea, I guess there are a few more of such complex call chains inside the core lock, at least when looking at the native skin. For a regression test suite, we should define load scenarios of low-prio realtime tasks doing some init/cleanup and communication while e.g. the latency test is running. This should give a clearer picture what numbers you can expect in a normal application scenarios. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. The advantage of Fu's mcount-based tracer will be that is can capture also functions you do not expect, e.g. accidentally called kernel services. His patch, likely against Adeos, will enable kernel-wide function tracing which you can use to instrument IRQ-off paths or (in a second step or so) others things you are interested in. And it will maintain a FULL calling history, something that NMI can't do. NMI will still be useful for hard lock-ups, LTT for a more global view what's happening, but the mcount-instrumentation should give deep insights on the core's and skin's critical timing behaviours. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [bug] don't try this at home...
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. Yea, I guess there are a few more of such complex call chains inside the core lock, at least when looking at the native skin. For a regression test suite, we should define load scenarios of low-prio realtime tasks doing some init/cleanup and communication while e.g. the latency test is running. This should give a clearer picture what numbers you can expect in a normal application scenarios. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. The advantage of Fu's mcount-based tracer will be that is can capture also functions you do not expect, e.g. accidentally called kernel services. His patch, likely against Adeos, will enable kernel-wide function tracing which you can use to instrument IRQ-off paths or (in a second step or so) others things you are interested in. And it will maintain a FULL calling history, something that NMI can't do. NMI will still be useful for hard lock-ups, LTT for a more global view what's happening, but the mcount-instrumentation should give deep insights on the core's and skin's critical timing behaviours. No problem. I've just suggested to build a bicycle to go to the shop around the corner, but if you tell me that a spaceship to visit Venus is at hand, I'll wait for it: shopping can wait. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] debug_maxlat as module_param
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Hi, this tiny patch exports the NMI watchdog's threshold as module parameter debug_maxlat of xeno_hal. This means that one can either override the value via a kernel parameter or in sysfs/modules/xeno_hal/parameters (before activating the hal). rthal_maxlat_tsc should be updated when rthal_maxlat_us_arg changes. I didn't digged that deep, is this possible during runtime? My current workflow looks like this: unload xeno_nucleus (and higher modules), change maxlat, reload the modules. Mmh, I guess one has to register some update handler with sysfs in that case. Any hint where to look for a pattern? In nucleus/module.c, the latency parameter may be read and written at run-time through procfs, have a look at the functions latency_read_proc and latency_write_proc. -- Gilles Chanteperdrix.
Re: [Xenomai-core] [bug] don't try this at home...
Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. We need that back-tracer soon - did I mentioned this before? ;) BTW, a kernel timer latency test based on a RTDM device is half-done. I'm able to dump kernel-based timed-task latencies via a patched testsuite latency. Histograms need to be added as well as a timer handler latency test. Will keep you posted. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [bug] don't try this at home...
Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. BTW, a kernel timer latency test based on a RTDM device is half-done. I'm able to dump kernel-based timed-task latencies via a patched testsuite latency. Histograms need to be added as well as a timer handler latency test. Will keep you posted. Ack. This would also cleanly solve the where-am-i-going-to-put-that-stuff issue wrt the latency kernel module the user-space section cannot/should not have to compile anymore in 2.1. I guess that moving it to the ksrc/drivers/ section would then be the most natural thing to do. Jan -- Philippe.
Re: [Xenomai-core] [bug] don't try this at home...
Philippe Gerum wrote: Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. Yea, I guess there are a few more of such complex call chains inside the core lock, at least when looking at the native skin. For a regression test suite, we should define load scenarios of low-prio realtime tasks doing some init/cleanup and communication while e.g. the latency test is running. This should give a clearer picture what numbers you can expect in a normal application scenarios. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. The advantage of Fu's mcount-based tracer will be that is can capture also functions you do not expect, e.g. accidentally called kernel services. His patch, likely against Adeos, will enable kernel-wide function tracing which you can use to instrument IRQ-off paths or (in a second step or so) others things you are interested in. And it will maintain a FULL calling history, something that NMI can't do. NMI will still be useful for hard lock-ups, LTT for a more global view what's happening, but the mcount-instrumentation should give deep insights on the core's and skin's critical timing behaviours. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [bug] don't try this at home...
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe Gerum wrote: ... Fixed. The cause was related to the thread migration routine to primary mode (xnshadow_harden), which would spuriously call the Linux rescheduling procedure from the primary domain under certain circumstances. This bug only triggers on preemptible kernels. This also fixes the spinlock recursion issue which is sometimes triggered when the spinlock debug option is active. Gasp. I've found a severe regression with this fix, so more work is needed. More later. End of alert. Should be ok now. No crashes so far, looks good. But the final test, a box which always went to hell very quickly, is still waiting in my office - more on Monday. Anyway, there seems to be some latency issues pending. I discovered this again with my migration test. Please give it a try on a mid- (800 MHz Athlon in my case) to low-end box. On that Athlon I got peaks of over 100 us in the userspace latency test right on starting migration. The Athlon does not support the NMI watchdog, but on my 1.4 GHz Notebook there were alarms (30 us) hitting in the native registry during rt_task_create. I have no clue yet if anything is broken there. I suspect that rt_registry_enter() is inherently a long operation when considered as a non-preemptible sum of reasonably short ones. Since it is always called with interrupts enabled, we should split the work in there, releasing interrupts in the middle. The tricky thing is that we must ensure that the new registration slot is not exposed in a half-baked state during the preemptible section. Yea, I guess there are a few more of such complex call chains inside the core lock, at least when looking at the native skin. For a regression test suite, we should define load scenarios of low-prio realtime tasks doing some init/cleanup and communication while e.g. the latency test is running. This should give a clearer picture what numbers you can expect in a normal application scenarios. We need that back-tracer soon - did I mentioned this before? ;) Well, we have a backtrace support for detecting latency peaks, but it's dependent on NMI availability. The thing is that not every platform provides a programmable NMI support. A possible option would be to overload the existing LTT tracepoints in order to keep an execution backtrace, so that we would not have to rely on any hw support. The advantage of Fu's mcount-based tracer will be that is can capture also functions you do not expect, e.g. accidentally called kernel services. His patch, likely against Adeos, will enable kernel-wide function tracing which you can use to instrument IRQ-off paths or (in a second step or so) others things you are interested in. And it will maintain a FULL calling history, something that NMI can't do. NMI will still be useful for hard lock-ups, LTT for a more global view what's happening, but the mcount-instrumentation should give deep insights on the core's and skin's critical timing behaviours. No problem. I've just suggested to build a bicycle to go to the shop around the corner, but if you tell me that a spaceship to visit Venus is at hand, I'll wait for it: shopping can wait. -- Philippe.