Am 19.01.2013 um 14:29 schrieb Gilles Chanteperdrix: > On 01/17/2013 02:30 PM, Bas Laarhoven wrote: > >> On 17-1-2013 9:53, Gilles Chanteperdrix wrote: >>> On 01/17/2013 08:59 AM, Bas Laarhoven wrote: >>> >>>> On 16-1-2013 20:36, Michael Haberler wrote: >>>>> Am 16.01.2013 um 17:45 schrieb Bas Laarhoven: >>>>> >>>>>> On 16-1-2013 15:15, Michael Haberler wrote: >>>>>>> ARM work: >>>>>>> >>>>>>> Several people have been able to get the Beaglebone ubuntu/xenomai >>>>>>> setup working as outlined here: >>>>>>> http://wiki.linuxcnc.org/cgi-bin/wiki.pl?BeagleboneDevsetup >>>>>>> I have updated the kernel and rootfs image a few days ago so the kernel >>>>>>> includes ext2/3/4 support compiled in, which should take care of two >>>>>>> failure reports I got. >>>>>>> >>>>>>> Again that xenomai kernel is based on 3.2.21; it works very stable for >>>>>>> me but there have been several reports of 'sudden stops'. The BB is a >>>>>>> bit sensitive to power fluctuations but it might be more than that. As >>>>>>> for that kernel, it works, but it is based on a branch which will see >>>>>>> no further development. It supports most of the stuff needed to >>>>>>> development; there might be some patches coming from more active BB >>>>>>> users than me. >>>>>> Hi Michael, >>>>>> >>>>>> Are you saying you don't have seen these 'sudden stops' yourself? >>>>> No, never, after swapping to stronger power supplies; I have two of these >>>>> boards running over NFS all the time. I dont have Linuxcnc running on >>>>> them though, I'll do that and see if that changes the picture. Maybe >>>>> keeping the torture test running helps trigger it. >>>> Beginners error! :-P The power supply is indeed critical, but the >>>> stepdown converter on my BeBoPr is dimensioned for at least 2A and >>>> hasn't failed me yet. >>>> >>>> I think that running linuxcnc is mandatory for the lockup. After a dozen >>>> runs, it looks like I can reproduce the lockup with 100% certainty >>>> within one hour. >>>> Using the JTAG interface to attach a debugger to the Bone, I've found >>>> that once stalled the kernel is still running. It looks like it won't >>>> schedule properly and almost all time is spent in the cpu_idle thread. >>> >>> This is typical of a tsc emulation or timer issue. On a system without >>> anything running, please let the "tsc -w" command run. It will take some >>> time to run (the wrap time of the hardware timer used for tsc >>> emulation), if it runs correctly, then you need to check whether the >>> timer is still running when the bug happens (cat /proc/xenomai/irq >>> should continue increasing when for instance the latency test is >>> running). If the timer is stopped, it may have been programmed for a too >>> short delay, to avoid that, you can try: >>> - increasing the ipipe_timer min_delay_ticks member (by default, it uses >>> a value corresponding to the min_delta_ns member in the clockevent >>> structure); >>> - checking after programming the timer (in the set_next_event method) if >>> the timer counter is already 0, in which case you can return a negative >>> value, usually -ETIME. >>> >> >> Hi Gilles, >> >> Thanks for the swift reply. >> >> As far as I can see, tsc -w runs without an error: >> >> ARM: counter wrap time: 179 seconds >> Checking tsc for 6 minute(s) >> min: 5, max: 12, avg: 5.04168 >> ... >> min: 5, max: 6, avg: 5.03771 >> min: 5, max: 28, avg: 5.03989 -> 0.209995 us >> >> real 6m0.284s >> >> I've also done the other regression tests and all were successful. >> >> Problem is that once the bug happens I won't be able to issue the cat >> command. >> I've fixed my debug setup so I don't have to use the System.map to >> manually translate the debugger addresses : / >> Now I'm waiting for another lockup to see what's happening. > > > You may want to have a look at the xeno-regression-test script to put > your system under pressure (and likely generate the lockup faster).
running tsc -w and xeno-regression-test in parallel I get errors like so (not on every run; no lockup so far): ++ /usr/xenomai/bin/mutex-torture-native simple_wait recursive_wait timed_mutex mode_switch pi_wait lock_stealing NOTE: lock_stealing mutex_trylock: not supported deny_stealing simple_condwait recursive_condwait auto_switchback FAILURE: current prio (0) != expected prio (2) dmesg [501963.390598] Xenomai: native: cleaning up mutex "" (ret=0). [502170.164984] usb 1-1: reset high-speed USB device number 2 using musb-hdrc on another run, I got a segfault while running sigdebug: ++ /usr/xenomai/bin/regression/native/sigdebug mayday page starting at 0x400eb000 [/dev/rtheap] mayday code: 0c 00 9f e5 0c 70 9f e5 00 00 00 ef 00 00 a0 e3 00 00 80 e5 2b 02 00 0a 42 00 0f 00 db d7 ee b8 mlockall syscall signal relaxed mutex owner page fault watchdog ./xeno-regression-test: line 53: 4210 Segmentation fault /usr/xenomai/bin/regression/native/sigdebug root@bb1:/usr/xenomai/bin# dmesg [502442.312996] Xenomai: watchdog triggered -- signaling runaway thread 'rt_task' [502443.054186] Xenomai: native: cleaning up mutex "prio_invert" (ret=0). [502443.055730] Xenomai: native: cleaning up sem "send_signal" (ret=0). [502518.134977] usb 1-1: reset high-speed USB device number 2 using musb-hdrc unsure what to make of it - any suggestions? the usb reset looks suspicious - Michael _______________________________________________ Xenomai mailing list [email protected] http://www.xenomai.org/mailman/listinfo/xenomai
