On 01/19/2013 03:09 PM, Michael Haberler wrote: > > Am 19.01.2013 um 14:29 schrieb Gilles Chanteperdrix: > >> On 01/17/2013 02:30 PM, Bas Laarhoven wrote: >> >>> On 17-1-2013 9:53, Gilles Chanteperdrix wrote: >>>> On 01/17/2013 08:59 AM, Bas Laarhoven wrote: >>>> >>>>> On 16-1-2013 20:36, Michael Haberler wrote: >>>>>> Am 16.01.2013 um 17:45 schrieb Bas Laarhoven: >>>>>> >>>>>>> On 16-1-2013 15:15, Michael Haberler wrote: >>>>>>>> ARM work: >>>>>>>> >>>>>>>> Several people have been able to get the Beaglebone ubuntu/xenomai >>>>>>>> setup working as outlined here: >>>>>>>> http://wiki.linuxcnc.org/cgi-bin/wiki.pl?BeagleboneDevsetup >>>>>>>> I have updated the kernel and rootfs image a few days ago so the >>>>>>>> kernel includes ext2/3/4 support compiled in, which should take care >>>>>>>> of two failure reports I got. >>>>>>>> >>>>>>>> Again that xenomai kernel is based on 3.2.21; it works very stable for >>>>>>>> me but there have been several reports of 'sudden stops'. The BB is a >>>>>>>> bit sensitive to power fluctuations but it might be more than that. As >>>>>>>> for that kernel, it works, but it is based on a branch which will see >>>>>>>> no further development. It supports most of the stuff needed to >>>>>>>> development; there might be some patches coming from more active BB >>>>>>>> users than me. >>>>>>> Hi Michael, >>>>>>> >>>>>>> Are you saying you don't have seen these 'sudden stops' yourself? >>>>>> No, never, after swapping to stronger power supplies; I have two of >>>>>> these boards running over NFS all the time. I dont have Linuxcnc running >>>>>> on them though, I'll do that and see if that changes the picture. Maybe >>>>>> keeping the torture test running helps trigger it. >>>>> Beginners error! :-P The power supply is indeed critical, but the >>>>> stepdown converter on my BeBoPr is dimensioned for at least 2A and >>>>> hasn't failed me yet. >>>>> >>>>> I think that running linuxcnc is mandatory for the lockup. After a dozen >>>>> runs, it looks like I can reproduce the lockup with 100% certainty >>>>> within one hour. >>>>> Using the JTAG interface to attach a debugger to the Bone, I've found >>>>> that once stalled the kernel is still running. It looks like it won't >>>>> schedule properly and almost all time is spent in the cpu_idle thread. >>>> >>>> This is typical of a tsc emulation or timer issue. On a system without >>>> anything running, please let the "tsc -w" command run. It will take some >>>> time to run (the wrap time of the hardware timer used for tsc >>>> emulation), if it runs correctly, then you need to check whether the >>>> timer is still running when the bug happens (cat /proc/xenomai/irq >>>> should continue increasing when for instance the latency test is >>>> running). If the timer is stopped, it may have been programmed for a too >>>> short delay, to avoid that, you can try: >>>> - increasing the ipipe_timer min_delay_ticks member (by default, it uses >>>> a value corresponding to the min_delta_ns member in the clockevent >>>> structure); >>>> - checking after programming the timer (in the set_next_event method) if >>>> the timer counter is already 0, in which case you can return a negative >>>> value, usually -ETIME. >>>> >>> >>> Hi Gilles, >>> >>> Thanks for the swift reply. >>> >>> As far as I can see, tsc -w runs without an error: >>> >>> ARM: counter wrap time: 179 seconds >>> Checking tsc for 6 minute(s) >>> min: 5, max: 12, avg: 5.04168 >>> ... >>> min: 5, max: 6, avg: 5.03771 >>> min: 5, max: 28, avg: 5.03989 -> 0.209995 us >>> >>> real 6m0.284s >>> >>> I've also done the other regression tests and all were successful. >>> >>> Problem is that once the bug happens I won't be able to issue the cat >>> command. >>> I've fixed my debug setup so I don't have to use the System.map to >>> manually translate the debugger addresses : / >>> Now I'm waiting for another lockup to see what's happening. >> >> >> You may want to have a look at the xeno-regression-test script to put >> your system under pressure (and likely generate the lockup faster). > > running tsc -w and xeno-regression-test in parallel I get errors like so (not > on every run; no lockup so far): > > ++ /usr/xenomai/bin/mutex-torture-native > simple_wait > recursive_wait > timed_mutex > mode_switch > pi_wait > lock_stealing > NOTE: lock_stealing mutex_trylock: not supported > deny_stealing > simple_condwait > recursive_condwait > auto_switchback > FAILURE: current prio (0) != expected prio (2) > > dmesg > [501963.390598] Xenomai: native: cleaning up mutex "" (ret=0). > [502170.164984] usb 1-1: reset high-speed USB device number 2 using musb-hdrc > > on another run, I got a segfault while running sigdebug: > ++ /usr/xenomai/bin/regression/native/sigdebug > mayday page starting at 0x400eb000 [/dev/rtheap] > mayday code: 0c 00 9f e5 0c 70 9f e5 00 00 00 ef 00 00 a0 e3 00 00 80 e5 2b > 02 00 0a 42 00 0f 00 db d7 ee b8 > mlockall > syscall > signal > relaxed mutex owner > page fault > watchdog > ./xeno-regression-test: line 53: 4210 Segmentation fault > /usr/xenomai/bin/regression/native/sigdebug > > root@bb1:/usr/xenomai/bin# dmesg > [502442.312996] Xenomai: watchdog triggered -- signaling runaway thread > 'rt_task' > [502443.054186] Xenomai: native: cleaning up mutex "prio_invert" (ret=0). > [502443.055730] Xenomai: native: cleaning up sem "send_signal" (ret=0). > [502518.134977] usb 1-1: reset high-speed USB device number 2 using musb-hdrc > > > unsure what to make of it - any suggestions? the usb reset looks suspicious
What version of xenomai are you using? These look like old issues? -- Gilles. _______________________________________________ Xenomai mailing list Xenomai@xenomai.org http://www.xenomai.org/mailman/listinfo/xenomai