Am 03.11.2010 12:44, Anders Blomdell wrote: > Anders Blomdell wrote: >> Jan Kiszka wrote: >>> Am 01.11.2010 17:55, Anders Blomdell wrote: >>>> Jan Kiszka wrote: >>>>> Am 28.10.2010 11:34, Anders Blomdell wrote: >>>>>> Jan Kiszka wrote: >>>>>>> Am 28.10.2010 09:34, Anders Blomdell wrote: >>>>>>>> Anders Blomdell wrote: >>>>>>>>> Anders Blomdell wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm trying to use rt_eepro100, for sending raw ethernet packets, >>>>>>>>>> but I'm >>>>>>>>>> experincing occasionally weird behaviour. >>>>>>>>>> >>>>>>>>>> Versions of things: >>>>>>>>>> >>>>>>>>>> linux-2.6.34.5 >>>>>>>>>> xenomai-2.5.5.2 >>>>>>>>>> rtnet-39f7fcf >>>>>>>>>> >>>>>>>>>> The testprogram runs on two computers with "Intel Corporation >>>>>>>>>> 82557/8/9/0/1 Ethernet Pro 100 (rev 08)" controller, where one >>>>>>>>>> computer >>>>>>>>>> acts as a mirror sending back packets received from the ethernet >>>>>>>>>> (only >>>>>>>>>> those two computers on the network), and the other sends >>>>>>>>>> packets and >>>>>>>>>> measures roundtrip time. Most packets comes back in approximately >>>>>>>>>> 100 >>>>>>>>>> us, but occasionally the reception times out (once in about >>>>>>>>>> 100000 >>>>>>>>>> packets or more), but the packets gets immediately received when >>>>>>>>>> reception is retried, which might indicate a race between >>>>>>>>>> rt_dev_recvmsg >>>>>>>>>> and interrupt, but I might miss something obvious. >>>>>>>>> Changing one of the ethernet cards to a "Intel Corporation 82541PI >>>>>>>>> Gigabit Ethernet Controller (rev 05)", while keeping everything >>>>>>>>> else >>>>>>>>> constant, changes behavior somewhat; after receiving a few 100000 >>>>>>>>> packets, reception stops entirely (-EAGAIN is returned), while >>>>>>>>> transmission proceeds as it should (and mirror returns packets). >>>>>>>>> >>>>>>>>> Any suggestions on what to try? >>>>>>>> Since the problem disappears with 'maxcpus=1', I suspect I have >>>>>>>> a SMP >>>>>>>> issue (machine is a Core2 Quad), so I'll move to xenomai-core. >>>>>>>> (original message can be found at >>>>>>>> http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se >>>>>>>> >>>>>>>> >>>>>>>> ) >>>>>>>> >>>>>>>> Xenomai-core gurus: which is the corrrect way to debug SMP issues? >>>>>>>> Can I run I-pipe-tracer and expect to be able save at least 150 >>>>>>>> us of >>>>>>>> traces for all cpus? Any hints/suggestions/insigths are welcome... >>>>>>> The i-pipe tracer unfortunately only saves traces for a the CPU that >>>>>>> triggered the freeze. To have a full pictures, you may want to >>>>>>> try my >>>>>>> ftrace port I posted recently for 2.6.35. >>>>>> 2.6.35.7 ? >>>>>> >>>>> Exactly. >>>> Finally managed to get the ftrace to work >>>> (one possible bug: had to manually copy >>>> include/xenomai/trace/xn_nucleus.h to >>>> include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be >>>> very useful... >>>> >>>> But I don't think it will give much info at the moment, since no >>>> xenomai/ipipe interrupt activity shows up, and adding that is far above >>>> my league :-( >>> >>> You could use the function tracer, provided you are able to stop the >>> trace quickly enough on error. >>> >>>> My current theory is that the problem occurs when something like this >>>> takes place: >>>> >>>> CPU-i CPU-j CPU-k CPU-l >>>> >>>> rt_dev_sendmsg >>>> xmit_irq >>>> rt_dev_recvmsg recv_irq >>> >>> Can't follow. When races here, and what will go wrong then? >> Thats the good question. Find attached: >> >> 1. .config (so you can check for stupid mistakes) >> 2. console log >> 3. latest version of test program >> 4. tail of ftrace dump >> >> These are the xenomai tasks running when the test program is active: >> >> CPU PID CLASS PRI TIMEOUT TIMEBASE STAT NAME >> 0 0 idle -1 - master R ROOT/0 >> 1 0 idle -1 - master R ROOT/1 >> 2 0 idle -1 - master R ROOT/2 >> 3 0 idle -1 - master R ROOT/3 >> 0 0 rt 98 - master W rtnet-stack >> 0 0 rt 0 - master W rtnet-rtpc >> 0 29901 rt 50 - master raw_test >> 0 29906 rt 0 - master X reporter >> >> >> >> The lines of interest from the trace are probably: >> >> [003] 2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00 >> thread_name=rtnet-stack mask=2 >> [003] 2061.347862: xn_nucleus_sched: status=2000000 >> [000] 2061.347866: xn_nucleus_sched_remote: status=0 >> >> since this is the only place where a packet gets delayed, and the only >> place in the trace where sched_remote reports a status=0 > Since the cpu that has rtnet-stack and hence should be resumed is doing > heavy I/O at the time of fault; could it be that > send_ipi/schedule_handler needs barriers to make sure taht decisions are > made on the right status?
That was my first idea as well - but we should run all relevant code under nklock here. But please correct me if I miss something. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core