Re: [Xenomai-core] Potential problem with rt_eepro100

Anders Blomdell Wed, 03 Nov 2010 04:46:02 -0700

Anders Blomdell wrote:

Jan Kiszka wrote:

Am 01.11.2010 17:55, Anders Blomdell wrote:

Jan Kiszka wrote:

Am 28.10.2010 11:34, Anders Blomdell wrote:

Jan Kiszka wrote:

Am 28.10.2010 09:34, Anders Blomdell wrote:

Anders Blomdell wrote:

Anders Blomdell wrote:

Hi,


I'm trying to use rt_eepro100, for sending raw ethernet packets,
but I'm
experincing occasionally weird behaviour.

Versions of things:

  linux-2.6.34.5
  xenomai-2.5.5.2
  rtnet-39f7fcf

The testprogram runs on two computers with "Intel Corporation
82557/8/9/0/1 Ethernet Pro 100 (rev 08)" controller, where one
computer
acts as a mirror sending back packets received from the ethernet
(only

those two computers on the network), and the other sendspackets and

measures roundtrip time. Most packets comes back in approximately
100
us, but occasionally the reception times out (once in about 100000
packets or more), but the packets gets immediately received when
reception is retried, which might indicate a race between
rt_dev_recvmsg
and interrupt, but I might miss something obvious.

Changing one of the ethernet cards to a "Intel Corporation 82541PI

Gigabit Ethernet Controller (rev 05)", while keeping everythingelse

constant, changes behavior somewhat; after receiving a few 100000
packets, reception stops entirely (-EAGAIN is returned), while
transmission proceeds as it should (and mirror returns packets).

Any suggestions on what to try?

Since the problem disappears with 'maxcpus=1', I suspect I have aSMP

issue (machine is a Core2 Quad), so I'll move to xenomai-core.
(original message can be found at

http://sourceforge.net/mailarchive/message.php?msg_name=4CC82C8D.3080808%40control.lth.se


)

Xenomai-core gurus: which is the corrrect way to debug SMP issues?

Can I run I-pipe-tracer and expect to be able save at least 150us of

traces for all cpus? Any hints/suggestions/insigths are welcome...

The i-pipe tracer unfortunately only saves traces for a the CPU that
triggered the freeze. To have a full pictures, you may want to try my
ftrace port I posted recently for 2.6.35.

2.6.35.7 ?

Exactly.

Finally managed to get the ftrace to work
(one possible bug: had to manually copy
include/xenomai/trace/xn_nucleus.h to
include/xenomai/trace/events/xn_nucleus.h), and it looks like it can be
very useful...

But I don't think it will give much info at the moment, since no
xenomai/ipipe interrupt activity shows up, and adding that is far above
my league :-(


You could use the function tracer, provided you are able to stop the
trace quickly enough on error.

My current theory is that the problem occurs when something like this
takes place:

  CPU-i        CPU-j        CPU-k        CPU-l

rt_dev_sendmsg
        xmit_irq
rt_dev_recvmsg            recv_irq


Can't follow. When races here, and what will go wrong then?

Thats the good question. Find attached:

1. .config (so you can check for stupid mistakes)
2. console log
3. latest version of test program
4. tail of ftrace dump

These are the xenomai tasks running when the test program is active:

CPU  PID    CLASS  PRI      TIMEOUT   TIMEBASE   STAT       NAME
  0  0      idle    -1      -         master     R          ROOT/0
  1  0      idle    -1      -         master     R          ROOT/1
  2  0      idle    -1      -         master     R          ROOT/2
  3  0      idle    -1      -         master     R          ROOT/3
  0  0      rt      98      -         master     W          rtnet-stack
  0  0      rt       0      -         master     W          rtnet-rtpc
  0  29901  rt      50      -         master                raw_test
  0  29906  rt       0      -         master     X          reporter



The lines of interest from the trace are probably:

[003] 2061.347855: xn_nucleus_thread_resume: thread=f9bf7b00thread_name=rtnet-stack mask=2

[003]  2061.347862: xn_nucleus_sched: status=2000000
[000]  2061.347866: xn_nucleus_sched_remote: status=0

since this is the only place where a packet gets delayed, and the onlyplace in the trace where sched_remote reports a status=0

Since the cpu that has rtnet-stack and hence should be resumed is doingheavy I/O at the time of fault; could it be thatsend_ipi/schedule_handler needs barriers to make sure taht decisions aremade on the right status?


/Anders


_______________________________________________
Xenomai-core mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] Potential problem with rt_eepro100

Reply via email to