Am 25.03.2011 08:23, schrieb Jan Kiszka: > On 2011-03-24 17:28, Vinzenz Bargsten wrote: >> Am 08.03.2011 21:59, schrieb Jan Kiszka: >>> On 2011-03-08 21:21, Vinzenz Bargsten wrote: >>>>>>>> Smells like IRQ conflict in line 17. What devices are using it? >>>>>>>> Check >>>>>>>> /proc/interrupt or lspci. >>>>>>>> >>>>>>>> >>>>>>> The real-time nic was using IRQ 16, see >>>>>>> >>>>>>> Feb 28 14:37:12 robot02 kernel: [ 2700.099343] rt_8139too >>>>>>> 0000:03:00.0: >>>>>>> PCI->APIC IRQ transform: INT A -> IRQ 16 >>>>>>> You are right, IRQ 16 was shared with a usb controller. >>>>>>> >>>>>>> As the 2nd (non-rt) 8139 nic had it's own IRQ 17, >>>>>>> I tried to use that instead and just swapped the cables. The >>>>>>> problem occured, too. >>>>>>> >>>>>>> Then I swapped the cards in the PCI slots and the situation got >>>>>>> worse. >>>>>>> See list of interrupts attached (interrupts.txt) >>>>>>> >>>>>> Both 8139 cards are of the same type. What were you steps to ensure >>>>>> that >>>>>> the right driver handles the right card? >>>>>> >>>>> - I use the cards parameter (well, that doesn't really ensure) >>>>> - the cards and cables are labeled (also with mac addresses) >>>>> - If I use the wrong interface, there shouldn't be any communication >>>>> since my >>>>> program only uses the rt interface >>>>> - one card's mac address is registered in the uni network, that's why I >>>>> swapped the cards pci slot rather than just using the other one >>>>> >>>> The problem also occurs with only one 8139 card, which has its own >>>> irq. I could >>>> not reproduce the PCI Bus errors. >>>> >>>> I'm letting rtping running in the background, did not get that >>>> problem then (so >>>> far). >>>> Maybe unrelated or a problem of the remote side, I see periodically >>>> high ping >>>> times, see attachment. >>> If you have an IRQ conflict (which I still assume) but you still have >>> valid IRQ reason at a sufficiently high rate, the system will >>> continuously recover. Only if a large number of unhandled IRQs were >>> received, Xenomai will disable the chatty line. >>> >>>>> I also have a tulip card available, but if I remember correctly, >>>>> loading the >>>>> rt_tulip module locked up the pc. I can check again, though. >>>>> I can also try to remove one of the 8139 nics. >>>> The driver is loaded but no communication is possible. In wireshark >>>> 3-4 outgoing >>>> rtping packets show up, then rtping causes a "ioctl: No buffer space >>>> available". >>>> Nothing is received. >>> The fact that two different adapter show similar issues (IRQs are not >>> properly delivered, which will exhaust buffer resources sooner or later) >>> indicate, that the problem is not the adapter but likely the system with >>> its IRQ routing. >>> >>>> Can someone recommend a PCI network card which works well with recent >>>> RTNet/ >>>> kernels / hardware? I thought about a Intel Pro 1000 or 100 PCI as >>>> well as >>>> via_rhine cards, which are also quite common. >>>> >>> On x86, a good way to avoid IRQ issues is to pick a recent supported >>> card with MSI(-X) support. rt_igb is a known-to-work example. >>> >>> Jan >> The setup changed in the following way: >> - I do not use any 8139too card anymore. >> - Instead, I use an igb card (Intel E1G42ET); to free the PCIe-x16 slot >> I use a PCI graphics card. >> >> The problem changed in the following way: >> - The communication does not stop, but I encountered a >> - delayed output packet and 2 or 3 consecutive input packets are not >> received (do not show up in wireshark). >> I think they are on the line / sent by the remote station but it is >> difficult to proove. >> Actually it counts delayed (or non-answered) packets and the counter >> increases to 3. >> - The rtping response times are similar (maybe a bit less often), i.e. >> up ~1000µs every ~100s. >> Not sure if this is caused by the remote station. >> >> What do you recommend to resolve these issues? > - Does Xenomai's tests work flawlessly on your box? Check e.g. latency. As far as I can interpret it, the latency test is successful. Considered SMI problems as cause, but I do not find any indications (Xenomais SMI detection is enabled and I do not see any messages).
> - Does the igb use MSI-X? Check the kernel log and your kernel .config > (CONFIG_PCI_MSI). Yes, it is enabled. /proc/interrupts also shows that the card uses several MSI-X interrupts. Btw., I do not understand the xenomai FAQ: "- CONFIG_PCI_MSI The I-pipe patch currently does not support MSI interrupts. For more details, see the thread at: https://mail.gna.org/public/adeos-main/2008-08/threads.html " Is it outdated?! > - If you don't trust the peer station, use a know-to-work Linux box > without RTnet for the tests. I did that for the stuck packets when I used the 8139too card to verify that actually no packets are received (result: nothing was received any longer). Now I checked the cause of the ping latencies; they are definitely caused by the remote machine. Using a non-rt Linux PC as remote, the ping latency is constantly ca. 70µs. I was already confident that the problems were fixed, but encountered the stuck-packet problem again after some hours of measurement, after that in shorter intervals until I rebooted. I am sure, this is still caused by the RTnet box, because: Even restart of the remote program does not allow a connection anymore in this stuck condition and if rtping "freed" the connection, some old packets show up in wireshark. Vinzenz ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ RTnet-users mailing list RTnet-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rtnet-users