Am 25.03.2011 08:23, schrieb Jan Kiszka:
> On 2011-03-24 17:28, Vinzenz Bargsten wrote:
>> Am 08.03.2011 21:59, schrieb Jan Kiszka:
>>> On 2011-03-08 21:21, Vinzenz Bargsten wrote:
>>>>>>>> Smells like IRQ conflict in line 17. What devices are using it?
>>>>>>>> Check
>>>>>>>> /proc/interrupt or lspci.
>>>>>>>>
>>>>>>>>
>>>>>>> The real-time nic was using IRQ 16, see
>>>>>>>
>>>>>>> Feb 28 14:37:12 robot02 kernel: [ 2700.099343] rt_8139too
>>>>>>> 0000:03:00.0:
>>>>>>> PCI->APIC IRQ transform: INT A ->    IRQ 16
>>>>>>> You are right, IRQ 16 was shared with a usb controller.
>>>>>>>
>>>>>>> As the 2nd (non-rt) 8139 nic had it's own IRQ 17,
>>>>>>> I tried to use that instead and just swapped the cables. The
>>>>>>> problem occured, too.
>>>>>>>
>>>>>>> Then I swapped the cards in the PCI slots and the situation got
>>>>>>> worse.
>>>>>>> See list of interrupts attached (interrupts.txt)
>>>>>>>
>>>>>> Both 8139 cards are of the same type. What were you steps to ensure
>>>>>> that
>>>>>> the right driver handles the right card?
>>>>>>
>>>>> - I use the cards parameter (well, that doesn't really ensure)
>>>>> - the cards and cables are labeled (also with mac addresses)
>>>>> - If I use the wrong interface, there shouldn't be any communication
>>>>> since my
>>>>> program only uses the rt interface
>>>>> - one card's mac address is registered in the uni network, that's why I
>>>>> swapped the cards pci slot rather than just using the other one
>>>>>
>>>> The problem also occurs with only one 8139 card, which has its own
>>>> irq. I could
>>>> not reproduce the PCI Bus errors.
>>>>
>>>> I'm letting rtping running in the background, did not get that
>>>> problem then (so
>>>> far).
>>>> Maybe unrelated or a problem of the remote side, I see periodically
>>>> high ping
>>>> times, see attachment.
>>> If you have an IRQ conflict (which I still assume) but you still have
>>> valid IRQ reason at a sufficiently high rate, the system will
>>> continuously recover. Only if a large number of unhandled IRQs were
>>> received, Xenomai will disable the chatty line.
>>>
>>>>> I also have a tulip card available, but if I remember correctly,
>>>>> loading the
>>>>> rt_tulip module locked up the pc. I can check again, though.
>>>>> I can also try to remove one of the 8139 nics.
>>>> The driver is loaded but no communication is possible. In wireshark
>>>> 3-4 outgoing
>>>> rtping packets show up, then rtping causes a "ioctl: No buffer space
>>>> available".
>>>> Nothing is received.
>>> The fact that two different adapter show similar issues (IRQs are not
>>> properly delivered, which will exhaust buffer resources sooner or later)
>>> indicate, that the problem is not the adapter but likely the system with
>>> its IRQ routing.
>>>
>>>> Can someone recommend a PCI network card which works well with recent
>>>> RTNet/
>>>> kernels / hardware? I thought about a Intel Pro 1000 or 100 PCI as
>>>> well as
>>>> via_rhine cards, which are also quite common.
>>>>
>>> On x86, a good way to avoid IRQ issues is to pick a recent supported
>>> card with MSI(-X) support. rt_igb is a known-to-work example.
>>>
>>> Jan
>> The setup changed in the following way:
>> - I do not use any 8139too card anymore.
>> - Instead, I use an igb card (Intel E1G42ET); to free the PCIe-x16 slot
>> I use a PCI graphics card.
>>
>> The problem changed in the following way:
>> - The communication does not stop, but I encountered a
>> - delayed output packet and 2 or 3 consecutive input packets are not
>> received (do not show up in wireshark).
>>     I think they are on the line / sent by the remote station but it is
>> difficult to proove.
>>    Actually it counts delayed (or non-answered) packets and the counter
>> increases to 3.
>> - The rtping response times are similar (maybe a bit less often), i.e.
>> up ~1000µs every ~100s.
>>     Not sure if this is caused by the remote station.
>>
>> What do you recommend to resolve these issues?
> - Does Xenomai's tests work flawlessly on your box? Check e.g. latency.
As far as I can interpret it, the latency test is successful.
Considered SMI problems as cause, but I do not find any indications 
(Xenomais SMI detection is enabled and I do not see any messages).

> - Does the igb use MSI-X? Check the kernel log and your kernel .config
>    (CONFIG_PCI_MSI).
Yes, it is enabled. /proc/interrupts also shows that the card uses 
several MSI-X interrupts.
Btw., I do not understand the xenomai FAQ:
"-    CONFIG_PCI_MSI
The I-pipe patch currently does not support MSI interrupts. For more 
details, see the thread at: 
https://mail.gna.org/public/adeos-main/2008-08/threads.html "
Is it outdated?!

> - If you don't trust the peer station, use a know-to-work Linux box
>    without RTnet for the tests.
I did that for the stuck packets when I used the 8139too card to verify 
that actually no packets are received (result: nothing was received any 
longer).

Now I checked the cause of the ping latencies; they are definitely 
caused by the remote machine.
Using a non-rt Linux PC as remote, the ping latency is constantly ca. 70µs.

I was already confident that the problems were fixed, but encountered 
the stuck-packet problem again after some hours of measurement,
after that in shorter intervals until I rebooted. I am sure, this is 
still caused by the RTnet box, because: Even restart of the remote program
does not allow a connection anymore in this stuck condition and if 
rtping "freed" the connection, some old packets show up in wireshark.

Vinzenz

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
RTnet-users mailing list
RTnet-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rtnet-users

Reply via email to