Re: RTNet: sendto(): EAGAIN error

Jan Kiszka via Xenomai Mon, 23 May 2022 23:33:48 -0700

On 13.05.22 14:51, Mauro S. via Xenomai wrote:
> Il 05/05/22 17:04, Mauro S. via Xenomai ha scritto:
>> Il 05/05/22 15:05, Jan Kiszka ha scritto:
>>> On 03.05.22 17:18, Mauro S. via Xenomai wrote:
>>>> Hi all,
>>>>
>>>> I'm trying to use RTNet with TDMA.
>>>>
>>>> I succesfully set up my bus:
>>>>
>>>> - 1GBps speed
>>>> - 3 devices
>>>> - cycle time 1ms
>>>> - timeslots with 200us offset
>>>>
>>>> I wrote a simple application that in parallel receives and sends UDP
>>>> packets on TDMA bus.
>>>>
>>>> - sendto() is done to the broadcast address, port 1111
>>>> - recvfrom() is done on the port 1111
>>>>
>>>> Application sends a small packet (5 bytes) in a periodic task with 1ms
>>>> period and prio 51. Receive is done in a non-periodic task with prio
>>>> 50.
>>>>
>>>> Application is running on all the three devices, and I can see packets
>>>> are sent and received correctly by all the devices.
>>>>
>>>> But after a while, all send() calls on all devices fails with error
>>>> EAGAIN.
>>>>
>>>> Could this error be related to some internal buffer/queue that becomes
>>>> full? Or am I missing something?
>>>
>>> When you get EAGAIN on sender side, cleanup of TX buffers likely failed,
>>> and the socket ran out of buffers to send further frames. That may be
>>> related to TX IRQs not making it. Check the TX IRQ counter on the
>>> sender, if it increases at the same pace as you send packets.
>>>
>>> Jan
>>>
>>
>> Thanks Jan for your fast answer.
>>
>> I forgot to mention that I'm using the rt_igb driver.
>>
>> I have only one IRQ field in /proc/xenomai/irq, counting both TX and RX
>>
>>   cat /proc/xenomai/irq | grep rteth0
>>    125:         0           0     2312152         0       rteth0-TxRx-0
>>
>> I did this test:
>>
>> * on the master I send a packet every 1ms in a periodic RT task
>> (period 1ms, prio 51) with my test app.
>>
>> * on the master I see an increment of about 2000 IRQs per second: I
>> guess 1000 are for my sent packets (1 packet every ms), and 1000 for
>> the TDMA sync packet. In fact I see the "rtifconfig" RX counter almost
>> stationary (only 8 packets every 2-3 seconds, refresh requests from
>> slaves?), TX counter incrementing in about 2000 packets per second.
>>
>> * on the two slaves (thet are running nothing) I observe the same rate
>> (about 2000 IRQs per second). I see the "rtifconfig" TX counter almost
>> stationary (only 4 packets every 2-3 seconds), RX counter incrementing
>> in about 2000 packets per second.
>>
>> * if I stop sending packets with my app, I can see all the rates at
>> about 1000 per second
>>
>> If I start send-receive on all the three devices, I can see a IRQ rate
>> around 4000 IRQs per second on all devices (1000 sync, 1000 send and
>> 1000 + 1000 receive).
>>
>> I observed that if I only send from master and receive on slaves the
>> problem does not appear. Or if I send/receive from all, but with a
>> packet every 2ms, the problem does not appear.
>>
>> Could be a CPU performance problem (4k IRQs per second are too much
>> for an Intel Atom x5-E8000 CPU @ 1.04GHz)?
>>
>>
>> Thanks in advance, regards
>>
> 
> Hi all,
> 
> I did further tests.
> 
> First of all I modified my code to wait the TDMA sync event before do a
> send. I'm doing it with RTMAC_RTIOC_WAITONCYCLE ioctl (the .h file that
> defines it is not exported in userland, I need to copy
> kernel/drivers/net/stack/include/rtmac.h file in my project dir to
> include it).
> 
> I send one broadcast packet each TDMA cycle (1ms) from each device
> (total 3 devices), and each device also receive the packets from the
> other two (I use two different sockets to send and receive).
> 
> The first problem that I detected is that the EAGAIN error happens
> anyway (only with less frequency): I expected to have this error
> disappearing, since I send one packet synced with TDMA cycle time, then
> the rtskbs queue should remain empty (or at most with a single packet
> queued). I tried to change the cycle time (2ms, then 4ms) but the
> problem remains.
> 
> The only mode that seems to don't have EAGAIN error (or at least have it
> really less frequently) is to send the packet every two TDMA cycles,
> independently of the cycle duration (1ms, 2ms, 4ms...).
> 
> Am I missing something?
> 
> Are there any benchmarks/use cases using TDMA in this manner?
> 
> The second problem that happened to me is that sometimes one slave
> stopped to send/receive packets.
> Send is blocked in RTMAC_RTIOC_WAITONCYCLE, recv does receive nothing.
> When the lock happens, rtifconfig shows dropped and overruns counters
> incrementing with the TDMA cycle rate (e.g. 250 for 4ms cycle): seems
> that the RX queue is completely locked. Dmesg does not show errors and
> /proc/xenomai/irq shows that IRQ counter is still (1 irq each 2-3
> seconds). A "rtnet stop && rtnet start" recovers from this situation.
> The strangeness is that the problematic device is always the same.
> Trying a different switch the problem disappears. Could be a problem
> caused by some switch buffering?
>


Hmm, my first try then would be using a cross-link between two nodes and
see if the issue is gone. If so, there is very likely some issue in the
compatibility of the hardware and/or the current driver version. Keep in
mind that the RTnet drivers are all aging.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux

Re: RTNet: sendto(): EAGAIN error

Reply via email to