Hi Alan,

Try increasing the TX ring buffer for the network interface and make sure
the CPU governor is not throttling the CPU (i.e. set to "performance" and
not "on demand" or "powersave").

The samples per packet for TX was reduced because the larger frame size was
actually resulting in even more underruns in testing.  That does mean more
smaller packets are creating more load on the CPU to process the network
interrupts and the above settings will help tune the system to better
handle that load.  We are looking at ways to increase the TX frame size
back to where it was and reduce that load, but it will take significant
changes to accomplish that and those changes probably won't be available
for a while.

Regards,
Michael

On Wed, Sep 5, 2018 at 1:22 PM Alan Conrad via USRP-users <
[email protected]> wrote:

> I tried Brian’s suggestion to rebuild UHD and the FPGA off of the commits
> he suggested (thanks Brian).  However, with this combination I am getting
> significantly more underruns than I did previously, even with the benchmark
> rate program.  Here’s the output of benchmark_rate that I got originally
> with UHD 4.0.0.rfnoc-devel-788-g1f8463cc.
>
>
>
> ./benchmark_rate --rx_rate 100e6 --tx_rate 100e6 --channels="0,1"
>
>
>
> Benchmark rate summary:
>
>   Num received samples:     2016651900
>
>   Num dropped samples:      0
>
>   Num overruns detected:    0
>
>   Num transmitted samples:  2005972016
>
>   Num sequence errors (Tx): 0
>
>   Num sequence errors (Rx): 0
>
>   Num underruns detected:   562
>
>   Num late commands:        0
>
>   Num timeouts (Tx):        0
>
>   Num timeouts (Rx):        0
>
>
>
> And now I get this with UHD 3.14.0.HEAD-31-g98057752.
>
>
>
> Benchmark rate summary:
>
>   Num received samples:     2001309816
>
>   Num dropped samples:      0
>
>   Num overruns detected:    0
>
>   Num transmitted samples:  1841996424
>
>   Num sequence errors (Tx): 0
>
>   Num sequence errors (Rx): 0
>
>   Num underruns detected:   353655
>
>   Num late commands:        0
>
>   Num timeouts (Tx):        0
>
>   Num timeouts (Rx):        0
>
>
>
> One difference I did notice between these two versions of UHD is the
> maximum samples per packet returned from the get_max_num_samps() function
> for both the Rx and Tx streams.  With the version from the rfnoc-devel
> branch, I get 1996 samples for both the Rx and Tx streams.  But, the UHD
> 3.14.0 version gives 1996 samples for the Rx stream but only 996 samples
> for the Tx stream.  I’m not sure if this is causing the additional
> underruns.
>
>
>
> In any case, was a change made to limit the number of transmit samples per
> packet?  Are there additional network configurations that I need to make to
> increase the maximum samples per packet for the Tx stream or to limit the
> underruns with these versions of UHD and the FPGA firmware?  BTW, setting
> “spp” in the transmit stream args does not allow more than the 996 samples
> per packet.
>
>
>
> Thanks,
>
>
>
> Alan
>
>
>
>
>
> *From:* Brian Padalino <[email protected]>
> *Sent:* Tuesday, August 28, 2018 8:57 PM
> *To:* Alan Conrad <[email protected]>
> *Cc:* [email protected]
> *Subject:* Re: [USRP-users] Transmit Thread Stuck Receiving Tx Flow
> Control Packets
>
>
>
>
>
> On Tue, Aug 28, 2018 at 4:02 PM Alan Conrad via USRP-users <
> [email protected]> wrote:
>
> Hi All,
>
>
>
> I’ve been working on an application that requires two receive streams and
> two transmit streams, written using the C++ API.  I have run into a problem
> when transmitting packets and I am hoping that someone has seen something
> similar and/or may be able to shed some light on this.
>
>
>
> My application is streaming two receive and two transmit channels, each at
> 100 Msps over dual 10GigE interfaces (NIC is Intel X520-DA2).  I have two
> receive threads, each calling recv() on separate receive streams, and two
> transmit threads each calling send(), also on separate transmit streams.
> Each receive thread copies samples into a large circular buffer.  Each
> transmit thread reads samples from the buffer to be sent in the send()
> call.  So, each receive thread is paired with a transmit thread through a
> shared circular buffer with some mutex locking to prevent simultaneous
> access to shared circular buffer memory.
>
>
>
> I did read in the UHD manual that recv() is not thread safe.  I assumed
> that this meant that recv() is not thread safe when called on the same
> rx_streamer from two different threads but would be ok when called on
> different rx_streamers.  If this is not the case, please let me know.
>
>
>
> On to my problem…
>
>
>
> After running for several minutes, one of the transmit threads will get
> stuck in the send() call.  Using strace to monitor the system calls it
> appears that the thread is in a loop continuously calling the
>
> poll() and recvfrom() system calls from within the UHD API.  Here’s the
> output of strace attached to one of the transmit threads after this has
> occurred.  These are the only two system calls that get logged for the
> transmit thread once this problem occurs.
>
>
>
> 11:19:04.564078 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.664276 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.664381 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.764600 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.764699 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.864906 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
>
>
> This partial stack trace shows that the transmit thread is stuck in the
> while loop in the tx_flow_ctrl() function.  I think this is happening due
> to missed or missing TX flow control packets.
>
>
>
> #0  0x00007fdb8fe4fbf9 in __GI___poll (fds=fds@entry=0x7fdb167fb510,
> nfds=nfds@entry=1, timeout=timeout@entry=100) at
> ../sysdeps/unix/sysv/linux/poll.c:29
>
> #1  0x00007fdb9186de45 in poll (__timeout=100, __nfds=1,
> __fds=0x7fdb167fb510) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
>
> #2  uhd::transport::wait_for_recv_ready (timeout=0.10000000000000001,
> sock_fd=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_common.hpp:59
>
> #3  udp_zero_copy_asio_mrb::get_new (index=@0x55726266f6e8: 28,
> timeout=<optimized out>, this=<optimized out>)
>
>     at /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:79
>
> #4  udp_zero_copy_asio_impl::get_recv_buff (this=0x55726266f670,
> timeout=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:226
>
> #5  0x00007fdb915d48cc in tx_flow_ctrl (fc_cache=..., async_xport=...,
> endian_conv=0x7fdb915df600 <uhd::ntohx<unsigned int>(unsigned int)>,
>
>     unpack=0x7fdb918b1090
> <uhd::transport::vrt::chdr::if_hdr_unpack_be(unsigned int const*,
> uhd::transport::vrt::if_packet_info_t&)>)
>
>     at
> /home/aconrad/rfnoc/src/uhd/host/lib/usrp/device3/device3_io_impl.cpp:345
>
>
>
> The poll() and recvfrom() calls are in the
> udp_zero_copy_asio_mrb::get_new() function in udp_zero_copy.cpp.
>
>
>
> Has anyone seen this problem before or have any suggestions on what else
> to look at to further debug this problem?  I have not yet used Wireshark to
> see what’s happening on the wire, but I’m planning to do that.  Also note
> that, if I run a single transmit/receive pair (instead of two) I don’t see
> this problem and everything works as I expect.
>
>
>
> My hardware is an X310 with the XG firmware and dual SBX-120
> daughterboards.  Here are the software versions I’m using, as displayed by
> the UHD API when the application starts.
>
>
>
> [00:00:00.000049] Creating the usrp device with:
> addr=192.168.30.2,second_addr=192.168.40.2...
>
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_4.0.0.rfnoc-devel-788-g1f8463cc
>
>
>
> The host is a Dell PowerEdge R420 with 24 CPU cores and 24 GB ram.  I
> think the clock speed is a little lower than recommended at 2.7 GHz but
> thought that I could distribute the work load across the various cores to
> account for that.  Also, I have followed the instructions to setup dual 10
> GigE interfaces for the X310 here,
> https://kb.ettus.com/Using_Dual_10_Gigabit_Ethernet_on_the_USRP_X300/X310
> <https://protect-us.mimecast.com/s/MKjlC31KyLFp6Lm5cgYuvL?domain=kb.ettus.com>
> .
>
>
>
> Any help is appreciated.
>
>
>
> I think you're hitting this:
>
>
>
>   https://github.com/EttusResearch/uhd/issues/203
>
>
>
> Which is the same thing that I hit.  I tracked it down to something
> happening in the FPGA with the DMA FIFO.
>
>
>
> I rebuilt my FPGA and UHD off the following commits, which switch over to
> byte based flow control:
>
>
>
>   UHD commit 98057752006b5c567ed331c5b14e3b8a281b83b9
>
>   FPGA commit c7015a9a57a77c0e312f0c56e461ac479cf7f1e9
>
>
>
> And the problem disappeared for the time being.  The infinite loop still
> exists as a potential issue, but it seemed whatever was causing the lockup
> in the DMA FIFO disappeared or at least couldn't be reproduced.
>
>
>
> Give that a shot and see if it works for you, or if you can still
> reproduce it?  We never got to the root cause of the problem.
>
>
>
> Brian
> _______________________________________________
> USRP-users mailing list
> [email protected]
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
_______________________________________________
USRP-users mailing list
[email protected]
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to