On Wed, Dec 11, 2019 at 9:33 AM Nate Temple <[email protected]> wrote:
> Hi Thomas, > > You will need to apply these changes below to the > fpga-src/usrp3/top/x300/rfnoc_ce_default_inst_x310.v file. This will add > additional SRAM FIFOs, which is basically what the "XGS" / SRAM image is. > Make sure to start with the v3.14.1.1 fpga sources. (run git submodule > init; git submodule update; in your UHD repo after checking out v3.14.1.1). > > ######################################################################## > > diff --git a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v > b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v > index d20a64962..bcb4c3c32 100644 > --- a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v > +++ b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v > @@ -1,4 +1,4 @@ > - localparam NUM_CE = 4; // Must be no more than 10 (6 ports taken by > transport and IO connected CEs) > + localparam NUM_CE = 6; // Must be no more than 10 (6 ports taken by > transport and IO connected CEs) > > wire [NUM_CE*64-1:0] ce_flat_o_tdata, ce_flat_i_tdata; > wire [63:0] ce_o_tdata[0:NUM_CE-1], ce_i_tdata[0:NUM_CE-1]; > @@ -46,7 +46,9 @@ > genvar n; > generate > for (n = 4; n < NUM_CE; n = n + 1) begin > - noc_block_axi_fifo_loopback inst_noc_block_axi_fifo_loopback ( > + noc_block_axi_fifo_loopback #( > + .STR_SINK_FIFOSIZE(15) > + ) inst_noc_block_axi_fifo_loopback ( > .bus_clk(bus_clk), .bus_rst(bus_rst), > .ce_clk(ce_clk), .ce_rst(ce_rst), > .i_tdata(ce_o_tdata[n]), .i_tlast(ce_o_tlast[n]), > .i_tvalid(ce_o_tvalid[n]), .i_tready(ce_o_tready[n]), > > ######################################################################## > > > After making these modifications to the FPGA sources, you can build a FPGA > image with the commands: > > cd fpga-src/usrp3/top/x300/ > source setupenv.sh > make X310_XG > > Note: Even though you are calling X310_XG, it is really a "XGS" image > since it has the additional SRAM fifos. > > After that has completed building, you should write that FPGA image to the > X310 using uhd_image_loader. > > uhd_image_lodaer --args "addr=192.168.40.2,type=x300" --fpga-path > /path/to/x300.bit > > After the FPGA image load and restarting the USRP, run uhd_usrp_probe and > at the end of the output where the RFNoC blocks are listed, you should see > two additional FIFO blocks: > > FIFO_0 > FIFO_1 > > > > > > Random performance tuning notes: > > * Ensure your CPU governor is set to performance: > > sudo apt install cpufrequtils > > To set performance for all cores: > > for ((i=0;i<$(nproc);i++)); do sudo cpufreq-set -c $i -r -g performance; > done > > > Verify with: > > cpufreq-info > > * Set your network buffers > > sudo sysctl -w net.core.rmem_max=625000000 > sudo sysctl -w net.core.wmem_max=625000000 > > * Set the MTU to 8000 on your 10Gb NICs > > * Ensure you have pthreads enabled for your user > > https://kb.ettus.com/Building_and_Installing_the_USRP_Open-Source_Toolchain_(UHD_and_GNU_Radio)_on_Linux#Thread_priority_scheduling > > http://files.ettus.com/manual/page_general.html#general_threading > > > * Disable hyper threading in bios. This will typically give about a 10% > boost in core performance if you can work without the additional cores. > You'll need to update your cpu core list in DPDK. > > * Disable KPTI for spectra/meltdown. I would recommend to try disabling > the KPTI protections for your CPU if the machine is offline, you may see a > 10-15% performance increase. > > This can be done by adding the lines below to your /etc/default/grub at > GRUB_CMDLINE_LINUX_DEFAULT="", then running sudo update-grub and rebooting. > > pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier > > Note, this disables protections against Meltdown/Spectra (links below). So > if you try to do this, I would recommend disconnecting that host from any > internet connected network. > > https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability) > https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) > > * There are additional recommendations here from Intel on various > adjustments you can do to improve performance with DPDK: > http://doc.dpdk.org/guides/linux_gsg/nic_perf_intel_platform.html > > Specifically I would recommend to try section 10.1.3 #3 where you isolate > the CPU cores that are used for DPDK. > > * Here is a performance report from Intel on DPDK 17.11: > https://fast.dpdk.org/doc/perf/DPDK_17_11_Intel_NIC_performance_report.pdf > > In the tables of boot and bio's settings the additional CPU options of > nohz_full="" and rcu_nocbs="" are added to their kernel configs, this may > help as well. > > Additionally they made the changes listed below: > > CPU Power and Performance Policy <Performance> (you should already be > doing this) > CPU C-state Disabled > CPU P-state Disabled > Enhanced Intel® Speedstep® Tech Disabled > Turbo Boost Disabled > > > > > Regards, > Nate Temple > > On Wed, Dec 11, 2019 at 9:18 AM Thomas Harder <[email protected]> > wrote: > >> Rob, >> >> I am definitely interested in your costum ‘txarb’ RFNoC block. For now I >> am using tx waveforms of about 10.000 samples, so the 2^15 samples would be >> sufficient. >> >> I was already searching what exactly this SRAM image means. Because today >> I was able to setup DPDK with UHD 3.14.1 and the benchmark_rate >> code(excactly described as in the mail of Nate) was still full of underruns >> with the stock XG fpga image which I downloaded with uhd_images_downloader. >> So I will also try to build a second FIFO block, since I have still for two >> weeks the test version of Vivado. >> >> Thomas >> >> >> >> >> >> *From: *Rob Kossler <[email protected]> >> *Sent: *Wednesday, December 11, 2019 4:50 PM >> *To: *Thomas Harder <[email protected]>; Nate Temple >> <[email protected]> >> *Subject: *Re: [USRP-users] transmitting on two channels with replay >> block >> >> >> >> Thomas, >> >> I believe that Nate and I were saying basically the same thing. When he >> referred to an SRAM image, I believe that this means an image with the FIFO >> blocks. I believe that such an image needs to be built by the user (rather >> than downloaded using uhd_images_downloader), but I'm not 100% certain. >> >> >> >> If you are interested, I have a custom 'txarb' RFNoC block that >> implements my 2nd option below. By default, it includes storage of up to >> 2^15 samples, but this can be modified using an input parameter (FPGA >> resources permitting). This block requires some specialized behavior, but >> it is pretty simple. Similar to the Replay block, you need to construct a >> custom RFNoC graph that connects the txarb block to the Radio. When you >> want to stream, you need to stream just one full waveform to the >> txarb block. Once the txarb block receives end-of-burst, it will >> automatically stop "recording the samples to memory" and begin "playing the >> samples from memory repeatedly". The streaming will continue indefinitely >> until you send a new tx waveform. If the new tx waveform contains less >> than 2 samples, the streaming is turned off. There are no control >> registers to worry about. Timed behavior is supported because the block >> preserves the command time of the incoming stream from the host when it >> starts playing out. >> >> >> >> It is not terribly difficult to build this custom block, but if you >> haven't built out-of-tree RFNOC blocks before, it might be easiest to just >> put this block in-tree (in the Ettus folder structure) and manually modify >> makefiles as needed. Let me know if you are interested. >> >> Rob >> >> >> >> >> >> On Wed, Dec 11, 2019 at 10:07 AM Nate Temple <[email protected]> >> wrote: >> >> Hi Thomas, >> >> One option instead of using the Replay block could be to stream 2x 200e6 >> from your host. >> >> On the X310, this requires using a SRAM image and DPDK. DPDK support was >> added with UHD 3.14.1.0 for the X310, I'd suggest to use 3.14.1.1 at this >> time though. >> >> Some links on DPDK: >> >> https://www.dpdk.org/ >> http://files.ettus.com/manual/page_dpdk.html >> >> I've been able to run 2x2 @ 200e6 with the X310 with DPDK using a 4 GHz >> CPU. >> >> ./benchmark_rate --rx_rate 200e6 --rx_channels 0,1 --tx_rate 200e6 >> --tx_channels 0,1 --args >> "addr=192.168.10.2,second_addr=192.168.20.2,use_dpdk=1,num_recv_frames=512,enable_tx_dual_eth=1,skip_ddc=1,skip_duc=1" >> >> num_recv_frames=512 can help if you're seeing overflows. >> >> enable_tx_dual_eth=1 is required for 2x TX @ 200e6 >> >> skip_ddc=1,skip_duc=1 can help as well since you'd be sending at full >> rate. >> >> >> >> Regards, >> Nate Temple >> >> >> >> On Wed, Dec 11, 2019 at 7:03 AM Rob Kossler via USRP-users < >> [email protected]> wrote: >> >> I do not think it is possible using the stock FPGA image. However, I can >> think of a couple of possibilities >> >> · On the N310, Ettus includes 4 FIFO blocks (rather than the >> DmaFIFO which used the off-FPGA RAM for memory), to provide capability for >> 4x125 MS/s streaming. Perhaps if you built an X310 FPGA image with 2 such >> FIFO blocks, you could use these rather than the DmaFIFO and achieve the >> desired streaming. Note that this requires a Vivado license to build your >> own FPGA image, but does not require FPGA experience because you would be >> building an image with "stock" blocks. One caution though is that >> streaming at this very high rate still requires a high performance host and >> so it is still possible that you would have underruns if your host could >> not keep up. If you go this route, I believe you will likely need to use >> the "DPDK" capability which is a bit of a pain to configure and get it >> working properly. >> >> · Another possibility is to create a custom RFNoC block that is >> similar to the replay block but that uses FPGA memory to store a fixed >> duration waveform and then plays it out cyclically like the replay block. >> The Ettus 'window' RFNoC block provides a good example of how to store >> coefficients and play them out repeatedly. But, making the needed >> modifications is not a trivial task except for someone who is pretty good >> at FPGA programming. >> >> Given that you were trying the replay block, I'm guessing that your Tx >> waveforms are of fixed duration. What is the duration (in number of >> samples) that you require? >> >> Rob >> >> >> >> On Wed, Dec 11, 2019 at 5:05 AM Thomas Harder <[email protected]> >> wrote: >> >> Thank you Rob for this comment. >> >> But I am not sure if I understand you correctly. Do you want to say, that >> it is *IMPOSSIBLE* to stream TX two different waveforms synchronized on >> the 2 channels of the x310 with the full bandwidth of 200MS/s on each >> channel? >> >> That is what I am trying the last 6 months full time, starting with >> Labview under windows and then UHD under Linux with a Dell Precision 5820 >> desktop (16GB RAM, Intel Xeon W-2125 CPU@ 4.GHz x8) with MXI connection, >> dual 10Gbit connection(Intel X520-DA2), the replay block recently: always >> the same result: continuous underruns. >> >> If you can confirm that this is not possible without an important FPGA >> change (because I have no experience in this field and I have not the time >> to invest into it), I must search for another solution to create two >> different synchronized RF waveforms with 160MHz bandwidth (optical, >> electronical,…) because this will be just a part of my experimental setup >> but it is crucial to go on . >> >> I am thankful for any advise, >> >> Thomas >> >> >> >> >> >> *From: *Rob Kossler <[email protected]> >> *Sent: *Tuesday, December 10, 2019 5:01 AM >> *To: *Thomas Harder <[email protected]> >> *Cc: *Sam Reiter <[email protected]>; [email protected] >> *Subject: *Re: [USRP-users] transmitting on two channels with replay >> block >> >> >> >> Apart from solving the underrun issue, there is also an issue with >> synchronization. The replay block doesn't presently support timed commands. >> >> >> >> And, as a side note, the issue with streaming from the host is not just >> the host. The DMA FIFO has a maximum bandwidth of something like 600 MS/s >> (combination of all inputs and outputs) that precludes streaming 400 MS/s >> in and out of the block simultaneously. So, even if the host could keep >> up, the FIFO could not. >> >> Rob >> >> >> >> On Mon, Dec 9, 2019 at 4:34 AM Thomas Harder via USRP-users < >> [email protected]> wrote: >> >> Hi Sam, >> >> Thank you for your reply. >> >> This morning I set the MCR to 184.32 and I am still having continuous >> underruns using also >> >> replay_ctrl->get_record_fullness >> >> for both channels. >> >> >> >> But since I need the full bandwidth of 160MHz I would like implement a >> second replay block in my fpga image. >> >> >> >> Could anyone help me with this? >> >> I am really new in fpga programming and for the image with one replay >> block I was just following the instructions in >> https://kb.ettus.com/Using_the_RFNoC_Replay_Block. >> >> Thank you, >> >> Thomas >> >> >> >> >> >> *From: *Sam Reiter <[email protected]> >> *Sent: *Friday, December 6, 2019 10:23 PM >> *To: *Thomas Harder <[email protected]> >> *Cc: *[email protected] >> *Subject: *Re: [USRP-users] transmitting on two channels with replay >> block >> >> >> >> Thomas, >> >> >> >> Upon further investigation, we may be running up to a practical limit of >> a single CHDR interface rather than an issue with your code. A single >> replay block servicing two radios will have a max (theoretical) rate of >> 187.5 MSPS on either channel. This means that you might be able to squeeze >> full rate out on 2 channels with an MCR of 184.32, but that's cutting it >> pretty close. Sounds like 2 channels at 200 MSPS with a replay setup will >> require 2 replay blocks serving each channel independently. If you end up >> trying either of the above out, I'd be curious to know what results you >> observe. >> >> >> >> Sam Reiter >> >> Ettus Research >> >> >> >> >> >> On Fri, Dec 6, 2019 at 2:38 PM Sam Reiter <[email protected]> wrote: >> >> Thomas, >> >> >> >> I'd need to set it up on my end, but I believe you can TX two distinct >> waveforms from a single replay block instance. You'd need to make sure that >> your adding your data to the buffer in separate locations and at an address >> that is a multiple of 8 bytes (which it looks like you're doing from the >> above snippets). Are you seeing continuous underruns, or just a handful at >> the beginning on the run? Does your duplicated code also use: >> >> >> >> replay_ctrl->get_record_fullness >> >> >> >> on both channels before kicking off the stream start? >> >> >> >> Sam Reiter >> >> Ettus Research >> >> >> >> On Wed, Dec 4, 2019 at 3:48 AM Thomas Harder via USRP-users < >> [email protected]> wrote: >> >> Hello everyone, >> >> Is it possible to transmit two different waveforms on the two channels of >> the USRP X310 with the two UBX-160 daughterboards? >> >> I want to transmit two different waveforms simultaneous (synchronized ) >> on the two channels of the USRP with the full sample rate of 200 MS/s. I >> tried already to do it with a dual 10Gbit-ethernet connection and I seemed >> to be limited by my computer. Now I am trying to do it with the replay >> block. >> >> >> >> I built the FPGA image with one Replay block as described in >> https://kb.ettus.com/Using_the_RFNoC_Replay_Block to run the example >> “replay_samples_from_file” and it is working fine if I transmit just on one >> channel. Now I was modifying the code by connecting the replay block to >> both channels: >> >> >> replay_graph->connect(replay_ctrl->get_block_id(),replay_chan,tx_blockid,tx_chan,replay_spp); >> >> >> replay_graph->connect(replay_ctrl->get_block_id(),replay_chan1,tx_blockid1,tx_chan,replay_spp); >> >> >> >> and writing the same waveform into another region of the DRAM-buffer: >> >> replay_ctrl->config_record(0,words_to_replay*replay_word_size, >> replay_chan); >> >> replay_ctrl->config_record(20000,words_to_replay*replay_word_size, >> replay_chan1); >> >> and >> >> replay_ctrl->config_play(0,words_to_replay*replay_word_size, replay_chan); >> >> replay_ctrl->config_play(20000,words_to_replay*replay_word_size, >> replay_chan1); >> >> >> >> where >> >> words_to_replay*replay_word_size=16000 >> >> replay_chan=0 >> >> replay_chan1=1 >> >> tx_blockid=0/Radio_0 >> >> tx_blockid=0/Radio_1 >> >> >> >> then I stream my waveforms to the replay block as defined in the example >> and I start to replay the data: >> >> replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan); >> >> replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan1); >> >> >> >> It works but with plenty of Underflows!! >> >> >> >> So what does it mean when it says in the manual: >> >> “Note that the record and playback buffers do not need to the same, >> allowing a single Replay block to both record and playback to different >> regions of memory* simultaneously*.” >> >> (https://kb.ettus.com/Using_the_RFNoC_Replay_Block)? >> >> >> >> Because in the manual it says also: >> >> “The replay block has the following features: One input and *one* output” >> >> ( >> https://files.ettus.com/manual/classuhd_1_1rfnoc_1_1replay__block__ctrl.html >> ) >> >> >> >> So if the replay block has just one output why does it have two channels >> connected to it (replay_chan= 0 and 1)? >> >> >> >> If one replay block can just stream to one channel at the same time, can >> I implement easily a second replay block in the FPGA to stream on the two >> channels of my USRP two different waveforms simultaneously? >> >> >> >> Thank you, >> >> Thomas >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> USRP-users mailing list >> [email protected] >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing list >> [email protected] >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing list >> [email protected] >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >
_______________________________________________ USRP-users mailing list [email protected] http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
