On Wed, Dec 11, 2019 at 9:33 AM Nate Temple <[email protected]> wrote:

> Hi Thomas,
>
> You will need to apply these changes below to the
> fpga-src/usrp3/top/x300/rfnoc_ce_default_inst_x310.v file. This will add
> additional SRAM FIFOs, which is basically what the "XGS" / SRAM image is.
> Make sure to start with the v3.14.1.1 fpga sources. (run git submodule
> init; git submodule update; in your UHD repo after checking out v3.14.1.1).
>
> ########################################################################
>
> diff --git a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
> b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
> index d20a64962..bcb4c3c32 100644
> --- a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
> +++ b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
> @@ -1,4 +1,4 @@
> -  localparam NUM_CE = 4;  // Must be no more than 10 (6 ports taken by
> transport and IO connected CEs)
> +  localparam NUM_CE = 6;  // Must be no more than 10 (6 ports taken by
> transport and IO connected CEs)
>
>   wire [NUM_CE*64-1:0] ce_flat_o_tdata, ce_flat_i_tdata;
>   wire [63:0]          ce_o_tdata[0:NUM_CE-1], ce_i_tdata[0:NUM_CE-1];
> @@ -46,7 +46,9 @@
>   genvar n;
>   generate
>     for (n = 4; n < NUM_CE; n = n + 1) begin
> -      noc_block_axi_fifo_loopback inst_noc_block_axi_fifo_loopback (
> +      noc_block_axi_fifo_loopback #(
> +        .STR_SINK_FIFOSIZE(15)
> +      ) inst_noc_block_axi_fifo_loopback (
>         .bus_clk(bus_clk), .bus_rst(bus_rst),
>         .ce_clk(ce_clk), .ce_rst(ce_rst),
>         .i_tdata(ce_o_tdata[n]), .i_tlast(ce_o_tlast[n]),
> .i_tvalid(ce_o_tvalid[n]), .i_tready(ce_o_tready[n]),
>
> ########################################################################
>
>
> After making these modifications to the FPGA sources, you can build a FPGA
> image with the commands:
>
> cd fpga-src/usrp3/top/x300/
> source setupenv.sh
> make X310_XG
>
> Note: Even though you are calling X310_XG, it is really a "XGS" image
> since it has the additional SRAM fifos.
>
> After that has completed building, you should write that FPGA image to the
> X310 using uhd_image_loader.
>
> uhd_image_lodaer --args "addr=192.168.40.2,type=x300" --fpga-path
> /path/to/x300.bit
>
> After the FPGA image load and restarting the USRP, run uhd_usrp_probe and
> at the end of the output where the RFNoC blocks are listed, you should see
> two additional FIFO blocks:
>
> FIFO_0
> FIFO_1
>
>
>
>
>
> Random performance tuning notes:
>
> * Ensure your CPU governor is set to performance:
>
> sudo apt install cpufrequtils
>
> To set performance for all cores:
>
> for ((i=0;i<$(nproc);i++)); do sudo cpufreq-set -c $i -r -g performance;
> done
>
>
> Verify with:
>
> cpufreq-info
>
> * Set your network buffers
>
> sudo sysctl -w net.core.rmem_max=625000000
> sudo sysctl -w net.core.wmem_max=625000000
>
> * Set the MTU to 8000 on your 10Gb NICs
>
> * Ensure you have pthreads enabled for your user
>
> https://kb.ettus.com/Building_and_Installing_the_USRP_Open-Source_Toolchain_(UHD_and_GNU_Radio)_on_Linux#Thread_priority_scheduling
>
> http://files.ettus.com/manual/page_general.html#general_threading
>
>
> * Disable hyper threading in bios. This will typically give about a 10%
> boost in core performance if you can work without the additional cores.
> You'll need to update your cpu core list in DPDK.
>
> * Disable KPTI for spectra/meltdown. I would recommend to try disabling
> the KPTI protections for your CPU if the machine is offline, you may see a
> 10-15% performance increase.
>
> This can be done by adding the lines below to your /etc/default/grub at
> GRUB_CMDLINE_LINUX_DEFAULT="", then running sudo update-grub and rebooting.
>
> pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier
>
> Note, this disables protections against Meltdown/Spectra (links below). So
> if you try to do this, I would recommend disconnecting that host from any
> internet connected network.
>
> https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
> https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
>
> * There are additional recommendations here from Intel on various
> adjustments you can do to improve performance with DPDK:
> http://doc.dpdk.org/guides/linux_gsg/nic_perf_intel_platform.html
>
> Specifically I would recommend to try section 10.1.3 #3 where you isolate
> the CPU cores that are used for DPDK.
>
> * Here is a performance report from Intel on DPDK 17.11:
> https://fast.dpdk.org/doc/perf/DPDK_17_11_Intel_NIC_performance_report.pdf
>
> In the tables of boot and bio's settings the additional CPU options of
> nohz_full="" and rcu_nocbs="" are added to their kernel configs, this may
> help as well.
>
> Additionally they made the changes listed below:
>
> CPU Power and Performance Policy <Performance> (you should already be
> doing this)
> CPU C-state Disabled
> CPU P-state Disabled
> Enhanced Intel® Speedstep® Tech Disabled
> Turbo Boost Disabled
>
>
>
>
> Regards,
> Nate Temple
>
> On Wed, Dec 11, 2019 at 9:18 AM Thomas Harder <[email protected]>
> wrote:
>
>> Rob,
>>
>> I am definitely interested in your costum ‘txarb’ RFNoC block. For now I
>> am using tx waveforms of about 10.000 samples, so the 2^15 samples would be
>> sufficient.
>>
>> I was already searching what exactly this SRAM image means. Because today
>> I was able to setup DPDK with UHD 3.14.1 and the benchmark_rate
>> code(excactly described as in the mail of Nate) was still full of underruns
>> with the stock XG fpga image which I downloaded with uhd_images_downloader.
>> So I will also try to build a second FIFO block, since I have still for two
>> weeks the test version of Vivado.
>>
>> Thomas
>>
>>
>>
>>
>>
>> *From: *Rob Kossler <[email protected]>
>> *Sent: *Wednesday, December 11, 2019 4:50 PM
>> *To: *Thomas Harder <[email protected]>; Nate Temple
>> <[email protected]>
>> *Subject: *Re: [USRP-users] transmitting on two channels with replay
>> block
>>
>>
>>
>> Thomas,
>>
>> I believe that Nate and I were saying basically the same thing.  When he
>> referred to an SRAM image, I believe that this means an image with the FIFO
>> blocks.  I believe that such an image needs to be built by the user (rather
>> than downloaded using uhd_images_downloader), but I'm not 100% certain.
>>
>>
>>
>> If you are interested, I have a custom 'txarb' RFNoC block that
>> implements my 2nd option below.  By default, it includes storage of up to
>> 2^15 samples, but this can be modified using an input parameter (FPGA
>> resources permitting). This block requires some specialized behavior, but
>> it is pretty simple.  Similar to the Replay block, you need to construct a
>> custom RFNoC graph that connects the txarb block to the Radio.  When you
>> want to stream, you need to stream just one full waveform to the
>> txarb block.  Once the txarb block receives end-of-burst, it will
>> automatically stop "recording the samples to memory" and begin "playing the
>> samples from memory repeatedly".  The streaming will continue indefinitely
>> until you send a new tx waveform.  If the new tx waveform contains less
>> than 2 samples, the streaming is turned off.  There are no control
>> registers to worry about. Timed behavior is supported because the block
>> preserves the command time of the incoming stream from the host when it
>> starts playing out.
>>
>>
>>
>> It is not terribly difficult to build this custom block, but if you
>> haven't built out-of-tree RFNOC blocks before, it might be easiest to just
>> put this block in-tree (in the Ettus folder structure) and manually modify
>> makefiles as needed. Let me know if you are interested.
>>
>> Rob
>>
>>
>>
>>
>>
>> On Wed, Dec 11, 2019 at 10:07 AM Nate Temple <[email protected]>
>> wrote:
>>
>> Hi Thomas,
>>
>> One option instead of using the Replay block could be to stream 2x 200e6
>> from your host.
>>
>> On the X310, this requires using a SRAM image and DPDK. DPDK support was
>> added with UHD 3.14.1.0 for the X310, I'd suggest to use 3.14.1.1 at this
>> time though.
>>
>> Some links on DPDK:
>>
>> https://www.dpdk.org/
>> http://files.ettus.com/manual/page_dpdk.html
>>
>> I've been able to run 2x2 @ 200e6 with the X310 with DPDK using a 4 GHz
>> CPU.
>>
>> ./benchmark_rate --rx_rate 200e6 --rx_channels 0,1 --tx_rate 200e6
>> --tx_channels 0,1 --args
>> "addr=192.168.10.2,second_addr=192.168.20.2,use_dpdk=1,num_recv_frames=512,enable_tx_dual_eth=1,skip_ddc=1,skip_duc=1"
>>
>> num_recv_frames=512 can help if you're seeing overflows.
>>
>> enable_tx_dual_eth=1 is required for 2x TX @ 200e6
>>
>> skip_ddc=1,skip_duc=1 can help as well since you'd be sending at full
>> rate.
>>
>>
>>
>> Regards,
>> Nate Temple
>>
>>
>>
>> On Wed, Dec 11, 2019 at 7:03 AM Rob Kossler via USRP-users <
>> [email protected]> wrote:
>>
>> I do not think it is possible using the stock FPGA image.  However, I can
>> think of a couple of possibilities
>>
>> ·       On the N310, Ettus includes 4 FIFO blocks (rather than the
>> DmaFIFO which used the off-FPGA RAM for memory), to provide capability for
>> 4x125 MS/s streaming. Perhaps if you built an X310 FPGA image with 2 such
>> FIFO blocks, you could use these rather than the DmaFIFO and achieve the
>> desired streaming.  Note that this requires a Vivado license to build your
>> own FPGA image, but does not require FPGA experience because you would be
>> building an image with "stock" blocks.  One caution though is that
>> streaming at this very high rate still requires a high performance host and
>> so it is still possible that you would have underruns if your host could
>> not keep up.  If you go this route, I believe you will likely need to use
>> the "DPDK" capability which is a bit of a pain to configure and get it
>> working properly.
>>
>> ·       Another possibility is to create a custom RFNoC block that is
>> similar to the replay block but that uses FPGA memory to store a fixed
>> duration waveform and then plays it out cyclically like the replay block.
>> The Ettus 'window' RFNoC block provides a good example of how to store
>> coefficients and play them out repeatedly.  But, making the needed
>> modifications is not a trivial task except for someone who is pretty good
>> at FPGA programming.
>>
>> Given that you were trying the replay block, I'm guessing that your Tx
>> waveforms are of fixed duration.  What is the duration (in number of
>> samples) that you require?
>>
>> Rob
>>
>>
>>
>> On Wed, Dec 11, 2019 at 5:05 AM Thomas Harder <[email protected]>
>> wrote:
>>
>> Thank you Rob for this comment.
>>
>> But I am not sure if I understand you correctly. Do you want to say, that
>> it is *IMPOSSIBLE* to stream TX two different waveforms synchronized  on
>> the 2 channels of the x310 with the full bandwidth of 200MS/s on each
>> channel?
>>
>> That is what I am trying the last 6 months full time, starting with
>> Labview under windows and then UHD under Linux with a Dell Precision 5820
>> desktop (16GB RAM, Intel Xeon W-2125 CPU@ 4.GHz x8) with MXI connection,
>> dual 10Gbit connection(Intel X520-DA2), the replay block recently: always
>> the same result: continuous underruns.
>>
>> If you can confirm that this is not possible without an important FPGA
>> change (because I have no experience in this field and I have not the time
>> to invest into it), I must search for another solution to create two
>> different synchronized RF waveforms with 160MHz bandwidth (optical,
>> electronical,…) because this will be just a part of my experimental setup
>> but it is crucial to go on .
>>
>> I am thankful for any advise,
>>
>> Thomas
>>
>>
>>
>>
>>
>> *From: *Rob Kossler <[email protected]>
>> *Sent: *Tuesday, December 10, 2019 5:01 AM
>> *To: *Thomas Harder <[email protected]>
>> *Cc: *Sam Reiter <[email protected]>; [email protected]
>> *Subject: *Re: [USRP-users] transmitting on two channels with replay
>> block
>>
>>
>>
>> Apart from solving the underrun issue, there is also an issue with
>> synchronization.  The replay block doesn't presently support timed commands.
>>
>>
>>
>> And, as a side note, the issue with streaming from the host is not just
>> the host.  The DMA FIFO has a maximum bandwidth of something like 600 MS/s
>> (combination of all inputs and outputs) that precludes streaming 400 MS/s
>> in and out of the block simultaneously.  So, even if the host could keep
>> up, the FIFO could not.
>>
>> Rob
>>
>>
>>
>> On Mon, Dec 9, 2019 at 4:34 AM Thomas Harder via USRP-users <
>> [email protected]> wrote:
>>
>> Hi Sam,
>>
>> Thank you for your reply.
>>
>> This morning I set the MCR to 184.32 and I am still having continuous
>> underruns using also
>>
>> replay_ctrl->get_record_fullness
>>
>> for both channels.
>>
>>
>>
>> But since I need the full bandwidth of 160MHz I would like implement a
>> second replay block in my fpga image.
>>
>>
>>
>> Could anyone help me with this?
>>
>> I am really new in fpga programming and for the image with one replay
>> block I was just following the instructions in
>> https://kb.ettus.com/Using_the_RFNoC_Replay_Block.
>>
>> Thank you,
>>
>> Thomas
>>
>>
>>
>>
>>
>> *From: *Sam Reiter <[email protected]>
>> *Sent: *Friday, December 6, 2019 10:23 PM
>> *To: *Thomas Harder <[email protected]>
>> *Cc: *[email protected]
>> *Subject: *Re: [USRP-users] transmitting on two channels with replay
>> block
>>
>>
>>
>> Thomas,
>>
>>
>>
>> Upon further investigation, we may be running up to a practical limit of
>> a single CHDR interface rather than an issue with your code. A single
>> replay block servicing two radios will have a max (theoretical) rate of
>> 187.5 MSPS on either channel. This means that you might be able to squeeze
>> full rate out on 2 channels with an MCR of 184.32, but that's cutting it
>> pretty close. Sounds like 2 channels at 200 MSPS with a replay setup will
>> require 2 replay blocks serving each channel independently. If you end up
>> trying either of the above out, I'd be curious to know what results you
>> observe.
>>
>>
>>
>> Sam Reiter
>>
>> Ettus Research
>>
>>
>>
>>
>>
>> On Fri, Dec 6, 2019 at 2:38 PM Sam Reiter <[email protected]> wrote:
>>
>> Thomas,
>>
>>
>>
>> I'd need to set it up on my end, but I believe you can TX two distinct
>> waveforms from a single replay block instance. You'd need to make sure that
>> your adding your data to the buffer in separate locations and at an address
>> that is a multiple of 8 bytes (which it looks like you're doing from the
>> above snippets). Are you seeing continuous underruns, or just a handful at
>> the beginning on the run? Does your duplicated code also use:
>>
>>
>>
>> replay_ctrl->get_record_fullness
>>
>>
>>
>> on both channels before kicking off the stream start?
>>
>>
>>
>> Sam Reiter
>>
>> Ettus Research
>>
>>
>>
>> On Wed, Dec 4, 2019 at 3:48 AM Thomas Harder via USRP-users <
>> [email protected]> wrote:
>>
>> Hello everyone,
>>
>> Is it possible to transmit two different waveforms on the two channels of
>> the USRP X310 with the two UBX-160 daughterboards?
>>
>> I want to transmit two different waveforms simultaneous (synchronized )
>> on the two channels of the USRP with the full sample rate of 200 MS/s. I
>> tried already to do it with a dual 10Gbit-ethernet connection and I seemed
>> to be limited by my computer. Now I am trying to do it with the replay
>> block.
>>
>>
>>
>> I built the FPGA image with one Replay block as described in
>> https://kb.ettus.com/Using_the_RFNoC_Replay_Block to run the example
>> “replay_samples_from_file” and it is working fine if I transmit just on one
>> channel. Now I was modifying the code by connecting the replay block to
>> both channels:
>>
>>
>> replay_graph->connect(replay_ctrl->get_block_id(),replay_chan,tx_blockid,tx_chan,replay_spp);
>>
>>
>> replay_graph->connect(replay_ctrl->get_block_id(),replay_chan1,tx_blockid1,tx_chan,replay_spp);
>>
>>
>>
>> and writing the same waveform into another region of the DRAM-buffer:
>>
>> replay_ctrl->config_record(0,words_to_replay*replay_word_size,
>> replay_chan);
>>
>> replay_ctrl->config_record(20000,words_to_replay*replay_word_size,
>> replay_chan1);
>>
>> and
>>
>> replay_ctrl->config_play(0,words_to_replay*replay_word_size, replay_chan);
>>
>> replay_ctrl->config_play(20000,words_to_replay*replay_word_size,
>> replay_chan1);
>>
>>
>>
>> where
>>
>> words_to_replay*replay_word_size=16000
>>
>> replay_chan=0
>>
>> replay_chan1=1
>>
>> tx_blockid=0/Radio_0
>>
>> tx_blockid=0/Radio_1
>>
>>
>>
>> then I stream my waveforms to the replay block as defined in the example
>> and I start to replay the data:
>>
>> replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan);
>>
>> replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan1);
>>
>>
>>
>> It works but with plenty of Underflows!!
>>
>>
>>
>> So what does it mean when it says in the manual:
>>
>> “Note that the record and playback buffers do not need to the same,
>> allowing a single Replay block to both record and playback to different
>> regions of memory* simultaneously*.”
>>
>> (https://kb.ettus.com/Using_the_RFNoC_Replay_Block)?
>>
>>
>>
>> Because in the manual it says also:
>>
>> “The replay block has the following features: One input and *one* output”
>>
>> (
>> https://files.ettus.com/manual/classuhd_1_1rfnoc_1_1replay__block__ctrl.html
>> )
>>
>>
>>
>> So if the replay block has just one output why does it have two channels
>> connected to it (replay_chan= 0 and 1)?
>>
>>
>>
>> If one replay block can just stream to one channel at the same time, can
>> I implement easily a second replay block in the FPGA to stream on the two
>> channels of my USRP two different waveforms simultaneously?
>>
>>
>>
>> Thank you,
>>
>> Thomas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> USRP-users mailing list
>> [email protected]
>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>>
>>
>> _______________________________________________
>> USRP-users mailing list
>> [email protected]
>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>>
>>
>> _______________________________________________
>> USRP-users mailing list
>> [email protected]
>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>>
>>
>
_______________________________________________
USRP-users mailing list
[email protected]
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to