Thanks for looking at this EJ. The outputs does go straight to the axi_wrapper, so that was good news. Out of curiosity, besides cleanliness, what does the axi_flop buy that issuing a one clock register event doesn't? Yeah, I didn't like that multiply when I put it in, but like I said, it got the job done quickly :). Of course I had good intentions to come back and do everything the right way, but..... I haven't touched it until I wanted to test on the X310..... --------- Original Message --------- Subject: Re: Re: [USRP-users] rfnoc build works for E310, doesn't meet timing with X310 From: "EJ Kreinar" <ejkrei...@gmail.com> Date: 11/8/18 10:02 am To: "Jason Matusiak" <ja...@gardettoengineering.com> Cc: "USRP-users@lists.ettus.com" <usrp-users@lists.ettus.com>
First, it's really not failing by much -- you got under 7 ns, so it's *almost* there. Two suggestions: 1. If the input/output of the block does not go directly into axi_wrapper, try adding an axi_flop at the end to insert a one-cycle delay. This could break up a critical path if you have a few "0-delay" blocks in a row. If this goes into axi_wrapper, it should probably be fine 2. This is a smoking gun: `(vector_cnt == (of_n-1)*keep_m)` See if you can find a way around the multiply operation and I expect it will work just fine. EJ On Thu, Nov 8, 2018 at 9:47 AM Jason Matusiak <ja...@gardettoengineering.com> wrote: Gents, thanks for the input. I actually found the section I needed in the timing report just before you guys wrote (I hate trying to sift through those). It is indeed my block that is causing issues. I was getting ready to try to break out my testbench and start playing with it by adding some registers to see if that helps (the testbench worked before, so I won't know if this helps timing, but I could at least make sure I didn't break anything. Excellent point on the clock differences. I was stuck down a rabbit hole as to why the E310 would be fine, but not the X310, but that makes sense. I was just getting lucky I guess at the slower clock rates. Here is the relevant timing issue: --------------------------------------------------------------------------------------------------- >From Clock: ce_clk To Clock: ce_clk Setup : 8006 Failing Endpoints, Worst Slack -2.711ns, Total Violation -4376.156ns Hold : 0 Failing Endpoints, Worst Slack 0.035ns, Total Violation 0.000ns PW : 0 Failing Endpoints, Worst Slack 1.565ns, Total Violation 0.000ns --------------------------------------------------------------------------------------------------- Max Delay Paths -------------------------------------------------------------------------------------- Slack (VIOLATED) : -2.711ns (required time - arrival time) Source: x300_core/inst_keepMinN/sr_n/out_reg[2]_replica/C (rising edge-triggered cell FDRE clocked by ce_clk {rise@0.000ns fall@2.333ns period=4.667ns}) Destination: x300_core/inst_keepMinN/keepMinN/vector_cnt_reg[12]/R (rising edge-triggered cell FDRE clocked by ce_clk {rise@0.000ns fall@2.333ns period=4.667ns}) Path Group: ce_clk Path Type: Setup (Max at Slow Process Corner) Requirement: 4.667ns (ce_clk rise@4.667ns - ce_clk rise@0.000ns) Data Path Delay: 6.966ns (logic 5.340ns (76.654%) route 1.626ns (23.346%)) Logic Levels: 10 (CARRY4=5 DSP48E1=2 LUT1=1 LUT3=1 LUT5=1) Clock Path Skew: -0.053ns (DCD - SCD + CPR) Destination Clock Delay (DCD): -1.659ns = ( 3.008 - 4.667 ) Source Clock Delay (SCD): -2.028ns Clock Pessimism Removal (CPR): -0.422ns Clock Uncertainty: 0.054ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE Total System Jitter (TSJ): 0.071ns Discrete Jitter (DJ): 0.082ns Phase Error (PE): 0.000ns (and it continues with more stuff that I don't think is particularly useful). So, it is plain to see that something I am doing inside my block is very stupid (though accomplishes what I wanted. I can post the code, so maybe someone can spot the issue (I really think it is a registering problem that I need to do on the output side). Please excuse the simplicity of the code, I needed to through something together VERY fast, and instead of being elegant, I went with easy to code up. The block's title is a little misleading, it basically keeps M vectors out of N vectors. So if the vector size is 512, and M==2 and N==10. I will pass through 1024 samples, and then dump the next 4096 samples. Then wash, rinse, repeat. The verilog is attached since I was having trouble keeping formatting when I pasted it here. --------- Original Message --------- Subject: Re: [USRP-users] rfnoc build works for E310, doesn't meet timing with X310 From: "EJ Kreinar" <ejkrei...@gmail.com> Date: 11/8/18 9:09 am To: "Jason Matusiak" <ja...@gardettoengineering.com> Cc: "USRP-users@lists.ettus.com" <usrp-users@lists.ettus.com> Hi Jason, That actually makes sense to me... Bus clk on the e310 is usually 50 MHz if I remember correctly (and if it didn't change), and the max radio_clk is something like 64ish MHz. Max clock rates on the x310 are, I believe, more like 200-215 MHz. So logic in the x310 nominally needs to settle within 5 ns while logic on the e310 can have a luxurious 15-20 ns. Xilinx will try very hard to optimize timing and the build for x310 could take a LONG time. If you can access the timing report, it will often show critical paths, but the text report is often unintelligible to me. I have also built in GUI mode before (like setting up a ILA core) because the Vivado interface for organizing and understanding timing failures are actually pretty helpful. I'd also suggest, if you'd like some verilog input, to send the relevant code you think might continue to the timing errors, and we can take a look for ideas? It could potentially lead to a quick solution if you're up for it. A MaxNinM block sounds like it could easily contribute to poor timing depending on the implementation. EJ On Thu, Nov 8, 2018, 8:11 AM Jason Matusiak via USRP-users <usrp-users@lists.ettus.com wrote: OK, this has befuddled me for 3 days and I can't seem to get past it. I have a prefix that seems to work fine. Here are my working steps for building a bitfile on an E310: cd /opt/gnuradio/e300/src/uhd/fpga-src/usrp3/tools/scripts source ../../top/e300/setupenv.sh ./uhd_image_builder.py keepMinN ddc split_stream axi_fifo_loopback -d e310 -t E310_RFNOC_sg3 -I /opt/gnuradio/e300/src/rfnoc-nocblocks This build and runs fine. keepMinN is a small custom block I made that doesn't use much resources and has been working fine for weeks. Now, if I open a new terminal and run this: cd /opt/gnuradio/e300/src/uhd/fpga-src/usrp3/tools/scripts source ../../top/x300/setupenv.sh ./uhd_image_builder.py keepMinN ddc ddc split_stream axi_fifo_loopback -d x310 -t X310_RFNOC_XG -I /opt/gnuradio/e300/src/rfnoc-nocblocks -m 5 it never seems to meet timing. Now, I have done this with and without the "-m" directive, that doesn't seem to matter. The only real difference in the command is the second ddc block. So what the heck could be causing these issues? If anything, I would have expected the X310 build to be fine and the E310 to not meet timing. Another odd thing (though I am chalking it up to the X310 doing more) is that the X310 build is taking A LOT longer. I don't recall it taking this long before, but I am not sure. It tell me to look at the report_timing_summary, but it hasn't updated yet (it keeps running for a bit after throwing the timing warning). If I remember though, I think that the issue it had was with the ce_clk for some reason. _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
_______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com