Thanks for looking at this EJ.
 
The outputs does go straight to the axi_wrapper, so that was good news.  Out of 
curiosity, besides cleanliness, what does the axi_flop buy that issuing a one 
clock register event doesn't?
 
Yeah, I didn't like that multiply when I put it in, but like I said, it got the 
job done quickly :).  Of course I had good intentions to come back and do 
everything the right way, but.....  I haven't touched it until I wanted to test 
on the X310.....
 
 
--------- Original Message --------- Subject: Re: Re: [USRP-users] rfnoc build 
works for E310, doesn't meet timing with X310
From: "EJ Kreinar" <ejkrei...@gmail.com>
Date: 11/8/18 10:02 am
To: "Jason Matusiak" <ja...@gardettoengineering.com>
Cc: "USRP-users@lists.ettus.com" <usrp-users@lists.ettus.com>

   First, it's really not failing by much -- you got under 7 ns, so it's 
*almost* there.
 
Two suggestions:
 
1. If the input/output of the block does not go directly into axi_wrapper, try 
adding an axi_flop at the end to insert a one-cycle delay. This could break up 
a critical path if you have a few "0-delay" blocks in a row. If this goes into 
axi_wrapper, it should probably be fine
 
2. This is a smoking gun: 
 
`(vector_cnt == (of_n-1)*keep_m)`
 
See if you can find a way around the multiply operation and I expect it will 
work just fine.
 
EJ
 
 



  On Thu, Nov 8, 2018 at 9:47 AM Jason Matusiak <ja...@gardettoengineering.com> 
wrote:
 Gents, thanks for the input.  I actually found the section I needed in the 
timing report just before you guys wrote (I hate trying to sift through those). 
 It is indeed my block that is causing issues.  I was getting ready to try to 
break out my testbench and start playing with it by adding some registers to 
see if that helps (the testbench worked before, so I won't know if this helps 
timing, but I could at least make sure I didn't break anything.
 
Excellent point on the clock differences.  I was stuck down a rabbit hole as to 
why the E310 would be fine, but not the X310, but that makes sense.  I was just 
getting lucky I guess at the slower clock rates.
 
Here is the relevant timing issue:
---------------------------------------------------------------------------------------------------
>From Clock: ce_clk
 To Clock: ce_clk
 Setup : 8006 Failing Endpoints, Worst Slack -2.711ns, Total Violation 
-4376.156ns
Hold : 0 Failing Endpoints, Worst Slack 0.035ns, Total Violation 0.000ns
PW : 0 Failing Endpoints, Worst Slack 1.565ns, Total Violation 0.000ns
---------------------------------------------------------------------------------------------------
 
Max Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) : -2.711ns (required time - arrival time)
 Source: x300_core/inst_keepMinN/sr_n/out_reg[2]_replica/C
 (rising edge-triggered cell FDRE clocked by ce_clk {rise@0.000ns fall@2.333ns 
period=4.667ns})
 Destination: x300_core/inst_keepMinN/keepMinN/vector_cnt_reg[12]/R
 (rising edge-triggered cell FDRE clocked by ce_clk {rise@0.000ns fall@2.333ns 
period=4.667ns})
 Path Group: ce_clk
 Path Type: Setup (Max at Slow Process Corner)
 Requirement: 4.667ns (ce_clk rise@4.667ns - ce_clk rise@0.000ns)
 Data Path Delay: 6.966ns (logic 5.340ns (76.654%) route 1.626ns (23.346%))
 Logic Levels: 10 (CARRY4=5 DSP48E1=2 LUT1=1 LUT3=1 LUT5=1)
 Clock Path Skew: -0.053ns (DCD - SCD + CPR)
 Destination Clock Delay (DCD): -1.659ns = ( 3.008 - 4.667 )
 Source Clock Delay (SCD): -2.028ns
 Clock Pessimism Removal (CPR): -0.422ns
 Clock Uncertainty: 0.054ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
 Total System Jitter (TSJ): 0.071ns
 Discrete Jitter (DJ): 0.082ns
 Phase Error (PE): 0.000ns
 (and it continues with more stuff that I don't think is particularly useful).
  
 So, it is plain to see that something I am doing inside my block is very 
stupid (though accomplishes what I wanted.  I can post the code, so maybe 
someone can spot the issue (I really think it is a registering problem that I 
need to do on the output side).  Please excuse the simplicity of the code, I 
needed to through something together VERY fast, and instead of being elegant, I 
went with easy to code up.
  
 The block's title is a little misleading, it basically keeps M vectors out of 
N vectors.  So if the vector size is 512, and M==2 and N==10.  I will pass 
through 1024 samples, and then dump the next 4096 samples.  Then wash, rinse, 
repeat.  The verilog is attached since I was having trouble keeping formatting 
when I pasted it here.
 
 
 
 
 
 
--------- Original Message --------- Subject: Re: [USRP-users] rfnoc build 
works for E310, doesn't meet timing with X310
From: "EJ Kreinar" <ejkrei...@gmail.com>
Date: 11/8/18 9:09 am
To: "Jason Matusiak" <ja...@gardettoengineering.com>
Cc: "USRP-users@lists.ettus.com" <usrp-users@lists.ettus.com>

  Hi Jason,  
That actually makes sense to me... Bus clk on the e310 is usually 50 MHz if I 
remember correctly (and if it didn't change), and the max radio_clk is 
something like 64ish MHz.
 
Max clock rates on the x310 are, I believe, more like 200-215 MHz. So logic in 
the x310 nominally needs to settle within 5 ns while logic on the e310 can have 
a luxurious 15-20 ns. Xilinx will try very hard to optimize timing and the 
build for x310 could take a LONG time.
 
If you can access the timing report, it will often show critical paths, but the 
text report is often unintelligible to me. I have also built in GUI mode before 
(like setting up a ILA core) because the Vivado interface for organizing and 
understanding timing failures are actually pretty helpful. 
 
I'd also suggest, if you'd like some verilog input, to send the relevant code 
you think might continue to the timing errors, and we can take a look for 
ideas? It could potentially lead to a quick solution if you're up for it. A 
MaxNinM block sounds like it could easily contribute to poor timing depending 
on the implementation.
 
EJ


  On Thu, Nov 8, 2018, 8:11 AM Jason Matusiak via USRP-users 
<usrp-users@lists.ettus.com wrote:
 OK, this has befuddled me for 3 days and I can't seem to get past it.  I have 
a prefix that seems to work fine.
 
Here are my working steps for building a bitfile on an E310:
cd /opt/gnuradio/e300/src/uhd/fpga-src/usrp3/tools/scripts
 
source ../../top/e300/setupenv.sh
 
./uhd_image_builder.py keepMinN ddc split_stream axi_fifo_loopback -d e310 -t 
E310_RFNOC_sg3 -I /opt/gnuradio/e300/src/rfnoc-nocblocks
 
This build and runs fine.  keepMinN is a small custom block I made that doesn't 
use much resources and has been working fine for weeks.
 
Now, if I open a new terminal and run this:
 
cd /opt/gnuradio/e300/src/uhd/fpga-src/usrp3/tools/scripts
source ../../top/x300/setupenv.sh
./uhd_image_builder.py keepMinN ddc ddc split_stream axi_fifo_loopback -d x310 
-t X310_RFNOC_XG -I /opt/gnuradio/e300/src/rfnoc-nocblocks -m 5
 
it never seems to meet timing.  Now, I have done this with and without the "-m" 
directive, that doesn't seem to matter.  The only real difference in the 
command is the second ddc block.
 
So what the heck could be causing these issues?  If anything, I would have 
expected the X310 build to be fine and the E310 to not meet timing.  Another 
odd thing (though I am chalking it up to the X310 doing more) is that the X310 
build is taking A LOT longer.  I don't recall it taking this long before, but I 
am not sure.
 
It tell me to look at the report_timing_summary, but it hasn't updated yet (it 
keeps running for a bit after throwing the timing warning).  If I remember 
though, I think that the issue it had was with the ce_clk for some reason.

_______________________________________________
 USRP-users mailing list
 USRP-users@lists.ettus.com
 http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to