Re: [casper] Mystery timing errors

2015-08-30 Thread Jack Hickish
Hi Michael,

As one of the Xilinx timing closure documents helpfully articulates, it's
very difficult to give specific recipes for solving timing problems, other
than reading the timing report, looking at things in PlanAhead/FPGA Editor,
iterating compiles and developing some intuition.

That said, there are a few things you could try, of varying levels of
complexity --
Since you have quite a lot of channels in your design, I suspect your vacc
problems are related to the size of the bram buffer (especially if your
samples are wide). Keep in mind that a bram in a Virtex6 is a 36 bit wide x
1k deep memory block, so building a vacc to accommodate many thousands of
channels means banking together lots of brams -- this is liable to cause
fanout issues on your address/control signals. The same is probably true of
the large delays in the PFB blocks, though I'm not sure how these are
implemented in the various block versions which exist in mlib_devel.
Setting brams to optimize for speed and giving them at least 3 cycles
latency may help. You could also try manually controlling bram control
signals -- I believe the casper_library_bus/bus_single_port_ram block will
do some of this for you if you set to optimize for speed and include some
fanout latency. If you replace the bram delay in the vacc with one of these
(and associated address counter logic to make the block act as a delay)
then perhaps this will help. Be careful to get your latencies right so the
block still functions properly.
You might also find that implementing the adder in a DSP slice (if you
haven't already) might help.

A more involved option is to go about manually constraining the placement
of components. You might find setting up pblocks in PlanAhead to place
major parts of your design might free up the compiler to do a little better
with the remainder of the logic. Adding pblocks for the pfb, and the
various FFT stages can be very helpful in this regard. Or perhaps just
constraining the vacc that's causing you problems might be enough.
Once you have some constraints from planahead which work, you can
auto-include them in your simulink compiles using various methods, my
favourite being the 'UCF' yellow block, which is in the current
casper-astro mlib_devel.

In general, I suspect that you would be well served by trying to reduce
your bram use in your model - the resource utilisation report can probably
either confirm or contest that this is the case. You could do this by
trying to use alternatives to brams where blocks allow it (for example the
FFT will allow coefficients to be generated using DSPs in some cases), or
by reducing the number of FIR taps, or implementing something using QDR
rather than bram if appropriate.

The unfortunate truth is that some, none or all of these suggestions may
help, or might make things worse. I would say that in my experience
over-judicious use of pipeline/register stages throughout a design can
often do more harm than good, and a sensibly placed PFB can result in
incredible timing improvements.

Hope that helps, please email back with any more info and/or updates -- I
suspect if you solve this problem documenting your method may prove very
useful for others encountering similar problems.

Cheers, and good luck!
Jack

On 30 August 2015 at 10:42, Michael D'Cruze 
michael.dcr...@postgrad.manchester.ac.uk wrote:

 Hi everyone,



 I’m having quite a lot of problems getting a particular design to
compile, XPS consistently reporting timing errors. The design is a wideband
spectrometer, somewhat similar to the tutorial 3 design. I’m running an
iADC at 1024 MHz, and the FPGA at 256 MHz. It is a two-polarisation design,
with a 64k-point PFB and FFT (for 32k channels) in each polarisation. I am
running the PFB and FFT blocks in “two-polarisation” mode, so there are
just one of each block. I am using the Vacc block from the xBlocks
repository.



 I should add at this point that the same design with 16k channels
compiles successfully at this speed, and that the 32k channel design
compiles at slower speeds (e.g. 200 MHz FPGA).



 The timing reports suggest negative slack in the PFB and Vacc blocks.
I’ve tried adding various combinations of latency in each, however no
permutations have resulted in substantial improvements. The overwhelming
majority of the errors are reported by the Vacc block. I have also tried
replacing the xBlocks Vacc block with the wide_bram_vacc block in 64-bit
mode, however this results in even more timing errors (and without the
option to adjust latencies).



 I’ve tried various combinations of latencies suggested previously on the
mail archive, but again nothing has really given any improvement.



 Am I simply pushing such a large design too hard? Is the only option to
slow it down or is there some strategic method I can adopt to get closer to
timing closure? Suggestions much appreciated!



 Thanks

 Michael


[casper] Mystery timing errors

2015-08-30 Thread Michael D'Cruze
Hi everyone,

I'm having quite a lot of problems getting a particular design to compile, XPS 
consistently reporting timing errors. The design is a wideband spectrometer, 
somewhat similar to the tutorial 3 design. I'm running an iADC at 1024 MHz, and 
the FPGA at 256 MHz. It is a two-polarisation design, with a 64k-point PFB and 
FFT (for 32k channels) in each polarisation. I am running the PFB and FFT 
blocks in two-polarisation mode, so there are just one of each block. I am 
using the Vacc block from the xBlocks repository.

I should add at this point that the same design with 16k channels compiles 
successfully at this speed, and that the 32k channel design compiles at slower 
speeds (e.g. 200 MHz FPGA).

The timing reports suggest negative slack in the PFB and Vacc blocks. I've 
tried adding various combinations of latency in each, however no permutations 
have resulted in substantial improvements. The overwhelming majority of the 
errors are reported by the Vacc block. I have also tried replacing the xBlocks 
Vacc block with the wide_bram_vacc block in 64-bit mode, however this results 
in even more timing errors (and without the option to adjust latencies).

I've tried various combinations of latencies suggested previously on the mail 
archive, but again nothing has really given any improvement.

Am I simply pushing such a large design too hard? Is the only option to slow it 
down or is there some strategic method I can adopt to get closer to timing 
closure? Suggestions much appreciated!

Thanks
Michael