<forwarding my answer, that Paul apparently missed>

-------- Перенаправленное сообщение --------
Тема: Re: [Cryptech Tech] Alpha Platform Upgrade
Дата: Thu, 20 Feb 2020 10:38:59 +0300
От: Pavel Shatov <meisterpa...@yandex.ru>
Кому: Paul Selkirk <p...@psgd.org>

On 20.02.2020 1:14, Paul Selkirk wrote:
I'm trying to build your new stuff, but it fails to meet timing on
TS_clkmgr_mmcm_inst_mmcm_clkout2.


That would be the new high-speed core clock. Haven't had time to look at the reports yet, but here's what immediately comes to mind:

1) what set of cores are you trying to build? I added a non-default "hsm_ng" project to core/platform/alpha, which is the current "hsm" project plus 1x ModExpNG. This configuration should build just fine, I tried it using both the GUI push button flow and the console Makefile flow in my Ubuntu VM. I believe I even got identical bitstreams, since the "checksums" printed throughout the processes were identical.

2) I tweaked the synthesis settings in core/platform/alpha, namely I disabled resource sharing. Have you pulled the changes? I believe the new stuff won't build with resource sharing enabled.

Speaking of resource sharing, I now see, that I haven't written an explanation of the change in the commit message. The problem is that during synthesis XST detects certain arithmetic operations and then tries to find those, that are never carried out simultaneously. It then tries to combine them to save logic resources. Sometimes it does the trick, sometimes only makes things worse. Montgomery modular multiplication has three phases: computation of full-size product, computation of reduction coefficient, computation of multiple of the modulus. Each intermediate product is written into a different internal storage space, which has its own address register. Since the phases are done in series, phase address registers are never incremented at the same time. With resource sharing enabled XST becomes too smart and throws away two of the three adders and instead adds an input 3:1 mux and an output 1:3 selector with clock enable. This introduces unnecessarily complex logic and makes timing impossible to meet. Disabling resource sharing increases slice utilization by ~1%, but makes it possible to have much higher clock frequency.


I've attached a couple report files that seem relevant.

Also, I'm updating the FMC driver, and I see where I need to change
fmc_timing.CLKDivision, but do I need to do anything with
fmc_timing.DataLatency to account for the 2-cycle read?

And is the following still true?

     // not needed, since nwait will be polled manually
     fmc_timing.BusTurnAroundDuration = 0;

                                paul



Hm, I believe, I already did those changes:
https://trac.cryptech.is/changeset/39f2d7ec4f35191884978db447bd97638b283d1d/sw/stm32

CLKDivision is now 4 and DataLatency becomes 6.

Speaking of NWAIT, you're right, we no longer poll, so I removed the misleading comment, should be in the same commit.


--
With best regards,
Pavel Shatov
_______________________________________________
Tech mailing list
Tech@cryptech.is
https://lists.cryptech.is/listinfo/tech

Reply via email to