Hi,

I want to report my progress with the fmc_clk branch that will hopefully allow us to further increase the signing performance.

The primary obstacle was that the bitstream failed timing at 90 MHz, and that was mostly in ECDSA cores. I rewrote those using the approach I learned while working on Ed25519 and X25519 cores and did another change which was getting rid of a dedicated modular inversion sub-module. I replaced that with a micro-coded inversion based on Fermat's little theorem, that turned out to be ~10% faster actually.

As a side note, that constant-time modular inversion module can be turned into a separate modular inversion math core (with certain amount of sanding, of course). I'm not sure, that there's currently an urgent need in such a core. From Paul's profiling report it looks like we might want a "full" RSA core that will also do the final part of CRT ("Garner's formula") in hardware, not just the exponentiation part.

The second change is that I overhauled the Verilog repository a bit. The problem was that both ECDSA multipliers and ModExpA7 use vendor math slices and they both had their essentially identical copies of wrappers and generic replacements for simulation. I moved all that reused stuff to core/lib and updated the cores to use primitives from there. I updated the Makefile and the core configuration file accordingly. I believe I was also able to fix the issue that it was not possible to build a bitstream with only ECDSA cores or only ModExp cores, but I would really appreciate it if someone pulls the changes and tries to build the bitstream to make sure that I haven't broken anything by chance given the amount of updates I've just pushed.

Now the bad news. For some reason the new bitstream (`fpga show cores' should report ECDSA version 0.20 instead of 0.11 now) fails unit-tests.py. It locks up in hal_get_random() waiting for the valid bit of CSPRNG core to go high. If I disable this test, it locks up a bit further in 'test_attribute_bloat_token_big' which calls hal_uuid_gen() which in its turn again calls hal_get_random(). If I drop FMC clock to 60 MHz (line 156 of stm-fmc.c sets the divisor: 2 is for 90 MHz, 3 is for 60 MHz), then all the tests pass just fine. This is strange because we've already seen this situation, but then we clearly had failed timing, while now everything should be fine. I haven't done any thorough investigation yet, my very preliminary guess is that maybe we need to change the number of taps in the ring oscillator entropy source or something like that. I do have a platform cable, so given hints on where to look I can try to debug, but I suggest that someone first tries to reproduce the situation, because maybe I'm doing something wrong (not pulled latest changes from Joachim, etc...)


--
With best regards,
Pavel Shatov
_______________________________________________
Tech mailing list
[email protected]
https://lists.cryptech.is/listinfo/tech

Reply via email to