[Cryptech Tech] RSA Key Format

Pavel Shatov Wed, 06 Mar 2019 10:22:19 -0800

Hi,

Rob, I have certain questions about the way we store RSA keys. Thebackground for those who have not been following is that I've beenworking on a faster ModExp core with full CRT support and integratedblinding. Those modifications will change the distribution ofcomputations done inside the STM32 and on the FPGA, so as Rob pointedout, it makes sense to re-evaluate where and when specific things arecarried out.

The "standard" RSA key (at least in the OpenSSL sense, as far as I know)consists of the following quantities: modulus, public exponent, secretexponent, two smaller primes, two shorter exponents and CRT helpercoefficient. Our ModExp currently needs two additional quantitiesbesides the aforementioned "standard" set of numbers.

The first one is the "Montgomery factor", it only depends on the modulusand is relatively easy to compute. Given modulus N which is L bits long,the factor is 2 ** (2*L) mod N.

The second quantity is the modulus-dependent reduction coefficient.Montgomery modular reduction works by adding a certain number ofmultiples of the modulus to the intermediate result to make some of theleast significant bits zero. Then the product is reduced by simplyshifting it to the right. The problem is to determine how many multiplesto add, and that's where the coefficient comes into play.

Software implementations typically compute only the lowest 32 bits ofthis coefficient on the fly and then work by repeatedly clearing thelower 32 bits of the intermediate product. This is not viable inhardware, because both hardware multipliers and adders have latency. Assoon as you're finished computing the multiple of modulus, you need toadd it to the intermediate result. While you're adding, the multiplierswill be idle, once you've added, you start multiplying and the addersare stalled. Hardware needs entire coefficient to keep the math pipelinebusy at all times.

We currently have two decicated pieces of Verilog that do the twoprecomputations. I'm trying to understand whether it would be viable tooffload the computation to the STM32 and get rid of those Verilogmodules to simplify the core.

I think, that it may even be possible to not store the Montgomery factorat all and just precompute it on the fly when a key is loaded.

The reduction coefficient is computed according to the extendedEuclidean algorithm and, as far as I remember, takes about the same timeas the exponentiation itself, so it still makes sense to precompute itand store along with other key components.

There's also a math trick that allows you to get ~10% speed increase atthe expense of precomputing one more word of the reduction coefficientthan there are words in the modulus. ModExp internally operates on16-bit words, because that's what the math slices in the FPGA canhandle. So to take advantage of the trick we need to store 1040-bitquantity for 1024-bit keys, 2064-bit quantity for 2048-bit keys, etc. Iwonder how inconvenient that might be?


--
With best regards,
Pavel Shatov
_______________________________________________
Tech mailing list
Tech@cryptech.is
https://lists.cryptech.is/listinfo/tech

[Cryptech Tech] RSA Key Format

Reply via email to