Summary: The keywrap core does what it says on the package, and does it a lot faster than software keywrap.
The following table shows wrap and unwrap times (in ms) for various object sizes. The speedup increases slightly for larger sizes, but in general let's call it 5x improvement. | WRAP | UNWRAP size | software | core | speedup | software | core | speedup ------+----------+---------+---------+----------+---------+--------- 128 | 1.086 | 0.220 | 4.94x | 1.102 | 0.221 | 4.99x 256 | 2.151 | 0.411 | 5.22x | 2.186 | 0.414 | 5.28x 512 | 4.283 | 0.795 | 5.39x | 4.352 | 0.799 | 5.45x 1024 | 8.547 | 1.564 | 5.46x | 8.686 | 1.572 | 5.53x 2048 | 17.075 | 3.103 | 5.50x | 17.354 | 3.117 | 5.57x 4096 | 34.130 | 6.182 | 5.52x | 34.689 | 6.208 | 5.59x (Note: I haven't yet tested the practical effects on keygen, signing, and verification. This is purely about key wrap/unwrap.) To reproduce this: Checkout branch 'master' on repo: user/js/keywrap Checkout branch 'js_keywrap' on repos: core/platform/alpha core/platform/common sw/libhal sw/stm32 In core/platform/alpha/build, run `make keywrap` `cryptech_upload --fpga -i alpha_fmc_keywrap.bit` In sw/stm32, run `make cli-test` `bin/flash-target projects/cli-test/cli-test` In cryptech_console, log in with username=ct, password=ct, and run `keywrap test` (for test vectors) `keywrap test 256 1000` (for timing tests) Enhancement requests: > The core does not implement the magic value calculation as specified > in RFC 5649. Nor does it do any padding of object to a complete 8 > byte block. The caller needs to calculate the 64-bit magic value and > store it in the A registers. It seems like this shouldn't be too hard to do in the core. It already has the length in the RLEN register, which directly gives it both the MLI field of the AIV ("magic value"), and the padding length. > Due to address space limitations in the cryptech cores, the core > implements bank switching (with four banks). The caller must write > to the BANK register to ensure that subsequent writes to the DATA > addresses ends up in the correct bank. And similarly, the caller > needs to set the bank as needed when reading the wrapped/unwrapped > object from the core. Basically the first 128 words of the object is > stored in bank 0, the next 128 in bank 1 etc. I would like to get rid of the bank switching, and unroll the block memory into a set of contiguous core register blocks (where each block is 256 4-byte register), as Pavel did for modexps6 and modexpa7. Of course, this would require agreement about the block memory size, between keywrap_mem.v, core/platform/common/config/core.cfg, and sw/libhal/core.c. At present, the HSM keystore uses fixed-size 8096-byte slots, so that's the practical limit. (In the long term, each core should encode its own length, right after its name and version, and we could remove the core-specific knowledge from core.c. But that's a project for another day.) Finally, I would like this core to directly fetch the KEK from the MKM, rather than continuing to require the caller to fetch the KEK via the mkmif core, and feed it right back to the keywrap core. Keeping the KEK out of RAM can only improve the security of the device. paul _______________________________________________ Tech mailing list Tech@cryptech.is https://lists.cryptech.is/listinfo/tech