Update after some profiling: it looks like the current bottleneck is loading keys from the keystore, not ModExp. More specifically:
* We're spending twice as much time in hal_ks_fetch() as we are in hal_rsa_decrypt(); * Most of that the time spent in hal_ks_fetch() is waiting for hal_aes_keyunwrap(), which in turn breaks down into time spent decrypting keys and time spent waiting for the keystore lock because some other client is currently hogging it. Adding more AES cores did not help, so the bottleneck here is probably the C code, not the AES core. This looks like an architectural issue, and in retrospect it's sort of obvious: the current C code makes no attempt to track which signer cores were just loaded with which key components (in part because, until relatively recently, we were using bitstreams with one signer core: since RSA CRT involves two ModExp operations, there would have been no point in tracking this). So we reload the key components for every signature, and profiling the results say that's expensive. Which of course also raises the question of whether we *should* be preserving key components in the signer cores. Doctrine for the C code has been to wipe any copy of private key components immediately after use; we're not currently doing that for the signer cores (oops) but adding code to do that would be straightforward. Adding code to be more clever about keeping key components in signer cores seems like a fun source of additional complexity; we do have a notion of an "open" key object, so presumably we could somehow hook into that, perhaps with some kind of LRU mechanism for reclaiming cores when there are too many open keys for the number of cores available. At any rate: some work to do, but the above at least sort of makes sense, which is an improvement over not understanding the results. _______________________________________________ Tech mailing list Tech@cryptech.is https://lists.cryptech.is/listinfo/tech