02.04.2018 1:44, Rob Austein пишет:
One thing we discussed at the last face to face meeting was what kind
of overall signature rate we could get with RSA if we were to load up
the FPGA with enough cores that we could run batches of signatures in
parallel. What numbers were able to generate quickly at the time
showed surprisingly little benefit from running parallel signers.
Clearly, more research was required.
The following is a report on what I saw with an FPGA configuration
using eight ModExpA7 cores. Our CRT implementation uses two cores in
parallel, so in theory this allows four signatures in parallel.
Each line in the results below represents 10,000 samples.
Explanation of fields:
* "sigs/sec" is overall signing rate: number of signatures (10,000)
divided by elapsed time from start of first signature until end of
last signature.
* "secs/sig" is inverse of "sigs/sec", as the name would suggest.
* "mean" is the arithmetic mean of the actual measured times for each
of the 10,000 signatures.
* "clients" is how many separate RPC client connections the test
program was driving in parallel. In theory, separate client streams
should execute in parallel on the HSM, resources permitting.
Given the above, one might expect four clients to be optimal with
eight cores. In practice it's weirder than that, we don't (yet)
really know why.
RSA blinding (as currently implemented, anyway) is a significant
component of the overall workload, so I ran the same set of tests
twice, once with blinding enabled, once with blinding disabled. As
mentioned at the last face-to-face meeting, I'm also looking into
options to amortize the cost of blinding factor generation over
multiple signatures (eg, the squaring method that Cryptlib uses), but
the "no blinding" case below is probably still useful for comparison,
since it seems unlikely that any blinding implementation, no matter
how clever, would be faster than no blinding at all.
Results with RSA blinding enabled:
rsa_1024 sigs/sec 8.23051184652 secs/sig 0:00:00.121499 mean 0:00:00.242737
clients 2
rsa_1024 sigs/sec 7.98897745442 secs/sig 0:00:00.125172 mean 0:00:00.375233
clients 3
rsa_1024 sigs/sec 7.81232996586 secs/sig 0:00:00.128002 mean 0:00:00.511688
clients 4
rsa_1024 sigs/sec 7.47154351327 secs/sig 0:00:00.133841 mean 0:00:00.668838
clients 5
rsa_1024 sigs/sec 7.39642568217 secs/sig 0:00:00.135200 mean 0:00:00.810773
clients 6
rsa_1024 sigs/sec 7.39639878825 secs/sig 0:00:00.135200 mean 0:00:00.945906
clients 7
rsa_1024 sigs/sec 7.39569001033 secs/sig 0:00:00.135213 mean 0:00:01.081113
clients 8
rsa_1024 sigs/sec 6.94447171114 secs/sig 0:00:00.143999 mean 0:00:00.143754
clients 1
rsa_2048 sigs/sec 3.58840968475 secs/sig 0:00:00.278674 mean 0:00:00.557077
clients 2
rsa_2048 sigs/sec 3.57318846246 secs/sig 0:00:00.279862 mean 0:00:00.839253
clients 3
rsa_2048 sigs/sec 3.52841577927 secs/sig 0:00:00.283413 mean 0:00:01.133249
clients 4
rsa_2048 sigs/sec 3.43730017148 secs/sig 0:00:00.290926 mean 0:00:01.454111
clients 5
rsa_2048 sigs/sec 3.43150490015 secs/sig 0:00:00.291417 mean 0:00:02.039086
clients 7
rsa_2048 sigs/sec 3.43119570233 secs/sig 0:00:00.291443 mean 0:00:01.748002
clients 6
rsa_2048 sigs/sec 3.43097294848 secs/sig 0:00:00.291462 mean 0:00:02.330666
clients 8
rsa_2048 sigs/sec 2.83980804247 secs/sig 0:00:00.352136 mean 0:00:00.351889
clients 1
rsa_4096 sigs/sec 1.26531176783 secs/sig 0:00:00.790319 mean 0:00:02.370533
clients 3
rsa_4096 sigs/sec 1.25766369367 secs/sig 0:00:00.795125 mean 0:00:03.179863
clients 4
rsa_4096 sigs/sec 1.24165903669 secs/sig 0:00:00.805374 mean 0:00:04.025947
clients 5
rsa_4096 sigs/sec 1.23907931329 secs/sig 0:00:00.807050 mean 0:00:04.841009
clients 6
rsa_4096 sigs/sec 1.23877049069 secs/sig 0:00:00.807252 mean 0:00:05.649015
clients 7
rsa_4096 sigs/sec 1.23818618332 secs/sig 0:00:00.807632 mean 0:00:06.458782
clients 8
rsa_4096 sigs/sec 1.20014282040 secs/sig 0:00:00.833234 mean 0:00:01.666174
clients 2
rsa_4096 sigs/sec 0.75256731769 secs/sig 0:00:01.328784 mean 0:00:01.328541
clients 1
Results with RSA blinding disabled:
rsa_1024 sigs/sec 20.8328738383 secs/sig 0:00:00.048001 mean 0:00:00.095756
clients 2
rsa_1024 sigs/sec 19.9879970479 secs/sig 0:00:00.050030 mean 0:00:00.149835
clients 3
rsa_1024 sigs/sec 19.2903102572 secs/sig 0:00:00.051839 mean 0:00:00.207086
clients 4
rsa_1024 sigs/sec 18.0929850137 secs/sig 0:00:00.055270 mean 0:00:00.276055
clients 5
rsa_1024 sigs/sec 17.6741526966 secs/sig 0:00:00.056579 mean 0:00:00.339158
clients 6
rsa_1024 sigs/sec 17.6726372423 secs/sig 0:00:00.056584 mean 0:00:00.395741
clients 7
rsa_1024 sigs/sec 17.6706028853 secs/sig 0:00:00.056591 mean 0:00:00.452338
clients 8
rsa_1024 sigs/sec 15.6249343020 secs/sig 0:00:00.064000 mean 0:00:00.063758
clients 1
rsa_2048 sigs/sec 11.8560079254 secs/sig 0:00:00.084345 mean 0:00:00.252769
clients 3
rsa_2048 sigs/sec 11.5105757962 secs/sig 0:00:00.086876 mean 0:00:00.347218
clients 4
rsa_2048 sigs/sec 10.8679436493 secs/sig 0:00:00.092013 mean 0:00:00.183782
clients 2
rsa_2048 sigs/sec 10.8513037353 secs/sig 0:00:00.092154 mean 0:00:00.460453
clients 5
rsa_2048 sigs/sec 10.7127620993 secs/sig 0:00:00.093346 mean 0:00:00.559706
clients 6
rsa_2048 sigs/sec 10.7120583942 secs/sig 0:00:00.093352 mean 0:00:00.653052
clients 7
rsa_2048 sigs/sec 10.7114778457 secs/sig 0:00:00.093357 mean 0:00:00.746364
clients 8
rsa_2048 sigs/sec 6.25004214872 secs/sig 0:00:00.159998 mean 0:00:00.159755
clients 1
rsa_4096 sigs/sec 5.14717930991 secs/sig 0:00:00.194281 mean 0:00:00.776755
clients 4
rsa_4096 sigs/sec 5.13575633082 secs/sig 0:00:00.194713 mean 0:00:00.973146
clients 5
rsa_4096 sigs/sec 5.10692320688 secs/sig 0:00:00.195812 mean 0:00:01.174336
clients 6
rsa_4096 sigs/sec 5.10315211883 secs/sig 0:00:00.195957 mean 0:00:01.371050
clients 7
rsa_4096 sigs/sec 5.10234529358 secs/sig 0:00:00.195988 mean 0:00:01.567119
clients 8
rsa_4096 sigs/sec 4.07444053710 secs/sig 0:00:00.245432 mean 0:00:00.736003
clients 3
rsa_4096 sigs/sec 2.78873489846 secs/sig 0:00:00.358585 mean 0:00:00.716912
clients 2
rsa_4096 sigs/sec 1.48805229671 secs/sig 0:00:00.672019 mean 0:00:00.671777
clients 1
Well, that's very weird. I guess it's stupid to ask, but are you sure,
you're using the cores correctly? I mean, when you need to sign, you
look for the first idle core, load it with input data and toggle the
corresponding control bit? Then you save the core's number (to keep
track of which core is signing what) and go do something else?
Another question, with say three clients does each one of them do 3333
signatures on average? I mean, aren't you doing 30000 tests for 3
clients by chance?
--
With best regards,
Pavel Shatov
_______________________________________________
Tech mailing list
Tech@cryptech.is
https://lists.cryptech.is/listinfo/tech