[openssl.org #555] RSA blinding MT patch
I've committed something similar to 0.9.8, see [1]. Please try a recent snapshot. [1] http://marc.theaimsgroup.com/?l=openssl-cvsm=111455472305028w=2 Cheers, Nils __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Bodo Moeller via RT wrote: Tom Wu via RT [EMAIL PROTECTED]: I just tried benchmarking the snapshot code against my earlier patch on an 8-way P3-700 server (Win2K AS). My patch gets ~100 RSA sign/s (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 threads. The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread and peaks at ~650 RSA sign/s with 8 threads. Thanks for the timings. One thing to take into account when interpreting these is that some additional random blinding should be added to your patch; maybe once in ten or hundred RSA operations, so the timing difference would not really change a lot. A more important aspect is that you are comparing just the case that multiple threads do share an RSA structure. A different scenario is that you have multiple threads with *individual* RSA structures -- then the snapshot version will be very fast while the version with locking will be unnecessarily slowed down because the locks are global. This is why we are trying to avoid excessive locking. We should try our best to quantify the cost of locking to weigh it against the cost of local blinding. If we are concerned about contention leading to a loss of parallelism on multi-processor systems, I would suggest that my patch places only a small amount of code (the blinding update squarings) inside a critical section, which results in very little contention, since the window of time for an RSA private op is still dominated by the CRT modexp. If we go by the benchmark numbers from our 8-way box, assuming the snapshot version gets perfect parallelism from multiple threads with individual RSA structs, that still makes its maximum theoretical performance 8 * 100 = 800 RSA sign/s, while the version with locking got about 790 sign/s. So far, it looks like locking is costing at most about 1% (10/800) in performance for both single and multithreaded cases, as opposed to 0% (single) and 18% (150/800) (multi) for local blinding. Perhaps there are other benchmarks we could run to get a more comprehensive picture? Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
RE: [openssl.org #555] RSA blinding MT patch
Approximately ten days ago, I posted about having problems with the RSA Blinding patch that resulted in seeing an intermittent problem of dropping GIFs from my SSL server implementation. I continued to see these problems until yesterday when I built with the 402 Snapshot for 0.9.6 (openssl-0.9.6-stable-SNAP-20030402.tar.gz). In short, I want to post for the record that the fixes in the 402 Snapshot for 0.9.6 are quite acceptable for my implementation. I realize there are some performance concerns being brought up, but I am quite HAPPY to see my stuff work for the first time since the original RSA blinding patch went in. In closing, does the OpenSSL Release Group have any idea as to when OpenSSL 0.9.6j might be officially released? --- Pete Bobco --- -Original Message- From: Tom Wu via RT [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 7:43 PM Cc: [EMAIL PROTECTED] Subject: Re: [openssl.org #555] RSA blinding MT patch Bodo Moeller via RT wrote: [EMAIL PROTECTED] - Mon Mar 31 17:14:12 2003]: The latest snapshots have not been fixed, some more patience is required ... The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. The good news is that the fix in the snapshot fixes the problem, but the bad news is that it seems to kill performance in my benchmarks. On a P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, regardless of the number of simultaneous threads. With the snapshot fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more threads it drops down to 81 sign/s. It's quite possible that I've misconfigured something on my own end, but I suspect that it is more likely that the local blinding operation is slowing things down. In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
In message [EMAIL PROTECTED] on Thu, 3 Apr 2003 11:26:58 -0600, Bobco, Pete [EMAIL PROTECTED] said: Pete.Bobco In closing, does the OpenSSL Release Group have any idea Pete.Bobco as to when OpenSSL 0.9.6j might be officially released? I don't know, but I'm guessing someone is working on the performance problem. Therefore, I can only say soon, and I do realise I've said that before. Be sure that we're working toward as much perfection as we can, and will do a release as soon as it's pronounced done. -- Richard Levitte \ Spannvägen 38, II \ [EMAIL PROTECTED] [EMAIL PROTECTED] \ S-168 35 BROMMA \ T: +46-8-26 52 47 \ SWEDEN \ or +46-708-26 53 44 Procurator Odiosus Ex Infernis-- [EMAIL PROTECTED] Member of the OpenSSL development team: http://www.openssl.org/ Unsolicited commercial email is subject to an archival fee of $400. See http://www.stacken.kth.se/~levitte/mail/ for more info. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Tom Wu via RT [EMAIL PROTECTED]: Bodo Moeller via RT wrote: The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. The good news is that the fix in the snapshot fixes the problem, but the bad news is that it seems to kill performance in my benchmarks. On a P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, regardless of the number of simultaneous threads. With the snapshot fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more threads it drops down to 81 sign/s. It's quite possible that I've misconfigured something on my own end, but I suspect that it is more likely that the local blinding operation is slowing things down. Yes, surely this is what is happening, local blinding is somewhat expensive. In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. These two squarings should be changed, though -- OpenSSL should use a random new blinding factor after a couple of RSA secret key operations instead of predictably updating the factor (I didn't want to change all of this at once). So the update code will be slower in the future; not every time, but sometimes. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). -- Bodo Möller [EMAIL PROTECTED] PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html * TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt * Tel. +49-6151-16-6628, Fax +49-6151-16-6036 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
(Bodo Moeller) via RT wrote: Tom Wu via RT [EMAIL PROTECTED]: In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. These two squarings should be changed, though -- OpenSSL should use a random new blinding factor after a couple of RSA secret key operations instead of predictably updating the factor (I didn't want to change all of this at once). So the update code will be slower in the future; not every time, but sometimes. Is there any established wisdom on the security implications of refreshing the blinding factor? Assuming that the initial blinding value had sufficient entropy and was unknown to an attacker, is there still some way to carry out a timing attack given the knowledge that the host is just squaring A after each RSA operation? I also wonder if slowing down blinding would weaken the rationale for turning it on all the time when doing RSA? I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. I just tried benchmarking the snapshot code against my earlier patch on an 8-way P3-700 server (Win2K AS). My patch gets ~100 RSA sign/s (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 threads. The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread and peaks at ~650 RSA sign/s with 8 threads. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). As for what to do for the short term in 0.9.6j or 0.9.7b, is there any chance of using my original patch with the _r methods? It doesn't change either the RSA or BN_BLINDING structures at all, and seems to give the least slowdown on both single and multi-processor systems. Perhaps the always square A behavior should be the default if it is good enough to block the obvious timing attack, with future options to enable random refresh strategies if the caller wants to do a performance/security tradeoff. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Tom Wu via RT [EMAIL PROTECTED]: (Bodo Moeller) via RT wrote: Tom Wu via RT [EMAIL PROTECTED]: Is there any established wisdom on the security implications of refreshing the blinding factor? Assuming that the initial blinding value had sufficient entropy and was unknown to an attacker, is there still some way to carry out a timing attack given the knowledge that the host is just squaring A after each RSA operation? I am not aware of descriptions of such attacks. Maybe it would even suffice to randomly square or cube the factor to avoid potential attacks, but better safe than sorry ... I also wonder if slowing down blinding would weaken the rationale for turning it on all the time when doing RSA? Well, in any case you can still turn it off if it is too slow. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. I just tried benchmarking the snapshot code against my earlier patch on an 8-way P3-700 server (Win2K AS). My patch gets ~100 RSA sign/s (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 threads. The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread and peaks at ~650 RSA sign/s with 8 threads. Thanks for the timings. One thing to take into account when interpreting these is that some additional random blinding should be added to your patch; maybe once in ten or hundred RSA operations, so the timing difference would not really change a lot. A more important aspect is that you are comparing just the case that multiple threads do share an RSA structure. A different scenario is that you have multiple threads with *individual* RSA structures -- then the snapshot version will be very fast while the version with locking will be unnecessarily slowed down because the locks are global. This is why we are trying to avoid excessive locking. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). As for what to do for the short term in 0.9.6j or 0.9.7b, is there any chance of using my original patch with the _r methods? It doesn't change either the RSA or BN_BLINDING structures at all, and seems to give the least slowdown on both single and multi-processor systems. Perhaps the always square A behavior should be the default if it is good enough to block the obvious timing attack, with future options to enable random refresh strategies if the caller wants to do a performance/security tradeoff. One idea to consider is to include a second BN_BLINDING slot with each RSA object: the first BN_BLINDING object would be single-threaded (as in the current snapshots), the second one would be shared between all the other threads (and require locking). The major problem is how to do this without harming compatibility too much. -- Bodo Möller [EMAIL PROTECTED] PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html * TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt * Tel. +49-6151-16-6628, Fax +49-6151-16-6036 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
[openssl.org #555] RSA blinding MT patch
[EMAIL PROTECTED] - Mon Mar 31 17:14:12 2003]: The latest snapshots have not been fixed, some more patience is required ... The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Bodo Moeller via RT wrote: [EMAIL PROTECTED] - Mon Mar 31 17:14:12 2003]: The latest snapshots have not been fixed, some more patience is required ... The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. The good news is that the fix in the snapshot fixes the problem, but the bad news is that it seems to kill performance in my benchmarks. On a P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, regardless of the number of simultaneous threads. With the snapshot fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more threads it drops down to 81 sign/s. It's quite possible that I've misconfigured something on my own end, but I suspect that it is more likely that the local blinding operation is slowing things down. In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
[openssl.org #555] RSA blinding MT patch
Could you please download the latest 0.9.6 snapshot and check that it works for you? As far as I understand, the MT issue has been adressed, but solved in a different manner. [EMAIL PROTECTED] - Fri Mar 28 08:22:16 2003]: This patch fixes the multithreading issues I was having when an RSA struct was being used by multiple threads simultaneously with blinding enabled. It adds _r versions of the convert/invert functions to save the unblinding value, and does the update in the convert step. rsa_eay.c uses the RSA_BLINDING lock to make the convert-and-update step atomic. The patch is for 0.9.6i. Tom -- Richard Levitte [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
On Mon, Mar 31, 2003 at 03:01:10PM +0200, Richard Levitte via RT wrote: Could you please download the latest 0.9.6 snapshot and check that it works for you? As far as I understand, the MT issue has been adressed, but solved in a different manner. The latest snapshots have not been fixed, some more patience is required ... -- Bodo Möller [EMAIL PROTECTED] PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html * TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt * Tel. +49-6151-16-6628, Fax +49-6151-16-6036 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Hi! This patch fixes the multithreading issues I was having when an RSA struct was being used by multiple threads simultaneously with blinding enabled. It adds _r versions of the convert/invert functions to save the unblinding value, and does the update in the convert step. rsa_eay.c uses the RSA_BLINDING lock to make the convert-and-update step atomic. The patch is for 0.9.6i. I can confirm that this patch really solves our problems. We wrote a test that creates 10 threads to sign and verify. Without this patch most of the signatures were invalid. With this patch all of them were correct. No memory leaks. Arne __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
[openssl.org #555] RSA blinding MT patch
This patch fixes the multithreading issues I was having when an RSA struct was being used by multiple threads simultaneously with blinding enabled. It adds _r versions of the convert/invert functions to save the unblinding value, and does the update in the convert step. rsa_eay.c uses the RSA_BLINDING lock to make the convert-and-update step atomic. The patch is for 0.9.6i. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]