OK, I've taken a look at this, and scratched my head a little. It's a touch complicated by the fact that thread-ids have changed in the head of development relative to what you're looking at in 0.9.8. But I'm now wondering if you haven't misunderstood the nature of openssl's threading support;
On Tuesday 31 March 2009 04:52:19 Marc Haisenko wrote: > Hi folks, > last week I described a problem with RSA blinding and its locking[1]. > Coincidentally another user ran into the same problem in a totally > different scenario, the work-around of disabling the blinding fixed > the issue for him[2]. > > So here's a patch to fix this issue. > > The issue is that the rsa_get_blinding method checks whether there is > concurrent access to the blinding and tells its caller whether locking > needs to be done. The check is done by comparing thread/process IDs. > If they differ locking needs to be done. The problem is that locking > is not done if the IDs are same. It is assumed that the creator > doesn't need to do locking which is correct for a single-threaded > application but is wrong in a multi- threaded/processed environment > once another process has accessed the blinding. We need to separate multi-threading from multi-processing here. Once you've forked, there should be no locking relationship whatsoever between processes, irrespective of any threads (and thread-ids) within those processes. Do you have some reason to believe this isn't the case? The fork has essentially duplicated the process and created isolation between the two (copy-on-write or otherwise). So our only concern *should* be between threads within the same process. So ... multi-threading. My understanding of the RSA blinding (which I didn't write) is that rsa->blinding is lazy-initialised by whichever thread gets to it first. The call to RSA_setup_blinding() creates the blinding struct and tags it with the current thread-id. Subsequent accesses to RSA blinding will use rsa->blinding if and only if the caller is the thread that lazy-initialised it (ipso facto, if processing of a given RSA struct is always in one thread - this is optimal). Accesses to RSA blinding from other threads should use rsa->mt_blinding (which is lazy-initialised by the first such thread) and this places obligations on the caller - ie. do your own locking, don't use a common blinding factor, etc. This should work. I'm wondering if I understood what you meant by "The problem is that locking is not done if the IDs are same." ... If the IDs are the same, that means you're in the same thread, period!! If that's not the case, then the problem is that your thread-id callback isn't set up correctly. For threading support to work correctly, you need to provide hooks in order to make openssl compatible with your thread model (pthreads or otherwise). For 0.9.8, the APIs include; void CRYPTO_set_locking_callback(void (*func)(int mode,int type, const char *file,int line)); void CRYPTO_set_id_callback(unsigned long (*func)(void)); Locking callbacks default to nops, and id callbacks default to different things depending on platform, but one possibility is "&errno" (ie. the address of the errno var, in the hopes that it's thread-local) and another is getpid() (ie. in the hopes that the threads have distinct pids too). It may be that the default behaviour is degenerating on your platform and not correctly differentiating between threads. > > As a fix I've introduced a new flag for struct rsa_st, > RSA_FLAG_BLINDING_NEEDS_LOCKING. Once it has been noticed that locking > has to be done the flag is set and from then on we always do locking, > whether we are the creator or not. > > However, I'm scratching my head why this fixes the issue given the > fact that there are actually two blindings: one when doing no locking > (rsa->blinding) and another one when doing locking (rsa->mt_blinding). > Yet our load test indicates that this really does fix the issue: > without it we get about a dozen Bad Record MAC per day due to the > blinding. Since I've implemented the attached patch none has occured > (which is about 1.5 days of load test now). As before, the thread detection ('id_callback') might be failing you such that all threads appear to be "the same thread"? Try calling CRYPTO_thread_id() from different threads in the same process and see whether they return the same value. This might explain why you "fixed" the problem by locking all callers even when they all have the same thread ID... Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org