Darryl Mile wrote:

> A compiler will not generate a store instruction to put back a 
> "cached_copy" into the variable location.  Principally because there was 
> no assignment operation in the original code and because even a 
> non-optimizing compiler knows it can just dump the "cached_copy" 
> temporary register on the floor.

That may be true, but it doesn't help. The compiler generating a store 
instruction is only one way a cached copy can be written back to memory. 
Anything a compiler can do could also be done by the CPU or memory hardware. 
Only operations with specific guaranteed semantics solve that problem.

> The conversation stemmed from the code:
> 
> 1289     if (ssl_comp_methods == NULL)
> 1290             return;
> 1291     CRYPTO_w_lock(CRYPTO_LOCK_SSL);
> 1292     if (ssl_comp_methods != NULL)
> 1993     {
> ...SNIP...
> 1296     }
> 1297     CRYPTO_w_unlock(CRYPTO_LOCK_SSL);
> 
> 
> If ssl_comp_methods is NULL, then no store instruction to the memory 
> location of ssl_comp_methods will ever happen.  So nothing is subject to 
> the so called "lost write" concurrency problem.

You are correct that no store *instruction* will happen, but that doesn't mean 
the CPU won't generate a store anyway. While it's true that current caches keep 
track of whether data is modified and don't write back data unless it is, that 
doesn't mean future CPUs might not find it more efficient to avoid having to 
keep the flag and write back in all cases. What if the flag gets to be 
expensive but write operations are cheap?

There is precedent. On the x86, for example, at least some compare-exchange 
operations will write back the original data if the compare fails.

In other words, if you ask for:

if(a==7) a=3;

You might get (in *microcode* even though the instructions say what you want):

a = (a==7) ? 3 : a;

This can be an optimization because the the code as specified requires a 
micro-branch internal to the compare-exchange instruction, which can be 
expensive if mispredicted. The unconditional store avoids the branch and can be 
implemented mathematically.

If the compiler was designed before anyone knew a CPU might do this, it could 
easily generate a compare-exchange instruction for a volatile variable that had 
an access of this form. The compiler author "knows" the CPU follows the 
requested code precisely. The next CPU, however, makes the store unconditional 
as an optimization because a branch is cheaper than a store. So sorry, your new 
CPU breaks your old executables.

The code we are talking about is of this precise form. It compares the value of 
a pointer and changes the value of that pointer only if the pointer was not 
NULL to begin with. Now the code we are talking about contains locks inside the 
conditional, so it will not fail this exact way. (I believe the x86 will only 
do this for bus-locked instructions anyway, but there is no reason why the next 
x86 couldn't do the same thing for non-bus-locked transactions too.)

You argument is of the form "I can't think of any way it could fail". This says 
more about your imagination than the validity of the code. ;)

I agree none of the way I can think of that it can fail seem particularly 
likely. The problem is the ways it can fail that neither of us can think of -- 
until it fails. Experience has shown me (and I cited a few examples in this 
thread) that will be ways it can fail you can't think of.

That is why standards provide guarantees.

> So marking it 'volatile' will not gain the above code anything.  But it 
> will inhibit valid optimizations the compiler might make to all code 
> that uses the variable 'ssl_comp_methods' that are inside the locked 
> regions.

I agree with that.

DS


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to