Kyle Hamilton wrote:
Oh.  I'm sorry.  Someone needs to use a keyword 'volatile'.

Bingo.  Problem solved on the improper optimization issue.

Actually 'volatile' doesn't provide a useful fix.

Ignoring the fact there is no problem with any CPU that currently exists. The volatile is most useful for memory mapped I/O where touching a memory location (for load or store) can have side effects on the I/O device. So 'volatile' can be used to enforce a strict access policy based on the code as written (as opposed to how the compiler might re-order load/stores if it was regular memory).

My points here are that even under the best optimizing compiler following 100% safe optimizations will never have a problem with the original posters code. There is no need to use volatile, it provide no benefit and will impact correct optimizations within code making use of the variable.

David seems to be thinking ahead into the realms of CPUs that have not been invented yet. I'm don't give two hoots about those CPUs and I certainly wouldn't waste brain cells trying to make my code portable to a piece of hardware that does not exist.


On 3/27/07, David Schwartz <[EMAIL PROTECTED]> wrote:
> > For POSIX threads, the result of reading a variable in one
> > thread while it
> > might be modified in another thread is undefined. Line 1289 and
> > 1290 should
> > be removed.

> Not this old chestnut again.

Like it or not, it's a fact.

I would disagree. Its merely your opinion; and in my opinion your understanding of C, CPU architecture, CPU cache cohearancy and memory access isn't as good as you think it is.

You did point out a scenario that was not at all relevant to the situation in the code of the original post.


Of course, that's because those CPUs don't exist yet. But I expect code that
I write and compile today to work on future CPUs because I don't rely on
guarantees I don't actually ahve.

LOL. Hey that trans-wrap-cpu which converts charged ions into bits uses an equally spartan language better able describe all those new features it brings to the party.


Umm, what CPU? *Every* CPU OpenSSL does or might ever run on? And that would
have to include every compiler OpenSSL does or might support, because the
compiler can change things around too.

I already pointed out the hows and whys the compiler CAN NOT change anything around in the situation embodied in the original post.


> The function entrance, the "return" and the "CRYPTO_w_lock()" calls all
> act as the correct memory barriers with regards to compiler
> optimizations and load/store rescheduling concerns.  Even if the
> CRYPTO_w_lock() is a macro or inline, since the writer would have been
> sure to insert the correct barriers if he chose to implement
> CRYPTO_w_lock() as a macro/inline.

Where is that documented or guaranteed? You mean that they *happen* to be
barriers on the platforms you have used. You have no idea what the next
version of GCC might do. We are talking about saving *NOTHING* here.

Its documented in the multi-CPU erratum for each CPU that can be wired up to access the same memory.

There are different levels to the entire problem area, there is the electrical and timing of the memory bus access to the same memory location, there is the area of CPU cache and shared CPU cache cohearancy. There are all documented for each architecture.

Then there is the area of the C language optimizing compiler. Which is less documented as its the language constructs which are documented rather than the optimization.

The barriers being discussed in this point are those for the compiler that affect code generation and how load/store operations are managed.

But in principle the C language is a sequential language, that means the observable order of events that occurs is STRICTLY in the order laid out in the source code. Any optimization that defies that basic principal is a bug.


The problem occurs after the beginning of the function. If the compare is
done on a cached copy in a register. Look at this example:

if (variable!=NULL)
{
 free(variable);
 variable=NULL;
}

This could easily be optimized to (cached_copy is a register or other fast
place to store data):

int cached_copy=variable;
if(cached_copy!=NULL)
{
 acquire_lock();
 cached_copy=variable;
 free(cached_copy);
 cached_copy=NULL;
 variable=cached_copy;
 release_lock();
}
else variable=cached_copy;

If you think this cannot happen, I defy you to explain to me why. What
standard or guarantee is being violated? How can POSIX-compliant or
C-compliant code break with this optimization? How can you say it can never
be faster?

I presume in this example the variable 'cached_copy' is not a real variable (as in it does not exist in the original C source) but a compiler generated variable during the optimization process.

It can not happen because when the compiler makes a function call to acquire_lock() (in the o/p this is "CRYPTO_w_lock();" by definition of C language it can not hold onto "cached_copy" over that function call, because the compiler can't work out at compile time if the function acquire_lock() has a side effect which could modify 'variable'. This is standard stuff for any compiler to be consistent with observable order of execution.

Then the argument would naturally lead into, well what if acquire_lock()/CRYPTO_w_lock() is not a real function but a macro or inline. To that argument I already mooted that any thread mutex function designer using macro or inlines would already we well aware of the gotchas in that situation. He would use an appropriate compiler memory barrier to ensure correct behavior (I can probably cite examples of real life code which does exactly this).



> The example you cite is not valid for line 1289 and 1290.  The only
> write operation to 'ssl_comp_methods' occurs inside the locked region
> and after the text condition has been re-evaluated.

So what? The compiler may not be smart enough to realize that deserves to
undo the optimization and may instead refresh before the lock acquisition.

This clearly illustrates your incorrect understanding of these matters. Sure there is some correct theory being demonstrated in your point but its not being applied correctly, and is not relevant to the code in the original posting.


> So your pseudo-code serves a useful purpose to illustrate to others on
> the list about the concern you raise, but the concern itself is not
> relevant to the example given.

Nonsense. You are assuming that future compilers won't do things that are
perfectly legal for them to do just because it doesn't make sense today.
This type of code is not permitted (except as a platform-specific
optimization) at any of the places I have worked as a professional
programmer and violates most professional coding standards.

FFS. A compiler will not execute a store instruction to memory as the example pseudo-code you posted, a store insn will NEVER EVER EVER happen.

In the code:

1289     if (ssl_comp_methods == NULL)
1290             return;

'ssl_comp_methods' is not being assigned to in any way; the only operation that takes place is a load operation before a compare then a branch.

The variable 'ssl_comp_methods' is outside the scope of the function.

The above two lines are the first two lines in a function block.

All this adds up to a 100% guarantee that compiler will always execute that load instruction (and will never optimize that away).

There is a 100% guarantee it will never execute a store instruction on the memory address for 'ssl_comp_methods'.

Then the code goes on to:

1291     CRYPTO_w_lock(CRYPTO_LOCK_SSL);

Now because this is a new function call, the compiler MUST flush any cached data in registers if the original source of that data could have been modified by the function CRYPTO_w_lock(). Since even an uber optimizing compiler can't be sure it has to presume there was a side effect (worst case scenario).

This means the next line:

1292     if (ssl_comp_methods != NULL)

is 100% guaranteed to cause ANOTHER load instruction to be executed.

Because as I keep saying, the compiler has no way to know if the call to CRYPTO_w_lock() didn't have a side effect which modified the value of ssl_comp_methods. So it has to perform another load to ensure that observable execution order is correct.


Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to