Kyle Hamilton wrote:
Oh. I'm sorry. Someone needs to use a keyword 'volatile'.
Bingo. Problem solved on the improper optimization issue.
Actually 'volatile' doesn't provide a useful fix.
Ignoring the fact there is no problem with any CPU that currently
exists. The volatile is most useful for memory mapped I/O where
touching a memory location (for load or store) can have side effects on
the I/O device. So 'volatile' can be used to enforce a strict access
policy based on the code as written (as opposed to how the compiler
might re-order load/stores if it was regular memory).
My points here are that even under the best optimizing compiler
following 100% safe optimizations will never have a problem with the
original posters code. There is no need to use volatile, it provide no
benefit and will impact correct optimizations within code making use of
the variable.
David seems to be thinking ahead into the realms of CPUs that have not
been invented yet. I'm don't give two hoots about those CPUs and I
certainly wouldn't waste brain cells trying to make my code portable to
a piece of hardware that does not exist.
On 3/27/07, David Schwartz <[EMAIL PROTECTED]> wrote:
> > For POSIX threads, the result of reading a variable in one
> > thread while it
> > might be modified in another thread is undefined. Line 1289 and
> > 1290 should
> > be removed.
> Not this old chestnut again.
Like it or not, it's a fact.
I would disagree. Its merely your opinion; and in my opinion your
understanding of C, CPU architecture, CPU cache cohearancy and memory
access isn't as good as you think it is.
You did point out a scenario that was not at all relevant to the
situation in the code of the original post.
Of course, that's because those CPUs don't exist yet. But I expect
code that
I write and compile today to work on future CPUs because I don't rely on
guarantees I don't actually ahve.
LOL. Hey that trans-wrap-cpu which converts charged ions into bits uses
an equally spartan language better able describe all those new features
it brings to the party.
Umm, what CPU? *Every* CPU OpenSSL does or might ever run on? And that
would
have to include every compiler OpenSSL does or might support, because the
compiler can change things around too.
I already pointed out the hows and whys the compiler CAN NOT change
anything around in the situation embodied in the original post.
> The function entrance, the "return" and the "CRYPTO_w_lock()" calls all
> act as the correct memory barriers with regards to compiler
> optimizations and load/store rescheduling concerns. Even if the
> CRYPTO_w_lock() is a macro or inline, since the writer would have been
> sure to insert the correct barriers if he chose to implement
> CRYPTO_w_lock() as a macro/inline.
Where is that documented or guaranteed? You mean that they *happen* to be
barriers on the platforms you have used. You have no idea what the next
version of GCC might do. We are talking about saving *NOTHING* here.
Its documented in the multi-CPU erratum for each CPU that can be wired
up to access the same memory.
There are different levels to the entire problem area, there is the
electrical and timing of the memory bus access to the same memory
location, there is the area of CPU cache and shared CPU cache
cohearancy. There are all documented for each architecture.
Then there is the area of the C language optimizing compiler. Which is
less documented as its the language constructs which are documented
rather than the optimization.
The barriers being discussed in this point are those for the compiler
that affect code generation and how load/store operations are managed.
But in principle the C language is a sequential language, that means the
observable order of events that occurs is STRICTLY in the order laid out
in the source code. Any optimization that defies that basic principal
is a bug.
The problem occurs after the beginning of the function. If the compare is
done on a cached copy in a register. Look at this example:
if (variable!=NULL)
{
free(variable);
variable=NULL;
}
This could easily be optimized to (cached_copy is a register or other
fast
place to store data):
int cached_copy=variable;
if(cached_copy!=NULL)
{
acquire_lock();
cached_copy=variable;
free(cached_copy);
cached_copy=NULL;
variable=cached_copy;
release_lock();
}
else variable=cached_copy;
If you think this cannot happen, I defy you to explain to me why. What
standard or guarantee is being violated? How can POSIX-compliant or
C-compliant code break with this optimization? How can you say it can
never
be faster?
I presume in this example the variable 'cached_copy' is not a real
variable (as in it does not exist in the original C source) but a
compiler generated variable during the optimization process.
It can not happen because when the compiler makes a function call to
acquire_lock() (in the o/p this is "CRYPTO_w_lock();" by definition of C
language it can not hold onto "cached_copy" over that function call,
because the compiler can't work out at compile time if the function
acquire_lock() has a side effect which could modify 'variable'. This is
standard stuff for any compiler to be consistent with observable order
of execution.
Then the argument would naturally lead into, well what if
acquire_lock()/CRYPTO_w_lock() is not a real function but a macro or
inline. To that argument I already mooted that any thread mutex
function designer using macro or inlines would already we well aware of
the gotchas in that situation. He would use an appropriate compiler
memory barrier to ensure correct behavior (I can probably cite examples
of real life code which does exactly this).
> The example you cite is not valid for line 1289 and 1290. The only
> write operation to 'ssl_comp_methods' occurs inside the locked region
> and after the text condition has been re-evaluated.
So what? The compiler may not be smart enough to realize that deserves to
undo the optimization and may instead refresh before the lock
acquisition.
This clearly illustrates your incorrect understanding of these matters.
Sure there is some correct theory being demonstrated in your point but
its not being applied correctly, and is not relevant to the code in the
original posting.
> So your pseudo-code serves a useful purpose to illustrate to others on
> the list about the concern you raise, but the concern itself is not
> relevant to the example given.
Nonsense. You are assuming that future compilers won't do things that are
perfectly legal for them to do just because it doesn't make sense today.
This type of code is not permitted (except as a platform-specific
optimization) at any of the places I have worked as a professional
programmer and violates most professional coding standards.
FFS. A compiler will not execute a store instruction to memory as the
example pseudo-code you posted, a store insn will NEVER EVER EVER happen.
In the code:
1289 if (ssl_comp_methods == NULL)
1290 return;
'ssl_comp_methods' is not being assigned to in any way; the only
operation that takes place is a load operation before a compare then a
branch.
The variable 'ssl_comp_methods' is outside the scope of the function.
The above two lines are the first two lines in a function block.
All this adds up to a 100% guarantee that compiler will always execute
that load instruction (and will never optimize that away).
There is a 100% guarantee it will never execute a store instruction on
the memory address for 'ssl_comp_methods'.
Then the code goes on to:
1291 CRYPTO_w_lock(CRYPTO_LOCK_SSL);
Now because this is a new function call, the compiler MUST flush any
cached data in registers if the original source of that data could have
been modified by the function CRYPTO_w_lock(). Since even an uber
optimizing compiler can't be sure it has to presume there was a side
effect (worst case scenario).
This means the next line:
1292 if (ssl_comp_methods != NULL)
is 100% guaranteed to cause ANOTHER load instruction to be executed.
Because as I keep saying, the compiler has no way to know if the call to
CRYPTO_w_lock() didn't have a side effect which modified the value of
ssl_comp_methods. So it has to perform another load to ensure that
observable execution order is correct.
Darryl
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager [EMAIL PROTECTED]