Re: Memory Leaks in SSL_Library_init()

Darryl Miles Tue, 27 Mar 2007 23:04:03 -0800

Kyle Hamilton wrote:

Oh.  I'm sorry.  Someone needs to use a keyword 'volatile'.


Bingo.  Problem solved on the improper optimization issue.


Actually 'volatile' doesn't provide a useful fix.

Ignoring the fact there is no problem with any CPU that currentlyexists. The volatile is most useful for memory mapped I/O wheretouching a memory location (for load or store) can have side effects onthe I/O device. So 'volatile' can be used to enforce a strict accesspolicy based on the code as written (as opposed to how the compilermight re-order load/stores if it was regular memory).

My points here are that even under the best optimizing compilerfollowing 100% safe optimizations will never have a problem with theoriginal posters code. There is no need to use volatile, it provide nobenefit and will impact correct optimizations within code making use ofthe variable.

David seems to be thinking ahead into the realms of CPUs that have notbeen invented yet. I'm don't give two hoots about those CPUs and Icertainly wouldn't waste brain cells trying to make my code portable toa piece of hardware that does not exist.

On 3/27/07, David Schwartz <[EMAIL PROTECTED]> wrote:

> > For POSIX threads, the result of reading a variable in one
> > thread while it
> > might be modified in another thread is undefined. Line 1289 and
> > 1290 should
> > be removed.

> Not this old chestnut again.

Like it or not, it's a fact.

I would disagree. Its merely your opinion; and in my opinion yourunderstanding of C, CPU architecture, CPU cache cohearancy and memoryaccess isn't as good as you think it is.

You did point out a scenario that was not at all relevant to thesituation in the code of the original post.

Of course, that's because those CPUs don't exist yet. But I expectcode that
I write and compile today to work on future CPUs because I don't rely on
guarantees I don't actually ahve.

LOL. Hey that trans-wrap-cpu which converts charged ions into bits usesan equally spartan language better able describe all those new featuresit brings to the party.

Umm, what CPU? *Every* CPU OpenSSL does or might ever run on? And thatwould
have to include every compiler OpenSSL does or might support, because the
compiler can change things around too.

I already pointed out the hows and whys the compiler CAN NOT changeanything around in the situation embodied in the original post.

> The function entrance, the "return" and the "CRYPTO_w_lock()" calls all
> act as the correct memory barriers with regards to compiler
> optimizations and load/store rescheduling concerns.  Even if the
> CRYPTO_w_lock() is a macro or inline, since the writer would have been
> sure to insert the correct barriers if he chose to implement
> CRYPTO_w_lock() as a macro/inline.

Where is that documented or guaranteed? You mean that they *happen* to be
barriers on the platforms you have used. You have no idea what the next
version of GCC might do. We are talking about saving *NOTHING* here.

Its documented in the multi-CPU erratum for each CPU that can be wiredup to access the same memory.

There are different levels to the entire problem area, there is theelectrical and timing of the memory bus access to the same memorylocation, there is the area of CPU cache and shared CPU cachecohearancy. There are all documented for each architecture.

Then there is the area of the C language optimizing compiler. Which isless documented as its the language constructs which are documentedrather than the optimization.

The barriers being discussed in this point are those for the compilerthat affect code generation and how load/store operations are managed.

But in principle the C language is a sequential language, that means theobservable order of events that occurs is STRICTLY in the order laid outin the source code. Any optimization that defies that basic principalis a bug.

The problem occurs after the beginning of the function. If the compare is
done on a cached copy in a register. Look at this example:

if (variable!=NULL)
{
 free(variable);
 variable=NULL;
}

This could easily be optimized to (cached_copy is a register or otherfast

place to store data):

int cached_copy=variable;
if(cached_copy!=NULL)
{
 acquire_lock();
 cached_copy=variable;
 free(cached_copy);
 cached_copy=NULL;
 variable=cached_copy;
 release_lock();
}
else variable=cached_copy;

If you think this cannot happen, I defy you to explain to me why. What
standard or guarantee is being violated? How can POSIX-compliant or

C-compliant code break with this optimization? How can you say it cannever

be faster?

I presume in this example the variable 'cached_copy' is not a realvariable (as in it does not exist in the original C source) but acompiler generated variable during the optimization process.

It can not happen because when the compiler makes a function call toacquire_lock() (in the o/p this is "CRYPTO_w_lock();" by definition of Clanguage it can not hold onto "cached_copy" over that function call,because the compiler can't work out at compile time if the functionacquire_lock() has a side effect which could modify 'variable'. This isstandard stuff for any compiler to be consistent with observable orderof execution.

Then the argument would naturally lead into, well what ifacquire_lock()/CRYPTO_w_lock() is not a real function but a macro orinline. To that argument I already mooted that any thread mutexfunction designer using macro or inlines would already we well aware ofthe gotchas in that situation. He would use an appropriate compilermemory barrier to ensure correct behavior (I can probably cite examplesof real life code which does exactly this).

> The example you cite is not valid for line 1289 and 1290.  The only
> write operation to 'ssl_comp_methods' occurs inside the locked region
> and after the text condition has been re-evaluated.

So what? The compiler may not be smart enough to realize that deserves to

undo the optimization and may instead refresh before the lockacquisition.

This clearly illustrates your incorrect understanding of these matters.Sure there is some correct theory being demonstrated in your point butits not being applied correctly, and is not relevant to the code in theoriginal posting.

> So your pseudo-code serves a useful purpose to illustrate to others on
> the list about the concern you raise, but the concern itself is not
> relevant to the example given.

Nonsense. You are assuming that future compilers won't do things that are
perfectly legal for them to do just because it doesn't make sense today.
This type of code is not permitted (except as a platform-specific
optimization) at any of the places I have worked as a professional
programmer and violates most professional coding standards.

FFS. A compiler will not execute a store instruction to memory as theexample pseudo-code you posted, a store insn will NEVER EVER EVER happen.


In the code:

1289     if (ssl_comp_methods == NULL)
1290             return;

'ssl_comp_methods' is not being assigned to in any way; the onlyoperation that takes place is a load operation before a compare then abranch.


The variable 'ssl_comp_methods' is outside the scope of the function.

The above two lines are the first two lines in a function block.

All this adds up to a 100% guarantee that compiler will always executethat load instruction (and will never optimize that away).

There is a 100% guarantee it will never execute a store instruction onthe memory address for 'ssl_comp_methods'.


Then the code goes on to:

1291     CRYPTO_w_lock(CRYPTO_LOCK_SSL);

Now because this is a new function call, the compiler MUST flush anycached data in registers if the original source of that data could havebeen modified by the function CRYPTO_w_lock(). Since even an uberoptimizing compiler can't be sure it has to presume there was a sideeffect (worst case scenario).


This means the next line:

1292     if (ssl_comp_methods != NULL)

is 100% guaranteed to cause ANOTHER load instruction to be executed.

Because as I keep saying, the compiler has no way to know if the call toCRYPTO_w_lock() didn't have a side effect which modified the value ofssl_comp_methods. So it has to perform another load to ensure thatobservable execution order is correct.



Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: Memory Leaks in SSL_Library_init()

Reply via email to