RE: OpenSSL use of DCLP may not be thread-safe on multiple processors

Steven Reddie Fri, 08 Apr 2005 20:56:18 -0700

FWIW, the Itanium compiler included with the Microsoft Platform SDK does
employ release/aquire semantics when accessing volatile variables.  However,
the standard x86 cl.exe does not and although it seems that the Pentium4
does not reorder as much as the spec allows, the spec does still allow it
and so future versions of the x86 line may do so, probably breaking existing
code expecting volatile accesses to be ordered a certain way.

Most compilers up until now probably did't do anything with memory barriers
simply because most processors didn't require them.  It's modern processors
that support reordering.

Volatile should really be thought of as only useful for accessing memory
mapped devices -- if you think of it this way then you probably wont go
wrong by trying volatile based hacks.  The compiler vendor may well expect
that when volatile is being used that the data segment being accessed has
been setup to disable caching and write combining.  As Brian pointed out,
you can hardly expect (or would want) the compiler to generate code to flush
the cache, etc., based on whether caching is enabled in the table
descriptors.  Therefore, volatile accesses on cacheable memory segments is
likely to be a candidate for reordering on processes that support it.  While
Microsoft's IA64 compiler does do release/aquire for volatile types, the C
standard is vaugue enough in this area that this behaviour shouldn't be
relied upon.

I've been reading these specs very closely recently, though it's more about
finding out what cannot be done rather than what I can get away with.
There's nothing wrong with looking for possible optimisations, but in this
case the specs show that DCLP is not safe and cannot be modified in anyway
(that doesn't result in loosing the optimisation) to make it generally safe.

Steven

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Brian Hurt
Sent: Saturday, 9 April 2005 6:07 AM
To: openssl-dev@openssl.org
Cc: [EMAIL PROTECTED]
Subject: RE: OpenSSL use of DCLP may not be thread-safe on multiple
processors

On Fri, 8 Apr 2005, David Schwartz wrote:

>       No. The C standard is not telling the compiler what to do. It is 
> saying what the system must do when it runs the particular source 
> code. If the compiler cannot generate code that makes the system as a 
> whole comply with the standard, then the compiler does not conform.

Yes, but the standard is only defined in terms of what is visisble from a
single thread, and not in terms of what is visible from external vantage
points (like other threads).

No C compiler I ever worked with issued the memory barrier/cache flush
instructions needed to enforce cache behavior for volatile references. 
Specially, neither Visual Studio nor GCC for the x86 issues those sorts of
instructions.

I haven't looked at the code in question, but my general experience has been
if you're relying on some precise memory specification and exacting
standards adherence, you're probably screwing up.

>
>> int a;
>> int c;
>> void foo(int b)
>> {
>>      c = b;
>>      a = c;
>> }

You will probably get code (x86 gas format) like:
        movl    8(%ebp), %eax   ; eax = b
        movl    %eax, c         ; c = b
        movl    %eax, a         ; a = c

>
>> into an assembly language sequence that loads the contents of b into a
>> register, and then stores it into both a and c.  The following code:
>
>> int a;
>> volatile int c;
>> void foo(int b)
>> {
>>      c = b;
>>      a = c;
>> }

This will produce code like:
        movl    8(%ebp), %eax   ; eax = b
        movl    %eax, c         ; c = b
        movl    c, %eax         ; eax = c
        movl    %eax, a

Note the reload of c.  Also note the utter lack of MBAR, CFLUSH, and 
similiar instructions.

This is actually pretty standard behavior in the face of caches, and write 
combining and speculative execution and all the other tricks modern CPUs 
are doing.  It issued the write, and then issued a seperate read to read 
the value back in, and the fact that the CPU short circuited this isn't 
the compiler's problem.  You can argue to the cows come home wether this 
is conformant or not- but that's the behavior on the ground.

>       The compiler is not free to ignore anything. If the C standard 
> specifies that the writes must occur in order, then the compiler must 
> make the writes occur in order. Not generate assembly code that makes it 
> look like the writes occur in order, but occur in order. The abstract 
> machine is not about assembly language, it's about what actually 
> happens.

That's what the compilers do.  And if the machine combines the writes- as 
most modern CPUs almost certainly would, the compilers will not issue 
extra instructions to overcome this.  Especially considering that it's 
non-trivial to determine if the extra instructions are even needed.  I 
mean, on the x86 you have the CD and NW flags in CR0, you have the MTRRs, 
plus bit 6 of the IA32_MISC_ENABLE MSR all statically controlling various 
types of caching.

Brian

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

RE: OpenSSL use of DCLP may not be thread-safe on multiple processors

Reply via email to