FWIW, the Itanium compiler included with the Microsoft Platform SDK does employ release/aquire semantics when accessing volatile variables. However, the standard x86 cl.exe does not and although it seems that the Pentium4 does not reorder as much as the spec allows, the spec does still allow it and so future versions of the x86 line may do so, probably breaking existing code expecting volatile accesses to be ordered a certain way.
Most compilers up until now probably did't do anything with memory barriers simply because most processors didn't require them. It's modern processors that support reordering. Volatile should really be thought of as only useful for accessing memory mapped devices -- if you think of it this way then you probably wont go wrong by trying volatile based hacks. The compiler vendor may well expect that when volatile is being used that the data segment being accessed has been setup to disable caching and write combining. As Brian pointed out, you can hardly expect (or would want) the compiler to generate code to flush the cache, etc., based on whether caching is enabled in the table descriptors. Therefore, volatile accesses on cacheable memory segments is likely to be a candidate for reordering on processes that support it. While Microsoft's IA64 compiler does do release/aquire for volatile types, the C standard is vaugue enough in this area that this behaviour shouldn't be relied upon. I've been reading these specs very closely recently, though it's more about finding out what cannot be done rather than what I can get away with. There's nothing wrong with looking for possible optimisations, but in this case the specs show that DCLP is not safe and cannot be modified in anyway (that doesn't result in loosing the optimisation) to make it generally safe. Steven -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brian Hurt Sent: Saturday, 9 April 2005 6:07 AM To: openssl-dev@openssl.org Cc: [EMAIL PROTECTED] Subject: RE: OpenSSL use of DCLP may not be thread-safe on multiple processors On Fri, 8 Apr 2005, David Schwartz wrote: > No. The C standard is not telling the compiler what to do. It is > saying what the system must do when it runs the particular source > code. If the compiler cannot generate code that makes the system as a > whole comply with the standard, then the compiler does not conform. Yes, but the standard is only defined in terms of what is visisble from a single thread, and not in terms of what is visible from external vantage points (like other threads). No C compiler I ever worked with issued the memory barrier/cache flush instructions needed to enforce cache behavior for volatile references. Specially, neither Visual Studio nor GCC for the x86 issues those sorts of instructions. I haven't looked at the code in question, but my general experience has been if you're relying on some precise memory specification and exacting standards adherence, you're probably screwing up. > >> int a; >> int c; >> void foo(int b) >> { >> c = b; >> a = c; >> } You will probably get code (x86 gas format) like: movl 8(%ebp), %eax ; eax = b movl %eax, c ; c = b movl %eax, a ; a = c > >> into an assembly language sequence that loads the contents of b into a >> register, and then stores it into both a and c. The following code: > >> int a; >> volatile int c; >> void foo(int b) >> { >> c = b; >> a = c; >> } This will produce code like: movl 8(%ebp), %eax ; eax = b movl %eax, c ; c = b movl c, %eax ; eax = c movl %eax, a Note the reload of c. Also note the utter lack of MBAR, CFLUSH, and similiar instructions. This is actually pretty standard behavior in the face of caches, and write combining and speculative execution and all the other tricks modern CPUs are doing. It issued the write, and then issued a seperate read to read the value back in, and the fact that the CPU short circuited this isn't the compiler's problem. You can argue to the cows come home wether this is conformant or not- but that's the behavior on the ground. > The compiler is not free to ignore anything. If the C standard > specifies that the writes must occur in order, then the compiler must > make the writes occur in order. Not generate assembly code that makes it > look like the writes occur in order, but occur in order. The abstract > machine is not about assembly language, it's about what actually > happens. That's what the compilers do. And if the machine combines the writes- as most modern CPUs almost certainly would, the compilers will not issue extra instructions to overcome this. Especially considering that it's non-trivial to determine if the extra instructions are even needed. I mean, on the x86 you have the CD and NW flags in CR0, you have the MTRRs, plus bit 6 of the IA32_MISC_ENABLE MSR all statically controlling various types of caching. Brian ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]