On Tue, Jan 5, 2016 at 5:12 PM, Zoltán Herczeg <[email protected]> wrote: > Perhaps we could start by supporting some platforms, and gradually cover more > with the community help. I heard that asm volatile forces GCC (and perhaps > clang) to disable moving instructions around such asm blocks. > > E.g: > > statement1; > asm volatile (" "); > statement2; > > Is it true, that statement1 is fully completed before statement2 is executed > even if the assembly part is nothing?
I don't think so, it's even documented: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html "Note that the compiler can move even volatile asm instructions relative to other code" What you may use on older GCCs is __sync_synchronize() (full barrier), assuming pointer assignments to be atomic. Newer GCCs have __atomic_load / __atomic_store with the __ATOMIC_ACQUIRE and __ATOMIC_RELEASE memory models; even newer GCCs have C11, so the _Atomic type qualifier and <stdatomic.h> operations. ... totally open question marks about other platforms / compilers ... > The CPU can still reorder stores. An x86 CPU does not have (need) data write > barrier instruction as far as I know. Recent ARM 32 CPUs has data write > barrier. Could somebody tell me how can I test whether this instruction is > available at compile time? ARM 64 should not be a problem. I have not checked > other CPUs yet. I guess you want to check the exact ARM revision of the CPU? Some detection code like this? http://code.woboq.org/qt5/qtbase/src/corelib/global/qprocessordetection.h.html#90 > There is one more thing. This theoretically affects everything, not just JIT > compilation. If we compile a pattern with pcre2_compile, it is possible that > the result pointer has been shared with another thread, but the compiled > pattern data is not. > > Main thread: > > compiled_pcre_pattern->byte_code = something; > return compiled_pcre_pattern; > > shared_pattern = compiled_pcre_pattern; > > Another thread: > match (shared_pattern, subject); > > The byte_code part can be a garbage on the other thread (since it can be > executed by another CPU). People did not complain about these effects before, > is there a reason for that? I don't want to solve a non-existing problem. Well, this is a problem of PCRE users and how they share that "shared_pattern" across threads. The "proper" way is making it an atomic pointer, then turning the assignment into an atomic store/release and the read in match into an atomic load/acquire. Cheers, -- Giuseppe D'Angelo -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
