On Tue, Jan 5, 2016 at 5:12 PM, Zoltán Herczeg <[email protected]> wrote:
> Perhaps we could start by supporting some platforms, and gradually cover more 
> with the community help. I heard that asm volatile forces GCC (and perhaps 
> clang) to disable moving instructions around such asm blocks.
>
> E.g:
>
> statement1;
> asm volatile (" ");
> statement2;
>
> Is it true, that statement1 is fully completed before statement2 is executed 
> even if the assembly part is nothing?

I don't think so, it's even documented:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

"Note that the compiler can move even volatile asm instructions
relative to other code"

What you may use on older GCCs is __sync_synchronize() (full barrier),
assuming pointer assignments to be atomic.

Newer GCCs have __atomic_load / __atomic_store with the
__ATOMIC_ACQUIRE and __ATOMIC_RELEASE memory models; even newer GCCs
have C11, so the _Atomic type qualifier and <stdatomic.h> operations.

... totally open question marks about other platforms / compilers ...

> The CPU can still reorder stores. An x86 CPU does not have (need) data write 
> barrier instruction as far as I know. Recent ARM 32 CPUs has data write 
> barrier. Could somebody tell me how can I test whether this instruction is 
> available at compile time? ARM 64 should not be a problem. I have not checked 
> other CPUs yet.

I guess you want to check the exact ARM revision of the CPU? Some
detection code like this?

http://code.woboq.org/qt5/qtbase/src/corelib/global/qprocessordetection.h.html#90

> There is one more thing. This theoretically affects everything, not just JIT 
> compilation. If we compile a pattern with pcre2_compile, it is possible that 
> the result pointer has been shared with another thread, but the compiled 
> pattern data is not.
>
> Main thread:
>
> compiled_pcre_pattern->byte_code = something;
> return compiled_pcre_pattern;
>
> shared_pattern = compiled_pcre_pattern;
>
> Another thread:
> match (shared_pattern, subject);
>
> The byte_code part can be a garbage on the other thread (since it can be 
> executed by another CPU). People did not complain about these effects before, 
> is there a reason for that? I don't want to solve a non-existing problem.

Well, this is a problem of PCRE users and how they share that
"shared_pattern" across threads. The "proper" way is making it an
atomic pointer, then turning the assignment into an atomic
store/release and the read in match into an atomic load/acquire.

Cheers,
-- 
Giuseppe D'Angelo

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to