Re: Memory barrier for fat initialization

2015-01-16 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: x86_64 and arm are the most important. OK, those are interesting to GMP as well. What's Nettle's intended fat granularity? I assume existence of instructions is sometimes enough, e.g., if AES hardware support exists then it seems safe to assume

Re: Memory barrier for fat initialization

2015-01-16 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: What's Nettle's intended fat granularity? For a start, only checking existence of instructions. When it comes to clever scheduling, I don't do very much. The only nettle function currently implemented in assembler where it might be beneficial to use

Re: Memory barrier for fat initialization

2015-01-15 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: Which architectures do you intend to target for fat nettle builds? x86_64 and arm are the most important. I really would want to move GMP towards fattyness for all current platforms. Unfortunately, this is not easy, and the problem is neither

Re: Memory barrier for fat initialization

2015-01-14 Thread Torbjörn Granlund
Which architectures do you intend to target for fat nettle builds? I really would want to move GMP towards fattyness for all current platforms. Unfortunately, this is not easy, and the problem is neither writing the actual code (fat.c, fat_entry.asm) nor getting memory ordering right. The real

Re: Memory barrier for fat initialization

2015-01-14 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Sounds doable for the sqr_basecase threshold, at least. On the other hand, on x86_64, maybe all chips we care about have the needed extensions, so it's *easy* to add an mfence or sfence instruction and not have to worry? I guess 32-bit x86 is

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
This article summarises things well. The x86/AMD64 indeed only reorders load before stores (except under the obscure OOStore mode). http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf There is some reasoning about the weaker ordering of some SSE instructions, as well

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: GMP's fat initialization (I'm looking at the x86_64 code now) ends with *((volatile int *) __gmpn_cpuvec_initialized) = 1; I suspect that it's possible (but unlikely) that a different thread on another cpu may read

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Fat gmp library, cpuvec not yet initialized. Several threads, on different cpus, call the CPUVEC_THRESHOLD macro at about the same time. That's the scenario. When the first of those threads get to setting __gmpn_cpuvec_initialized, the magic

Memory barrier for fat initialization

2015-01-13 Thread Niels Möller
GMP's fat initialization (I'm looking at the x86_64 code now) ends with *((volatile int *) __gmpn_cpuvec_initialized) = 1; I suspect that it's possible (but unlikely) that a different thread on another cpu may read __gmpn_cpuvec_initialized, get 1, read thresholds or pointers, and still get

Re: Memory barrier for fat initialization

2015-01-13 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: My understanding is that the AMD64 as well as older x86 architectures do not allow store/store reordering, except when explicitly told otherwise. I'm a bit confused about this. Then when are memory barriers (mfence and friends) ever needed? I have a

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I'm a bit confused about this. Then when are memory barriers (mfence and friends) ever needed? I have a pretty vague idea about how memory models work in both theory and practice. I'm thinking about something like: cpu0 for some reason has

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I've tried to look in the intel architecture manual, but I can't find any obvious place where this ordering guarantee is described. It is not well-documented. That makes me wonder even more when mfence (or in particular, sfence) is ever

Re: Memory barrier for fat initialization

2015-01-13 Thread Victor Shoup
Just to chime in here... I've recently worked on making some thread safe code, and learned all about the new memory model in the C11 and C++11 standards (these are thankfully compatible models). There's lot of stuff online (look for C11 atomics or C++11 atomics, and buzzwords like