ni...@lysator.liu.se (Niels Möller) writes:
x86_64 and arm are the most important.
OK, those are interesting to GMP as well.
What's Nettle's intended fat granularity?
I assume existence of instructions is sometimes enough, e.g., if AES
hardware support exists then it seems safe to assume
t...@gmplib.org (Torbjörn Granlund) writes:
What's Nettle's intended fat granularity?
For a start, only checking existence of instructions. When it comes to
clever scheduling, I don't do very much.
The only nettle function currently implemented in assembler where it
might be beneficial to use
t...@gmplib.org (Torbjörn Granlund) writes:
Which architectures do you intend to target for fat nettle builds?
x86_64 and arm are the most important.
I really would want to move GMP towards fattyness for all current
platforms. Unfortunately, this is not easy, and the problem is neither
Which architectures do you intend to target for fat nettle builds?
I really would want to move GMP towards fattyness for all current
platforms. Unfortunately, this is not easy, and the problem is neither
writing the actual code (fat.c, fat_entry.asm) nor getting memory
ordering right. The real
ni...@lysator.liu.se (Niels Möller) writes:
Sounds doable for the sqr_basecase threshold, at least.
On the other hand, on x86_64, maybe all chips we care about have the
needed extensions, so it's *easy* to add an mfence or sfence instruction
and not have to worry? I guess 32-bit x86 is
This article summarises things well. The x86/AMD64 indeed only reorders
load before stores (except under the obscure OOStore mode).
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf
There is some reasoning about the weaker ordering of some SSE
instructions, as well
ni...@lysator.liu.se (Niels Möller) writes:
GMP's fat initialization (I'm looking at the x86_64 code now) ends with
*((volatile int *) __gmpn_cpuvec_initialized) = 1;
I suspect that it's possible (but unlikely) that a different thread on
another cpu may read
ni...@lysator.liu.se (Niels Möller) writes:
Fat gmp library, cpuvec not yet initialized. Several threads, on
different cpus, call the CPUVEC_THRESHOLD macro at about the same time.
That's the scenario.
When the first of those threads get to setting
__gmpn_cpuvec_initialized, the magic
GMP's fat initialization (I'm looking at the x86_64 code now) ends with
*((volatile int *) __gmpn_cpuvec_initialized) = 1;
I suspect that it's possible (but unlikely) that a different thread on
another cpu may read __gmpn_cpuvec_initialized, get 1, read thresholds
or pointers, and still get
t...@gmplib.org (Torbjörn Granlund) writes:
My understanding is that the AMD64 as well as older x86 architectures do
not allow store/store reordering, except when explicitly told otherwise.
I'm a bit confused about this. Then when are memory barriers (mfence and
friends) ever needed? I have a
ni...@lysator.liu.se (Niels Möller) writes:
I'm a bit confused about this. Then when are memory barriers (mfence and
friends) ever needed? I have a pretty vague idea about how memory models
work in both theory and practice. I'm thinking about something like:
cpu0 for some reason has
ni...@lysator.liu.se (Niels Möller) writes:
I've tried to look in the intel architecture manual, but I can't find
any obvious place where this ordering guarantee is described.
It is not well-documented.
That makes me wonder even more when mfence (or in particular, sfence) is
ever
Just to chime in here...
I've recently worked on making some thread safe code,
and learned all about the new memory model in the C11
and C++11 standards (these are thankfully compatible models).
There's lot of stuff online (look for C11 atomics or C++11 atomics,
and buzzwords like
13 matches
Mail list logo