* From: Christian Ullrich
> On February 13, 2016 4:10:34 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Christian Ullrich <ch...@chrullrich.net> writes:
> > Lastly, I'd like to see some discussion of what side effects
> > "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> > a magic-elixir-against-crashes-with-no-downsides.
> It tells the math library (in the CRT, no separate libm on Windows)
> not to use the AVX2-based implementations of log() and possibly
> other functions. AIUI, FMA means "fused multiply-add" and is
> apparently something that increases performance and accuracy in
> transcendental functions.
> I can check the CRT source later today and figure out exactly what
> it does.
OK, it turns out that the CRT source MS ships is not quite as complete as I
thought it was (up until 2013, at least), so I had a look at the disassembly.
When the library initializes, it checks whether the CPU supports the FMA
instructions by looking at a certain bit in the CPUID result. If that is set,
it sets a flag to use the FMA instructions. Later, in exp(), log(), pow() and
the trigonometrical functions, it first checks whether that flag is set, and if
so, uses the AVX-based implementation. If the flag is not set, it falls back to
an SSE2-based one. So, yes, that function only and specifically disables the
use of instructions that do not work in the problematic case.
The bug appears to be that it uses all manner of AVX and AVX2 instructions
based only on the FMA support flag in CPUID, even though AVX2 has its own bit
To reiterate: The problem occurs because the library only asks the CPU whether
it is *able* to perform the AVX instructions, but not whether it is *willing*
to do so. In this particular situation, the former applies but not the latter,
because the CPU needs OS support (saving the XMM/YMM registers across context
switches), and the OS has not declared its support for that.
The downside to disabling the AVX implementations is a performance loss
compared to using it. I ran a microbenchmark (avg(log(x) from
generate_series(1,1e8))), and the result was that with FMA enabled, it is ~5.5%
faster than without.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: