On Tue, Jan 23, 2018 at 4:15 PM, Richard Henderson <richard.hender...@linaro.org> wrote: > Ok. Now it depends on what result you care about for madd specifically. > > If, like x86 and Power, fmsub returns the (silenced) original input NaN, you > want the float_muladd_* flags. > > If, like ARM, fmsub returns the (silenced) negated input NaN, then you do need > to change sign externally. If this is the case, please use float32_chs > instead > of open-coding it with xor.
The ISA spec is a little ambiguous here. There is text that says we multiply, optionally negate the result, and then add or subtract the addend. However, this is followed by a sentence that gives equations, and for fnmsub it says -rs1*rs2+rs3. If we assume C semantics, then they are negating rs1 not the multiply result. This could potentially give a different result if rs2 is a +qNaN and rs1/rs3 are not NaNs. qemu is implementing what the equation says, not what the text says. The definitive source would be the Spike simulator, and it agrees with qemu and the equations. The ISA spec does not specify what happens when one of the operands is a NaN, except to mention that operations comply with the IEEE 754 2008 standard. I think that qemu is correct here, and that you want to use float32_chs. Although, looking at this again, I see another statement in a different place that says: Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000. So it sounds like maybe we do want default NaN support. It appears that spike is using default NaNs. Unfortunately, enabling default NaN support causes gcc and glibc testsuite failures which complicates upstreaming glibc support, as they won't accept it unless we get failures below a certain number. Also, not having hardware to compare against makes it impossible to determine if the simulator is doing exactly what the hardware does. This could take a little time to sort out. Jim