Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2019-01-07 Thread Alex Bennée


Emilio G. Cota  writes:

> On Thu, Dec 20, 2018 at 11:10:08 +, Alex Bennée wrote:
> (snip)
>> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
>> +#define QEMU_HARDFLOAT_USE_FMA 0
>> +#else
>> +#define QEMU_HARDFLOAT_USE_FMA 1
>> +#endif
>> +#else
>> +#define QEMU_HARDFLOAT_USE_FMA 1
>> +#endif
>> +
>>  /*
>>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>>   * float{32,64}_is_infinity when !USE_FP.
>> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int 
>> flags, float_status *s)
>>  ub.s = xb;
>>  uc.s = xc;
>>
>> +if (!QEMU_HARDFLOAT_USE_FMA) {
>> +goto soft;
>> +}
>
> I don't think this should be a compile-time check; if the QEMU binary
> is run on a system with a newer, fixed glibc (or any other libc), then
> we'll have disabled fma hardfloat unnecessarily.
>
> What do you think about the following?
>
> Laurent: if you want to test the below, you can pull it from
>https://github.com/cota/qemu/tree/fma-fix
>
> Thanks,
>
>   Emilio
> ---
> commit ddeec29a2c33550c5d018aeea05d45a23579ae1b
> Author: Emilio G. Cota 
> Date:   Fri Dec 21 14:08:57 2018 -0500
>
> softfloat: enforce softfloat if the host's FMA is broken
>
> The added branch is marked as unlikely and therefore its impact
> on performance (measured with fp-bench) is within the noise range
> when measured on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.
>
> Laurent Desnogues 
> Signed-off-by: Emilio G. Cota 

I've applied b8cc3928cee7b1d91bf39c86bec4801b9dc612e1 from your tree to
fpu/next although the numbers look a bit odd. I see:

Before:
  143.83 MFlops
  89.34 MFlops

After:
  150.20 MFlops
  85.73 MFlops

On my i7-4770 where as your commit seems to show a big jump in
performance which is odd as this is preventing a bug not enabling FMA.

> +
> +static void __attribute__((constructor)) softfloat_init(void)
> +{
> +union_float64 ua, ub, uc, ur;
> +
> +if (QEMU_NO_HARDFLOAT) {
> +return;
> +}
> +
> +/*
> + * Test that the host's FMA is not obviously broken. For example,
> + * glibc < 2.23 can perform an incorrect FMA on certain hosts; see
> + *   https://sourceware.org/bugzilla/show_bug.cgi?id=13304
> + */
> +ua.s = 0x0021;
> +ub.s = 0x3ca0;
> +uc.s = 0x0020;
> +ur.h = fma(ua.h, ub.h, uc.h);
> +if (ur.s != 0x0021) {
> +host_fma_is_broken = true;
> +}
> +}

I'm fine with the cpuid stuff at the bottom of softfloat for now. We can
move it later if the other micro-architectures want to get in on the
detecting features game.

--
Alex Bennée



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2019-01-07 Thread Alex Bennée


Richard Henderson  writes:

> On 12/21/18 11:30 AM, Emilio G. Cota wrote:
>> +ua.s = 0x0021;
>> +ub.s = 0x3ca0;
>> +uc.s = 0x0020;
>> +ur.h = fma(ua.h, ub.h, uc.h);
>> +if (ur.s != 0x0021) {
>
> Forgot your ull's, but otherwise ok.
>
> In email to Alex, I did wonder if we should check for fma hardware (at least 
> on
> x86).  Without a hardware insn, the libm routine is probably no faster than
> softmmu.

My only worry is we could end up doing a bunch of these processor
id/capability type checks. Is a correct but slow libm FMA going to be
much slower than our FMA? Will users even notice?

--
Alex Bennée



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-21 Thread Aleksandar Markovic
On Dec 20, 2018 12:11 PM, "Alex Bennée"  wrote:
>
> Some versions of glibc have been reported to have problems with
> fused-multiply-accumulate operations. If the underlying fma
> implementation does a two step operation it will instroduce subtle
> rounding errors. Newer versions of the library seem to deal with this
> better and modern hardware has fused operations which the library can
> use.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Alex Bennée 
> Cc: Emilio G. Cota 
> ---
>  fpu/softfloat.c | 25 +
>  1 file changed, 25 insertions(+)
>

What about using gnu_get_libc_version() at runtime?

http://man7.org/linux/man-pages/man3/gnu_get_libc_version.3.html

> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59eac97d10..9c2dbd04b5 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -203,6 +203,25 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
>  # define QEMU_HARDFLOAT_3F64_USE_FP 0
>  #endif
>
> +/*
> + * Choose whether to accelerate fused multiply-accumulate operations
> + * with hard float functions. Some versions of glibc's maths library
> + * have been reported to be broken on x86 without FMA instructions.
> + */
> +#if defined(__x86_64__)
> +/* this was actually reported as glibc-2.12-1.149.el6_6.5.x86_64 was
> + * broken but glibc 2.12-1.209 works but out of caution lets disable
> + * for all older glibcs.
> + */
> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
> +#define QEMU_HARDFLOAT_USE_FMA 0
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +
>  /*
>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>   * float{32,64}_is_infinity when !USE_FP.
> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc,
int flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> @@ -1612,6 +1634,9 @@ float64_muladd(float64 xa, float64 xb, float64 xc,
int flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> --
> 2.17.1
>
>


Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-21 Thread Richard Henderson
On 12/21/18 11:30 AM, Emilio G. Cota wrote:
> +ua.s = 0x0021;
> +ub.s = 0x3ca0;
> +uc.s = 0x0020;
> +ur.h = fma(ua.h, ub.h, uc.h);
> +if (ur.s != 0x0021) {

Forgot your ull's, but otherwise ok.

In email to Alex, I did wonder if we should check for fma hardware (at least on
x86).  Without a hardware insn, the libm routine is probably no faster than
softmmu.


r~



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-21 Thread Emilio G. Cota
On Thu, Dec 20, 2018 at 11:10:08 +, Alex Bennée wrote:
(snip)
> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
> +#define QEMU_HARDFLOAT_USE_FMA 0
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +
>  /*
>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>   * float{32,64}_is_infinity when !USE_FP.
> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int 
> flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>  
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}

I don't think this should be a compile-time check; if the QEMU binary
is run on a system with a newer, fixed glibc (or any other libc), then
we'll have disabled fma hardfloat unnecessarily.

What do you think about the following?

Laurent: if you want to test the below, you can pull it from
   https://github.com/cota/qemu/tree/fma-fix

Thanks,

Emilio
---
commit ddeec29a2c33550c5d018aeea05d45a23579ae1b
Author: Emilio G. Cota 
Date:   Fri Dec 21 14:08:57 2018 -0500

softfloat: enforce softfloat if the host's FMA is broken

The added branch is marked as unlikely and therefore its impact
on performance (measured with fp-bench) is within the noise range
when measured on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.

Laurent Desnogues 
Signed-off-by: Emilio G. Cota 

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 59eac97d10..8b3670ca9d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1542,6 +1542,8 @@ soft_f64_muladd(float64 a, float64 b, float64 c, int 
flags,
 return float64_round_pack_canonical(pr, status);
 }
 
+static bool host_fma_is_broken;
+
 float32 QEMU_FLATTEN
 float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
 {
@@ -1562,6 +1564,11 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int 
flags, float_status *s)
 if (unlikely(!f32_is_zon3(ua, ub, uc))) {
 goto soft;
 }
+
+if (unlikely(host_fma_is_broken)) {
+goto soft;
+}
+
 /*
  * When (a || b) == 0, there's no need to check for under/over flow,
  * since we know the addend is (normal || 0) and the product is 0.
@@ -1623,6 +1630,11 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int 
flags, float_status *s)
 if (unlikely(!f64_is_zon3(ua, ub, uc))) {
 goto soft;
 }
+
+if (unlikely(host_fma_is_broken)) {
+goto soft;
+}
+
 /*
  * When (a || b) == 0, there's no need to check for under/over flow,
  * since we know the addend is (normal || 0) and the product is 0.
@@ -7974,3 +7986,25 @@ float128 float128_scalbn(float128 a, int n, float_status 
*status)
  , status);
 
 }
+
+static void __attribute__((constructor)) softfloat_init(void)
+{
+union_float64 ua, ub, uc, ur;
+
+if (QEMU_NO_HARDFLOAT) {
+return;
+}
+
+/*
+ * Test that the host's FMA is not obviously broken. For example,
+ * glibc < 2.23 can perform an incorrect FMA on certain hosts; see
+ *   https://sourceware.org/bugzilla/show_bug.cgi?id=13304
+ */
+ua.s = 0x0021;
+ub.s = 0x3ca0;
+uc.s = 0x0020;
+ur.h = fma(ua.h, ub.h, uc.h);
+if (ur.s != 0x0021) {
+host_fma_is_broken = true;
+}
+}



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-21 Thread Laurent Desnogues
Hi,

On Thu, Dec 20, 2018 at 12:10 PM Alex Bennée  wrote:
>
> Some versions of glibc have been reported to have problems with
> fused-multiply-accumulate operations. If the underlying fma
> implementation does a two step operation it will instroduce subtle
> rounding errors. Newer versions of the library seem to deal with this
> better and modern hardware has fused operations which the library can
> use.

Thanks for taking care of this issue.  The fix was tested on our
machines and it works as expected.

So:

Tested-by: Laurent Desnogues 

> Reported-by: Laurent Desnogues 
> Signed-off-by: Alex Bennée 
> Cc: Emilio G. Cota 
> ---
>  fpu/softfloat.c | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59eac97d10..9c2dbd04b5 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -203,6 +203,25 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
>  # define QEMU_HARDFLOAT_3F64_USE_FP 0
>  #endif
>
> +/*
> + * Choose whether to accelerate fused multiply-accumulate operations
> + * with hard float functions. Some versions of glibc's maths library
> + * have been reported to be broken on x86 without FMA instructions.
> + */
> +#if defined(__x86_64__)
> +/* this was actually reported as glibc-2.12-1.149.el6_6.5.x86_64 was
> + * broken but glibc 2.12-1.209 works but out of caution lets disable
> + * for all older glibcs.
> + */
> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
> +#define QEMU_HARDFLOAT_USE_FMA 0
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +
>  /*
>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>   * float{32,64}_is_infinity when !USE_FP.
> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int 
> flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> @@ -1612,6 +1634,9 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int 
> flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> --
> 2.17.1
>



Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma

2018-12-20 Thread Aleksandar Markovic
On Dec 20, 2018 12:11 PM, "Alex Bennée"  wrote:
>
> Some versions of glibc have been reported to have problems with
> fused-multiply-accumulate operations. If the underlying fma
> implementation does a two step operation it will instroduce subtle
> rounding errors. Newer versions of the library seem to deal with this
> better and modern hardware has fused operations which the library can
> use.
>
> Reported-by: Laurent Desnogues 
> Signed-off-by: Alex Bennée 
> Cc: Emilio G. Cota 
> ---
>  fpu/softfloat.c | 25 +
>  1 file changed, 25 insertions(+)
>

Acked-by: Aleksandar Markovic 

> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59eac97d10..9c2dbd04b5 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -203,6 +203,25 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
>  # define QEMU_HARDFLOAT_3F64_USE_FP 0
>  #endif
>
> +/*
> + * Choose whether to accelerate fused multiply-accumulate operations
> + * with hard float functions. Some versions of glibc's maths library
> + * have been reported to be broken on x86 without FMA instructions.
> + */
> +#if defined(__x86_64__)
> +/* this was actually reported as glibc-2.12-1.149.el6_6.5.x86_64 was
> + * broken but glibc 2.12-1.209 works but out of caution lets disable
> + * for all older glibcs.
> + */
> +#if defined(__GLIBC__) && (__GLIBC__ == 2 && __GLIBC_MINOR__ <= 12)
> +#define QEMU_HARDFLOAT_USE_FMA 0
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +#else
> +#define QEMU_HARDFLOAT_USE_FMA 1
> +#endif
> +
>  /*
>   * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
>   * float{32,64}_is_infinity when !USE_FP.
> @@ -1551,6 +1570,9 @@ float32_muladd(float32 xa, float32 xb, float32 xc,
int flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> @@ -1612,6 +1634,9 @@ float64_muladd(float64 xa, float64 xb, float64 xc,
int flags, float_status *s)
>  ub.s = xb;
>  uc.s = xc;
>
> +if (!QEMU_HARDFLOAT_USE_FMA) {
> +goto soft;
> +}
>  if (unlikely(!can_use_fpu(s))) {
>  goto soft;
>  }
> --
> 2.17.1
>
>