Re: [PULL 14/31] target/i386: reimplement f2xm1 using floatx80 operations

2020-07-14 Thread Laszlo Ersek
On 06/24/20 12:50, Paolo Bonzini wrote:
> From: Joseph Myers 
>
> The x87 f2xm1 emulation is currently based around conversion to
> double.  This is inherently unsuitable for a good emulation of any
> floatx80 operation, even before considering that it is a particularly
> naive implementation using double (computing with pow and then
> subtracting 1 rather than attempting a better emulation using expm1).
>
> Reimplement using the soft-float operations, including additions and
> multiplications with higher precision where appropriate to limit
> accumulation of errors.  I considered reusing some of the m68k code
> for transcendental operations, but the instructions don't generally
> correspond exactly to x87 operations (for example, m68k has 2^x and
> e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
> from applying multiple such operations each rounding to floatx80
> precision, I wrote a direct implementation of 2^x - 1 instead.  It
> would be possible in principle to make the implementation more
> efficient by doing the intermediate operations directly with
> significands, signs and exponents and not packing / unpacking floatx80
> format for each operation, but that would make it significantly more
> complicated and it's not clear that's worthwhile; the m68k emulation
> doesn't try to do that.
>
> A test is included with many randomly generated inputs.  The
> assumption of the test is that the result in round-to-nearest mode
> should always be one of the two closest floating-point numbers to the
> mathematical value of 2^x - 1; the implementation aims to do somewhat
> better than that (about 70 correct bits before rounding).  I haven't
> investigated how accurate hardware is.
>
> Signed-off-by: Joseph Myers 
>
> Message-Id: 
> Signed-off-by: Paolo Bonzini 
> ---
>  target/i386/fpu_helper.c |  385 +-
>  tests/tcg/i386/test-i386-f2xm1.c | 1140 ++
>  2 files changed, 1522 insertions(+), 3 deletions(-)
>  create mode 100644 tests/tcg/i386/test-i386-f2xm1.c
>
> diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
> index 8ef5b463ea..e32a2aa74b 100644
> --- a/target/i386/fpu_helper.c
> +++ b/target/i386/fpu_helper.c
> @@ -25,6 +25,7 @@
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
>  #include "fpu/softfloat.h"
> +#include "fpu/softfloat-macros.h"
>
>  #ifdef CONFIG_SOFTMMU
>  #include "hw/irq.h"
> @@ -836,12 +837,390 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong 
> ptr)
>  merge_exception_flags(env, old_flags);
>  }
>
> +/* 128-bit significand of log(2).  */
> +#define ln2_sig_high 0xb17217f7d1cf79abULL
> +#define ln2_sig_low 0xc9e3b39803f2f6afULL
> +
> +/*
> + * Polynomial coefficients for an approximation to (2^x - 1) / x, on
> + * the interval [-1/64, 1/64].
> + */
> +#define f2xm1_coeff_0 make_floatx80(0x3ffe, 0xb17217f7d1cf79acULL)
> +#define f2xm1_coeff_0_low make_floatx80(0xbfbc, 0xd87edabf495b3762ULL)
> +#define f2xm1_coeff_1 make_floatx80(0x3ffc, 0xf5fdeffc162c7543ULL)
> +#define f2xm1_coeff_2 make_floatx80(0x3ffa, 0xe35846b82505fcc7ULL)
> +#define f2xm1_coeff_3 make_floatx80(0x3ff8, 0x9d955b7dd273b899ULL)
> +#define f2xm1_coeff_4 make_floatx80(0x3ff5, 0xaec3ff3c4ef4ac0cULL)
> +#define f2xm1_coeff_5 make_floatx80(0x3ff2, 0xa184897c3a7f0de9ULL)
> +#define f2xm1_coeff_6 make_floatx80(0x3fee, 0xffe634d0ec30d504ULL)
> +#define f2xm1_coeff_7 make_floatx80(0x3feb, 0xb160111d2db515e4ULL)
> +
> +struct f2xm1_data {
> +/*
> + * A value very close to a multiple of 1/32, such that 2^t and 2^t - 1
> + * are very close to exact floatx80 values.
> + */
> +floatx80 t;
> +/* The value of 2^t.  */
> +floatx80 exp2;
> +/* The value of 2^t - 1.  */
> +floatx80 exp2m1;
> +};
> +
> +static const struct f2xm1_data f2xm1_table[65] = {
> +{ make_floatx80(0xbfff, 0x8000ULL),

This line (in commit eca30647fc07, "target/i386: reimplement f2xm1 using
floatx80 operations", 2020-06-26) causes a build failure for me, with
gcc-4.8 (on RHEL-7.6):

> target/i386/fpu_helper.c:871:5: error: initializer element is not constant
>  { make_floatx80(0xbfff, 0x8000ULL),
>  ^

The macro is defined as follows:

  #define make_floatx80(exp, mant) ((floatx80) { mant, exp })

with floatx80 being:

  typedef struct {
  uint64_t low;
  uint16_t high;
  } floatx80;

The macro produces a compound literal. While the compound literal occurs
outside of the body of a function, and thus has static storage duration
(per ISO C99 6.5.2.5p6), it is not a constant expression.

The array object "f2xm1_table" has static storage duration, and "All the
expressions in an initializer for an object that has static storage
duration shall be constant expressions or string literals" (ISO C99
6.7.8p4).

Per ISO C99 6.6:

> 7 More latitude is permitted for constant expressions in initializers.
>   Such a constant expression shall be, or evaluate to, one of the
>   following:
>   - an arithmetic 

[PULL 14/31] target/i386: reimplement f2xm1 using floatx80 operations

2020-06-24 Thread Paolo Bonzini
From: Joseph Myers 

The x87 f2xm1 emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors.  I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead.  It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs.  The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding).  I haven't
investigated how accurate hardware is.

Signed-off-by: Joseph Myers 

Message-Id: 
Signed-off-by: Paolo Bonzini 
---
 target/i386/fpu_helper.c |  385 +-
 tests/tcg/i386/test-i386-f2xm1.c | 1140 ++
 2 files changed, 1522 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-f2xm1.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..e32a2aa74b 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -25,6 +25,7 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 #include "fpu/softfloat.h"
+#include "fpu/softfloat-macros.h"
 
 #ifdef CONFIG_SOFTMMU
 #include "hw/irq.h"
@@ -836,12 +837,390 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
 merge_exception_flags(env, old_flags);
 }
 
+/* 128-bit significand of log(2).  */
+#define ln2_sig_high 0xb17217f7d1cf79abULL
+#define ln2_sig_low 0xc9e3b39803f2f6afULL
+
+/*
+ * Polynomial coefficients for an approximation to (2^x - 1) / x, on
+ * the interval [-1/64, 1/64].
+ */
+#define f2xm1_coeff_0 make_floatx80(0x3ffe, 0xb17217f7d1cf79acULL)
+#define f2xm1_coeff_0_low make_floatx80(0xbfbc, 0xd87edabf495b3762ULL)
+#define f2xm1_coeff_1 make_floatx80(0x3ffc, 0xf5fdeffc162c7543ULL)
+#define f2xm1_coeff_2 make_floatx80(0x3ffa, 0xe35846b82505fcc7ULL)
+#define f2xm1_coeff_3 make_floatx80(0x3ff8, 0x9d955b7dd273b899ULL)
+#define f2xm1_coeff_4 make_floatx80(0x3ff5, 0xaec3ff3c4ef4ac0cULL)
+#define f2xm1_coeff_5 make_floatx80(0x3ff2, 0xa184897c3a7f0de9ULL)
+#define f2xm1_coeff_6 make_floatx80(0x3fee, 0xffe634d0ec30d504ULL)
+#define f2xm1_coeff_7 make_floatx80(0x3feb, 0xb160111d2db515e4ULL)
+
+struct f2xm1_data {
+/*
+ * A value very close to a multiple of 1/32, such that 2^t and 2^t - 1
+ * are very close to exact floatx80 values.
+ */
+floatx80 t;
+/* The value of 2^t.  */
+floatx80 exp2;
+/* The value of 2^t - 1.  */
+floatx80 exp2m1;
+};
+
+static const struct f2xm1_data f2xm1_table[65] = {
+{ make_floatx80(0xbfff, 0x8000ULL),
+  make_floatx80(0x3ffe, 0x8000ULL),
+  make_floatx80(0xbffe, 0x8000ULL) },
+{ make_floatx80(0xbffe, 0xf8002e7eULL),
+  make_floatx80(0x3ffe, 0x82cd8698ac2b9160ULL),
+  make_floatx80(0xbffd, 0xfa64f2cea7a8dd40ULL) },
+{ make_floatx80(0xbffe, 0xefffe960ULL),
+  make_floatx80(0x3ffe, 0x85aac367cc488345ULL),
+  make_floatx80(0xbffd, 0xf4aa7930676ef976ULL) },
+{ make_floatx80(0xbffe, 0xe8006f10ULL),
+  make_floatx80(0x3ffe, 0x88980e8092da5c14ULL),
+  make_floatx80(0xbffd, 0xeecfe2feda4b47d8ULL) },
+{ make_floatx80(0xbffe, 0xe0008a45ULL),
+  make_floatx80(0x3ffe, 0x8b95c1e3ea8ba2a5ULL),
+  make_floatx80(0xbffd, 0xe8d47c382ae8bab6ULL) },
+{ make_floatx80(0xbffe, 0xd7ff8a9eULL),
+  make_floatx80(0x3ffe, 0x8ea4398b45cd8116ULL),
+  make_floatx80(0xbffd, 0xe2b78ce97464fdd4ULL) },
+{ make_floatx80(0xbffe, 0xd00019a0ULL),
+  make_floatx80(0x3ffe, 0x91c3d373ab11b919ULL),
+  make_floatx80(0xbffd, 0xdc785918a9dc8dceULL) },
+{ make_floatx80(0xbffe, 0xc7ff14dfULL),
+  make_floatx80(0x3ffe, 0x94f4efa8fef76836ULL),
+  make_floatx80(0xbffd, 0xd61620ae02112f94ULL) },
+{ make_floatx80(0xbffe, 0xc0006530ULL),
+  make_floatx80(0x3ffe,