Re: [PULL 14/31] target/i386: reimplement f2xm1 using floatx80 operations
On 06/24/20 12:50, Paolo Bonzini wrote: > From: Joseph Myers > > The x87 f2xm1 emulation is currently based around conversion to > double. This is inherently unsuitable for a good emulation of any > floatx80 operation, even before considering that it is a particularly > naive implementation using double (computing with pow and then > subtracting 1 rather than attempting a better emulation using expm1). > > Reimplement using the soft-float operations, including additions and > multiplications with higher precision where appropriate to limit > accumulation of errors. I considered reusing some of the m68k code > for transcendental operations, but the instructions don't generally > correspond exactly to x87 operations (for example, m68k has 2^x and > e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors > from applying multiple such operations each rounding to floatx80 > precision, I wrote a direct implementation of 2^x - 1 instead. It > would be possible in principle to make the implementation more > efficient by doing the intermediate operations directly with > significands, signs and exponents and not packing / unpacking floatx80 > format for each operation, but that would make it significantly more > complicated and it's not clear that's worthwhile; the m68k emulation > doesn't try to do that. > > A test is included with many randomly generated inputs. The > assumption of the test is that the result in round-to-nearest mode > should always be one of the two closest floating-point numbers to the > mathematical value of 2^x - 1; the implementation aims to do somewhat > better than that (about 70 correct bits before rounding). I haven't > investigated how accurate hardware is. > > Signed-off-by: Joseph Myers > > Message-Id: > Signed-off-by: Paolo Bonzini > --- > target/i386/fpu_helper.c | 385 +- > tests/tcg/i386/test-i386-f2xm1.c | 1140 ++ > 2 files changed, 1522 insertions(+), 3 deletions(-) > create mode 100644 tests/tcg/i386/test-i386-f2xm1.c > > diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c > index 8ef5b463ea..e32a2aa74b 100644 > --- a/target/i386/fpu_helper.c > +++ b/target/i386/fpu_helper.c > @@ -25,6 +25,7 @@ > #include "exec/exec-all.h" > #include "exec/cpu_ldst.h" > #include "fpu/softfloat.h" > +#include "fpu/softfloat-macros.h" > > #ifdef CONFIG_SOFTMMU > #include "hw/irq.h" > @@ -836,12 +837,390 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong > ptr) > merge_exception_flags(env, old_flags); > } > > +/* 128-bit significand of log(2). */ > +#define ln2_sig_high 0xb17217f7d1cf79abULL > +#define ln2_sig_low 0xc9e3b39803f2f6afULL > + > +/* > + * Polynomial coefficients for an approximation to (2^x - 1) / x, on > + * the interval [-1/64, 1/64]. > + */ > +#define f2xm1_coeff_0 make_floatx80(0x3ffe, 0xb17217f7d1cf79acULL) > +#define f2xm1_coeff_0_low make_floatx80(0xbfbc, 0xd87edabf495b3762ULL) > +#define f2xm1_coeff_1 make_floatx80(0x3ffc, 0xf5fdeffc162c7543ULL) > +#define f2xm1_coeff_2 make_floatx80(0x3ffa, 0xe35846b82505fcc7ULL) > +#define f2xm1_coeff_3 make_floatx80(0x3ff8, 0x9d955b7dd273b899ULL) > +#define f2xm1_coeff_4 make_floatx80(0x3ff5, 0xaec3ff3c4ef4ac0cULL) > +#define f2xm1_coeff_5 make_floatx80(0x3ff2, 0xa184897c3a7f0de9ULL) > +#define f2xm1_coeff_6 make_floatx80(0x3fee, 0xffe634d0ec30d504ULL) > +#define f2xm1_coeff_7 make_floatx80(0x3feb, 0xb160111d2db515e4ULL) > + > +struct f2xm1_data { > +/* > + * A value very close to a multiple of 1/32, such that 2^t and 2^t - 1 > + * are very close to exact floatx80 values. > + */ > +floatx80 t; > +/* The value of 2^t. */ > +floatx80 exp2; > +/* The value of 2^t - 1. */ > +floatx80 exp2m1; > +}; > + > +static const struct f2xm1_data f2xm1_table[65] = { > +{ make_floatx80(0xbfff, 0x8000ULL), This line (in commit eca30647fc07, "target/i386: reimplement f2xm1 using floatx80 operations", 2020-06-26) causes a build failure for me, with gcc-4.8 (on RHEL-7.6): > target/i386/fpu_helper.c:871:5: error: initializer element is not constant > { make_floatx80(0xbfff, 0x8000ULL), > ^ The macro is defined as follows: #define make_floatx80(exp, mant) ((floatx80) { mant, exp }) with floatx80 being: typedef struct { uint64_t low; uint16_t high; } floatx80; The macro produces a compound literal. While the compound literal occurs outside of the body of a function, and thus has static storage duration (per ISO C99 6.5.2.5p6), it is not a constant expression. The array object "f2xm1_table" has static storage duration, and "All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals" (ISO C99 6.7.8p4). Per ISO C99 6.6: > 7 More latitude is permitted for constant expressions in initializers. > Such a constant expression shall be, or evaluate to, one of the > following: > - an arithmetic
[PULL 14/31] target/i386: reimplement f2xm1 using floatx80 operations
From: Joseph Myers The x87 f2xm1 emulation is currently based around conversion to double. This is inherently unsuitable for a good emulation of any floatx80 operation, even before considering that it is a particularly naive implementation using double (computing with pow and then subtracting 1 rather than attempting a better emulation using expm1). Reimplement using the soft-float operations, including additions and multiplications with higher precision where appropriate to limit accumulation of errors. I considered reusing some of the m68k code for transcendental operations, but the instructions don't generally correspond exactly to x87 operations (for example, m68k has 2^x and e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors from applying multiple such operations each rounding to floatx80 precision, I wrote a direct implementation of 2^x - 1 instead. It would be possible in principle to make the implementation more efficient by doing the intermediate operations directly with significands, signs and exponents and not packing / unpacking floatx80 format for each operation, but that would make it significantly more complicated and it's not clear that's worthwhile; the m68k emulation doesn't try to do that. A test is included with many randomly generated inputs. The assumption of the test is that the result in round-to-nearest mode should always be one of the two closest floating-point numbers to the mathematical value of 2^x - 1; the implementation aims to do somewhat better than that (about 70 correct bits before rounding). I haven't investigated how accurate hardware is. Signed-off-by: Joseph Myers Message-Id: Signed-off-by: Paolo Bonzini --- target/i386/fpu_helper.c | 385 +- tests/tcg/i386/test-i386-f2xm1.c | 1140 ++ 2 files changed, 1522 insertions(+), 3 deletions(-) create mode 100644 tests/tcg/i386/test-i386-f2xm1.c diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c index 8ef5b463ea..e32a2aa74b 100644 --- a/target/i386/fpu_helper.c +++ b/target/i386/fpu_helper.c @@ -25,6 +25,7 @@ #include "exec/exec-all.h" #include "exec/cpu_ldst.h" #include "fpu/softfloat.h" +#include "fpu/softfloat-macros.h" #ifdef CONFIG_SOFTMMU #include "hw/irq.h" @@ -836,12 +837,390 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr) merge_exception_flags(env, old_flags); } +/* 128-bit significand of log(2). */ +#define ln2_sig_high 0xb17217f7d1cf79abULL +#define ln2_sig_low 0xc9e3b39803f2f6afULL + +/* + * Polynomial coefficients for an approximation to (2^x - 1) / x, on + * the interval [-1/64, 1/64]. + */ +#define f2xm1_coeff_0 make_floatx80(0x3ffe, 0xb17217f7d1cf79acULL) +#define f2xm1_coeff_0_low make_floatx80(0xbfbc, 0xd87edabf495b3762ULL) +#define f2xm1_coeff_1 make_floatx80(0x3ffc, 0xf5fdeffc162c7543ULL) +#define f2xm1_coeff_2 make_floatx80(0x3ffa, 0xe35846b82505fcc7ULL) +#define f2xm1_coeff_3 make_floatx80(0x3ff8, 0x9d955b7dd273b899ULL) +#define f2xm1_coeff_4 make_floatx80(0x3ff5, 0xaec3ff3c4ef4ac0cULL) +#define f2xm1_coeff_5 make_floatx80(0x3ff2, 0xa184897c3a7f0de9ULL) +#define f2xm1_coeff_6 make_floatx80(0x3fee, 0xffe634d0ec30d504ULL) +#define f2xm1_coeff_7 make_floatx80(0x3feb, 0xb160111d2db515e4ULL) + +struct f2xm1_data { +/* + * A value very close to a multiple of 1/32, such that 2^t and 2^t - 1 + * are very close to exact floatx80 values. + */ +floatx80 t; +/* The value of 2^t. */ +floatx80 exp2; +/* The value of 2^t - 1. */ +floatx80 exp2m1; +}; + +static const struct f2xm1_data f2xm1_table[65] = { +{ make_floatx80(0xbfff, 0x8000ULL), + make_floatx80(0x3ffe, 0x8000ULL), + make_floatx80(0xbffe, 0x8000ULL) }, +{ make_floatx80(0xbffe, 0xf8002e7eULL), + make_floatx80(0x3ffe, 0x82cd8698ac2b9160ULL), + make_floatx80(0xbffd, 0xfa64f2cea7a8dd40ULL) }, +{ make_floatx80(0xbffe, 0xefffe960ULL), + make_floatx80(0x3ffe, 0x85aac367cc488345ULL), + make_floatx80(0xbffd, 0xf4aa7930676ef976ULL) }, +{ make_floatx80(0xbffe, 0xe8006f10ULL), + make_floatx80(0x3ffe, 0x88980e8092da5c14ULL), + make_floatx80(0xbffd, 0xeecfe2feda4b47d8ULL) }, +{ make_floatx80(0xbffe, 0xe0008a45ULL), + make_floatx80(0x3ffe, 0x8b95c1e3ea8ba2a5ULL), + make_floatx80(0xbffd, 0xe8d47c382ae8bab6ULL) }, +{ make_floatx80(0xbffe, 0xd7ff8a9eULL), + make_floatx80(0x3ffe, 0x8ea4398b45cd8116ULL), + make_floatx80(0xbffd, 0xe2b78ce97464fdd4ULL) }, +{ make_floatx80(0xbffe, 0xd00019a0ULL), + make_floatx80(0x3ffe, 0x91c3d373ab11b919ULL), + make_floatx80(0xbffd, 0xdc785918a9dc8dceULL) }, +{ make_floatx80(0xbffe, 0xc7ff14dfULL), + make_floatx80(0x3ffe, 0x94f4efa8fef76836ULL), + make_floatx80(0xbffd, 0xd61620ae02112f94ULL) }, +{ make_floatx80(0xbffe, 0xc0006530ULL), + make_floatx80(0x3ffe,