Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-23 Thread Joseph Myers
On Tue, 23 Jun 2020, Paolo Bonzini wrote:

> On 23/06/20 02:01, Joseph Myers wrote:
> > The x87 fpatan emulation is currently based around conversion to
> > double.  This is inherently unsuitable for a good emulation of any
> > floatx80 operation.  Reimplement using the soft-float operations, as
> > for other such instructions.
> > 
> > Signed-off-by: Joseph Myers 
> 
> Queued, thanks.
> 
> Just one question: do recent processors still use the same CORDIC
> approximations as the 8087, and if so would it be better or simpler to
> do that instead of using a good implementation such as this one?

I don't know what approximations the processors use, but they're 
definitely different for at least some instructions between Intel and AMD 
processors (as shown by glibc test ulps baselines created on one processor 
sometimes needing increasing to work on other processors; avoiding test 
problems means the emulation needs to be at least as accurate as 
hardware).  (Whereas the AVX-512 approximation instructions have reference 
implementations for their exact semantics.)

-- 
Joseph S. Myers
jos...@codesourcery.com



Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/alpine.deb.2.21.200623340.24...@digraph.polyomino.org.uk/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  GEN docs/interop/qemu-qmp-ref.7
  CC  qga/commands.o
  CC  qga/guest-agent-command-state.o
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `  CC  qga/main.o
  CC  qga/commands-posix.o
  CC  qga/channel-posix.o
__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
---
  AR  libvhost-user.a
  AS  pc-bios/optionrom/multiboot.o
  AS  pc-bios/optionrom/linuxboot.o
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  CC  pc-bios/optionrom/linuxboot_dma.o
  GEN docs/interop/qemu-ga-ref.html
  GEN docs/interop/qemu-ga-ref.txt
---
  BUILD   pc-bios/optionrom/pvh.raw
  SIGNpc-bios/optionrom/pvh.bin
  LINKqemu-ga
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKqemu-keymap
  LINKivshmem-client
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKivshmem-server
  LINKqemu-nbd
  LINKqemu-storage-daemon
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from  
 LINKqemu-img
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKqemu-io
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKqemu-edid
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKfsdev/virtfs-proxy-helper
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKscsi/qemu-pr-helper
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKqemu-bridge-helper
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINKvirtiofsd
/usr/bin/ld: 
/usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o):
 warning: common of `__interception::real_vfork' overridden by definition from 

Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/alpine.deb.2.21.200623340.24...@digraph.polyomino.org.uk/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/target/arm/pauth_helper.o
  GEN trace/generated-helpers.c
/tmp/qemu-test/src/target/i386/fpu_helper.c: In function 'helper_fpatan':
/tmp/qemu-test/src/target/i386/fpu_helper.c:1098:17: error: implicit 
declaration of function 'shift128Right' [-Werror=implicit-function-declaration]
 shift128Right(remsig0, remsig1, 1, , );
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1098:17: error: nested extern 
declaration of 'shift128Right' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1101:13: error: implicit 
declaration of function 'estimateDiv128To64' 
[-Werror=implicit-function-declaration]
 xsig0 = estimateDiv128To64(remsig0, remsig1, den_sig);
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1101:13: error: nested extern 
declaration of 'estimateDiv128To64' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1102:13: error: implicit 
declaration of function 'mul64To128' [-Werror=implicit-function-declaration]
 mul64To128(den_sig, xsig0, , );
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1102:13: error: nested extern 
declaration of 'mul64To128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1103:13: error: implicit 
declaration of function 'sub128' [-Werror=implicit-function-declaration]
 sub128(remsig0, remsig1, msig0, msig1, , );
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1103:13: error: nested extern 
declaration of 'sub128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1106:17: error: implicit 
declaration of function 'add128' [-Werror=implicit-function-declaration]
 add128(remsig0, remsig1, 0, den_sig, , );
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1106:17: error: nested extern 
declaration of 'add128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1143:33: error: implicit 
declaration of function 'shift128Left' [-Werror=implicit-function-declaration]
 shift128Left(ysig0, ysig1, shift,
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1143:33: error: nested extern 
declaration of 'shift128Left' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1171:21: error: implicit 
declaration of function 'shift128RightJamming' 
[-Werror=implicit-function-declaration]
 shift128RightJamming(xsig0, xsig1, texp - xexp,
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1171:21: error: nested extern 
declaration of 'shift128RightJamming' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1200:17: error: implicit 
declaration of function 'mul128By64To192' 
[-Werror=implicit-function-declaration]
 mul128By64To192(xsig0, xsig1, tsig, , , );
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1200:17: error: nested extern 
declaration of 'mul128By64To192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1218:17: error: implicit 
declaration of function 'sub192' [-Werror=implicit-function-declaration]
 sub192(remsig0, remsig1, remsig2, msig0, msig1, msig2,
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1218:17: error: nested extern 
declaration of 'sub192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1222:21: error: implicit 
declaration of function 'add192' [-Werror=implicit-function-declaration]
 add192(remsig0, remsig1, remsig2, 0, dsig0, dsig1,
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1222:21: error: nested extern 
declaration of 'add192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1237:17: error: implicit 
declaration of function 'mul128To256' [-Werror=implicit-function-declaration]
 mul128To256(zsig0, zsig1, zsig0, zsig1,
 ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1237:17: error: nested extern 
declaration of 'mul128To256' [-Werror=nested-externs]
cc1: all warnings being treated as errors
  CC  aarch64-softmmu/trace/control-target.o
make[1]: *** [target/i386/fpu_helper.o] Error 1
make[1]: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/softmmu/main.o
  CC  aarch64-softmmu/target/arm/translate.o
  CC  aarch64-softmmu/gdbstub-xml.o
  CC  aarch64-softmmu/trace/generated-helpers.o
  CC  aarch64-softmmu/target/arm/translate-sve.o

Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/alpine.deb.2.21.200623340.24...@digraph.polyomino.org.uk/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/target/arm/translate-sve.o
  LINKaarch64-softmmu/qemu-system-aarch64w.exe
/tmp/qemu-test/src/target/i386/fpu_helper.c: In function 'helper_fpatan':
/tmp/qemu-test/src/target/i386/fpu_helper.c:1098:17: error: implicit 
declaration of function 'shift128Right' [-Werror=implicit-function-declaration]
 1098 | shift128Right(remsig0, remsig1, 1, , );
  | ^
/tmp/qemu-test/src/target/i386/fpu_helper.c:1098:17: error: nested extern 
declaration of 'shift128Right' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1101:21: error: implicit 
declaration of function 'estimateDiv128To64' 
[-Werror=implicit-function-declaration]
 1101 | xsig0 = estimateDiv128To64(remsig0, remsig1, den_sig);
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1101:21: error: nested extern 
declaration of 'estimateDiv128To64' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1102:13: error: implicit 
declaration of function 'mul64To128' [-Werror=implicit-function-declaration]
 1102 | mul64To128(den_sig, xsig0, , );
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1102:13: error: nested extern 
declaration of 'mul64To128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1103:13: error: implicit 
declaration of function 'sub128' [-Werror=implicit-function-declaration]
 1103 | sub128(remsig0, remsig1, msig0, msig1, , );
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1103:13: error: nested extern 
declaration of 'sub128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1106:17: error: implicit 
declaration of function 'add128' [-Werror=implicit-function-declaration]
 1106 | add128(remsig0, remsig1, 0, den_sig, , 
);
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1106:17: error: nested extern 
declaration of 'add128' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1143:33: error: implicit 
declaration of function 'shift128Left' [-Werror=implicit-function-declaration]
 1143 | shift128Left(ysig0, ysig1, shift,
  | ^~~~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1143:33: error: nested extern 
declaration of 'shift128Left' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1171:21: error: implicit 
declaration of function 'shift128RightJamming' 
[-Werror=implicit-function-declaration]
 1171 | shift128RightJamming(xsig0, xsig1, texp - xexp,
  | ^~~~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1171:21: error: nested extern 
declaration of 'shift128RightJamming' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1200:17: error: implicit 
declaration of function 'mul128By64To192' 
[-Werror=implicit-function-declaration]
 1200 | mul128By64To192(xsig0, xsig1, tsig, , , 
);
  | ^~~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1200:17: error: nested extern 
declaration of 'mul128By64To192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1218:17: error: implicit 
declaration of function 'sub192' [-Werror=implicit-function-declaration]
 1218 | sub192(remsig0, remsig1, remsig2, msig0, msig1, msig2,
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1218:17: error: nested extern 
declaration of 'sub192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1222:21: error: implicit 
declaration of function 'add192' [-Werror=implicit-function-declaration]
 1222 | add192(remsig0, remsig1, remsig2, 0, dsig0, dsig1,
  | ^~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1222:21: error: nested extern 
declaration of 'add192' [-Werror=nested-externs]
/tmp/qemu-test/src/target/i386/fpu_helper.c:1237:17: error: implicit 
declaration of function 'mul128To256' [-Werror=implicit-function-declaration]
 1237 | mul128To256(zsig0, zsig1, zsig0, zsig1,
  | ^~~
/tmp/qemu-test/src/target/i386/fpu_helper.c:1237:17: error: nested extern 
declaration of 'mul128To256' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: target/i386/fpu_helper.o] Error 1

Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread Paolo Bonzini
On 23/06/20 02:01, Joseph Myers wrote:
> The x87 fpatan emulation is currently based around conversion to
> double.  This is inherently unsuitable for a good emulation of any
> floatx80 operation.  Reimplement using the soft-float operations, as
> for other such instructions.
> 
> Signed-off-by: Joseph Myers 

Queued, thanks.

Just one question: do recent processors still use the same CORDIC
approximations as the 8087, and if so would it be better or simpler to
do that instead of using a good implementation such as this one?

Thanks,

Paolo




Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread Paolo Bonzini
On 23/06/20 02:01, Joseph Myers wrote:
> The x87 fpatan emulation is currently based around conversion to
> double.  This is inherently unsuitable for a good emulation of any
> floatx80 operation.  Reimplement using the soft-float operations, as
> for other such instructions.
> 
> Signed-off-by: Joseph Myers 
> 
> ---
> 
> Changes in version 2: adjust the "Dividing ST1 by ST0 gives the
> correct result." case to ensure correct exceptions, as well as a
> correctly rounded result in non-to-nearest modes, when the division is
> exact.
> ---
>  target/i386/fpu_helper.c  |  487 -
>  tests/tcg/i386/test-i386-fpatan.c | 1071 +
>  2 files changed, 1554 insertions(+), 4 deletions(-)
>  create mode 100644 tests/tcg/i386/test-i386-fpatan.c
> 
> diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
> index 62820bc735..71cec3962f 100644
> --- a/target/i386/fpu_helper.c
> +++ b/target/i386/fpu_helper.c
> @@ -1239,14 +1239,493 @@ void helper_fptan(CPUX86State *env)
>  }
>  }
>  
> +/* Values of pi/4, pi/2, 3pi/4 and pi, with 128-bit precision.  */
> +#define pi_4_exp 0x3ffe
> +#define pi_4_sig_high 0xc90fdaa22168c234ULL
> +#define pi_4_sig_low 0xc4c6628b80dc1cd1ULL
> +#define pi_2_exp 0x3fff
> +#define pi_2_sig_high 0xc90fdaa22168c234ULL
> +#define pi_2_sig_low 0xc4c6628b80dc1cd1ULL
> +#define pi_34_exp 0x4000
> +#define pi_34_sig_high 0x96cbe3f9990e91a7ULL
> +#define pi_34_sig_low 0x9394c9e8a0a5159dULL
> +#define pi_exp 0x4000
> +#define pi_sig_high 0xc90fdaa22168c234ULL
> +#define pi_sig_low 0xc4c6628b80dc1cd1ULL
> +
> +/*
> + * Polynomial coefficients for an approximation to atan(x), with only
> + * odd powers of x used, for x in the interval [-1/16, 1/16].  (Unlike
> + * for some other approximations, no low part is needed for the first
> + * coefficient here to achieve a sufficiently accurate result, because
> + * the coefficient in this minimax approximation is very close to
> + * exactly 1.)
> + */
> +#define fpatan_coeff_0 make_floatx80(0x3fff, 0x8000ULL)
> +#define fpatan_coeff_1 make_floatx80(0xbffd, 0xaa43ULL)
> +#define fpatan_coeff_2 make_floatx80(0x3ffc, 0xccbfe4f8ULL)
> +#define fpatan_coeff_3 make_floatx80(0xbffc, 0x92492491fbab2e66ULL)
> +#define fpatan_coeff_4 make_floatx80(0x3ffb, 0xe38e372881ea1e0bULL)
> +#define fpatan_coeff_5 make_floatx80(0xbffb, 0xba2c0104bbdd0615ULL)
> +#define fpatan_coeff_6 make_floatx80(0x3ffb, 0x9baf7ebf898b42efULL)
> +
> +struct fpatan_data {
> +/* High and low parts of atan(x).  */
> +floatx80 atan_high, atan_low;
> +};
> +
> +static const struct fpatan_data fpatan_table[9] = {
> +{ floatx80_zero,
> +  floatx80_zero },
> +{ make_floatx80(0x3ffb, 0xfeadd4d5617b6e33ULL),
> +  make_floatx80(0xbfb9, 0xdda19d8305ddc420ULL) },
> +{ make_floatx80(0x3ffc, 0xfadbafc96406eb15ULL),
> +  make_floatx80(0x3fbb, 0xdb8f3debef442fccULL) },
> +{ make_floatx80(0x3ffd, 0xb7b0ca0f26f78474ULL),
> +  make_floatx80(0xbfbc, 0xeab9bdba460376faULL) },
> +{ make_floatx80(0x3ffd, 0xed63382b0dda7b45ULL),
> +  make_floatx80(0x3fbc, 0xdfc88bd978751a06ULL) },
> +{ make_floatx80(0x3ffe, 0x8f005d5ef7f59f9bULL),
> +  make_floatx80(0x3fbd, 0xb906bc2ccb886e90ULL) },
> +{ make_floatx80(0x3ffe, 0xa4bc7d1934f70924ULL),
> +  make_floatx80(0x3fbb, 0xcd43f9522bed64f8ULL) },
> +{ make_floatx80(0x3ffe, 0xb8053e2bc2319e74ULL),
> +  make_floatx80(0xbfbc, 0xd3496ab7bd6eef0cULL) },
> +{ make_floatx80(0x3ffe, 0xc90fdaa22168c235ULL),
> +  make_floatx80(0xbfbc, 0xece675d1fc8f8cbcULL) },
> +};
> +
>  void helper_fpatan(CPUX86State *env)
>  {
> -double fptemp, fpsrcop;
> +uint8_t old_flags = save_exception_flags(env);
> +uint64_t arg0_sig = extractFloatx80Frac(ST0);
> +int32_t arg0_exp = extractFloatx80Exp(ST0);
> +bool arg0_sign = extractFloatx80Sign(ST0);
> +uint64_t arg1_sig = extractFloatx80Frac(ST1);
> +int32_t arg1_exp = extractFloatx80Exp(ST1);
> +bool arg1_sign = extractFloatx80Sign(ST1);
> +
> +if (floatx80_is_signaling_nan(ST0, >fp_status)) {
> +float_raise(float_flag_invalid, >fp_status);
> +ST1 = floatx80_silence_nan(ST0, >fp_status);
> +} else if (floatx80_is_signaling_nan(ST1, >fp_status)) {
> +float_raise(float_flag_invalid, >fp_status);
> +ST1 = floatx80_silence_nan(ST1, >fp_status);
> +} else if (floatx80_invalid_encoding(ST0) ||
> +   floatx80_invalid_encoding(ST1)) {
> +float_raise(float_flag_invalid, >fp_status);
> +ST1 = floatx80_default_nan(>fp_status);
> +} else if (floatx80_is_any_nan(ST0)) {
> +ST1 = ST0;
> +} else if (floatx80_is_any_nan(ST1)) {
> +/* Pass this NaN through.  */
> +} else if (floatx80_is_zero(ST1) && !arg0_sign) {
> +/* Pass this zero through.  */
> +} else if (((floatx80_is_infinity(ST0) && !floatx80_is_infinity(ST1)) ||
> + arg0_exp - arg1_exp >= 80) &&
> 

[PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread Joseph Myers
The x87 fpatan emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float operations, as
for other such instructions.

Signed-off-by: Joseph Myers 

---

Changes in version 2: adjust the "Dividing ST1 by ST0 gives the
correct result." case to ensure correct exceptions, as well as a
correctly rounded result in non-to-nearest modes, when the division is
exact.
---
 target/i386/fpu_helper.c  |  487 -
 tests/tcg/i386/test-i386-fpatan.c | 1071 +
 2 files changed, 1554 insertions(+), 4 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fpatan.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 62820bc735..71cec3962f 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1239,14 +1239,493 @@ void helper_fptan(CPUX86State *env)
 }
 }
 
+/* Values of pi/4, pi/2, 3pi/4 and pi, with 128-bit precision.  */
+#define pi_4_exp 0x3ffe
+#define pi_4_sig_high 0xc90fdaa22168c234ULL
+#define pi_4_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_2_exp 0x3fff
+#define pi_2_sig_high 0xc90fdaa22168c234ULL
+#define pi_2_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_34_exp 0x4000
+#define pi_34_sig_high 0x96cbe3f9990e91a7ULL
+#define pi_34_sig_low 0x9394c9e8a0a5159dULL
+#define pi_exp 0x4000
+#define pi_sig_high 0xc90fdaa22168c234ULL
+#define pi_sig_low 0xc4c6628b80dc1cd1ULL
+
+/*
+ * Polynomial coefficients for an approximation to atan(x), with only
+ * odd powers of x used, for x in the interval [-1/16, 1/16].  (Unlike
+ * for some other approximations, no low part is needed for the first
+ * coefficient here to achieve a sufficiently accurate result, because
+ * the coefficient in this minimax approximation is very close to
+ * exactly 1.)
+ */
+#define fpatan_coeff_0 make_floatx80(0x3fff, 0x8000ULL)
+#define fpatan_coeff_1 make_floatx80(0xbffd, 0xaa43ULL)
+#define fpatan_coeff_2 make_floatx80(0x3ffc, 0xccbfe4f8ULL)
+#define fpatan_coeff_3 make_floatx80(0xbffc, 0x92492491fbab2e66ULL)
+#define fpatan_coeff_4 make_floatx80(0x3ffb, 0xe38e372881ea1e0bULL)
+#define fpatan_coeff_5 make_floatx80(0xbffb, 0xba2c0104bbdd0615ULL)
+#define fpatan_coeff_6 make_floatx80(0x3ffb, 0x9baf7ebf898b42efULL)
+
+struct fpatan_data {
+/* High and low parts of atan(x).  */
+floatx80 atan_high, atan_low;
+};
+
+static const struct fpatan_data fpatan_table[9] = {
+{ floatx80_zero,
+  floatx80_zero },
+{ make_floatx80(0x3ffb, 0xfeadd4d5617b6e33ULL),
+  make_floatx80(0xbfb9, 0xdda19d8305ddc420ULL) },
+{ make_floatx80(0x3ffc, 0xfadbafc96406eb15ULL),
+  make_floatx80(0x3fbb, 0xdb8f3debef442fccULL) },
+{ make_floatx80(0x3ffd, 0xb7b0ca0f26f78474ULL),
+  make_floatx80(0xbfbc, 0xeab9bdba460376faULL) },
+{ make_floatx80(0x3ffd, 0xed63382b0dda7b45ULL),
+  make_floatx80(0x3fbc, 0xdfc88bd978751a06ULL) },
+{ make_floatx80(0x3ffe, 0x8f005d5ef7f59f9bULL),
+  make_floatx80(0x3fbd, 0xb906bc2ccb886e90ULL) },
+{ make_floatx80(0x3ffe, 0xa4bc7d1934f70924ULL),
+  make_floatx80(0x3fbb, 0xcd43f9522bed64f8ULL) },
+{ make_floatx80(0x3ffe, 0xb8053e2bc2319e74ULL),
+  make_floatx80(0xbfbc, 0xd3496ab7bd6eef0cULL) },
+{ make_floatx80(0x3ffe, 0xc90fdaa22168c235ULL),
+  make_floatx80(0xbfbc, 0xece675d1fc8f8cbcULL) },
+};
+
 void helper_fpatan(CPUX86State *env)
 {
-double fptemp, fpsrcop;
+uint8_t old_flags = save_exception_flags(env);
+uint64_t arg0_sig = extractFloatx80Frac(ST0);
+int32_t arg0_exp = extractFloatx80Exp(ST0);
+bool arg0_sign = extractFloatx80Sign(ST0);
+uint64_t arg1_sig = extractFloatx80Frac(ST1);
+int32_t arg1_exp = extractFloatx80Exp(ST1);
+bool arg1_sign = extractFloatx80Sign(ST1);
+
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST0, >fp_status);
+} else if (floatx80_is_signaling_nan(ST1, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST1, >fp_status);
+} else if (floatx80_invalid_encoding(ST0) ||
+   floatx80_invalid_encoding(ST1)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_default_nan(>fp_status);
+} else if (floatx80_is_any_nan(ST0)) {
+ST1 = ST0;
+} else if (floatx80_is_any_nan(ST1)) {
+/* Pass this NaN through.  */
+} else if (floatx80_is_zero(ST1) && !arg0_sign) {
+/* Pass this zero through.  */
+} else if (((floatx80_is_infinity(ST0) && !floatx80_is_infinity(ST1)) ||
+ arg0_exp - arg1_exp >= 80) &&
+   !arg0_sign) {
+/*
+ * Dividing ST1 by ST0 gives the correct result up to
+ * rounding, and avoids spurious underflow exceptions that
+ * might result from passing some small values through the
+ * polynomial