from:"Joseph Myers"

Re: [Qemu-devel] undefined behavior of signed left shifts (was Re: [PULL 00/40] ppc patch queue 2015-06-03)

2015-06-05 Thread Joseph Myers

On Fri, 5 Jun 2015, Paolo Bonzini wrote:

 The GCC manual says GCC does not use the latitude given in C99 and C11
 only to treat certain aspects of signed '' as undefined, but this is
 subject to change.  It would certainly be nice if they removed the
 this is subject to change part.

The correct statement would be more complicated.  That is: the value 
returned is as documented, without that latitude being used for 
*optimization*, but (a) -fsanitize=undefined (and its subcase 
-fsanitize=shift) intends to follow exactly what the different standards 
specify when giving runtime errors and (b) the cases that are undefined 
are thereby not considered integer constant expressions (with consequent 
pedwarns-if-pedantic in various cases, and corner case effects on what's a 
null pointer constant).  (The only subject to change would be that if 
there are still missing cases from the runtime detection or the not 
treating as integer constant expressions, then those missing cases may be 
fixed.  I don't think it would be a good idea to add optimizations on this 
basis - for example, optimizations of x * 2 based on undefined overflow 
should not be applied to x  1.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Qemu-devel] [PATCH] tcg: increase MAX_OP_PER_INSTR to 395

2016-09-23 Thread Joseph Myers

On Fri, 23 Sep 2016, Richard Henderson wrote:

> While increasing the max per insn is indeed one way to approach this, aarch64
> is being remarkably inefficient in this case.  With the following, I see a
> reduction from 387 ops to 261 ops; for a 64-bit host, the reduction is from
> 258 ops to 195 ops.

261 ops plus ops generated in gen_intermediate_code_a64 after the loop 
plus ops from optimization may still require an increase from 266, of 
course (I don't know how to bound the number of ops space must still be 
available for after translating an instruction has resulted in 
tcg_op_buf_full() being true, but my testing had cases where it was at 
least 8).

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] tcg: increase MAX_OP_PER_INSTR to 395

2016-09-22 Thread Joseph Myers

MAX_OP_PER_INSTR is currently 266, reported in commit
14dcdac82f398cbac874c8579b9583fab31c67bf to be the worst case for the
ARM A64 decoder.

Whether or not it was in fact the worst case at that time in 2014, I'm
observing the instruction 0x4c006020 (st1 {v0.16b-v2.16b}, [x1])
generate 386 ops from disas_ldst_multiple_struct with current sources,
plus one op from the call to tcg_gen_insn_start in the loop in
gen_intermediate_code_a64.  Furthermore, I see six ops generated after
the loop in gen_intermediate_code_a64, and at least two added
subsequently in optimization, so MAX_OP_PER_INSTR needs to be at least
395.  I do not know whether other instructions, or code during or
after the loop in gen_intermediate_code_a64, might actually require
the value to be bigger than 395 (possibly depending on the
instructions translated before the one generating 386 ops), just that
395 is definitely needed for a GCC testcase that generates that
particular instruction.  So if there is a reliable methodology to
determine the maximum number of ops that might be generated in (one
pass through that loop, plus the code after that loop, plus
optimization), it should be used instead, and might result in a higher
figure (or maybe a higher figure would be safer anyway).

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/tcg/tcg.h b/tcg/tcg.h
index c9949aa..a7fa452 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -32,7 +32,7 @@
 #include "tcg-target.h"
 
 /* XXX: make safe guess about sizes */
-#define MAX_OP_PER_INSTR 266
+#define MAX_OP_PER_INSTR 395
 
 #if HOST_LONG_BITS == 32
 #define MAX_OPC_PARAM_PER_ARG 2

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Qemu-devel] [PATCH] tcg: increase MAX_OP_PER_INSTR to 395

2016-09-23 Thread Joseph Myers

On Fri, 23 Sep 2016, Laurent Desnogues wrote:

> Hello,
> 
> On Fri, Sep 23, 2016 at 1:53 AM, Joseph Myers <jos...@codesourcery.com> wrote:
> > MAX_OP_PER_INSTR is currently 266, reported in commit
> > 14dcdac82f398cbac874c8579b9583fab31c67bf to be the worst case for the
> > ARM A64 decoder.
> >
> > Whether or not it was in fact the worst case at that time in 2014, I'm
> > observing the instruction 0x4c006020 (st1 {v0.16b-v2.16b}, [x1])
> > generate 386 ops from disas_ldst_multiple_struct with current sources,
> 
> Something's odd, I get exactly half of that with 193.

Does the number of ops depend on the system for which TCG is generating 
code?  (I'm building QEMU for 32-bit x86.)  If 32-bit systems require 
twice as many ops as 64-bit systems, maybe the existing value should be 
doubled, so using 532 (plus whatever is needed to allow for extra ops from 
optimization etc.) - or made conditional on the system for which code is 
generated.

> That being said st1 {v0.16b-v3.16b}, [x1], #64 generates even more ops 
> with 258.

My empirical observations are only from examining cases where QEMU gives 
errors running GCC tests; it's quite possible some instructions aren't 
covered, or that the relevant tests got lucky and avoided buffer overruns 
despite generating too many ops.

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] tcg: correct 32-bit tcg_gen_ld8s_i64 sign-extension

2016-10-27 Thread Joseph Myers

The version of tcg_gen_ld8s_i64 for 32-bit systems does a load into
the low part of the return value - then attempts a sign extension into
the high part, but wrongly sets the high part to a sign extension of
itself rather than of the low part.  This results in TCG internal
errors from the use of the uninitialized high part (in some GCC tests
of AArch64 NEON shift intrinsics, in particular).  This patch corrects
the sign-extension logic, making it match other functions such as
tcg_gen_ld16s_i64.

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index bb2bfee..43d34ea 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -790,7 +790,7 @@ void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, 
tcg_target_long offset)
 void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset)
 {
 tcg_gen_ld8s_i32(TCGV_LOW(ret), arg2, offset);
-tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), 31);
+tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
 }
 
 void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset)

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: fix pcmpxstrx substring search

2017-08-10 Thread Joseph Myers

One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri /
pcmpistrm instructions does a substring search.  The implementation of
this case in the pcmpxstrx helper is incorrect.  The operation in this
case is a search for a string (argument d to the helper) in another
string (argument s to the helper); if a copy of d at a particular
position would run off the end of s, the resulting output bit should
be 0 whether or not the strings match in the region where they
overlap, but the QEMU implementation was wrongly comparing only up to
the point where s ends and counting it as a match if an initial
segment of d matched a terminal segment of s.  Here, "run off the end
of s" means that some byte of d would overlap some byte outside of s;
thus, if d has zero length, it is considered to match everywhere,
including after the end of s.  This patch fixes the implementation to
correspond with the proper instruction semantics.  This fixes four gcc
test failures in my GCC 6-based testing.

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 16509d0..9f1b351 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2037,10 +2040,14 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg 
*d, Reg *s,
 }
 break;
 case 3:
-for (j = valids; j >= 0; j--) {
+if (validd == -1) {
+res = (2 << upper) - 1;
+break;
+}
+for (j = valids - validd; j >= 0; j--) {
 res <<= 1;
 v = 1;
-for (i = MIN(valids - j, validd); i >= 0; i--) {
+for (i = validd; i >= 0; i--) {
 v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
 }
 res |= v;

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: fix phminposuw in-place operation

2017-08-11 Thread Joseph Myers

The SSE4.1 phminposuw instruction finds the minimum 16-bit element in
the source vector, putting the value of that element in the low 16
bits of the destination vector, the index of that element in the next
three bits and zeroing the rest of the destination.  The helper for
this operation fills the destination from high to low, meaning that
when the source and destination are the same register, the minimum
source element can be overwritten before it is copied to the
destination.  This patch fixes it to fill the destination from low to
high instead, so the minimum source element is always copied first.
This fixes one gcc test failure in my GCC 6-based testing (and so
concludes the present sequence of patches, as I don't have any further
gcc test failures left in that testing that I attribute to QEMU bugs).

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 16509d0..ed05989 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1707,10 +1710,10 @@ void glue(helper_phminposuw, SUFFIX)(CPUX86State *env, 
Reg *d, Reg *s)
 idx = 7;
 }
 
-d->Q(1) = 0;
-d->L(1) = 0;
-d->W(1) = idx;
 d->W(0) = s->W(idx);
+d->W(1) = idx;
+d->L(1) = 0;
+d->Q(1) = 0;
 }
 
 void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: set rip_offset for further SSE instructions

2017-08-08 Thread Joseph Myers

It turns out that my recent fix to set rip_offset when emulating some
SSE4.1 instructions needs generalizing to cover a wider class of
instructions.  Specifically, every instruction in the sse_op_table7
table, coming from various instruction set extensions, has an 8-bit
immediate operand that comes after any memory operand, and so needs
rip_offset set for correctness if there is a memory operand that is
rip-relative, and my patch only set it for a subset of those
instructions.  This patch moves the rip_offset setting to cover the
wider class of instructions, so fixing 9 further gcc testsuite
failures in my GCC 6-based testing.  (I do not know whether there
might be still further classes of instructions missing this setting.)

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 5fdadf9..95f7261 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4077,10 +4077,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 if (!(s->cpuid_ext_features & sse_op_table7[b].ext_mask))
 goto illegal_op;
 
+s->rip_offset = 1;
+
 if (sse_fn_eppi == SSE_SPECIAL) {
 ot = mo_64_32(s->dflag);
 rm = (modrm & 7) | REX_B(s);
-s->rip_offset = 1;
 if (mod != 3)
 gen_lea_modrm(env, s, modrm);
 reg = ((modrm >> 3) & 7) | rex_r;

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: set rip_offset for some SSE4.1 instructions

2017-08-07 Thread Joseph Myers

When emulating various SSE4.1 instructions such as pinsrd, the address
of a memory operand is computed without allowing for the 8-bit
immediate operand located after the memory operand, meaning that the
memory operand uses the wrong address in the case where it is
rip-relative.  This patch adds the required rip_offset setting for
those instructions, so fixing some GCC test failures (13 in the gcc
testsuite in my GCC 6-based testing) when testing with a default CPU
setting enabling those instructions.

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/translate.c b/target/i386/translate.c
index cab9e32..5fdadf9 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4080,6 +4080,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 if (sse_fn_eppi == SSE_SPECIAL) {
 ot = mo_64_32(s->dflag);
 rm = (modrm & 7) | REX_B(s);
+s->rip_offset = 1;
 if (mod != 3)
 gen_lea_modrm(env, s, modrm);
 reg = ((modrm >> 3) & 7) | rex_r;

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: fix pmovsx/pmovzx in-place operations

2017-08-08 Thread Joseph Myers

The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
or 4-byte inputs and sign-extend or zero-extend them to a wider vector
output.  The associated helpers for these instructions do the
extension on each element in turn, starting with the lowest.  If the
input and output are the same register, this means that all the input
elements after the first have been overwritten before they are read.
This patch makes the helpers extend starting with the highest element,
not the lowest, to avoid such overwriting.  This fixes many GCC test
failures (161 in the gcc testsuite in my GCC 6-based testing) when
testing with a default CPU setting enabling those instructions.

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 16509d0..d578216 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1617,18 +1617,18 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg 
*d, Reg *s)
 #define SSE_HELPER_F(name, elem, num, F)\
 void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \
 {   \
-d->elem(0) = F(0);  \
-d->elem(1) = F(1);  \
 if (num > 2) {  \
-d->elem(2) = F(2);  \
-d->elem(3) = F(3);  \
 if (num > 4) {  \
-d->elem(4) = F(4);  \
-d->elem(5) = F(5);  \
-d->elem(6) = F(6);  \
 d->elem(7) = F(7);  \
+d->elem(6) = F(6);  \
+d->elem(5) = F(5);  \
+d->elem(4) = F(4);  \
 }   \
+d->elem(3) = F(3);  \
+d->elem(2) = F(2);  \
 }   \
+d->elem(1) = F(1);  \
+d->elem(0) = F(0);  \
 }
 
 SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B)

-- 
Joseph S. Myers
jos...@codesourcery.com

[Qemu-devel] [PATCH] target/i386: fix packusdw in-place operation

2017-08-09 Thread Joseph Myers

The SSE4.1 packusdw instruction combines source and destination
vectors of signed 32-bit integers into a single vector of unsigned
16-bit integers, with unsigned saturation.  When the source and
destination are the same register, this means each 32-bit element of
that register is used twice as an input, to produce two of the 16-bit
output elements, and so if the operation is carried out
element-by-element in-place, no matter what the order in which it is
applied to the elements, the first element's operation will overwrite
some future input.  The helper for packssdw avoids this issue by
computing the result in a local temporary and copying it to the
destination at the end; this patch fixes the packusdw helper to do
likewise.  This fixes three gcc test failures in my GCC 6-based
testing.

Signed-off-by: Joseph Myers <jos...@codesourcery.com>

---

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 16509d0..05b1701 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1655,14 +1655,17 @@ SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ)
 
 void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-d->W(0) = satuw((int32_t) d->L(0));
-d->W(1) = satuw((int32_t) d->L(1));
-d->W(2) = satuw((int32_t) d->L(2));
-d->W(3) = satuw((int32_t) d->L(3));
-d->W(4) = satuw((int32_t) s->L(0));
-d->W(5) = satuw((int32_t) s->L(1));
-d->W(6) = satuw((int32_t) s->L(2));
-d->W(7) = satuw((int32_t) s->L(3));
+Reg r;
+
+r.W(0) = satuw((int32_t) d->L(0));
+r.W(1) = satuw((int32_t) d->L(1));
+r.W(2) = satuw((int32_t) d->L(2));
+r.W(3) = satuw((int32_t) d->L(3));
+r.W(4) = satuw((int32_t) s->L(0));
+r.W(5) = satuw((int32_t) s->L(1));
+r.W(6) = satuw((int32_t) s->L(2));
+r.W(7) = satuw((int32_t) s->L(3));
+*d = r;
 }
 
 #define FMINSB(d, s) MIN((int8_t)d, (int8_t)s)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation

2018-12-31 Thread Joseph Myers

On Fri, 28 Dec 2018, Adhemerval Zanella wrote:

> >> Currently we only have nios2 and csky (unfortunately).  But since generic 
> >> definition for off_t and off64_t still assumes non-LFS support, all new
> >> 32-bits ports potentially might carry the issue.
> > 
> > For csky, we could still change the type of the non-standard d_off
> > field to long long int.  This way, only telldir would have to fail
> > when truncation is necessary, as mentioned below:
> 
> I think it makes no sense to continue making non-LFS as default for
> newer 32 bits ports, the support will be emulated with LFS syscalls.

Any new 32-bit port that uses 64-bit time_t will also use 64-bit offsets 
(because we don't have any glibc configurations that support the 
combination of 64-bit time with 32-bit offsets, and don't want to add 
them).  That should apply for RISC-V 32-bit at least.

I've filed  for 
missing overflow checks in telldir when the default off_t is wider than 
long int (currently just applies to x32; not sure why we don't see glibc 
test failures on x32 resulting from the quiet truncation, as the issue is 
certainly there in the source code).

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 2/4] softfloat: fix floatx80 pseudo-denormal addition / subtraction

2020-04-30 Thread Joseph Myers

The softfloat function addFloatx80Sigs, used for addition of values
with the same sign and subtraction of values with opposite sign, fails
to handle the case where the two values both have biased exponent zero
and there is a carry resulting from adding the significands, which can
occur if one or both values are pseudo-denormals (biased exponent
zero, explicit integer bit 1).  Add a check for that case, so making
the results match those seen on x86 hardware for pseudo-denormals.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ac116c70b8..6094d267b5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5866,6 +5866,12 @@ static floatx80 addFloatx80Sigs(floatx80 a, floatx80 b, 
flag zSign,
 zSig1 = 0;
 zSig0 = aSig + bSig;
 if ( aExp == 0 ) {
+if ((aSig | bSig) & UINT64_C(0x8000) && zSig0 < aSig) {
+/* At least one of the values is a pseudo-denormal,
+ * and there is a carry out of the result.  */
+zExp = 1;
+goto shiftRight1;
+}
 if (zSig0 == 0) {
 return packFloatx80(zSign, 0, 0);
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 3/4] softfloat: fix floatx80 pseudo-denormal comparisons

2020-04-30 Thread Joseph Myers

The softfloat floatx80 comparisons fail to allow for pseudo-denormals,
which should compare equal to corresponding values with biased
exponent 1 rather than 0.  Add an adjustment for that case when
comparing numbers with the same sign.

Note that this fix only changes floatx80_compare_internal, not the
other more specific comparison operations.  That is the only
comparison function for floatx80 used in the i386 port, which is the
only supported port with these pseudo-denormal semantics.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6094d267b5..8e9c714e6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7966,6 +7966,11 @@ static inline int floatx80_compare_internal(floatx80 a, 
floatx80 b,
 return 1 - (2 * aSign);
 }
 } else {
+/* Normalize pseudo-denormals before comparison.  */
+if ((a.high & 0x7fff) == 0 && a.low & UINT64_C(0x8000))
+++a.high;
+if ((b.high & 0x7fff) == 0 && b.low & UINT64_C(0x8000))
+++b.high;
 if (a.low == b.low && a.high == b.high) {
 return float_relation_equal;
 } else {
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 4/4] softfloat: fix floatx80 pseudo-denormal round to integer

2020-04-30 Thread Joseph Myers

The softfloat function floatx80_round_to_int incorrectly handles the
case of a pseudo-denormal where only the high bit of the significand
is set, ignoring that bit (treating the number as an exact zero)
rather than treating the number as an alternative representation of
+/- 2^-16382 (which may round to +/- 1 depending on the rounding mode)
as hardware does.  Fix this check (simplifying the code in the
process).

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8e9c714e6f..e29b07542a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5741,7 +5741,7 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status 
*status)
 }
 if ( aExp < 0x3FFF ) {
 if (( aExp == 0 )
- && ( (uint64_t) ( extractFloatx80Frac( a )<<1 ) == 0 ) ) {
+ && ( (uint64_t) ( extractFloatx80Frac( a ) ) == 0 ) ) {
 return a;
 }
 status->float_exception_flags |= float_flag_inexact;
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 0/4] softfloat: fix floatx80 emulation bugs

2020-04-30 Thread Joseph Myers

Attempting to run the GCC and glibc testsuites for i686 under QEMU
shows up a range of bugs in the x87 floating-point emulation.  This
series fixes some bugs (found both through those testsuites and
through subsequent code inspection) that appear to be in the softfloat
code itself rather than in the target/i386 code; I intend to address
such bugs in target/i386 separately.

Note that the floatx80 code is used for both i386 and m68k emulation,
but the two variants of the floatx80 format are not entirely
compatible.  Where the code should do different things for i386 and
m68k, it consistently only does the thing that is right for i386, not
the thing that is right for m68k, and my patches (specifically, the
second and third patches) continue this, doing the things that are
right for i386 but not for m68k.

Specifically, the formats have the following differences (based on
documentation; I don't have m68k hardware to test):

* For m68k, the explicit integer bit of the significand may be either
  0 or 1 for infinities and NaNs, but for i386 it must be 1 and having
  0 there makes it an invalid encoding.

* For i386, when the biased exponent is 0, this is interpreted the
  same way as a biased exponent of 0 in an IEEE format; an explicit
  integer bit 0 means a subnormal value while an explicit integer bit
  1 means a pseudo-denormal; the integer bit has value 2^-16382, as
  for a biased exponent of 1.  For m68k, a biased exponent of 0
  results in the integer bit having value 2^-16383, so values with
  integer bit 1 are normal and those with integer bit 0 are
  subnormal.  So the least subnormal value is 2^-16445 for i386 and
  2^-16446 for m68k.  (This means that the i386 floatx80 format meets
  the IEEE definition of an extended format, which requires a certain
  relation between the largest and smallest exponents, but the m68k
  floatx80 format does not meet that definition.)

  Patches 2 and 3 in this series deal with pseudo-denormals in a way
  that is correct for i386 but not for m68k; to support the m68k
  format properly, the new code in patch 3 could simply be disabled
  for m68k, but addition / subtraction would need more complicated
  changes to be correct for m68k and just disabling the new code would
  not make it correct (likewise, various changes elsewhere in the
  softfloat code would be needed to handle the m68k semantics for
  biased exponent 0).

Joseph Myers (4):
  softfloat: silence sNaN for conversions to/from floatx80
  softfloat: fix floatx80 pseudo-denormal addition / subtraction
  softfloat: fix floatx80 pseudo-denormal comparisons
  softfloat: fix floatx80 pseudo-denormal round to integer

 fpu/softfloat.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 1/4] softfloat: silence sNaN for conversions to/from floatx80

2020-04-30 Thread Joseph Myers

Conversions between IEEE floating-point formats should convert
signaling NaNs to quiet NaNs.  Most of those in QEMU's softfloat code
do so, but those for floatx80 fail to.  Fix those conversions to
silence signaling NaNs as well.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ae6ba71854..ac116c70b8 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -4498,7 +4498,9 @@ floatx80 float32_to_floatx80(float32 a, float_status 
*status)
 aSign = extractFloat32Sign( a );
 if ( aExp == 0xFF ) {
 if (aSig) {
-return commonNaNToFloatx80(float32ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float32ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5016,7 +5018,9 @@ floatx80 float64_to_floatx80(float64 a, float_status 
*status)
 aSign = extractFloat64Sign( a );
 if ( aExp == 0x7FF ) {
 if (aSig) {
-return commonNaNToFloatx80(float64ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float64ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5618,7 +5622,9 @@ float32 floatx80_to_float32(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat32(floatx80ToCommonNaN(a, status), status);
+float32 res = commonNaNToFloat32(floatx80ToCommonNaN(a, status),
+ status);
+return float32_silence_nan(res, status);
 }
 return packFloat32( aSign, 0xFF, 0 );
 }
@@ -5650,7 +5656,9 @@ float64 floatx80_to_float64(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat64(floatx80ToCommonNaN(a, status), status);
+float64 res = commonNaNToFloat64(floatx80ToCommonNaN(a, status),
+ status);
+return float64_silence_nan(res, status);
 }
 return packFloat64( aSign, 0x7FF, 0 );
 }
@@ -5681,7 +5689,9 @@ float128 floatx80_to_float128(floatx80 a, float_status 
*status)
 aExp = extractFloatx80Exp( a );
 aSign = extractFloatx80Sign( a );
 if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat128(floatx80ToCommonNaN(a, status), status);
+float128 res = commonNaNToFloat128(floatx80ToCommonNaN(a, status),
+   status);
+return float128_silence_nan(res, status);
 }
 shift128Right( aSig<<1, 0, 16, ,  );
 return packFloat128( aSign, aExp, zSig0, zSig1 );
@@ -6959,7 +6969,9 @@ floatx80 float128_to_floatx80(float128 a, float_status 
*status)
 aSign = extractFloat128Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( aSig0 | aSig1 ) {
-return commonNaNToFloatx80(float128ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float128ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign, floatx80_infinity_high,
floatx80_infinity_low);
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/4] softfloat: fix floatx80 pseudo-denormal addition / subtraction

2020-05-01 Thread Joseph Myers

On Fri, 1 May 2020, Alex Bennée wrote:

> 
> Joseph Myers  writes:
> 
> > The softfloat function addFloatx80Sigs, used for addition of values
> > with the same sign and subtraction of values with opposite sign, fails
> > to handle the case where the two values both have biased exponent zero
> > and there is a carry resulting from adding the significands, which can
> > occur if one or both values are pseudo-denormals (biased exponent
> > zero, explicit integer bit 1).  Add a check for that case, so making
> > the results match those seen on x86 hardware for pseudo-denormals.
> 
> Hmm running the super detailed test:
> 
>   fp-test -s -l 2 -r all  extF80_add extF80_sub
> 
> I don't see any difference between before and after the patch. This
> makes me wonder if we are (or rather TestFloat) is missing something in
> it's test case.

It could well only be testing kinds of floating-point representations that 
are meaningful in IEEE interchange formats.  Pseudo-denormals don't exist 
in IEEE interchange formats (and nor do pseudo-NaNs, pseudo-infinities and 
un-normals, which are dealt with in floatx80_invalid_encoding).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 3/4] softfloat: fix floatx80 pseudo-denormal comparisons

2020-05-01 Thread Joseph Myers

On Fri, 1 May 2020, Alex Bennée wrote:

> 
> Joseph Myers  writes:
> 
> > The softfloat floatx80 comparisons fail to allow for pseudo-denormals,
> > which should compare equal to corresponding values with biased
> > exponent 1 rather than 0.  Add an adjustment for that case when
> > comparing numbers with the same sign.
> >
> > Note that this fix only changes floatx80_compare_internal, not the
> > other more specific comparison operations.  That is the only
> > comparison function for floatx80 used in the i386 port, which is the
> > only supported port with these pseudo-denormal semantics.
> 
> Again I can't see anything that triggers this although I noticed
> le_quiet has been fixed in the meantime. lt_quiet still fails with:

It looks like this test is only testing the separate comparison functions, 
which aren't used in the i386 port and which I didn't change, not anything 
that uses floatx80_compare_internal.  (That's apart from probably not 
covering pseudo-denormals either.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 3/4] softfloat: fix floatx80 pseudo-denormal comparisons

2020-05-01 Thread Joseph Myers

On Fri, 1 May 2020, Alex Bennée wrote:

> OK - so these only turn up in i386?

Patch 1, silencing sNaN, is about generic semantics of IEEE floating-point 
conversions (which are implemented correctly in various other cases in 
QEMU), and would be equally applicable to m68k (I believe, without having 
m68k hardware to test).

Patches 2 and 3 are i386-specific (just like everything in the existing 
softfloat code relating to floatx80 subnormals), because m68k interprets 
biased exponent zero differently.

Patch 4 would apply equally to m68k, because all that matters there is 
that a certain representation is a small nonzero value, not exactly what 
value it is.

None of these apply to any other architectures supported by QEMU.

> We have two tests currently (float_convs and float_madds) which
> currently exercise the various combinations of limits and NaN types
> using some common float_helpers.c support. Maybe extend it for have a
> table of the various ext80 types and write a i386 only test case to
> exercise the functions you fixed?

It seems to me that appropriate tests would be entirely i386-specific (in 
tests/tcg/i386?).  How are such tests supposed to signal success or 
failure, since all the tests currently there seem to exit with status 0 
unconditionally?

I do have a test I'm using to check these fixes (in C code for convenience 
of implementation, with only a little inline asm), but it's not suitable 
for inclusion as-is, since it includes many tests that currently fail 
(e.g. for exceptions generated, since the i386 floating-point support in 
QEMU currently discards exceptions from the softfloat code; one of the 
things I intend to fix but haven't yet).  It also doesn't yet cover all 
the problems I think I've found so far in the floating-point support in 
the i386 port (at least ten such bugs beyond the ones fixed in the present 
patch series).  And it might well depend on details of compiler code 
generation to test some of the bugs effectively.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 0/4] softfloat: fix floatx80 emulation bugs

2020-05-04 Thread Joseph Myers

Attempting to run the GCC and glibc testsuites for i686 under QEMU
shows up a range of bugs in the x87 floating-point emulation.  This
series fixes some bugs (found both through those testsuites and
through subsequent code inspection) that appear to be in the softfloat
code itself rather than in the target/i386 code; I intend to address
such bugs in target/i386 separately.

Note that the floatx80 code is used for both i386 and m68k emulation,
but the two variants of the floatx80 format are not entirely
compatible.  Where the code should do different things for i386 and
m68k, it consistently only does the thing that is right for i386, not
the thing that is right for m68k, and my patches (specifically, the
second and third patches) continue this, doing the things that are
right for i386 but not for m68k.

Specifically, the formats have the following differences (based on
documentation; I don't have m68k hardware to test):

* For m68k, the explicit integer bit of the significand may be either
  0 or 1 for infinities and NaNs, but for i386 it must be 1 and having
  0 there makes it an invalid encoding.

* For i386, when the biased exponent is 0, this is interpreted the
  same way as a biased exponent of 0 in an IEEE format; an explicit
  integer bit 0 means a subnormal value while an explicit integer bit
  1 means a pseudo-denormal; the integer bit has value 2^-16382, as
  for a biased exponent of 1.  For m68k, a biased exponent of 0
  results in the integer bit having value 2^-16383, so values with
  integer bit 1 are normal and those with integer bit 0 are
  subnormal.  So the least subnormal value is 2^-16445 for i386 and
  2^-16446 for m68k.  (This means that the i386 floatx80 format meets
  the IEEE definition of an extended format, which requires a certain
  relation between the largest and smallest exponents, but the m68k
  floatx80 format does not meet that definition.)

  Patches 2 and 3 in this series deal with pseudo-denormals in a way
  that is correct for i386 but not for m68k; to support the m68k
  format properly, the new code in patch 3 could simply be disabled
  for m68k, but addition / subtraction would need more complicated
  changes to be correct for m68k and just disabling the new code would
  not make it correct (likewise, various changes elsewhere in the
  softfloat code would be needed to handle the m68k semantics for
  biased exponent 0).

This second version of the patch series includes i386-specific tests
for the bugs being fixed (written to be reasonably self-contained
rather than depending on libm functionality).  Given the previous
discussion of how some existing tests for floating-point operations
that are present but not enabled fail for unrelated reasons if enabled
for floatx80, this does not do anything regarding enabling such tests.

Joseph Myers (4):
  softfloat: silence sNaN for conversions to/from floatx80
  softfloat: fix floatx80 pseudo-denormal addition / subtraction
  softfloat: fix floatx80 pseudo-denormal comparisons
  softfloat: fix floatx80 pseudo-denormal round to integer

 fpu/softfloat.c| 37 ++---
 tests/tcg/i386/test-i386-pseudo-denormal.c | 38 +
 tests/tcg/i386/test-i386-snan-convert.c| 63 ++
 3 files changed, 131 insertions(+), 7 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-pseudo-denormal.c
 create mode 100644 tests/tcg/i386/test-i386-snan-convert.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 3/4] softfloat: fix floatx80 pseudo-denormal comparisons

2020-05-04 Thread Joseph Myers

The softfloat floatx80 comparisons fail to allow for pseudo-denormals,
which should compare equal to corresponding values with biased
exponent 1 rather than 0.  Add an adjustment for that case when
comparing numbers with the same sign.

Note that this fix only changes floatx80_compare_internal, not the
other more specific comparison operations.  That is the only
comparison function for floatx80 used in the i386 port, which is the
only supported port with these pseudo-denormal semantics.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c| 5 +
 tests/tcg/i386/test-i386-pseudo-denormal.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6094d267b5..8e9c714e6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7966,6 +7966,11 @@ static inline int floatx80_compare_internal(floatx80 a, 
floatx80 b,
 return 1 - (2 * aSign);
 }
 } else {
+/* Normalize pseudo-denormals before comparison.  */
+if ((a.high & 0x7fff) == 0 && a.low & UINT64_C(0x8000))
+++a.high;
+if ((b.high & 0x7fff) == 0 && b.low & UINT64_C(0x8000))
+++b.high;
 if (a.low == b.low && a.high == b.high) {
 return float_relation_equal;
 } else {
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
index cfa2a500b0..acf2b9cf03 100644
--- a/tests/tcg/i386/test-i386-pseudo-denormal.c
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -20,5 +20,9 @@ int main(void)
 printf("FAIL: pseudo-denormal add\n");
 ret = 1;
 }
+if (ld_pseudo_m16382.ld != 0x1p-16382L) {
+printf("FAIL: pseudo-denormal compare\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 1/4] softfloat: silence sNaN for conversions to/from floatx80

2020-05-04 Thread Joseph Myers

Conversions between IEEE floating-point formats should convert
signaling NaNs to quiet NaNs.  Most of those in QEMU's softfloat code
do so, but those for floatx80 fail to.  Fix those conversions to
silence signaling NaNs as well.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 24 +++---
 tests/tcg/i386/test-i386-snan-convert.c | 63 +
 2 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-snan-convert.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ae6ba71854..ac116c70b8 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -4498,7 +4498,9 @@ floatx80 float32_to_floatx80(float32 a, float_status 
*status)
 aSign = extractFloat32Sign( a );
 if ( aExp == 0xFF ) {
 if (aSig) {
-return commonNaNToFloatx80(float32ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float32ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5016,7 +5018,9 @@ floatx80 float64_to_floatx80(float64 a, float_status 
*status)
 aSign = extractFloat64Sign( a );
 if ( aExp == 0x7FF ) {
 if (aSig) {
-return commonNaNToFloatx80(float64ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float64ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5618,7 +5622,9 @@ float32 floatx80_to_float32(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat32(floatx80ToCommonNaN(a, status), status);
+float32 res = commonNaNToFloat32(floatx80ToCommonNaN(a, status),
+ status);
+return float32_silence_nan(res, status);
 }
 return packFloat32( aSign, 0xFF, 0 );
 }
@@ -5650,7 +5656,9 @@ float64 floatx80_to_float64(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat64(floatx80ToCommonNaN(a, status), status);
+float64 res = commonNaNToFloat64(floatx80ToCommonNaN(a, status),
+ status);
+return float64_silence_nan(res, status);
 }
 return packFloat64( aSign, 0x7FF, 0 );
 }
@@ -5681,7 +5689,9 @@ float128 floatx80_to_float128(floatx80 a, float_status 
*status)
 aExp = extractFloatx80Exp( a );
 aSign = extractFloatx80Sign( a );
 if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat128(floatx80ToCommonNaN(a, status), status);
+float128 res = commonNaNToFloat128(floatx80ToCommonNaN(a, status),
+   status);
+return float128_silence_nan(res, status);
 }
 shift128Right( aSig<<1, 0, 16, ,  );
 return packFloat128( aSign, aExp, zSig0, zSig1 );
@@ -6959,7 +6969,9 @@ floatx80 float128_to_floatx80(float128 a, float_status 
*status)
 aSign = extractFloat128Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( aSig0 | aSig1 ) {
-return commonNaNToFloatx80(float128ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float128ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign, floatx80_infinity_high,
floatx80_infinity_low);
diff --git a/tests/tcg/i386/test-i386-snan-convert.c 
b/tests/tcg/i386/test-i386-snan-convert.c
new file mode 100644
index 00..ed6d535ce2
--- /dev/null
+++ b/tests/tcg/i386/test-i386-snan-convert.c
@@ -0,0 +1,63 @@
+/* Test conversions of signaling NaNs to and from long double.  */
+
+#include 
+#include 
+
+volatile float f_res;
+volatile double d_res;
+volatile long double ld_res;
+
+volatile float f_snan = __builtin_nansf("");
+volatile double d_snan = __builtin_nans("");
+volatile long double ld_snan = __builtin_nansl("");
+
+int issignaling_f(float x)
+{
+union { float f; uint32_t u; } u = { .f = x };
+return (u.u & 0x7fff) > 0x7f80 && (u.u & 0x40) == 0;
+}
+
+int issignaling_d(double x)
+{
+union { double d; uint64_t u; } u = { .d = x };
+return (((u.u & UINT64_C(0x7fff)) >
+UINT64_C(0x7ff0)) &&
+(u.u & UINT64_C(0x8)) == 0);
+}
+
+int i

[PATCH v2 4/4] softfloat: fix floatx80 pseudo-denormal round to integer

2020-05-04 Thread Joseph Myers

The softfloat function floatx80_round_to_int incorrectly handles the
case of a pseudo-denormal where only the high bit of the significand
is set, ignoring that bit (treating the number as an exact zero)
rather than treating the number as an alternative representation of
+/- 2^-16382 (which may round to +/- 1 depending on the rounding mode)
as hardware does.  Fix this check (simplifying the code in the
process).

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c|  2 +-
 tests/tcg/i386/test-i386-pseudo-denormal.c | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8e9c714e6f..e29b07542a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5741,7 +5741,7 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status 
*status)
 }
 if ( aExp < 0x3FFF ) {
 if (( aExp == 0 )
- && ( (uint64_t) ( extractFloatx80Frac( a )<<1 ) == 0 ) ) {
+ && ( (uint64_t) ( extractFloatx80Frac( a ) ) == 0 ) ) {
 return a;
 }
 status->float_exception_flags |= float_flag_inexact;
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
index acf2b9cf03..00d510cf4a 100644
--- a/tests/tcg/i386/test-i386-pseudo-denormal.c
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -14,6 +14,7 @@ volatile long double ld_res;
 
 int main(void)
 {
+short cw;
 int ret = 0;
 ld_res = ld_pseudo_m16382.ld + ld_pseudo_m16382.ld;
 if (ld_res != 0x1p-16381L) {
@@ -24,5 +25,14 @@ int main(void)
 printf("FAIL: pseudo-denormal compare\n");
 ret = 1;
 }
+/* Set round-upward.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x800;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ ("frndint" : "=t" (ld_res) : "0" (ld_pseudo_m16382.ld));
+if (ld_res != 1.0L) {
+printf("FAIL: pseudo-denormal round-to-integer\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 2/4] softfloat: fix floatx80 pseudo-denormal addition / subtraction

2020-05-04 Thread Joseph Myers

The softfloat function addFloatx80Sigs, used for addition of values
with the same sign and subtraction of values with opposite sign, fails
to handle the case where the two values both have biased exponent zero
and there is a carry resulting from adding the significands, which can
occur if one or both values are pseudo-denormals (biased exponent
zero, explicit integer bit 1).  Add a check for that case, so making
the results match those seen on x86 hardware for pseudo-denormals.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c|  6 ++
 tests/tcg/i386/test-i386-pseudo-denormal.c | 24 ++
 2 files changed, 30 insertions(+)
 create mode 100644 tests/tcg/i386/test-i386-pseudo-denormal.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ac116c70b8..6094d267b5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5866,6 +5866,12 @@ static floatx80 addFloatx80Sigs(floatx80 a, floatx80 b, 
flag zSign,
 zSig1 = 0;
 zSig0 = aSig + bSig;
 if ( aExp == 0 ) {
+if ((aSig | bSig) & UINT64_C(0x8000) && zSig0 < aSig) {
+/* At least one of the values is a pseudo-denormal,
+ * and there is a carry out of the result.  */
+zExp = 1;
+goto shiftRight1;
+}
 if (zSig0 == 0) {
 return packFloatx80(zSign, 0, 0);
 }
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
new file mode 100644
index 00..cfa2a500b0
--- /dev/null
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -0,0 +1,24 @@
+/* Test pseudo-denormal operations.  */
+
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_pseudo_m16382 = { .s = { UINT64_C(1) << 63, 0 } };
+
+volatile long double ld_res;
+
+int main(void)
+{
+int ret = 0;
+ld_res = ld_pseudo_m16382.ld + ld_pseudo_m16382.ld;
+if (ld_res != 0x1p-16381L) {
+printf("FAIL: pseudo-denormal add\n");
+ret = 1;
+}
+return ret;
+}
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 4/5] target/i386: fix fscale handling of infinite exponents

2020-05-06 Thread Joseph Myers

The fscale implementation passes infinite exponents through to generic
code that rounds the exponent to a 32-bit integer before using
floatx80_scalbn.  In round-to-nearest mode, and ignoring exceptions,
this works in many cases.  But it fails to handle the special cases of
scaling 0 by a +Inf exponent or an infinity by a -Inf exponent, which
should produce a NaN, and because it produces an inexact result for
finite nonzero numbers being scaled, the result is sometimes incorrect
in other rounding modes.  Add appropriate handling of infinite
exponents to produce a NaN or an appropriately signed exact zero or
infinity as a result.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  | 22 ++
 tests/tcg/i386/test-i386-fscale.c | 29 +
 2 files changed, 51 insertions(+)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 7709af8fdd..d4c15728e1 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -977,6 +977,28 @@ void helper_fscale(CPUX86State *env)
 float_raise(float_flag_invalid, >fp_status);
 ST0 = floatx80_silence_nan(ST0, >fp_status);
 }
+} else if (floatx80_is_infinity(ST1) &&
+   !floatx80_invalid_encoding(ST0) &&
+   !floatx80_is_any_nan(ST0)) {
+if (floatx80_is_neg(ST1)) {
+if (floatx80_is_infinity(ST0)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_default_nan(>fp_status);
+} else {
+ST0 = (floatx80_is_neg(ST0) ?
+   floatx80_chs(floatx80_zero) :
+   floatx80_zero);
+}
+} else {
+if (floatx80_is_zero(ST0)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_default_nan(>fp_status);
+} else {
+ST0 = (floatx80_is_neg(ST0) ?
+   floatx80_chs(floatx80_infinity) :
+   floatx80_infinity);
+}
+}
 } else {
 int n = floatx80_to_int32_round_to_zero(ST1, >fp_status);
 ST0 = floatx80_scalbn(ST0, n, >fp_status);
diff --git a/tests/tcg/i386/test-i386-fscale.c 
b/tests/tcg/i386/test-i386-fscale.c
index b65a055d0a..b953e7c563 100644
--- a/tests/tcg/i386/test-i386-fscale.c
+++ b/tests/tcg/i386/test-i386-fscale.c
@@ -31,6 +31,7 @@ int issignaling_ld(long double x)
 
 int main(void)
 {
+short cw;
 int ret = 0;
 __asm__ volatile ("fscale" : "=t" (ld_res) :
   "0" (2.5L), "u" (__builtin_nansl("")));
@@ -62,5 +63,33 @@ int main(void)
 printf("FAIL: fscale invalid 4\n");
 ret = 1;
 }
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (0.0L), "u" (__builtin_infl()));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale 0 up inf\n");
+ret = 1;
+}
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (__builtin_infl()), "u" (-__builtin_infl()));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale inf down inf\n");
+ret = 1;
+}
+/* Set round-downward.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x400;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (1.0L), "u" (__builtin_infl()));
+if (ld_res != __builtin_infl()) {
+printf("FAIL: fscale finite up inf\n");
+ret = 1;
+}
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (-1.0L), "u" (-__builtin_infl()));
+if (ld_res != -0.0L || __builtin_copysignl(1.0L, ld_res) != -1.0L) {
+printf("FAIL: fscale finite down inf\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 2/5] target/i386: fix fscale handling of signaling NaN

2020-05-06 Thread Joseph Myers

The implementation of the fscale instruction returns a NaN exponent
unchanged.  Fix it to return a quiet NaN when the provided exponent is
a signaling NaN.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  4 
 tests/tcg/i386/test-i386-fscale.c | 37 +++
 2 files changed, 41 insertions(+)
 create mode 100644 tests/tcg/i386/test-i386-fscale.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 71a696a863..60012c405c 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -970,6 +970,10 @@ void helper_fscale(CPUX86State *env)
 {
 if (floatx80_is_any_nan(ST1)) {
 ST0 = ST1;
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_silence_nan(ST0, >fp_status);
+}
 } else {
 int n = floatx80_to_int32_round_to_zero(ST1, >fp_status);
 ST0 = floatx80_scalbn(ST0, n, >fp_status);
diff --git a/tests/tcg/i386/test-i386-fscale.c 
b/tests/tcg/i386/test-i386-fscale.c
new file mode 100644
index 00..aecac5125f
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fscale.c
@@ -0,0 +1,37 @@
+/* Test fscale instruction.  */
+
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile long double ld_res;
+
+int isnan_ld(long double x)
+{
+  union u tmp = { .ld = x };
+  return ((tmp.s.sign_exp & 0x7fff) == 0x7fff &&
+  (tmp.s.sig >> 63) != 0 &&
+  (tmp.s.sig << 1) != 0);
+}
+
+int issignaling_ld(long double x)
+{
+union u tmp = { .ld = x };
+return isnan_ld(x) && (tmp.s.sig & UINT64_C(0x4000)) == 0;
+}
+
+int main(void)
+{
+int ret = 0;
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (2.5L), "u" (__builtin_nansl("")));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale snan\n");
+ret = 1;
+}
+return ret;
+}
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 3/5] target/i386: fix fscale handling of invalid exponent encodings

2020-05-06 Thread Joseph Myers

The fscale implementation does not check for invalid encodings in the
exponent operand, thus treating them like INT_MIN (the value returned
for invalid encodings by floatx80_to_int32_round_to_zero).  Fix it to
treat them similarly to signaling NaN exponents, thus generating a
quiet NaN result.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  5 -
 tests/tcg/i386/test-i386-fscale.c | 29 +
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 60012c405c..7709af8fdd 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -968,7 +968,10 @@ void helper_frndint(CPUX86State *env)
 
 void helper_fscale(CPUX86State *env)
 {
-if (floatx80_is_any_nan(ST1)) {
+if (floatx80_invalid_encoding(ST1)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_default_nan(>fp_status);
+} else if (floatx80_is_any_nan(ST1)) {
 ST0 = ST1;
 if (floatx80_is_signaling_nan(ST0, >fp_status)) {
 float_raise(float_flag_invalid, >fp_status);
diff --git a/tests/tcg/i386/test-i386-fscale.c 
b/tests/tcg/i386/test-i386-fscale.c
index aecac5125f..b65a055d0a 100644
--- a/tests/tcg/i386/test-i386-fscale.c
+++ b/tests/tcg/i386/test-i386-fscale.c
@@ -8,6 +8,11 @@ union u {
 long double ld;
 };
 
+volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
+volatile union u ld_invalid_2 = { .s = { 0, 1234 } };
+volatile union u ld_invalid_3 = { .s = { 0, 0x7fff } };
+volatile union u ld_invalid_4 = { .s = { (UINT64_C(1) << 63) - 1, 0x7fff } };
+
 volatile long double ld_res;
 
 int isnan_ld(long double x)
@@ -33,5 +38,29 @@ int main(void)
 printf("FAIL: fscale snan\n");
 ret = 1;
 }
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (2.5L), "u" (ld_invalid_1.ld));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale invalid 1\n");
+ret = 1;
+}
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (2.5L), "u" (ld_invalid_2.ld));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale invalid 2\n");
+ret = 1;
+}
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (2.5L), "u" (ld_invalid_3.ld));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale invalid 3\n");
+ret = 1;
+}
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (2.5L), "u" (ld_invalid_4.ld));
+if (!isnan_ld(ld_res) || issignaling_ld(ld_res)) {
+printf("FAIL: fscale invalid 4\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 5/5] target/i386: fix fscale handling of rounding precision

2020-05-06 Thread Joseph Myers

The fscale implementation uses floatx80_scalbn for the final scaling
operation.  floatx80_scalbn ends up rounding the result using the
dynamic rounding precision configured for the FPU.  But only a limited
set of x87 floating-point instructions are supposed to respect the
dynamic rounding precision, and fscale is not in that set.  Fix the
implementation to save and restore the rounding precision around the
call to floatx80_scalbn.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  3 +++
 tests/tcg/i386/test-i386-fscale.c | 13 +
 2 files changed, 16 insertions(+)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index d4c15728e1..0c3fce933c 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1001,7 +1001,10 @@ void helper_fscale(CPUX86State *env)
 }
 } else {
 int n = floatx80_to_int32_round_to_zero(ST1, >fp_status);
+signed char save = env->fp_status.floatx80_rounding_precision;
+env->fp_status.floatx80_rounding_precision = 80;
 ST0 = floatx80_scalbn(ST0, n, >fp_status);
+env->fp_status.floatx80_rounding_precision = save;
 }
 }
 
diff --git a/tests/tcg/i386/test-i386-fscale.c 
b/tests/tcg/i386/test-i386-fscale.c
index b953e7c563..d23b3cfeec 100644
--- a/tests/tcg/i386/test-i386-fscale.c
+++ b/tests/tcg/i386/test-i386-fscale.c
@@ -8,6 +8,8 @@ union u {
 long double ld;
 };
 
+volatile long double ld_third = 1.0L / 3.0L;
+volatile long double ld_four_thirds = 4.0L / 3.0L;
 volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
 volatile union u ld_invalid_2 = { .s = { 0, 1234 } };
 volatile union u ld_invalid_3 = { .s = { 0, 0x7fff } };
@@ -91,5 +93,16 @@ int main(void)
 printf("FAIL: fscale finite down inf\n");
 ret = 1;
 }
+/* Set round-to-nearest with single-precision rounding.  */
+cw = cw & ~0xf00;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fscale" : "=t" (ld_res) :
+  "0" (ld_third), "u" (2.0L));
+cw = cw | 0x300;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+if (ld_res != ld_four_thirds) {
+printf("FAIL: fscale single-precision\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 1/5] target/i386: implement special cases for fxtract

2020-05-06 Thread Joseph Myers

The implementation of the fxtract instruction treats all nonzero
operands as normal numbers, so yielding incorrect results for invalid
formats, infinities, NaNs and subnormal and pseudo-denormal operands.
Implement appropriate handling of all those cases.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c   |  25 +-
 tests/tcg/i386/test-i386-fxtract.c | 120 +
 2 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/i386/test-i386-fxtract.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 792a128a6d..71a696a863 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -767,10 +767,33 @@ void helper_fxtract(CPUX86State *env)
>fp_status);
 fpush(env);
 ST0 = temp.d;
+} else if (floatx80_invalid_encoding(ST0)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_default_nan(>fp_status);
+fpush(env);
+ST0 = ST1;
+} else if (floatx80_is_any_nan(ST0)) {
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST0 = floatx80_silence_nan(ST0, >fp_status);
+}
+fpush(env);
+ST0 = ST1;
+} else if (floatx80_is_infinity(ST0)) {
+fpush(env);
+ST0 = ST1;
+ST1 = floatx80_infinity;
 } else {
 int expdif;
 
-expdif = EXPD(temp) - EXPBIAS;
+if (EXPD(temp) == 0) {
+int shift = clz64(temp.l.lower);
+temp.l.lower <<= shift;
+expdif = 1 - EXPBIAS - shift;
+float_raise(float_flag_input_denormal, >fp_status);
+} else {
+expdif = EXPD(temp) - EXPBIAS;
+}
 /* DP exponent bias */
 ST0 = int32_to_floatx80(expdif, >fp_status);
 fpush(env);
diff --git a/tests/tcg/i386/test-i386-fxtract.c 
b/tests/tcg/i386/test-i386-fxtract.c
new file mode 100644
index 00..64fd93d333
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fxtract.c
@@ -0,0 +1,120 @@
+/* Test fxtract instruction.  */
+
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_pseudo_m16382 = { .s = { UINT64_C(1) << 63, 0 } };
+volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
+volatile union u ld_invalid_2 = { .s = { 0, 1234 } };
+volatile union u ld_invalid_3 = { .s = { 0, 0x7fff } };
+volatile union u ld_invalid_4 = { .s = { (UINT64_C(1) << 63) - 1, 0x7fff } };
+
+volatile long double ld_sig, ld_exp;
+
+int isnan_ld(long double x)
+{
+  union u tmp = { .ld = x };
+  return ((tmp.s.sign_exp & 0x7fff) == 0x7fff &&
+  (tmp.s.sig >> 63) != 0 &&
+  (tmp.s.sig << 1) != 0);
+}
+
+int issignaling_ld(long double x)
+{
+union u tmp = { .ld = x };
+return isnan_ld(x) && (tmp.s.sig & UINT64_C(0x4000)) == 0;
+}
+
+int main(void)
+{
+int ret = 0;
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) : "0" (2.5L));
+if (ld_sig != 1.25L || ld_exp != 1.0L) {
+printf("FAIL: fxtract 2.5\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) : "0" (0.0L));
+if (ld_sig != 0.0L || __builtin_copysignl(1.0L, ld_sig) != 1.0L ||
+ld_exp != -__builtin_infl()) {
+printf("FAIL: fxtract 0.0\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) : "0" (-0.0L));
+if (ld_sig != -0.0L || __builtin_copysignl(1.0L, ld_sig) != -1.0L ||
+ld_exp != -__builtin_infl()) {
+printf("FAIL: fxtract -0.0\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) :
+  "0" (__builtin_infl()));
+if (ld_sig != __builtin_infl() || ld_exp != __builtin_infl()) {
+printf("FAIL: fxtract inf\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) :
+  "0" (-__builtin_infl()));
+if (ld_sig != -__builtin_infl() || ld_exp != __builtin_infl()) {
+printf("FAIL: fxtract -inf\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) :
+  "0" (__builtin_nanl("")));
+if (!isnan_ld(ld_sig) || issignaling_ld(ld_sig) ||
+!isnan_ld(ld_exp) || issignaling_ld(ld_exp)) {
+printf("FAIL: fxtract qnan\n");
+ret = 1;
+}
+__asm__ volatile ("fxtract" : "=t" (ld_sig), "=u" (ld_exp) :
+

[PATCH 0/5] target/i386: fxtract, fscale fixes

2020-05-06 Thread Joseph Myers

Among the various bugs in the x87 floating-point emulation that show
up through a combination of glibc testing and code inspection, there
are several in the implementations of the fxtract and fscale
instructions.  This series fixes those bugs.

Bugs in other instructions, and bugs relating to floating-point
exceptions and flag setting, will be addressed separately.  In
particular, while some of these patches add code that sets exception
flags in the softfloat state, it's generally the case that the x87
emulation ignores exceptions in that state rather than propagating
them to the status word (and to generating traps where appropriate).
I intend to address that missing propagation of exceptions in a
subsequent patch series; until it's addressed, the code setting
exceptions won't actually do anything useful.  (There is also code in
the x87 emulation, including that of fscale, that would result in
spurious exceptions being set from a naive propagation of exceptions
from the softfloat state, and thus will need updating to avoid
propagating inappropriate exceptions when such propagation is
implemented.)

Joseph Myers (5):
  target/i386: implement special cases for fxtract
  target/i386: fix fscale handling of signaling NaN
  target/i386: fix fscale handling of invalid exponent encodings
  target/i386: fix fscale handling of infinite exponents
  target/i386: fix fscale handling of rounding precision

 target/i386/fpu_helper.c   |  59 +-
 tests/tcg/i386/test-i386-fscale.c  | 108 ++
 tests/tcg/i386/test-i386-fxtract.c | 120 +
 3 files changed, 285 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fscale.c
 create mode 100644 tests/tcg/i386/test-i386-fxtract.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 4/4] target/i386: fix fbstp handling of out-of-range values

2020-05-13 Thread Joseph Myers

The fbstp implementation fails to check for out-of-range and invalid
values, instead just taking the result of conversion to int64_t and
storing its sign and low 18 decimal digits.  Fix this by checking for
an out-of-range result (invalid conversions always result in INT64_MAX
or INT64_MIN from the softfloat code, which are large enough to be
considered as out-of-range by this code) and storing the packed BCD
indefinite encoding in that case.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c |  10 +++
 tests/tcg/i386/test-i386-fbstp.c | 115 +++
 2 files changed, 125 insertions(+)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index e1872b3fc0..96c512fedf 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -732,6 +732,16 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
 
 val = floatx80_to_int64(ST0, >fp_status);
 mem_ref = ptr;
+if (val >= 100LL || val <= -100LL) {
+float_raise(float_flag_invalid, >fp_status);
+while (mem_ref < ptr + 7) {
+cpu_stb_data_ra(env, mem_ref++, 0, GETPC());
+}
+cpu_stb_data_ra(env, mem_ref++, 0xc0, GETPC());
+cpu_stb_data_ra(env, mem_ref++, 0xff, GETPC());
+cpu_stb_data_ra(env, mem_ref++, 0xff, GETPC());
+return;
+}
 mem_end = mem_ref + 9;
 if (SIGND(temp)) {
 cpu_stb_data_ra(env, mem_end, 0x80, GETPC());
diff --git a/tests/tcg/i386/test-i386-fbstp.c b/tests/tcg/i386/test-i386-fbstp.c
index d368949188..73bf56b9dc 100644
--- a/tests/tcg/i386/test-i386-fbstp.c
+++ b/tests/tcg/i386/test-i386-fbstp.c
@@ -1,8 +1,19 @@
 /* Test fbstp instruction.  */
 
+#include 
 #include 
 #include 
 
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
+volatile union u ld_invalid_2 = { .s = { 0, 1234 } };
+volatile union u ld_invalid_3 = { .s = { 0, 0x7fff } };
+volatile union u ld_invalid_4 = { .s = { (UINT64_C(1) << 63) - 1, 0x7fff } };
+
 int main(void)
 {
 int ret = 0;
@@ -21,5 +32,109 @@ int main(void)
 printf("FAIL: fbstp -0.1\n");
 ret = 1;
 }
+memset(out, 0x1f, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-987654321987654321.0L) :
+  "st");
+out[9] &= 0x80;
+if (memcmp(out, "\x21\x43\x65\x87\x19\x32\x54\x76\x98\x80",
+   sizeof out) != 0) {
+printf("FAIL: fbstp -987654321987654321\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (99.5L) :
+  "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp 99.5\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (100.0L) :
+  "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp 100\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (1e30L) : "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp 1e30\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-99.5L) :
+  "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp -99.5\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-100.0L) :
+  "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp -100\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-1e30L) : "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp -1e30\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (__builtin_infl()) : "st");
+if (memcmp(out, "\0\0\0\0\0\0\0\xc0\xff\xff", sizeof out) != 0) {
+printf("FAIL: fbstp inf\n&quo

[PATCH 2/4] target/i386: fix fxam handling of invalid encodings

2020-05-13 Thread Joseph Myers

The fxam implementation does not check for invalid encodings, instead
treating them like NaN or normal numbers depending on the exponent.
Fix it to check that the high bit of the significand is set before
treating an encoding as NaN or normal, thus resulting in correct
handling (all of C0, C2 and C3 cleared) for invalid encodings.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c|   4 +-
 tests/tcg/i386/test-i386-fxam.c | 143 
 2 files changed, 145 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fxam.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 38968b2ec7..51372c371b 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1099,7 +1099,7 @@ void helper_fxam_ST0(CPUX86State *env)
 if (expdif == MAXEXPD) {
 if (MANTD(temp) == 0x8000ULL) {
 env->fpus |= 0x500; /* Infinity */
-} else {
+} else if (MANTD(temp) & 0x8000ULL) {
 env->fpus |= 0x100; /* NaN */
 }
 } else if (expdif == 0) {
@@ -1108,7 +1108,7 @@ void helper_fxam_ST0(CPUX86State *env)
 } else {
 env->fpus |= 0x4400; /* Denormal */
 }
-} else {
+} else if (MANTD(temp) & 0x8000ULL) {
 env->fpus |= 0x400;
 }
 }
diff --git a/tests/tcg/i386/test-i386-fxam.c b/tests/tcg/i386/test-i386-fxam.c
new file mode 100644
index 00..ddd76ca42d
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fxam.c
@@ -0,0 +1,143 @@
+/* Test fxam instruction.  */
+
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_pseudo_m16382 = { .s = { UINT64_C(1) << 63, 0 } };
+volatile union u ld_pseudo_nm16382 = { .s = { UINT64_C(1) << 63, 0x8000 } };
+volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
+volatile union u ld_invalid_2 = { .s = { 0, 1234 } };
+volatile union u ld_invalid_3 = { .s = { 0, 0x7fff } };
+volatile union u ld_invalid_4 = { .s = { (UINT64_C(1) << 63) - 1, 0x7fff } };
+volatile union u ld_invalid_n1 = { .s = { 1, 0x8123 } };
+volatile union u ld_invalid_n2 = { .s = { 0, 0x8123 } };
+volatile union u ld_invalid_n3 = { .s = { 0, 0x } };
+volatile union u ld_invalid_n4 = { .s = { (UINT64_C(1) << 63) - 1, 0x } };
+
+#define C0 (1 << 8)
+#define C1 (1 << 9)
+#define C2 (1 << 10)
+#define C3 (1 << 14)
+#define FLAGS (C0 | C1 | C2 | C3)
+
+int main(void)
+{
+short sw;
+int ret = 0;
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (0.0L));
+if ((sw & FLAGS) != C3) {
+printf("FAIL: +0\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (-0.0L));
+if ((sw & FLAGS) != (C3 | C1)) {
+printf("FAIL: -0\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (1.0L));
+if ((sw & FLAGS) != C2) {
+printf("FAIL: +normal\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (-1.0L));
+if ((sw & FLAGS) != (C2 | C1)) {
+printf("FAIL: -normal\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (__builtin_infl()));
+if ((sw & FLAGS) != (C2 | C0)) {
+printf("FAIL: +inf\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (-__builtin_infl()));
+if ((sw & FLAGS) != (C2 | C1 | C0)) {
+printf("FAIL: -inf\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (__builtin_nanl("")));
+if ((sw & FLAGS) != C0) {
+printf("FAIL: +nan\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (-__builtin_nanl("")));
+if ((sw & FLAGS) != (C1 | C0)) {
+printf("FAIL: -nan\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (__builtin_nansl("")));
+if ((sw & FLAGS) != C0) {
+printf("FAIL: +snan\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (-__builtin_nansl("")));
+if ((sw & FLAGS) != (C1 | C0)) {
+printf("FAIL: -snan\n");
+ret = 1;
+}
+__asm__ volatile ("fxam\nfnstsw" : "=a" (sw) : "t" (0x1p-16445L));
+if ((sw & FLAGS) != (C3 | C2)) {
+printf("FAIL: +denormal\n");
+ret = 1;
+}
+__asm__ volatile (

[PATCH 3/4] target/i386: fix fbstp handling of negative zero

2020-05-13 Thread Joseph Myers

The fbstp implementation stores +0 when the rounded result should be
-0 because it compares an integer value with 0 to determine the sign.
Fix this by checking the sign bit of the operand instead.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c |  5 -
 tests/tcg/i386/test-i386-fbstp.c | 25 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/i386/test-i386-fbstp.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 51372c371b..e1872b3fc0 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -726,11 +726,14 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
 int v;
 target_ulong mem_ref, mem_end;
 int64_t val;
+CPU_LDoubleU temp;
+
+temp.d = ST0;
 
 val = floatx80_to_int64(ST0, >fp_status);
 mem_ref = ptr;
 mem_end = mem_ref + 9;
-if (val < 0) {
+if (SIGND(temp)) {
 cpu_stb_data_ra(env, mem_end, 0x80, GETPC());
 val = -val;
 } else {
diff --git a/tests/tcg/i386/test-i386-fbstp.c b/tests/tcg/i386/test-i386-fbstp.c
new file mode 100644
index 00..d368949188
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fbstp.c
@@ -0,0 +1,25 @@
+/* Test fbstp instruction.  */
+
+#include 
+#include 
+
+int main(void)
+{
+int ret = 0;
+unsigned char out[10];
+memset(out, 0xfe, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-0.0L) : "st");
+out[9] &= 0x80;
+if (memcmp(out, "\0\0\0\0\0\0\0\0\0\x80", sizeof out) != 0) {
+printf("FAIL: fbstp -0\n");
+ret = 1;
+}
+memset(out, 0x12, sizeof out);
+__asm__ volatile ("fbstp %0" : "=m" (out) : "t" (-0.1L) : "st");
+out[9] &= 0x80;
+if (memcmp(out, "\0\0\0\0\0\0\0\0\0\x80", sizeof out) != 0) {
+printf("FAIL: fbstp -0.1\n");
+ret = 1;
+}
+return ret;
+}
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 1/4] target/i386: fix floating-point load-constant rounding

2020-05-13 Thread Joseph Myers

The implementations of the fldl2t, fldl2e, fldpi, fldlg2 and fldln2
instructions load fixed constants independent of the rounding mode.
Fix them to load a value correctly rounded for the current rounding
mode (but always rounded to 64-bit precision independent of the
precision control, and without setting "inexact") as specified.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  54 +++-
 tests/tcg/i386/test-i386-fldcst.c | 199 ++
 2 files changed, 248 insertions(+), 5 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fldcst.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 0c3fce933c..38968b2ec7 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -59,8 +59,13 @@
 #define FPUC_EM 0x3f
 
 #define floatx80_lg2 make_floatx80(0x3ffd, 0x9a209a84fbcff799LL)
+#define floatx80_lg2_d make_floatx80(0x3ffd, 0x9a209a84fbcff798LL)
 #define floatx80_l2e make_floatx80(0x3fff, 0xb8aa3b295c17f0bcLL)
+#define floatx80_l2e_d make_floatx80(0x3fff, 0xb8aa3b295c17f0bbLL)
 #define floatx80_l2t make_floatx80(0x4000, 0xd49a784bcd1b8afeLL)
+#define floatx80_l2t_u make_floatx80(0x4000, 0xd49a784bcd1b8affLL)
+#define floatx80_ln2_d make_floatx80(0x3ffe, 0xb17217f7d1cf79abLL)
+#define floatx80_pi_d make_floatx80(0x4000, 0xc90fdaa22168c234LL)
 
 #if !defined(CONFIG_USER_ONLY)
 static qemu_irq ferr_irq;
@@ -544,27 +549,66 @@ void helper_fld1_ST0(CPUX86State *env)
 
 void helper_fldl2t_ST0(CPUX86State *env)
 {
-ST0 = floatx80_l2t;
+switch (env->fpuc & FPU_RC_MASK) {
+case FPU_RC_UP:
+ST0 = floatx80_l2t_u;
+break;
+default:
+ST0 = floatx80_l2t;
+break;
+}
 }
 
 void helper_fldl2e_ST0(CPUX86State *env)
 {
-ST0 = floatx80_l2e;
+switch (env->fpuc & FPU_RC_MASK) {
+case FPU_RC_DOWN:
+case FPU_RC_CHOP:
+ST0 = floatx80_l2e_d;
+break;
+default:
+ST0 = floatx80_l2e;
+break;
+}
 }
 
 void helper_fldpi_ST0(CPUX86State *env)
 {
-ST0 = floatx80_pi;
+switch (env->fpuc & FPU_RC_MASK) {
+case FPU_RC_DOWN:
+case FPU_RC_CHOP:
+ST0 = floatx80_pi_d;
+break;
+default:
+ST0 = floatx80_pi;
+break;
+}
 }
 
 void helper_fldlg2_ST0(CPUX86State *env)
 {
-ST0 = floatx80_lg2;
+switch (env->fpuc & FPU_RC_MASK) {
+case FPU_RC_DOWN:
+case FPU_RC_CHOP:
+ST0 = floatx80_lg2_d;
+break;
+default:
+ST0 = floatx80_lg2;
+break;
+}
 }
 
 void helper_fldln2_ST0(CPUX86State *env)
 {
-ST0 = floatx80_ln2;
+switch (env->fpuc & FPU_RC_MASK) {
+case FPU_RC_DOWN:
+case FPU_RC_CHOP:
+ST0 = floatx80_ln2_d;
+break;
+default:
+ST0 = floatx80_ln2;
+break;
+}
 }
 
 void helper_fldz_ST0(CPUX86State *env)
diff --git a/tests/tcg/i386/test-i386-fldcst.c 
b/tests/tcg/i386/test-i386-fldcst.c
new file mode 100644
index 00..e635432ccf
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fldcst.c
@@ -0,0 +1,199 @@
+/* Test instructions loading floating-point constants.  */
+
+#include 
+#include 
+
+volatile long double ld_res;
+
+int main(void)
+{
+short cw;
+int ret = 0;
+
+/* Round to nearest.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x000;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fldl2t" : "=t" (ld_res));
+if (ld_res != 0x3.5269e12f346e2bf8p+0L) {
+printf("FAIL: fldl2t N\n");
+ret = 1;
+}
+/* Round downward.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x400;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fldl2t" : "=t" (ld_res));
+if (ld_res != 0x3.5269e12f346e2bf8p+0L) {
+printf("FAIL: fldl2t D\n");
+ret = 1;
+}
+/* Round toward zero.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0xc00;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fldl2t" : "=t" (ld_res));
+if (ld_res != 0x3.5269e12f346e2bf8p+0L) {
+printf("FAIL: fldl2t Z\n");
+ret = 1;
+}
+/* Round upward.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x800;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ volatile ("fldl2t" : "=t" (ld_res));
+if (ld_res != 0x3.5269e12f346e2bfcp+0L) {
+printf("FAIL: fldl2t U\n");
+ret = 1;
+}
+
+/* Round to nearest.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x000;
+__asm__ volatile (

[PATCH 0/4] target/i386: miscellaneous x87 fixes

2020-05-13 Thread Joseph Myers

Following my previous patch series
<https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg00781.html>
and
<https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01465.html>
for problems found in the x87 floating-point emulation, this patch
series fixes further miscellaneous bugs in that emulation.

There are further problems with x87 emulation that I am aware of and
intend to address in future patch series.  Those other problems, not
addressed by the first three patch series, generally relate to
exceptions, flag setting and those instructions for which the
emulation currently converts to host double (so losing range and
precision) and then works on host double for the rest of the emulation
process before converting back to floatx80 at the end.  Thus, the same
comments as for the previous patch series apply about this patch
series not fixing missing propagation of exceptions even when it adds
code to set exceptions in the softfloat state.

Joseph Myers (4):
  target/i386: fix floating-point load-constant rounding
  target/i386: fix fxam handling of invalid encodings
  target/i386: fix fbstp handling of negative zero
  target/i386: fix fbstp handling of out-of-range values

 target/i386/fpu_helper.c  |  73 +--
 tests/tcg/i386/test-i386-fbstp.c  | 140 +
 tests/tcg/i386/test-i386-fldcst.c | 199 ++
 tests/tcg/i386/test-i386-fxam.c   | 143 +
 4 files changed, 547 insertions(+), 8 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fbstp.c
 create mode 100644 tests/tcg/i386/test-i386-fldcst.c
 create mode 100644 tests/tcg/i386/test-i386-fxam.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 2/2] target/i386: fix IEEE x87 floating-point exception raising

2020-05-15 Thread Joseph Myers

Most x87 instruction implementations fail to raise the expected IEEE
floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in the x87 status word.  There is special-case handling of division to
raise the divide-by-zero exception, but that handling is itself buggy:
it raises the exception in inappropriate cases (inf / 0 and nan / 0,
which should not raise any exceptions, and 0 / 0, which should raise
"invalid" instead).

Fix this by converting the floating-point exceptions raised during an
operation by the softfloat machinery into exceptions in the x87 status
word (passing through the existing fpu_set_exception function for
handling related to trapping exceptions).  There are special cases
where some functions convert to integer internally but exceptions from
that conversion are not always correct exceptions for the instruction
to raise.

There might be scope for some simplification if the softfloat
exception state either could always be assumed to be in sync with the
state in the status word, or could always be ignored at the start of
each instruction and just set to 0 then; I haven't looked into that in
detail, and it might run into interactions with the various ways the
emulation does not yet handle trapping exceptions properly.  I think
the approach taken here, of saving the softfloat state, setting
exceptions there to 0 and then merging the old exceptions back in
after carrying out the operation, is conservatively safe.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c | 126 +++-
 tests/tcg/i386/test-i386-fp-exceptions.c | 831 +++
 2 files changed, 926 insertions(+), 31 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fp-exceptions.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8dcc9ddf68..c19cad466e 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -161,12 +161,32 @@ static void fpu_set_exception(CPUX86State *env, int mask)
 }
 }
 
+static inline uint8_t save_exception_flags(CPUX86State *env)
+{
+uint8_t old_flags = get_float_exception_flags(>fp_status);
+set_float_exception_flags(0, >fp_status);
+return old_flags;
+}
+
+static void merge_exception_flags(CPUX86State *env, uint8_t old_flags)
+{
+uint8_t new_flags = get_float_exception_flags(>fp_status);
+float_raise(old_flags, >fp_status);
+fpu_set_exception(env,
+  ((new_flags & float_flag_invalid ? FPUS_IE : 0) |
+   (new_flags & float_flag_divbyzero ? FPUS_ZE : 0) |
+   (new_flags & float_flag_overflow ? FPUS_OE : 0) |
+   (new_flags & float_flag_underflow ? FPUS_UE : 0) |
+   (new_flags & float_flag_inexact ? FPUS_PE : 0) |
+   (new_flags & float_flag_input_denormal ? FPUS_DE : 0)));
+}
+
 static inline floatx80 helper_fdiv(CPUX86State *env, floatx80 a, floatx80 b)
 {
-if (floatx80_is_zero(b)) {
-fpu_set_exception(env, FPUS_ZE);
-}
-return floatx80_div(a, b, >fp_status);
+uint8_t old_flags = save_exception_flags(env);
+floatx80 ret = floatx80_div(a, b, >fp_status);
+merge_exception_flags(env, old_flags);
+return ret;
 }
 
 static void fpu_raise_exception(CPUX86State *env, uintptr_t retaddr)
@@ -183,6 +203,7 @@ static void fpu_raise_exception(CPUX86State *env, uintptr_t 
retaddr)
 
 void helper_flds_FT0(CPUX86State *env, uint32_t val)
 {
+uint8_t old_flags = save_exception_flags(env);
 union {
 float32 f;
 uint32_t i;
@@ -190,10 +211,12 @@ void helper_flds_FT0(CPUX86State *env, uint32_t val)
 
 u.i = val;
 FT0 = float32_to_floatx80(u.f, >fp_status);
+merge_exception_flags(env, old_flags);
 }
 
 void helper_fldl_FT0(CPUX86State *env, uint64_t val)
 {
+uint8_t old_flags = save_exception_flags(env);
 union {
 float64 f;
 uint64_t i;
@@ -201,6 +224,7 @@ void helper_fldl_FT0(CPUX86State *env, uint64_t val)
 
 u.i = val;
 FT0 = float64_to_floatx80(u.f, >fp_status);
+merge_exception_flags(env, old_flags);
 }
 
 void helper_fildl_FT0(CPUX86State *env, int32_t val)
@@ -210,6 +234,7 @@ void helper_fildl_FT0(CPUX86State *env, int32_t val)
 
 void helper_flds_ST0(CPUX86State *env, uint32_t val)
 {
+uint8_t old_flags = save_exception_flags(env);
 int new_fpstt;
 union {
 float32 f;
@@ -221,10 +246,12 @@ void helper_flds_ST0(CPUX86State *env, uint32_t val)
 env->fpregs[new_fpstt].d = float32_to_floatx80(u.f, >fp_status);
 env->fpstt = new_fpstt;
 env->fptags[new_fpstt] = 0; /* validate stack entry */
+merge_exception_flags(env, old_flags);
 }
 
 void helper_fldl_ST0(CPUX86State *env, uint64_t val)
 {
+uint8_t old_flags = save_exception_flags(env);
 int new_fpstt;
 union {
 float64 f;
@@ -236,6 +263,7 @@

[PATCH 1/2] target/i386: fix fisttpl, fisttpll handling of out-of-range values

2020-05-15 Thread Joseph Myers

The fist / fistt family of instructions should all store the most
negative integer in the destination format when the rounded /
truncated integer result is out of range or the input is an invalid
encoding, infinity or NaN.  The fisttpl and fisttpll implementations
(32-bit and 64-bit results, truncate towards zero) failed to do this,
producing the most positive integer in some cases instead.  Fix this
by copying the code used to handle this issue for fistpl and fistpll,
adjusted to use the _round_to_zero functions for the actual
conversion (but without any other changes to that code).

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  28 -
 tests/tcg/i386/test-i386-fisttp.c | 100 ++
 2 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fisttp.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 96c512fedf..8dcc9ddf68 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -338,12 +338,36 @@ int32_t helper_fistt_ST0(CPUX86State *env)
 
 int32_t helper_fisttl_ST0(CPUX86State *env)
 {
-return floatx80_to_int32_round_to_zero(ST0, >fp_status);
+int32_t val;
+signed char old_exp_flags;
+
+old_exp_flags = get_float_exception_flags(>fp_status);
+set_float_exception_flags(0, >fp_status);
+
+val = floatx80_to_int32_round_to_zero(ST0, >fp_status);
+if (get_float_exception_flags(>fp_status) & float_flag_invalid) {
+val = 0x8000;
+}
+set_float_exception_flags(get_float_exception_flags(>fp_status)
+| old_exp_flags, >fp_status);
+return val;
 }
 
 int64_t helper_fisttll_ST0(CPUX86State *env)
 {
-return floatx80_to_int64_round_to_zero(ST0, >fp_status);
+int64_t val;
+signed char old_exp_flags;
+
+old_exp_flags = get_float_exception_flags(>fp_status);
+set_float_exception_flags(0, >fp_status);
+
+val = floatx80_to_int64_round_to_zero(ST0, >fp_status);
+if (get_float_exception_flags(>fp_status) & float_flag_invalid) {
+val = 0x8000ULL;
+}
+set_float_exception_flags(get_float_exception_flags(>fp_status)
+| old_exp_flags, >fp_status);
+return val;
 }
 
 void helper_fldt_ST0(CPUX86State *env, target_ulong ptr)
diff --git a/tests/tcg/i386/test-i386-fisttp.c 
b/tests/tcg/i386/test-i386-fisttp.c
new file mode 100644
index 00..16af59a774
--- /dev/null
+++ b/tests/tcg/i386/test-i386-fisttp.c
@@ -0,0 +1,100 @@
+/* Test fisttpl and fisttpll instructions.  */
+
+#include 
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_invalid_1 = { .s = { 1, 1234 } };
+
+int main(void)
+{
+int ret = 0;
+int32_t res_32;
+int64_t res_64;
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (0x1p100L) : "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl 0x1p100\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (-0x1p100L) : "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl -0x1p100\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (__builtin_infl()) :
+  "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl inf\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (-__builtin_infl()) :
+  "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl -inf\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (__builtin_nanl("")) :
+  "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl nan\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) :
+  "t" (-__builtin_nanl("")) : "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl -nan\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpl %0" : "=m" (res_32) : "t" (ld_invalid_1.ld) :
+  "st");
+if (res_32 != INT32_MIN) {
+printf("FAIL: fisttpl invalid\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpll %0" : "=m" (res_64) : "t" (0x1p100L) : "st");
+if (res_64 != INT64_MIN) {
+printf("FAIL: fisttpll 0x1p100\n");
+ret = 1;
+}
+__asm__ volatile ("fisttpll %0" : "=m" (res_64) : "t&qu

[PATCH 0/2] target/i386: x87 exceptions fixes

2020-05-15 Thread Joseph Myers

Following (and depending on) my three previous patch series for
problems found in the x87 floating-point emulation, this patch series
fixes some issues relating to floating-point exceptions.

Other issues in that area remain that I hope to address in future
patch series.  In particular, this patch series does not address the
"input denormal" exception (the generic softfloat code only raises
that in the flush-to-zero case; x87 has different logic for when to
raise it, generally raising it for all denormal and pseudo-denormal
operands but including a few instructions that don't raise it at all),
does not address issues with functions whose emulation currently goes
via host double (which need to be reimplemented to work properly with
the full floatx80 range and precision, probably reusing some of the
code from the m68k target), and does not address issues with the
handling of exceptions for which traps are enabled (where there are
many different bugs in the current implementation in QEMU).

Joseph Myers (2):
  target/i386: fix fisttpl, fisttpll handling of out-of-range values
  target/i386: fix IEEE x87 floating-point exception raising

 target/i386/fpu_helper.c | 130 +++-
 tests/tcg/i386/test-i386-fisttp.c| 100 +++
 tests/tcg/i386/test-i386-fp-exceptions.c | 831 +++
 3 files changed, 1040 insertions(+), 21 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fisttp.c
 create mode 100644 tests/tcg/i386/test-i386-fp-exceptions.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Ping Re: [PATCH 0/5] target/i386: fxtract, fscale fixes

2020-05-14 Thread Joseph Myers

Ping for this patch series 
.

Although my three patch series so far for floatx80 and i386 floating-point 
instructions fixes are independent of each other, it's likely future patch 
series in this area will depend on some of the previous patch series.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 0/4] target/i386: miscellaneous x87 fixes

2020-05-14 Thread Joseph Myers

On Thu, 14 May 2020, no-re...@patchew.org wrote:

> This series seems to have some coding style problems. See output below for
> more information:

These are all false positives for the same reasons as for the previous 
patch series.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 0/5] target/i386: fxtract, fscale fixes

2020-05-07 Thread Joseph Myers

On Thu, 7 May 2020, no-re...@patchew.org wrote:

> === OUTPUT BEGIN ===
> 1/5 Checking commit 69eed0bcaaaf (target/i386: implement special cases for 
> fxtract)
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?

I don't think any MAINTAINERS update is needed for a new testcase in an 
existing directory.

> ERROR: Use of volatile is usually wrong, please add a comment

I think the justification for volatile in such testcase code is obvious 
without comments in individual cases - to avoid any code movement or 
optimization that might break what the tests are intending to test (these 
tests are making heavy use of mixed C and inline asm to test how emulated 
instructions behave, including on input representations that are not valid 
long double values in the ABI and with the rounding precision changed 
behind the compiler's back).  I think making everything possibly relevant 
volatile in these tests is better than trying to produce a fragile 
argument that in fact certain data does not need to be volatile to avoid 
problematic code movement.

> ERROR: spaces required around that '-' (ctx:VxV)
> #139: FILE: tests/tcg/i386/test-i386-fxtract.c:80:
> +  "0" (0x1p-16445L));
> ^

No, this is a C99 hex float contstant, not a subtraction.  There are 
already such constants in tests/tcg/multiarch/float_helpers.c and 
tests/tcg/multiarch/float_madds.c at least, so I assume they are OK in 
QEMU floating-point tests and this style checker should not be objecting 
to them.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/2] target/i386: fix IEEE x87 floating-point exception raising

2020-05-19 Thread Joseph Myers

On Tue, 19 May 2020, Richard Henderson wrote:

> > Note that another bug in the x87 emulation is the lack of setting C1 for 
> > most instructions with inexact results based on the direction of rounding 
> > (which will require a new feature to be added to the softfloat code to 
> > record that information so the x87 emulation can use it).
> 
> Wow, I don't believe I ever knew about that detail.

musl libc uses it to get correctly rounded double-precision sqrt with x87 
arithmetic.  (glibc instead temporarily sets the rounding precision to 
achieve the same goal.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/2] target/i386: fix IEEE x87 floating-point exception raising

2020-05-19 Thread Joseph Myers

On Tue, 19 May 2020, Richard Henderson wrote:

> To retain the hard float fast path, we need to leave float_flag_invalid set
> when the accrued exception bit is set.  To me this suggests keep all of the
> FPUS_* bits in fp_status and only convert to FPUS_* when we read the fp status
> word.

There is no hard float fast path that I can see for floatx80.  The issue 
of the fast path might be relevant for fixing SSE exception handling 
(which has some similar issues to x87), but not for floatx80.

Note that another bug in the x87 emulation is the lack of setting C1 for 
most instructions with inexact results based on the direction of rounding 
(which will require a new feature to be added to the softfloat code to 
record that information so the x87 emulation can use it).

> When it comes to raising unmasked exceptions... I have a couple of thoughts.

I expect some code will be needed in each individual instruction 
implementation, and probably extra softfloat code, to handle unmasked 
exceptions.  Some exceptions, when unmasked, should result in instructions 
not popping inputs from the stack and not updating destinations.  The 
softfloat case needs to provide information about the exact underflow case 
that targets can use when that exception is set to trap.  x87 overflow and 
underflow, when unmasked and with a register destination, are supposed to 
compute and store a result with a biased exponent for use by the trap 
handler.  The code will also need to know exactly which instructions 
should result in a trap handler being called rather than only doing it for 
fwait.  Stack underflow and overflow need to be checked for, regardless of 
exception masking.  (There are other issues relating to trapped exception 
handling as well, but that's a summary of the main ones I've noticed.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] target/i386: correct fix for pcmpxstrx substring search

2020-05-21 Thread Joseph Myers

This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.

That commit fixed a bug that showed up in four GCC tests with one libc
implementation.  The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that.  Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.

When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way.  Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.".  However, that statement is incorrect.

In my previous commit message, I noted:

  The operation in this case is a search for a string (argument d to
  the helper) in another string (argument s to the helper); if a copy
  of d at a particular position would run off the end of s, the
  resulting output bit should be 0 whether or not the strings match in
  the region where they overlap, but the QEMU implementation was
  wrongly comparing only up to the point where s ends and counting it
  as a match if an initial segment of d matched a terminal segment of
  s.  Here, "run off the end of s" means that some byte of d would
  overlap some byte outside of s; thus, if d has zero length, it is
  considered to match everywhere, including after the end of s.

The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand.  So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand).  Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.

Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.

In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).

Signed-off-by: Joseph Myers 
---
 target/i386/ops_sse.h|  4 ++--
 tests/tcg/i386/Makefile.target   |  3 +++
 tests/tcg/i386/test-i386-pcmpistri.c | 33 
 3 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-pcmpistri.c

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index ec1ec745d0..f5ede2ca27 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2076,10 +2076,10 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg 
*d, Reg *s,
 res = (2 << upper) - 1;
 break;
 }
-for (j = valids - validd; j >= 0; j--) {
+for (j = valids == upper ? valids : valids - validd; j >= 0; j--) {
 res <<= 1;
 v = 1;
-for (i = validd; i >= 0; i--) {
+for (i = MIN(valids - j, validd); i >= 0; i--) {
 v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
 }
 res |= v;
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 43ee2e181e..de5a3a275f 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -10,6 +10,9 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
 SKIP_I386_TESTS=test-i386-ssse3
 X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
 
+test-i386-pcmpistri: CFLAGS += -msse4.2
+test-i386-pcmpistri: QEMU_OPTS += -cpu max
+
 #
 # hello-i386 is a barebones app
 #
diff --git a/tests/tcg/i386/test-i386-pcmpistri.c 
b/tests/tcg/i386/test-i386-pcmpistri.c

Re: [PATCH 1/4] softfloat: silence sNaN for conversions to/from floatx80

2020-05-01 Thread Joseph Myers

On Fri, 1 May 2020, Alex Bennée wrote:

> I still see some failures for:
> 
>   f64_to_extF80
>   f128_to_extF80

Running what I think are those tests, I see e.g.

./fp-test -s -l 1 -r all f64_to_extF80
>> Testing f64_to_extF80
768 tests total.
Errors found in f64_to_extF80:
-368.800FF
=> -3F68.C007F800 .  expected -3F68.C000 x

which looks like it's a test of the floatx80 format with 24-bit precision.

If that's what this is testing, then:

(a) float64_to_floatx80 would need, in 24-bit mode, to call 
roundAndPackFloatx80 rather than just packFloatx80, to get appropriate 
rounding;

(b) float128_to_floatx80 would need to use the dynamically specified 
rounding precision in its call to roundAndPackFloatx80 instead of 
hardcoded 80;

(c) but i386 instruction semantics are that a load of a double value into 
a floating-point register, in the 24-bit mode, does *not* convert the 
significand to 24-bit precision, but loads the full 53-bit-precision value 
into the register, so making such a change to float64_to_floatx80 would 
render it incorrect for i386 emulation without changes to the target/i386 
code to adjust the rounding precision used for loads;

(d) float128_to_floatx80 shouldn't actually be used by any QEMU target, 
because no supported CPU architecture has support for both formats in 
hardware (although I made my sNaN change to the conversions between them 
anyway for completeness).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC PATCH 08/21] contrib/gitdm: Add Mentor Graphics to the domain map

2020-10-05 Thread Joseph Myers

On Mon, 5 Oct 2020, Alex Bennée wrote:

> Joseph Myers  writes:
> 
> > On Sun, 4 Oct 2020, Philippe Mathieu-Daudé wrote:
> >
> >> There is a number of contributors from this domain,
> >> add its own entry to the gitdm domain map.
> >
> > At some point the main branding will be Siemens; not sure how you want to 
> > handle that.
> 
> We've already done something similar with WaveComp who have rolled up
> the various mips and imgtec contributions into
> contrib/gitdm/group-map-wavecomp.
> 
> It's really up to you and which corporate entity would like internet
> bragging points. The only Siemens contributor I could find is Jan Kiszka
> but he has contributed a fair amount ;-)

Given that the Mentor branding is going away (and the "Mentor Graphics" 
version largely has gone away, "Mentor, a Siemens Business" is what's 
currently used as a Mentor brand), probably it makes sense to use Siemens 
for both codesourcery.com and mentor.com addresses.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC PATCH 08/21] contrib/gitdm: Add Mentor Graphics to the domain map

2020-10-05 Thread Joseph Myers

On Sun, 4 Oct 2020, Philippe Mathieu-Daudé wrote:

> There is a number of contributors from this domain,
> add its own entry to the gitdm domain map.

At some point the main branding will be Siemens; not sure how you want to 
handle that.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 0/4] target/i386: miscellaneous x87 fixes

2020-06-02 Thread Joseph Myers

Ping for this patch series 
, and 
the subsequent series 
 and 
individual patch 
.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-22 Thread Joseph Myers

The x87 fpatan emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float operations, as
for other such instructions.

Signed-off-by: Joseph Myers 

---

Changes in version 2: adjust the "Dividing ST1 by ST0 gives the
correct result." case to ensure correct exceptions, as well as a
correctly rounded result in non-to-nearest modes, when the division is
exact.
---
 target/i386/fpu_helper.c  |  487 -
 tests/tcg/i386/test-i386-fpatan.c | 1071 +
 2 files changed, 1554 insertions(+), 4 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fpatan.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 62820bc735..71cec3962f 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1239,14 +1239,493 @@ void helper_fptan(CPUX86State *env)
 }
 }
 
+/* Values of pi/4, pi/2, 3pi/4 and pi, with 128-bit precision.  */
+#define pi_4_exp 0x3ffe
+#define pi_4_sig_high 0xc90fdaa22168c234ULL
+#define pi_4_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_2_exp 0x3fff
+#define pi_2_sig_high 0xc90fdaa22168c234ULL
+#define pi_2_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_34_exp 0x4000
+#define pi_34_sig_high 0x96cbe3f9990e91a7ULL
+#define pi_34_sig_low 0x9394c9e8a0a5159dULL
+#define pi_exp 0x4000
+#define pi_sig_high 0xc90fdaa22168c234ULL
+#define pi_sig_low 0xc4c6628b80dc1cd1ULL
+
+/*
+ * Polynomial coefficients for an approximation to atan(x), with only
+ * odd powers of x used, for x in the interval [-1/16, 1/16].  (Unlike
+ * for some other approximations, no low part is needed for the first
+ * coefficient here to achieve a sufficiently accurate result, because
+ * the coefficient in this minimax approximation is very close to
+ * exactly 1.)
+ */
+#define fpatan_coeff_0 make_floatx80(0x3fff, 0x8000ULL)
+#define fpatan_coeff_1 make_floatx80(0xbffd, 0xaa43ULL)
+#define fpatan_coeff_2 make_floatx80(0x3ffc, 0xccbfe4f8ULL)
+#define fpatan_coeff_3 make_floatx80(0xbffc, 0x92492491fbab2e66ULL)
+#define fpatan_coeff_4 make_floatx80(0x3ffb, 0xe38e372881ea1e0bULL)
+#define fpatan_coeff_5 make_floatx80(0xbffb, 0xba2c0104bbdd0615ULL)
+#define fpatan_coeff_6 make_floatx80(0x3ffb, 0x9baf7ebf898b42efULL)
+
+struct fpatan_data {
+/* High and low parts of atan(x).  */
+floatx80 atan_high, atan_low;
+};
+
+static const struct fpatan_data fpatan_table[9] = {
+{ floatx80_zero,
+  floatx80_zero },
+{ make_floatx80(0x3ffb, 0xfeadd4d5617b6e33ULL),
+  make_floatx80(0xbfb9, 0xdda19d8305ddc420ULL) },
+{ make_floatx80(0x3ffc, 0xfadbafc96406eb15ULL),
+  make_floatx80(0x3fbb, 0xdb8f3debef442fccULL) },
+{ make_floatx80(0x3ffd, 0xb7b0ca0f26f78474ULL),
+  make_floatx80(0xbfbc, 0xeab9bdba460376faULL) },
+{ make_floatx80(0x3ffd, 0xed63382b0dda7b45ULL),
+  make_floatx80(0x3fbc, 0xdfc88bd978751a06ULL) },
+{ make_floatx80(0x3ffe, 0x8f005d5ef7f59f9bULL),
+  make_floatx80(0x3fbd, 0xb906bc2ccb886e90ULL) },
+{ make_floatx80(0x3ffe, 0xa4bc7d1934f70924ULL),
+  make_floatx80(0x3fbb, 0xcd43f9522bed64f8ULL) },
+{ make_floatx80(0x3ffe, 0xb8053e2bc2319e74ULL),
+  make_floatx80(0xbfbc, 0xd3496ab7bd6eef0cULL) },
+{ make_floatx80(0x3ffe, 0xc90fdaa22168c235ULL),
+  make_floatx80(0xbfbc, 0xece675d1fc8f8cbcULL) },
+};
+
 void helper_fpatan(CPUX86State *env)
 {
-double fptemp, fpsrcop;
+uint8_t old_flags = save_exception_flags(env);
+uint64_t arg0_sig = extractFloatx80Frac(ST0);
+int32_t arg0_exp = extractFloatx80Exp(ST0);
+bool arg0_sign = extractFloatx80Sign(ST0);
+uint64_t arg1_sig = extractFloatx80Frac(ST1);
+int32_t arg1_exp = extractFloatx80Exp(ST1);
+bool arg1_sign = extractFloatx80Sign(ST1);
+
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST0, >fp_status);
+} else if (floatx80_is_signaling_nan(ST1, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST1, >fp_status);
+} else if (floatx80_invalid_encoding(ST0) ||
+   floatx80_invalid_encoding(ST1)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_default_nan(>fp_status);
+} else if (floatx80_is_any_nan(ST0)) {
+ST1 = ST0;
+} else if (floatx80_is_any_nan(ST1)) {
+/* Pass this NaN through.  */
+} else if (floatx80_is_zero(ST1) && !arg0_sign) {
+/* Pass this zero through.  */
+} else if (((floatx80_is_infinity(ST0) && !floatx80_is_infinity(ST1)) ||
+ arg0_exp - arg1_exp >= 80) &&
+   !arg0_sign) {
+/*
+ * Dividing ST1 by ST0 gives the correct result up to
+ * rounding, and avoids spurious underflow exceptions that
+ * might result from

Re: [PATCH v2] target/i386: reimplement fpatan using floatx80 operations

2020-06-23 Thread Joseph Myers

On Tue, 23 Jun 2020, Paolo Bonzini wrote:

> On 23/06/20 02:01, Joseph Myers wrote:
> > The x87 fpatan emulation is currently based around conversion to
> > double.  This is inherently unsuitable for a good emulation of any
> > floatx80 operation.  Reimplement using the soft-float operations, as
> > for other such instructions.
> > 
> > Signed-off-by: Joseph Myers 
> 
> Queued, thanks.
> 
> Just one question: do recent processors still use the same CORDIC
> approximations as the 8087, and if so would it be better or simpler to
> do that instead of using a good implementation such as this one?

I don't know what approximations the processors use, but they're 
definitely different for at least some instructions between Intel and AMD 
processors (as shown by glibc test ulps baselines created on one processor 
sometimes needing increasing to work on other processors; avoiding test 
problems means the emulation needs to be at least as accurate as 
hardware).  (Whereas the AVX-512 approximation instructions have reference 
implementations for their exact semantics.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 0/2] target/i386: SSE floating-point fixes

2020-06-25 Thread Joseph Myers

Fix some issues relating to SSE floating-point emulation.  The first
patch fixes a problem with the handling of the FTZ bit that was found
through the testcase written for the second patch.  Rather than
writing a separate standalone test for that bug, it seemed sufficient
for the testcase in the second patch to cover both patches.

The style checker will produce its usual inapplicable warnings about
use of "volatile" in the testcase and about C99 hex float constants.

Joseph Myers (2):
  target/i386: set SSE FTZ in correct floating-point state
  target/i386: fix IEEE SSE floating-point exception raising

 target/i386/cpu.h |   1 +
 target/i386/fpu_helper.c  |  35 +-
 target/i386/gdbstub.c |   1 +
 target/i386/helper.c  |   1 +
 target/i386/helper.h  |   1 +
 target/i386/ops_sse.h |  28 +-
 target/i386/translate.c   |   1 +
 tests/tcg/i386/Makefile.target|   4 +
 tests/tcg/i386/test-i386-sse-exceptions.c | 813 ++
 9 files changed, 872 insertions(+), 13 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-sse-exceptions.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 2/2] target/i386: fix IEEE SSE floating-point exception raising

2020-06-25 Thread Joseph Myers

The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions.  Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations.  The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Signed-off-by: Joseph Myers 
---
 target/i386/cpu.h |   1 +
 target/i386/fpu_helper.c  |  33 +
 target/i386/gdbstub.c |   1 +
 target/i386/helper.c  |   1 +
 target/i386/helper.h  |   1 +
 target/i386/ops_sse.h |  28 +-
 target/i386/translate.c   |   1 +
 tests/tcg/i386/Makefile.target|   4 +
 tests/tcg/i386/test-i386-sse-exceptions.c | 813 ++
 9 files changed, 871 insertions(+), 12 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-sse-exceptions.c

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7d77efd9e4..06b2e3a5c6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2143,6 +2143,7 @@ static inline bool cpu_vmx_maybe_enabled(CPUX86State *env)
 /* fpu_helper.c */
 void update_fp_status(CPUX86State *env);
 void update_mxcsr_status(CPUX86State *env);
+void update_mxcsr_from_sse_status(CPUX86State *env);
 
 static inline void cpu_set_mxcsr(CPUX86State *env, uint32_t mxcsr)
 {
diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 6590ce482f..bc79812fe3 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1397,6 +1397,7 @@ static void do_xsave_fpu(CPUX86State *env, target_ulong 
ptr, uintptr_t ra)
 
 static void do_xsave_mxcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra)
 {
+update_mxcsr_from_sse_status(env);
 cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr), env->mxcsr, ra);
 cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr_mask), 0x, ra);
 }
@@ -1826,6 +1827,14 @@ void update_mxcsr_status(CPUX86State *env)
 }
 set_float_rounding_mode(rnd_type, >sse_status);
 
+/* Set exception flags.  */
+set_float_exception_flags((mxcsr & FPUS_IE ? float_flag_invalid : 0) |
+  (mxcsr & FPUS_ZE ? float_flag_divbyzero : 0) |
+  (mxcsr & FPUS_OE ? float_flag_overflow : 0) |
+  (mxcsr & FPUS_UE ? float_flag_underflow : 0) |
+  (mxcsr & FPUS_PE ? float_flag_inexact : 0),
+  >sse_status);
+
 /* set denormals are zero */
 set_flush_inputs_to_zero((mxcsr & SSE_DAZ) ? 1 : 0, >sse_status);
 
@@ -1833,6 +1842,30 @@ void update_mxcsr_status(CPUX86State *env)
 set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, >sse_status);
 }
 
+void update_mxcsr_from_sse_status(CPUX86State *env)
+{
+uint8_t flags = get_float_exception_flags(>sse_status);
+/*
+ * The MXCSR denormal flag has opposite semantics to
+ * float_flag_input_denormal (the softfloat code sets that flag
+ * only when flushing input denormals to zero, but SSE sets it
+ * only when not flushing them to zero), so is not converted
+ * here.
+ */
+env->mxcsr |= ((flags & float_flag_invalid ? FPUS_IE : 0) |
+   (flags & float_flag_divbyzero ? FPUS_ZE : 0) |
+   (flags & float_flag_overflow ? FPUS_OE : 0) |
+   (flags & float_flag_underflow ? FPUS_UE : 0) |
+   (flags & float_flag_inexact ? FPUS_PE : 0) |
+   (flags & float_flag_output_denor

[PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state

2020-06-25 Thread Joseph Myers

The code to set floating-point state when MXCSR changes calls
set_flush_to_zero on >fp_status, so affecting the x87
floating-point state rather than the SSE state.  Fix to call it for
>sse_status instead.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..6590ce482f 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1830,7 +1830,7 @@ void update_mxcsr_status(CPUX86State *env)
 set_flush_inputs_to_zero((mxcsr & SSE_DAZ) ? 1 : 0, >sse_status);
 
 /* set flush to zero */
-set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, >fp_status);
+set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, >sse_status);
 }
 
 void helper_ldmxcsr(CPUX86State *env, uint32_t val)
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] target/i386: reimplement fpatan using floatx80 operations

2020-06-19 Thread Joseph Myers

Testing with the glibc testsuite shows this patch needs a little more work 
to get correct underflow/inexact exceptions in the case where ST0 is 
positive and ST1/ST0 is small.  I'll send a revised patch next week (I 
don't expect any changes in the rest of the code).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] target/i386: reimplement fpatan using floatx80 operations

2020-06-19 Thread Joseph Myers

On Fri, 19 Jun 2020, no-re...@patchew.org wrote:

> This series failed the docker-mingw@fedora build test. Please find the 
> testing commands and their output below. If you have Docker installed, 
> you can probably reproduce it locally.

This is because the patch depends on my previous patch to reimplement 
f2xm1, which adds an include of fpu/softfloat-macros.h to 
target/i386/fpu_helper.c.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] target/i386: reimplement fpatan using floatx80 operations

2020-06-19 Thread Joseph Myers

On Fri, 19 Jun 2020, no-re...@patchew.org wrote:

> This series seems to have some coding style problems. See output below for
> more information:

This is the same issues as before of this patch checker not understanding 
hex float constants, and it not seeming particularly useful to wrap lines 
in a large table of randomly generated tests.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] target/i386: reimplement fpatan using floatx80 operations

2020-06-19 Thread Joseph Myers

The x87 fpatan emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float operations, as
for other such instructions.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c  |  461 -
 tests/tcg/i386/test-i386-fpatan.c | 1071 +
 2 files changed, 1528 insertions(+), 4 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fpatan.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 62820bc735..7436c62c9b 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1239,14 +1239,467 @@ void helper_fptan(CPUX86State *env)
 }
 }
 
+/* Values of pi/4, pi/2, 3pi/4 and pi, with 128-bit precision.  */
+#define pi_4_exp 0x3ffe
+#define pi_4_sig_high 0xc90fdaa22168c234ULL
+#define pi_4_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_2_exp 0x3fff
+#define pi_2_sig_high 0xc90fdaa22168c234ULL
+#define pi_2_sig_low 0xc4c6628b80dc1cd1ULL
+#define pi_34_exp 0x4000
+#define pi_34_sig_high 0x96cbe3f9990e91a7ULL
+#define pi_34_sig_low 0x9394c9e8a0a5159dULL
+#define pi_exp 0x4000
+#define pi_sig_high 0xc90fdaa22168c234ULL
+#define pi_sig_low 0xc4c6628b80dc1cd1ULL
+
+/*
+ * Polynomial coefficients for an approximation to atan(x), with only
+ * odd powers of x used, for x in the interval [-1/16, 1/16].  (Unlike
+ * for some other approximations, no low part is needed for the first
+ * coefficient here to achieve a sufficiently accurate result, because
+ * the coefficient in this minimax approximation is very close to
+ * exactly 1.)
+ */
+#define fpatan_coeff_0 make_floatx80(0x3fff, 0x8000ULL)
+#define fpatan_coeff_1 make_floatx80(0xbffd, 0xaa43ULL)
+#define fpatan_coeff_2 make_floatx80(0x3ffc, 0xccbfe4f8ULL)
+#define fpatan_coeff_3 make_floatx80(0xbffc, 0x92492491fbab2e66ULL)
+#define fpatan_coeff_4 make_floatx80(0x3ffb, 0xe38e372881ea1e0bULL)
+#define fpatan_coeff_5 make_floatx80(0xbffb, 0xba2c0104bbdd0615ULL)
+#define fpatan_coeff_6 make_floatx80(0x3ffb, 0x9baf7ebf898b42efULL)
+
+struct fpatan_data {
+/* High and low parts of atan(x).  */
+floatx80 atan_high, atan_low;
+};
+
+static const struct fpatan_data fpatan_table[9] = {
+{ floatx80_zero,
+  floatx80_zero },
+{ make_floatx80(0x3ffb, 0xfeadd4d5617b6e33ULL),
+  make_floatx80(0xbfb9, 0xdda19d8305ddc420ULL) },
+{ make_floatx80(0x3ffc, 0xfadbafc96406eb15ULL),
+  make_floatx80(0x3fbb, 0xdb8f3debef442fccULL) },
+{ make_floatx80(0x3ffd, 0xb7b0ca0f26f78474ULL),
+  make_floatx80(0xbfbc, 0xeab9bdba460376faULL) },
+{ make_floatx80(0x3ffd, 0xed63382b0dda7b45ULL),
+  make_floatx80(0x3fbc, 0xdfc88bd978751a06ULL) },
+{ make_floatx80(0x3ffe, 0x8f005d5ef7f59f9bULL),
+  make_floatx80(0x3fbd, 0xb906bc2ccb886e90ULL) },
+{ make_floatx80(0x3ffe, 0xa4bc7d1934f70924ULL),
+  make_floatx80(0x3fbb, 0xcd43f9522bed64f8ULL) },
+{ make_floatx80(0x3ffe, 0xb8053e2bc2319e74ULL),
+  make_floatx80(0xbfbc, 0xd3496ab7bd6eef0cULL) },
+{ make_floatx80(0x3ffe, 0xc90fdaa22168c235ULL),
+  make_floatx80(0xbfbc, 0xece675d1fc8f8cbcULL) },
+};
+
 void helper_fpatan(CPUX86State *env)
 {
-double fptemp, fpsrcop;
+uint8_t old_flags = save_exception_flags(env);
+uint64_t arg0_sig = extractFloatx80Frac(ST0);
+int32_t arg0_exp = extractFloatx80Exp(ST0);
+bool arg0_sign = extractFloatx80Sign(ST0);
+uint64_t arg1_sig = extractFloatx80Frac(ST1);
+int32_t arg1_exp = extractFloatx80Exp(ST1);
+bool arg1_sign = extractFloatx80Sign(ST1);
+
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST0, >fp_status);
+} else if (floatx80_is_signaling_nan(ST1, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST1, >fp_status);
+} else if (floatx80_invalid_encoding(ST0) ||
+   floatx80_invalid_encoding(ST1)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_default_nan(>fp_status);
+} else if (floatx80_is_any_nan(ST0)) {
+ST1 = ST0;
+} else if (floatx80_is_any_nan(ST1)) {
+/* Pass this NaN through.  */
+} else if (floatx80_is_zero(ST1) && !arg0_sign) {
+/* Pass this zero through.  */
+} else if (((floatx80_is_infinity(ST0) && !floatx80_is_infinity(ST1)) ||
+ arg0_exp - arg1_exp >= 80) &&
+   !arg0_sign) {
+/* Dividing ST1 by ST0 gives the correct result.  */
+signed char save_prec = env->fp_status.floatx80_rounding_precision;
+env->fp_status.floatx80_rounding_precision = 80;
+ST1 = floatx80_div(ST1, ST0, >fp_status);
+env->fp_status.floatx80_rounding_precision = save_prec;
+} else {
+/* The result is inexact.  */
+bool rs

Re: [PATCH 1/5] target/i386: implement special cases for fxtract

2020-06-23 Thread Joseph Myers

On Tue, 23 Jun 2020, Eduardo Habkost wrote:

> > +if (EXPD(temp) == 0) {
> > +int shift = clz64(temp.l.lower);
> > +temp.l.lower <<= shift;
> 
> Coverity reports the following.  It looks like a false positive
> because floatx80_is_zero() would be true if both EXPD(temp) and
> temp.l.lower were zero, but maybe I'm missing something.

Yes, that looks like a false positive to me.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] target/i386: reimplement fyl2xp1 using floatx80 operations

2020-06-16 Thread Joseph Myers

The x87 fyl2xp1 emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (adding 1 then using log rather than
attempting a better emulation using log1p).

Reimplement using the soft-float operations, as was done for f2xm1; as
in that case, m68k has related operations but not exactly this one and
it seemed safest to implement directly rather than reusing the m68k
code to avoid accumulation of errors.

A test is included with many randomly generated inputs.  The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding).  I haven't
investigated how accurate hardware is.

Intel manuals describe a narrower range of valid arguments to this
instruction than AMD manuals.  The implementation accepts the wider
range (it's needed anyway for the core code to be reusable in a
subsequent patch reimplementing fyl2x), but the test only has inputs
in the narrower range so that it's valid on hardware that may reject
or produce poor results for inputs outside that range.

Code in the previous implementation that sets C2 for some out-of-range
arguments is not carried forward to the new implementation; C2 is
undefined for this instruction and I suspect that code was just
cut-and-pasted from the trigonometric instructions (fcos, fptan, fsin,
fsincos) where C2 *is* defined to be set for out-of-range arguments.

Signed-off-by: Joseph Myers 

---

This patch *does* depend on my previous one for f2xm1, but only at the
trivial level of needing a #include added by the previous patch.
---
 target/i386/fpu_helper.c   |  208 -
 tests/tcg/i386/test-i386-fyl2xp1.c | 1155 
 2 files changed, 1354 insertions(+), 9 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-fyl2xp1.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8f34ea9776..63b8d20824 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1373,19 +1373,209 @@ void helper_fprem(CPUX86State *env)
 helper_fprem_common(env, true);
 }
 
+/* 128-bit significand of log2(e).  */
+#define log2_e_sig_high 0xb8aa3b295c17f0bbULL
+#define log2_e_sig_low 0xbe87fed0691d3e89ULL
+
+/*
+ * Polynomial coefficients for an approximation to log2((1+x)/(1-x)),
+ * with only odd powers of x used, for x in the interval [2*sqrt(2)-3,
+ * 3-2*sqrt(2)], which corresponds to logarithms of numbers in the
+ * interval [sqrt(2)/2, sqrt(2)].
+ */
+#define fyl2x_coeff_0 make_floatx80(0x4000, 0xb8aa3b295c17f0bcULL)
+#define fyl2x_coeff_0_low make_floatx80(0xbfbf, 0x834972fe2d7bab1bULL)
+#define fyl2x_coeff_1 make_floatx80(0x3ffe, 0xf6384ee1d01febb8ULL)
+#define fyl2x_coeff_2 make_floatx80(0x3ffe, 0x93bb62877cdfa2e3ULL)
+#define fyl2x_coeff_3 make_floatx80(0x3ffd, 0xd30bb153d808f269ULL)
+#define fyl2x_coeff_4 make_floatx80(0x3ffd, 0xa42589eaf451499eULL)
+#define fyl2x_coeff_5 make_floatx80(0x3ffd, 0x864d42c0f8f17517ULL)
+#define fyl2x_coeff_6 make_floatx80(0x3ffc, 0xe3476578adf26272ULL)
+#define fyl2x_coeff_7 make_floatx80(0x3ffc, 0xc506c5f874e6d80fULL)
+#define fyl2x_coeff_8 make_floatx80(0x3ffc, 0xac5cf50cc57d6372ULL)
+#define fyl2x_coeff_9 make_floatx80(0x3ffc, 0xb1ed0066d971a103ULL)
+
 void helper_fyl2xp1(CPUX86State *env)
 {
-double fptemp = floatx80_to_double(env, ST0);
-
-if ((fptemp + 1.0) > 0.0) {
-fptemp = log(fptemp + 1.0) / log(2.0); /* log2(ST + 1.0) */
-fptemp *= floatx80_to_double(env, ST1);
-ST1 = double_to_floatx80(env, fptemp);
-fpop(env);
+uint8_t old_flags = save_exception_flags(env);
+uint64_t arg0_sig = extractFloatx80Frac(ST0);
+int32_t arg0_exp = extractFloatx80Exp(ST0);
+bool arg0_sign = extractFloatx80Sign(ST0);
+uint64_t arg1_sig = extractFloatx80Frac(ST1);
+int32_t arg1_exp = extractFloatx80Exp(ST1);
+bool arg1_sign = extractFloatx80Sign(ST1);
+
+if (floatx80_is_signaling_nan(ST0, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST0, >fp_status);
+} else if (floatx80_is_signaling_nan(ST1, >fp_status)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_silence_nan(ST1, >fp_status);
+} else if (floatx80_invalid_encoding(ST0) ||
+   floatx80_invalid_encoding(ST1)) {
+float_raise(float_flag_invalid, >fp_status);
+ST1 = floatx80_default_nan(>fp_status);
+} else if (floatx80_is_any_nan(ST0)) {
+ST1 = ST0;
+} else if (floatx80_is_any_nan(ST1)) {
+/* Pass this NaN through.  */
+} else if (arg0_exp > 0x3ffd ||
+   (arg0_exp == 0

[PATCH 5/7] softfloat: return low bits of quotient from floatx80_modrem

2020-06-05 Thread Joseph Myers

Both x87 and m68k need the low parts of the quotient for their
remainder operations.  Arrange for floatx80_modrem to track those bits
and return them via a pointer.

The architectures using float32_rem and float64_rem do not appear to
need this information, so the *_rem interface is left unchanged and
the information returned only from floatx80_modrem.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch does not
change that, but the m68k maintainers may wish to do so.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 23 ++-
 include/fpu/softfloat.h |  3 ++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 423a815196..c3c3f382af 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5684,10 +5684,11 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, 
float_status *status)
 | `a' with respect to the corresponding value `b'.  The operation is performed
 | according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
 | if 'mod' is false; if 'mod' is true, return the remainder based on truncating
-| the quotient toward zero instead.
+| the quotient toward zero instead.  '*quotient' is set to the low 64 bits of
+| the absolute value of the integer quotient.
 **/
 
-floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod, uint64_t *quotient,
  float_status *status)
 {
 bool aSign, zSign;
@@ -5695,6 +5696,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 uint64_t aSig0, aSig1, bSig;
 uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
+*quotient = 0;
 if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
 float_raise(float_flag_invalid, status);
 return floatx80_default_nan(status);
@@ -5749,7 +5751,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
-q = ( bSig <= aSig0 );
+*quotient = q = ( bSig <= aSig0 );
 if ( q ) aSig0 -= bSig;
 expDiff -= 64;
 while ( 0 < expDiff ) {
@@ -5759,6 +5761,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 sub128( aSig0, aSig1, term0, term1, ,  );
 shortShift128Left( aSig0, aSig1, 62, ,  );
 expDiff -= 62;
+*quotient <<= 62;
+*quotient += q;
 }
 expDiff += 64;
 if ( 0 < expDiff ) {
@@ -5772,6 +5776,12 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 ++q;
 sub128( aSig0, aSig1, term0, term1, ,  );
 }
+if (expDiff < 64) {
+*quotient <<= expDiff;
+} else {
+*quotient = 0;
+}
+*quotient += q;
 }
 else {
 term1 = 0;
@@ -5786,6 +5796,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 aSig0 = alternateASig0;
 aSig1 = alternateASig1;
 zSign = ! zSign;
+++*quotient;
 }
 }
 return
@@ -5802,7 +5813,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 
 floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
 {
-return floatx80_modrem(a, b, false, status);
+uint64_t quotient;
+return floatx80_modrem(a, b, false, , status);
 }
 
 /*
@@ -5813,7 +5825,8 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 
 floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
 {
-return floatx80_modrem(a, b, true, status);
+uint64_t quotient;
+return floatx80_modrem(a, b, true, , status);
 }
 
 /*
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index bff6934d09..ff4e2605b1 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,7 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80, float_status 
*status);
 floatx80 floatx80_sub(floatx80, floatx80, float_status *status);
 floatx80 floatx80_mul(floatx80, floatx80, float_status *status);
 floatx80 floatx80_div(floatx80, floatx80, float_status *status);
-floatx80 floatx80_modrem(floatx80, floatx80, bool, float_status *status);
+floatx80 floatx80_modrem(floatx80, floatx80, bool, uint64_t *,
+ float_status *status);
 floatx80 floatx80_mod(floatx80, floatx80, float_status *status);
 floatx80 floatx80_rem(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sqrt(floatx80, float_status *status);
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 4/7] softfloat: do not set denominator high bit for floatx80 remainder

2020-06-05 Thread Joseph Myers

The floatx80 remainder implementation unnecessarily sets the high bit
of bSig explicitly.  By that point in the function, arguments that are
invalid, zero, infinity or NaN have already been handled and
subnormals have been through normalizeFloatx80Subnormal, so the high
bit will already be set.  Remove the unnecessary code.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 00f362af23..423a815196 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5734,7 +5734,6 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 if ( aSig0 == 0 ) return a;
 normalizeFloatx80Subnormal( aSig0, ,  );
 }
-bSig |= UINT64_C(0x8000);
 zSign = aSign;
 expDiff = aExp - bExp;
 aSig1 = 0;
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 6/7] target/i386: reimplement fprem1 using floatx80 operations

2020-06-05 Thread Joseph Myers

The x87 fprem1 emulation is currently based around conversion to
double, which is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float floatx80
remainder operations.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c | 96 +++-
 1 file changed, 45 insertions(+), 51 deletions(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..bab35e00a0 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -934,63 +934,57 @@ void helper_fxtract(CPUX86State *env)
 merge_exception_flags(env, old_flags);
 }
 
-void helper_fprem1(CPUX86State *env)
+static void helper_fprem_common(CPUX86State *env, bool mod)
 {
-double st0, st1, dblq, fpsrcop, fptemp;
-CPU_LDoubleU fpsrcop1, fptemp1;
-int expdif;
-signed long long int q;
-
-st0 = floatx80_to_double(env, ST0);
-st1 = floatx80_to_double(env, ST1);
-
-if (isinf(st0) || isnan(st0) || isnan(st1) || (st1 == 0.0)) {
-ST0 = double_to_floatx80(env, 0.0 / 0.0); /* NaN */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-return;
-}
-
-fpsrcop = st0;
-fptemp = st1;
-fpsrcop1.d = ST0;
-fptemp1.d = ST1;
-expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
-
-if (expdif < 0) {
-/* optimisation? taken from the AMD docs */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* ST0 is unchanged */
-return;
-}
+uint8_t old_flags = save_exception_flags(env);
+uint64_t quotient;
+CPU_LDoubleU temp0, temp1;
+int exp0, exp1, expdiff;
 
-if (expdif < 53) {
-dblq = fpsrcop / fptemp;
-/* round dblq towards nearest integer */
-dblq = rint(dblq);
-st0 = fpsrcop - fptemp * dblq;
+temp0.d = ST0;
+temp1.d = ST1;
+exp0 = EXPD(temp0);
+exp1 = EXPD(temp1);
 
-/* convert dblq to q by truncating towards zero */
-if (dblq < 0.0) {
-q = (signed long long int)(-dblq);
+env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
+if (floatx80_is_zero(ST0) || floatx80_is_zero(ST1) ||
+exp0 == 0x7fff || exp1 == 0x7fff ||
+floatx80_invalid_encoding(ST0) || floatx80_invalid_encoding(ST1)) {
+ST0 = floatx80_modrem(ST0, ST1, mod, , >fp_status);
+} else {
+if (exp0 == 0) {
+exp0 = 1 - clz64(temp0.l.lower);
+}
+if (exp1 == 0) {
+exp1 = 1 - clz64(temp1.l.lower);
+}
+expdiff = exp0 - exp1;
+if (expdiff < 64) {
+ST0 = floatx80_modrem(ST0, ST1, mod, , >fp_status);
+env->fpus |= (quotient & 0x4) << (8 - 2);  /* (C0) <-- q2 */
+env->fpus |= (quotient & 0x2) << (14 - 1); /* (C3) <-- q1 */
+env->fpus |= (quotient & 0x1) << (9 - 0);  /* (C1) <-- q0 */
 } else {
-q = (signed long long int)dblq;
+/* Partial remainder.  This choice of how many bits to
+ * process at once is specified in AMD instruction set
+ * manuals, and empirically is followed by Intel
+ * processors as well; it ensures that the final remainder
+ * operation in a loop does produce the correct low three
+ * bits of the quotient.  AMD manuals specify that the
+ * flags other than C2 are cleared, and empirically Intel
+ * processors clear them as well.  */
+int n = 32 + (expdiff % 32);
+temp1.d = floatx80_scalbn(temp1.d, expdiff - n, >fp_status);
+ST0 = floatx80_mod(ST0, temp1.d, >fp_status);
+env->fpus |= 0x400;  /* C2 <-- 1 */
 }
-
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* (C0,C3,C1) <-- (q2,q1,q0) */
-env->fpus |= (q & 0x4) << (8 - 2);  /* (C0) <-- q2 */
-env->fpus |= (q & 0x2) << (14 - 1); /* (C3) <-- q1 */
-env->fpus |= (q & 0x1) << (9 - 0);  /* (C1) <-- q0 */
-} else {
-env->fpus |= 0x400;  /* C2 <-- 1 */
-fptemp = pow(2.0, expdif - 50);
-fpsrcop = (st0 / st1) / fptemp;
-/* fpsrcop = integer obtained by chopping */
-fpsrcop = (fpsrcop < 0.0) ?
-  -(floor(fabs(fpsrcop))) : floor(fpsrcop);
-st0 -= (st1 * fpsrcop * fptemp);
 }
-ST0 = double_to_floatx80(env, st0);
+merge_exception_flags(env, old_flags);
+}
+
+void helper_fprem1(CPUX86State *env)
+{
+helper_fprem_common(env, false);
 }
 
 void helper_fprem(CPUX86State *env)
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 1/7] softfloat: merge floatx80_mod and floatx80_rem

2020-06-05 Thread Joseph Myers

The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer.  This is also useful for emulating the x87 fprem
and fprem1 instructions.  Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.

There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 49 ++--
 include/fpu/softfloat.h |  2 +
 target/m68k/softfloat.c | 83 -
 target/m68k/softfloat.h |  1 -
 4 files changed, 40 insertions(+), 95 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6c8f2d597a..7b1ce7664f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5682,10 +5682,13 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, 
float_status *status)
 /*
 | Returns the remainder of the extended double-precision floating-point value
 | `a' with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
+| if 'mod' is false; if 'mod' is true, return the remainder based on truncating
+| the quotient toward zero instead.
 **/
 
-floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+ float_status *status)
 {
 bool aSign, zSign;
 int32_t aExp, bExp, expDiff;
@@ -5731,7 +5734,7 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 expDiff = aExp - bExp;
 aSig1 = 0;
 if ( expDiff < 0 ) {
-if ( expDiff < -1 ) return a;
+if ( mod || expDiff < -1 ) return a;
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
@@ -5763,14 +5766,16 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 term1 = 0;
 term0 = bSig;
 }
-sub128( term0, term1, aSig0, aSig1, ,  );
-if (lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
- || (eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
-  && ( q & 1 ) )
-   ) {
-aSig0 = alternateASig0;
-aSig1 = alternateASig1;
-zSign = ! zSign;
+if (!mod) {
+sub128( term0, term1, aSig0, aSig1, ,  );
+if (lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
+|| (eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
+&& ( q & 1 ) )
+) {
+aSig0 = alternateASig0;
+aSig1 = alternateASig1;
+zSign = ! zSign;
+}
 }
 return
 normalizeRoundAndPackFloatx80(
@@ -5778,6 +5783,28 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 
 }
 
+/*
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b'.  The operation is performed
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+**/
+
+floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+{
+return floatx80_modrem(a, b, false, status);
+}
+
+/*
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b', with the quotient truncated
+| toward zero.
+**/
+
+floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
+{
+return floatx80_modrem(a, b, true, status);
+}
+
 /*
 | Returns the square root of the extended double-precision floating-point
 | value `a'.  The operation is performed according to the IEC/IEEE Standard
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 16ca697a73..bff6934d09 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,6 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80, float_status 
*status);
 floatx8

[PATCH 0/7] softfloat, target/i386: fprem, fprem1 fixes

2020-06-05 Thread Joseph Myers

The x87 floating-point emulation of the fprem and fprem1 instructions
works via conversion to and from double.  This is inherently
unsuitable for a good emulation of any floatx80 operation.  This patch
series adapts the softfloat floatx80_rem implementation to be suitable
for these instructions and uses it to reimplement them.

There is an existing test for these instructions, test-i386-fprem.c,
based on comparison of output.  It produces 1679695 lines of output,
and before this patch series 415422 of those lines are different on
hardware from the output produced by QEMU.  Some of those differences
are because QEMU's x87 emulation does not yet produce the "denormal
operand" exception; ignoring such differences (modifying the output
from a native run not to report that exception), there are still
398833 different lines.  This patch series reduces that latter number
to 1 (that one difference being because of missing checks for
floating-point stack underflow, another global issue with the x87
emulation), or 35517 different lines without the correction for lack
of denormal operand exception support.

Several fixes to and new features in the softfloat support for this
operation are needed; floatx80_mod, previously present in the m68k
code only, is made generic and unified with floatx80_rem in a new
floatx80_modrem of which floatx80_mod and floatx80_rem are thin
wrappers.  The only architectures using float*_rem for other formats
are arm (FPA emulation) and openrisc (instructions that have been
removed in the latest architecture version); they do not appear to
need any of the new features, and all the bugs fixed are specific to
floatx80, so no changes are made to the remainder implementation for
those formats.

A new feature added is returning the low bits of the quotient from
floatx80_modrem, as needed for both x87 and m68k.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch series does
not change that to use the new interface, but the m68k maintainers may
wish to do so.

The Intel instruction set documentation leaves unspecified the exact
number of bits by which the remainder instructions reduce the operand
each time.  The AMD documentation gives a specific formula, which
empirically Intel processors follow as well, and that formula is
implemented in the code.  The AMD documentation also specifies that
flags other than C2 are cleared in the partial remainder case, whereas
the Intel manual is silent on that (but the processors do appear to
clear those flags); this patch implements that flag clearing, and
keeps the existing flag clearing in cases where the instructions raise
"invalid" (although it seems hardware in fact only clears some but not
all flags in that case, leaving other flags unchanged).

The Intel manuals include an inaccurate table asserting that (finite
REM 0) should raise "divide by zero"; actually, in accordance with
IEEE semantics, it raises "invalid".  The AMD manuals inaccurately say
for both fprem and fprem1 that if the exponent difference is negative,
the numerator is returned unchanged, which is correct (apart from
normalizing pseudo-denormals) for fprem but not for fprem1 (and the
old QEMU code had an incorrect optimization following the AMD manuals
for fprem1).

Joseph Myers (7):
  softfloat: merge floatx80_mod and floatx80_rem
  softfloat: fix floatx80 remainder pseudo-denormal check for zero
  softfloat: do not return pseudo-denormal from floatx80 remainder
  softfloat: do not set denominator high bit for floatx80 remainder
  softfloat: return low bits of quotient from floatx80_modrem
  target/i386: reimplement fprem1 using floatx80 operations
  target/i386: reimplement fprem using floatx80 operations

 fpu/softfloat.c  |  83 +
 include/fpu/softfloat.h  |   3 +
 target/i386/fpu_helper.c | 154 ---
 target/m68k/softfloat.c  |  83 -
 target/m68k/softfloat.h  |   1 -
 5 files changed, 116 insertions(+), 208 deletions(-)

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 3/7] softfloat: do not return pseudo-denormal from floatx80 remainder

2020-06-05 Thread Joseph Myers

The floatx80 remainder implementation sometimes returns the numerator
unchanged when the denominator is sufficiently larger than the
numerator.  But if the value to be returned unchanged is a
pseudo-denormal, that is incorrect.  Fix it to normalize the numerator
in that case.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 091847beb9..00f362af23 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5691,7 +5691,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
  float_status *status)
 {
 bool aSign, zSign;
-int32_t aExp, bExp, expDiff;
+int32_t aExp, bExp, expDiff, aExpOrig;
 uint64_t aSig0, aSig1, bSig;
 uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
@@ -5700,7 +5700,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 return floatx80_default_nan(status);
 }
 aSig0 = extractFloatx80Frac( a );
-aExp = extractFloatx80Exp( a );
+aExpOrig = aExp = extractFloatx80Exp( a );
 aSign = extractFloatx80Sign( a );
 bSig = extractFloatx80Frac( b );
 bExp = extractFloatx80Exp( b );
@@ -5715,6 +5715,11 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 if ((uint64_t)(bSig << 1)) {
 return propagateFloatx80NaN(a, b, status);
 }
+if (aExp == 0 && aSig0 >> 63) {
+/* Pseudo-denormal argument must be returned in normalized
+ * form.  */
+return packFloatx80(aSign, 1, aSig0);
+}
 return a;
 }
 if ( bExp == 0 ) {
@@ -5734,7 +5739,14 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 expDiff = aExp - bExp;
 aSig1 = 0;
 if ( expDiff < 0 ) {
-if ( mod || expDiff < -1 ) return a;
+if ( mod || expDiff < -1 ) {
+if (aExp == 1 && aExpOrig == 0) {
+/* Pseudo-denormal argument must be returned in
+ * normalized form.  */
+return packFloatx80(aSign, aExp, aSig0);
+}
+return a;
+}
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 2/7] softfloat: fix floatx80 remainder pseudo-denormal check for zero

2020-06-05 Thread Joseph Myers

The floatx80 remainder implementation ignores the high bit of the
significand when checking whether an operand (numerator) with zero
exponent is zero.  This means it mishandles a pseudo-denormal
representation of 0x1p-16382L by treating it as zero.  Fix this by
checking the whole significand instead.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7b1ce7664f..091847beb9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5726,7 +5726,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 normalizeFloatx80Subnormal( bSig, ,  );
 }
 if ( aExp == 0 ) {
-if ( (uint64_t) ( aSig0<<1 ) == 0 ) return a;
+if ( aSig0 == 0 ) return a;
 normalizeFloatx80Subnormal( aSig0, ,  );
 }
 bSig |= UINT64_C(0x8000);
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH 7/7] target/i386: reimplement fprem using floatx80 operations

2020-06-05 Thread Joseph Myers

The x87 fprem emulation is currently based around conversion to
double, which is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float floatx80
remainder operations.

Signed-off-by: Joseph Myers 
---
 target/i386/fpu_helper.c | 58 +---
 1 file changed, 1 insertion(+), 57 deletions(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index bab35e00a0..d2fc2c1dde 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -989,63 +989,7 @@ void helper_fprem1(CPUX86State *env)
 
 void helper_fprem(CPUX86State *env)
 {
-double st0, st1, dblq, fpsrcop, fptemp;
-CPU_LDoubleU fpsrcop1, fptemp1;
-int expdif;
-signed long long int q;
-
-st0 = floatx80_to_double(env, ST0);
-st1 = floatx80_to_double(env, ST1);
-
-if (isinf(st0) || isnan(st0) || isnan(st1) || (st1 == 0.0)) {
-ST0 = double_to_floatx80(env, 0.0 / 0.0); /* NaN */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-return;
-}
-
-fpsrcop = st0;
-fptemp = st1;
-fpsrcop1.d = ST0;
-fptemp1.d = ST1;
-expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
-
-if (expdif < 0) {
-/* optimisation? taken from the AMD docs */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* ST0 is unchanged */
-return;
-}
-
-if (expdif < 53) {
-dblq = fpsrcop / fptemp; /* ST0 / ST1 */
-/* round dblq towards zero */
-dblq = (dblq < 0.0) ? ceil(dblq) : floor(dblq);
-st0 = fpsrcop - fptemp * dblq; /* fpsrcop is ST0 */
-
-/* convert dblq to q by truncating towards zero */
-if (dblq < 0.0) {
-q = (signed long long int)(-dblq);
-} else {
-q = (signed long long int)dblq;
-}
-
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* (C0,C3,C1) <-- (q2,q1,q0) */
-env->fpus |= (q & 0x4) << (8 - 2);  /* (C0) <-- q2 */
-env->fpus |= (q & 0x2) << (14 - 1); /* (C3) <-- q1 */
-env->fpus |= (q & 0x1) << (9 - 0);  /* (C1) <-- q0 */
-} else {
-int N = 32 + (expdif % 32); /* as per AMD docs */
-
-env->fpus |= 0x400;  /* C2 <-- 1 */
-fptemp = pow(2.0, (double)(expdif - N));
-fpsrcop = (st0 / st1) / fptemp;
-/* fpsrcop = integer obtained by chopping */
-fpsrcop = (fpsrcop < 0.0) ?
-  -(floor(fabs(fpsrcop))) : floor(fpsrcop);
-st0 -= (st1 * fpsrcop * fptemp);
-}
-ST0 = double_to_floatx80(env, st0);
+helper_fprem_common(env, true);
 }
 
 void helper_fyl2xp1(CPUX86State *env)
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 5/6] softfloat: return low bits of quotient from floatx80_modrem

2020-06-08 Thread Joseph Myers

Both x87 and m68k need the low parts of the quotient for their
remainder operations.  Arrange for floatx80_modrem to track those bits
and return them via a pointer.

The architectures using float32_rem and float64_rem do not appear to
need this information, so the *_rem interface is left unchanged and
the information returned only from floatx80_modrem.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch does not
change that, but the m68k maintainers may wish to do so.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 fpu/softfloat.c | 23 ++-
 include/fpu/softfloat.h |  3 ++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1552241b5e..72f45b0103 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5684,10 +5684,11 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, 
float_status *status)
 | `a' with respect to the corresponding value `b'.  The operation is performed
 | according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
 | if 'mod' is false; if 'mod' is true, return the remainder based on truncating
-| the quotient toward zero instead.
+| the quotient toward zero instead.  '*quotient' is set to the low 64 bits of
+| the absolute value of the integer quotient.
 **/
 
-floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod, uint64_t *quotient,
  float_status *status)
 {
 bool aSign, zSign;
@@ -5695,6 +5696,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 uint64_t aSig0, aSig1, bSig;
 uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
+*quotient = 0;
 if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
 float_raise(float_flag_invalid, status);
 return floatx80_default_nan(status);
@@ -5753,7 +5755,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
-q = ( bSig <= aSig0 );
+*quotient = q = ( bSig <= aSig0 );
 if ( q ) aSig0 -= bSig;
 expDiff -= 64;
 while ( 0 < expDiff ) {
@@ -5763,6 +5765,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 sub128( aSig0, aSig1, term0, term1, ,  );
 shortShift128Left( aSig0, aSig1, 62, ,  );
 expDiff -= 62;
+*quotient <<= 62;
+*quotient += q;
 }
 expDiff += 64;
 if ( 0 < expDiff ) {
@@ -5776,6 +5780,12 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 ++q;
 sub128( aSig0, aSig1, term0, term1, ,  );
 }
+if (expDiff < 64) {
+*quotient <<= expDiff;
+} else {
+*quotient = 0;
+}
+*quotient += q;
 }
 else {
 term1 = 0;
@@ -5790,6 +5800,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 aSig0 = alternateASig0;
 aSig1 = alternateASig1;
 zSign = ! zSign;
+++*quotient;
 }
 }
 return
@@ -5806,7 +5817,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 
 floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
 {
-return floatx80_modrem(a, b, false, status);
+uint64_t quotient;
+return floatx80_modrem(a, b, false, , status);
 }
 
 /*
@@ -5817,7 +5829,8 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 
 floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
 {
-return floatx80_modrem(a, b, true, status);
+uint64_t quotient;
+return floatx80_modrem(a, b, true, , status);
 }
 
 /*
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index bff6934d09..ff4e2605b1 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,7 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80, float_status 
*status);
 floatx80 floatx80_sub(floatx80, floatx80, float_status *status);
 floatx80 floatx80_mul(floatx80, floatx80, float_status *status);
 floatx80 floatx80_div(floatx80, floatx80, float_status *status);
-floatx80 floatx80_modrem(floatx80, floatx80, bool, float_status *status);
+floatx80 floatx80_modrem(floatx80, floatx80, bool, uint64_t *,
+ float_status *status);
 floatx80 floatx80_mod(floatx80, floatx80, float_status *status);
 floatx80 floatx80_rem(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sqrt(floatx80, float_status *status);
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 2/6] softfloat: fix floatx80 remainder pseudo-denormal check for zero

2020-06-08 Thread Joseph Myers

The floatx80 remainder implementation ignores the high bit of the
significand when checking whether an operand (numerator) with zero
exponent is zero.  This means it mishandles a pseudo-denormal
representation of 0x1p-16382L by treating it as zero.  Fix this by
checking the whole significand instead.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7b1ce7664f..091847beb9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5726,7 +5726,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 normalizeFloatx80Subnormal( bSig, ,  );
 }
 if ( aExp == 0 ) {
-if ( (uint64_t) ( aSig0<<1 ) == 0 ) return a;
+if ( aSig0 == 0 ) return a;
 normalizeFloatx80Subnormal( aSig0, ,  );
 }
 bSig |= UINT64_C(0x8000);
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes

2020-06-08 Thread Joseph Myers

The x87 floating-point emulation of the fprem and fprem1 instructions
works via conversion to and from double.  This is inherently
unsuitable for a good emulation of any floatx80 operation.  This patch
series adapts the softfloat floatx80_rem implementation to be suitable
for these instructions and uses it to reimplement them.

There is an existing test for these instructions, test-i386-fprem.c,
based on comparison of output.  It produces 1679695 lines of output,
and before this patch series 415422 of those lines are different on
hardware from the output produced by QEMU.  Some of those differences
are because QEMU's x87 emulation does not yet produce the "denormal
operand" exception; ignoring such differences (modifying the output
from a native run not to report that exception), there are still
398833 different lines.  This patch series reduces that latter number
to 1 (that one difference being because of missing checks for
floating-point stack underflow, another global issue with the x87
emulation), or 35517 different lines without the correction for lack
of denormal operand exception support.

Several fixes to and new features in the softfloat support for this
operation are needed; floatx80_mod, previously present in the m68k
code only, is made generic and unified with floatx80_rem in a new
floatx80_modrem of which floatx80_mod and floatx80_rem are thin
wrappers.  The only architectures using float*_rem for other formats
are arm (FPA emulation) and openrisc (instructions that have been
removed in the latest architecture version); they do not appear to
need any of the new features, and all the bugs fixed are specific to
floatx80, so no changes are made to the remainder implementation for
those formats.

A new feature added is returning the low bits of the quotient from
floatx80_modrem, as needed for both x87 and m68k.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch series does
not change that to use the new interface, but the m68k maintainers may
wish to do so.

The Intel instruction set documentation leaves unspecified the exact
number of bits by which the remainder instructions reduce the operand
each time.  The AMD documentation gives a specific formula, which
empirically Intel processors follow as well, and that formula is
implemented in the code.  The AMD documentation also specifies that
flags other than C2 are cleared in the partial remainder case, whereas
the Intel manual is silent on that (but the processors do appear to
clear those flags); this patch implements that flag clearing, and
keeps the existing flag clearing in cases where the instructions raise
"invalid" (although it seems hardware in fact only clears some but not
all flags in that case, leaving other flags unchanged).

The Intel manuals include an inaccurate table asserting that (finite
REM 0) should raise "divide by zero"; actually, in accordance with
IEEE semantics, it raises "invalid".  The AMD manuals inaccurately say
for both fprem and fprem1 that if the exponent difference is negative,
the numerator is returned unchanged, which is correct (apart from
normalizing pseudo-denormals) for fprem but not for fprem1 (and the
old QEMU code had an incorrect optimization following the AMD manuals
for fprem1).

Changes in version 2 of the patch series: fix comment formatting and
combine patches 6 and 7.

Joseph Myers (6):
  softfloat: merge floatx80_mod and floatx80_rem
  softfloat: fix floatx80 remainder pseudo-denormal check for zero
  softfloat: do not return pseudo-denormal from floatx80 remainder
  softfloat: do not set denominator high bit for floatx80 remainder
  softfloat: return low bits of quotient from floatx80_modrem
  target/i386: reimplement fprem, fprem1 using floatx80 operations

 fpu/softfloat.c  |  87 ++
 include/fpu/softfloat.h  |   3 +
 target/i386/fpu_helper.c | 156 ---
 target/m68k/softfloat.c  |  83 -
 target/m68k/softfloat.h  |   1 -
 5 files changed, 122 insertions(+), 208 deletions(-)

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 6/7] target/i386: reimplement fprem1 using floatx80 operations

2020-06-08 Thread Joseph Myers

On Mon, 8 Jun 2020, Alex Bennée wrote:

> > +uint8_t old_flags = save_exception_flags(env);
> 
> Hmm where did this come from:

This series assumes all my other recent x87 fixes (11 such patches in 
three series that aren't yet on master, there's also a single patch for 
pcmpxstrx which is independent of those) are already present.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 6/6] target/i386: reimplement fprem, fprem1 using floatx80 operations

2020-06-08 Thread Joseph Myers

The x87 fprem and fprem1 emulation is currently based around
conversion to double, which is inherently unsuitable for a good
emulation of any floatx80 operation.  Reimplement using the soft-float
floatx80 remainder operations.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 target/i386/fpu_helper.c | 156 ---
 1 file changed, 48 insertions(+), 108 deletions(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..0e531e3821 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -934,124 +934,64 @@ void helper_fxtract(CPUX86State *env)
 merge_exception_flags(env, old_flags);
 }
 
-void helper_fprem1(CPUX86State *env)
+static void helper_fprem_common(CPUX86State *env, bool mod)
 {
-double st0, st1, dblq, fpsrcop, fptemp;
-CPU_LDoubleU fpsrcop1, fptemp1;
-int expdif;
-signed long long int q;
-
-st0 = floatx80_to_double(env, ST0);
-st1 = floatx80_to_double(env, ST1);
-
-if (isinf(st0) || isnan(st0) || isnan(st1) || (st1 == 0.0)) {
-ST0 = double_to_floatx80(env, 0.0 / 0.0); /* NaN */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-return;
-}
-
-fpsrcop = st0;
-fptemp = st1;
-fpsrcop1.d = ST0;
-fptemp1.d = ST1;
-expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
-
-if (expdif < 0) {
-/* optimisation? taken from the AMD docs */
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* ST0 is unchanged */
-return;
-}
+uint8_t old_flags = save_exception_flags(env);
+uint64_t quotient;
+CPU_LDoubleU temp0, temp1;
+int exp0, exp1, expdiff;
 
-if (expdif < 53) {
-dblq = fpsrcop / fptemp;
-/* round dblq towards nearest integer */
-dblq = rint(dblq);
-st0 = fpsrcop - fptemp * dblq;
+temp0.d = ST0;
+temp1.d = ST1;
+exp0 = EXPD(temp0);
+exp1 = EXPD(temp1);
 
-/* convert dblq to q by truncating towards zero */
-if (dblq < 0.0) {
-q = (signed long long int)(-dblq);
+env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
+if (floatx80_is_zero(ST0) || floatx80_is_zero(ST1) ||
+exp0 == 0x7fff || exp1 == 0x7fff ||
+floatx80_invalid_encoding(ST0) || floatx80_invalid_encoding(ST1)) {
+ST0 = floatx80_modrem(ST0, ST1, mod, , >fp_status);
+} else {
+if (exp0 == 0) {
+exp0 = 1 - clz64(temp0.l.lower);
+}
+if (exp1 == 0) {
+exp1 = 1 - clz64(temp1.l.lower);
+}
+expdiff = exp0 - exp1;
+if (expdiff < 64) {
+ST0 = floatx80_modrem(ST0, ST1, mod, , >fp_status);
+env->fpus |= (quotient & 0x4) << (8 - 2);  /* (C0) <-- q2 */
+env->fpus |= (quotient & 0x2) << (14 - 1); /* (C3) <-- q1 */
+env->fpus |= (quotient & 0x1) << (9 - 0);  /* (C1) <-- q0 */
 } else {
-q = (signed long long int)dblq;
+/*
+ * Partial remainder.  This choice of how many bits to
+ * process at once is specified in AMD instruction set
+ * manuals, and empirically is followed by Intel
+ * processors as well; it ensures that the final remainder
+ * operation in a loop does produce the correct low three
+ * bits of the quotient.  AMD manuals specify that the
+ * flags other than C2 are cleared, and empirically Intel
+ * processors clear them as well.
+ */
+int n = 32 + (expdiff % 32);
+temp1.d = floatx80_scalbn(temp1.d, expdiff - n, >fp_status);
+ST0 = floatx80_mod(ST0, temp1.d, >fp_status);
+env->fpus |= 0x400;  /* C2 <-- 1 */
 }
-
-env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <--  */
-/* (C0,C3,C1) <-- (q2,q1,q0) */
-env->fpus |= (q & 0x4) << (8 - 2);  /* (C0) <-- q2 */
-env->fpus |= (q & 0x2) << (14 - 1); /* (C3) <-- q1 */
-env->fpus |= (q & 0x1) << (9 - 0);  /* (C1) <-- q0 */
-} else {
-env->fpus |= 0x400;  /* C2 <-- 1 */
-fptemp = pow(2.0, expdif - 50);
-fpsrcop = (st0 / st1) / fptemp;
-/* fpsrcop = integer obtained by chopping */
-fpsrcop = (fpsrcop < 0.0) ?
-  -(floor(fabs(fpsrcop))) : floor(fpsrcop);
-st0 -= (st1 * fpsrcop * fptemp);
 }
-ST0 = double_to_floatx80(env, st0);
+merge_exception_flags(env, old_flags);
 }
 
-void helper_fprem(CPUX86State *env)
+void helper_fprem1(CPUX86State *env)
 {
-double st0, st1, dblq, fpsrcop, fptemp;
-CPU_LDoubleU fpsrcop1, fptemp1;
-int expdif;
-signed long long int q;
-
-st0 = floatx80_to_double(env, ST0);
-st1 = floatx80_to_double(env, ST1);
-
-if (isinf(st0) || is

[PATCH v2 4/6] softfloat: do not set denominator high bit for floatx80 remainder

2020-06-08 Thread Joseph Myers

The floatx80 remainder implementation unnecessarily sets the high bit
of bSig explicitly.  By that point in the function, arguments that are
invalid, zero, infinity or NaN have already been handled and
subnormals have been through normalizeFloatx80Subnormal, so the high
bit will already be set.  Remove the unnecessary code.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 fpu/softfloat.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9d43868e4c..1552241b5e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5736,7 +5736,6 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 if ( aSig0 == 0 ) return a;
 normalizeFloatx80Subnormal( aSig0, ,  );
 }
-bSig |= UINT64_C(0x8000);
 zSign = aSign;
 expDiff = aExp - bExp;
 aSig1 = 0;
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem

2020-06-08 Thread Joseph Myers

The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer.  This is also useful for emulating the x87 fprem
and fprem1 instructions.  Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.

There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 fpu/softfloat.c | 49 ++--
 include/fpu/softfloat.h |  2 +
 target/m68k/softfloat.c | 83 -
 target/m68k/softfloat.h |  1 -
 4 files changed, 40 insertions(+), 95 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6c8f2d597a..7b1ce7664f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5682,10 +5682,13 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, 
float_status *status)
 /*
 | Returns the remainder of the extended double-precision floating-point value
 | `a' with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
+| if 'mod' is false; if 'mod' is true, return the remainder based on truncating
+| the quotient toward zero instead.
 **/
 
-floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+ float_status *status)
 {
 bool aSign, zSign;
 int32_t aExp, bExp, expDiff;
@@ -5731,7 +5734,7 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 expDiff = aExp - bExp;
 aSig1 = 0;
 if ( expDiff < 0 ) {
-if ( expDiff < -1 ) return a;
+if ( mod || expDiff < -1 ) return a;
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
@@ -5763,14 +5766,16 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 term1 = 0;
 term0 = bSig;
 }
-sub128( term0, term1, aSig0, aSig1, ,  );
-if (lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
- || (eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
-  && ( q & 1 ) )
-   ) {
-aSig0 = alternateASig0;
-aSig1 = alternateASig1;
-zSign = ! zSign;
+if (!mod) {
+sub128( term0, term1, aSig0, aSig1, ,  );
+if (lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
+|| (eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
+&& ( q & 1 ) )
+) {
+aSig0 = alternateASig0;
+aSig1 = alternateASig1;
+zSign = ! zSign;
+}
 }
 return
 normalizeRoundAndPackFloatx80(
@@ -5778,6 +5783,28 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, 
float_status *status)
 
 }
 
+/*
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b'.  The operation is performed
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+**/
+
+floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+{
+return floatx80_modrem(a, b, false, status);
+}
+
+/*
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b', with the quotient truncated
+| toward zero.
+**/
+
+floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
+{
+return floatx80_modrem(a, b, true, status);
+}
+
 /*
 | Returns the square root of the extended double-precision floating-point
 | value `a'.  The operation is performed according to the IEC/IEEE Standard
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 16ca697a73..bff6934d09 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,6 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80,

[PATCH v2 3/6] softfloat: do not return pseudo-denormal from floatx80 remainder

2020-06-08 Thread Joseph Myers

The floatx80 remainder implementation sometimes returns the numerator
unchanged when the denominator is sufficiently larger than the
numerator.  But if the value to be returned unchanged is a
pseudo-denormal, that is incorrect.  Fix it to normalize the numerator
in that case.

Signed-off-by: Joseph Myers 
Reviewed-by: Richard Henderson 
---
 fpu/softfloat.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 091847beb9..9d43868e4c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5691,7 +5691,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
  float_status *status)
 {
 bool aSign, zSign;
-int32_t aExp, bExp, expDiff;
+int32_t aExp, bExp, expDiff, aExpOrig;
 uint64_t aSig0, aSig1, bSig;
 uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
@@ -5700,7 +5700,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 return floatx80_default_nan(status);
 }
 aSig0 = extractFloatx80Frac( a );
-aExp = extractFloatx80Exp( a );
+aExpOrig = aExp = extractFloatx80Exp( a );
 aSign = extractFloatx80Sign( a );
 bSig = extractFloatx80Frac( b );
 bExp = extractFloatx80Exp( b );
@@ -5715,6 +5715,13 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 if ((uint64_t)(bSig << 1)) {
 return propagateFloatx80NaN(a, b, status);
 }
+if (aExp == 0 && aSig0 >> 63) {
+/*
+ * Pseudo-denormal argument must be returned in normalized
+ * form.
+ */
+return packFloatx80(aSign, 1, aSig0);
+}
 return a;
 }
 if ( bExp == 0 ) {
@@ -5734,7 +5741,16 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool 
mod,
 expDiff = aExp - bExp;
 aSig1 = 0;
 if ( expDiff < 0 ) {
-if ( mod || expDiff < -1 ) return a;
+if ( mod || expDiff < -1 ) {
+if (aExp == 1 && aExpOrig == 0) {
+/*
+ * Pseudo-denormal argument must be returned in
+ * normalized form.
+ */
+return packFloatx80(aSign, aExp, aSig0);
+}
+return a;
+}
 shift128Right( aSig0, 0, 1, ,  );
 expDiff = 0;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] target/i386: reimplement f2xm1 using floatx80 operations

2020-06-11 Thread Joseph Myers

The x87 f2xm1 emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors.  I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead.  It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs.  The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding).  I haven't
investigated how accurate hardware is.

Signed-off-by: Joseph Myers 

---

This patch depends on at least some of my previous x87 emulation fixes
being present (probably only the ones in the recent pull request; I
don't think it depends on any of the most recent series fixing fprem
and fprem1).  I expect to make similar fixes for fyl2xp1, fyl2x and
fpatan.  (The other transcendental instructions (fcos, fptan, fsin,
fsincos) may follow, but as a lower priority, as I'm aiming at getting
reasonable glibc test results under QEMU and those trigonometric
instructions - with their documented semantics that they are defined
to do range reduction using a specific 66-bit approximation to pi -
aren't used in glibc.)

checkpatch.pl has its usual false-positive complaints about hex
floating-point constants in the testcase.  It also complains about
lines over 80 columns in that test; I can reformat the test if
desired, but it's not clear line length matters for such a randomly
generated table of test inputs and expected results.

---
 target/i386/fpu_helper.c |  385 +-
 tests/tcg/i386/test-i386-f2xm1.c | 1140 ++
 2 files changed, 1522 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-f2xm1.c

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 0e531e3821..8f34ea9776 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -25,6 +25,7 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 #include "fpu/softfloat.h"
+#include "fpu/softfloat-macros.h"
 
 #ifdef CONFIG_SOFTMMU
 #include "hw/irq.h"
@@ -836,12 +837,390 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
 merge_exception_flags(env, old_flags);
 }
 
+/* 128-bit significand of log(2).  */
+#define ln2_sig_high 0xb17217f7d1cf79abULL
+#define ln2_sig_low 0xc9e3b39803f2f6afULL
+
+/*
+ * Polynomial coefficients for an approximation to (2^x - 1) / x, on
+ * the interval [-1/64, 1/64].
+ */
+#define f2xm1_coeff_0 make_floatx80(0x3ffe, 0xb17217f7d1cf79acULL)
+#define f2xm1_coeff_0_low make_floatx80(0xbfbc, 0xd87edabf495b3762ULL)
+#define f2xm1_coeff_1 make_floatx80(0x3ffc, 0xf5fdeffc162c7543ULL)
+#define f2xm1_coeff_2 make_floatx80(0x3ffa, 0xe35846b82505fcc7ULL)
+#define f2xm1_coeff_3 make_floatx80(0x3ff8, 0x9d955b7dd273b899ULL)
+#define f2xm1_coeff_4 make_floatx80(0x3ff5, 0xaec3ff3c4ef4ac0cULL)
+#define f2xm1_coeff_5 make_floatx80(0x3ff2, 0xa184897c3a7f0de9ULL)
+#define f2xm1_coeff_6 make_floatx80(0x3fee, 0xffe634d0ec30d504ULL)
+#define f2xm1_coeff_7 make_floatx80(0x3feb, 0xb160111d2db515e4ULL)
+
+struct f2xm1_data {
+/*
+ * A value very close to a multiple of 1/32, such that 2^t and 2^t - 1
+ * are very close to exact floatx80 values.
+ */
+floatx80 t;
+/* The value of 2^t.  */
+floatx80 exp2;
+/* The value of 2^t - 1.  */
+floatx80 exp2m1;
+};
+
+static const struct f2xm1_data f2xm1_table[65] = {
+{ make_floatx80(0xbfff, 0x8000ULL),
+  make_floatx80(0x3ffe, 0x8000ULL),
+  make_floatx80(0xbffe, 0x8000ULL) },
+{ make_floatx80(0xbffe, 0xf8002e7eULL),
+  make_floatx80(0x3ffe, 0x82cd8698ac2b9160ULL),
+  make_floatx80(0xbffd, 0xfa64f2cea7a8dd40ULL) },
+{ make_floatx80(0xbffe, 0xefffe960ULL),
+  make_floatx80(0x3ffe, 0x85aac367cc488345ULL),

[PATCH v2] target/i386: correct fix for pcmpxstrx substring search

2020-06-12 Thread Joseph Myers

This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.

That commit fixed a bug that showed up in four GCC tests with one libc
implementation.  The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that.  Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.

When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way.  Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.".  However, that statement is incorrect.

In my previous commit message, I noted:

  The operation in this case is a search for a string (argument d to
  the helper) in another string (argument s to the helper); if a copy
  of d at a particular position would run off the end of s, the
  resulting output bit should be 0 whether or not the strings match in
  the region where they overlap, but the QEMU implementation was
  wrongly comparing only up to the point where s ends and counting it
  as a match if an initial segment of d matched a terminal segment of
  s.  Here, "run off the end of s" means that some byte of d would
  overlap some byte outside of s; thus, if d has zero length, it is
  considered to match everywhere, including after the end of s.

The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand.  So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand).  Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.

Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.

In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).

Signed-off-by: Joseph Myers 

---

Version 2: remove stray string constant from test that caused compiler
warning; adjust target for which QEMU_OPTS is set, which will
hopefully cause it to be effective for the test.
---
 target/i386/ops_sse.h|  4 ++--
 tests/tcg/i386/Makefile.target   |  3 +++
 tests/tcg/i386/test-i386-pcmpistri.c | 33 
 3 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-pcmpistri.c

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 4658768de2..c46fc592dc 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2076,10 +2076,10 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg 
*d, Reg *s,
 res = (2 << upper) - 1;
 break;
 }
-for (j = valids - validd; j >= 0; j--) {
+for (j = valids == upper ? valids : valids - validd; j >= 0; j--) {
 res <<= 1;
 v = 1;
-for (i = validd; i >= 0; i--) {
+for (i = MIN(valids - j, validd); i >= 0; i--) {
 v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
 }
 res |= v;
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 43ee2e181e..53efec0668 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -10,6 +10,9 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
 SKIP_I386_TESTS=test-i386-ssse3
 X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
 
+test-i386-pcmpistri: CFLAGS +

Re: Deprecation/removal of nios2 target support

2024-04-18 Thread Joseph Myers

On Wed, 17 Apr 2024, Sandra Loosemore wrote:

> Therefore I'd like to mark Nios II as obsolete in GCC 14 now, and remove
> support from all toolchain components after the release is made.  I'm not sure
> there is an established process for obsoleting/removing support in other
> components; besides binutils, GDB, and GLIBC, there's QEMU, newlib/libgloss,
> and the Linux kernel.  But, we need to get the ball rolling somewhere.

CC:ing Arnd Bergmann regarding the obsolescence in the Linux kernel.

-- 
Joseph S. Myers
josmy...@redhat.com

79 matches

Mail list logo