Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
19.08.2019. 08.30, "David Gibson" је написао/ла: > > On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote: > > 18.08.2019. 10.10, "Richard Henderson" је > > написао/ла: > > > > > > On 8/16/19 11:59 PM, Aleksandar Markovic wrote: > > > >> From: "Paul A. Clarke" > > > ... > > > >> ISA 3.0B has xscvdpspn leaving its result in word 1 of the target > > > > register, > > > >> and mffprwz expecting its input to come from word 0 of the source > > > > register. > > > >> This sequence fails with QEMU, as a shift is required between those > > two > > > >> instructions. However, the hardware splats the result to both word 0 > > > > and > > > >> word 1 of its output register, so the shift is not necessary. > > > >> Expect a future revision of the ISA to specify this behavior. > > > >> > > > > > > > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), > > given > > > > everything you said here? > > > > > > The key here is "expect a future revision of the ISA to specify this > > behavior". > > > > > > It's clearly within IBM's purview to adjust the specification to document > > a > > > previously undocumented hardware feature. > > > > > > > By no means, yes, the key is in ISA documentation. But, the impression that > > full original commit message conveys is that the main reason for change is > > gcc behavior. If we accepted in general that gcc behavior determines QEMU > > behavior, I am afraid we would be on a very slippery slope - therefore I > > think the commit message (and possible code comment) should, in my opinion, > > mention ISA docs as the central reason for change. Paul, is there any > > tentative release date of the new ISA specification? > > It's not really a question of gcc behaviour, it's a question of actual > cpu behaviour versus ISA document. Which one qemu should follow is > somewhat debatable, but it sounds here like the ISA will be updated to > match the cpu, which weights it heavily in favour of mimicing the > actual cpu. > This sounds right to me. Aleksandar > -- > David Gibson| I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
On Sun, Aug 18, 2019 at 10:59:01PM +0200, Aleksandar Markovic wrote: > 18.08.2019. 10.10, "Richard Henderson" је > написао/ла: > > > > On 8/16/19 11:59 PM, Aleksandar Markovic wrote: > > >> From: "Paul A. Clarke" > > ... > > >> ISA 3.0B has xscvdpspn leaving its result in word 1 of the target > > > register, > > >> and mffprwz expecting its input to come from word 0 of the source > > > register. > > >> This sequence fails with QEMU, as a shift is required between those > two > > >> instructions. However, the hardware splats the result to both word 0 > > > and > > >> word 1 of its output register, so the shift is not necessary. > > >> Expect a future revision of the ISA to specify this behavior. > > >> > > > > > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), > given > > > everything you said here? > > > > The key here is "expect a future revision of the ISA to specify this > behavior". > > > > It's clearly within IBM's purview to adjust the specification to document > a > > previously undocumented hardware feature. > > > > By no means, yes, the key is in ISA documentation. But, the impression that > full original commit message conveys is that the main reason for change is > gcc behavior. If we accepted in general that gcc behavior determines QEMU > behavior, I am afraid we would be on a very slippery slope - therefore I > think the commit message (and possible code comment) should, in my opinion, > mention ISA docs as the central reason for change. Paul, is there any > tentative release date of the new ISA specification? It's not really a question of gcc behaviour, it's a question of actual cpu behaviour versus ISA document. Which one qemu should follow is somewhat debatable, but it sounds here like the ISA will be updated to match the cpu, which weights it heavily in favour of mimicing the actual cpu. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
On Fri, Aug 16, 2019 at 02:27:49PM -0500, Paul A. Clarke wrote: > From: "Paul A. Clarke" > > - target/ppc/fpu_helper.c: > - helper_todouble() was not properly converting INFINITY from 32 bit > float to 64 bit double. > - helper_todouble() was not properly converting any denormalized > 32 bit float to 64 bit double. > > - GCC, as of version 8 or so, takes advantage of the hardware's > implementation of the xscvdpspn instruction to optimize the following > sequence: > xscvdpspn vs0,vs1 > mffprwz r8,f0 > ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register, > and mffprwz expecting its input to come from word 0 of the source register. > This sequence fails with QEMU, as a shift is required between those two > instructions. However, the hardware splats the result to both word 0 and > word 1 of its output register, so the shift is not necessary. > Expect a future revision of the ISA to specify this behavior. As well as addressing Richard's comments, I'd prefer to see each of the 3 fixes here in separate patches. The code changes may be tiny, but I find the fpu emulation code cryptic at the best of times, and making it clear which change is addressing which bug would help. > Signed-off-by: Paul A. Clarke > --- > target/ppc/fpu_helper.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c > index 5611cf0..82b5425 100644 > --- a/target/ppc/fpu_helper.c > +++ b/target/ppc/fpu_helper.c > @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg) > ret = (uint64_t)extract32(arg, 30, 2) << 62; > ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59; > ret |= (uint64_t)extract32(arg, 0, 30) << 29; > +ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52; > } else { > /* Zero or Denormalized operand. */ > ret = (uint64_t)extract32(arg, 31, 1) << 63; > if (unlikely(abs_arg != 0)) { > /* Denormalized operand. */ > int shift = clz32(abs_arg) - 9; > -int exp = -126 - shift + 1023; > +int exp = -127 - shift + 1023; > ret |= (uint64_t)exp << 52; > ret |= abs_arg << (shift + 29); > } > @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t > opcode, > > uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) > { > +uint64_t result; > + > float_status tstat = env->fp_status; > set_float_exception_flags(0, ); > > -return (uint64_t)float64_to_float32(xb, ) << 32; > +result = (uint64_t)float64_to_float32(xb, ); > +/* hardware replicates result to both words of the doubleword result. */ > +return (result << 32) | result; > } > > uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb) -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
18.08.2019. 10.10, "Richard Henderson" је написао/ла: > > On 8/16/19 11:59 PM, Aleksandar Markovic wrote: > >> From: "Paul A. Clarke" > ... > >> ISA 3.0B has xscvdpspn leaving its result in word 1 of the target > > register, > >> and mffprwz expecting its input to come from word 0 of the source > > register. > >> This sequence fails with QEMU, as a shift is required between those two > >> instructions. However, the hardware splats the result to both word 0 > > and > >> word 1 of its output register, so the shift is not necessary. > >> Expect a future revision of the ISA to specify this behavior. > >> > > > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given > > everything you said here? > > The key here is "expect a future revision of the ISA to specify this behavior". > > It's clearly within IBM's purview to adjust the specification to document a > previously undocumented hardware feature. > By no means, yes, the key is in ISA documentation. But, the impression that full original commit message conveys is that the main reason for change is gcc behavior. If we accepted in general that gcc behavior determines QEMU behavior, I am afraid we would be on a very slippery slope - therefore I think the commit message (and possible code comment) should, in my opinion, mention ISA docs as the central reason for change. Paul, is there any tentative release date of the new ISA specification? Aleksandar > > r~
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
On 8/16/19 11:59 PM, Aleksandar Markovic wrote: >> From: "Paul A. Clarke" ... >> ISA 3.0B has xscvdpspn leaving its result in word 1 of the target > register, >> and mffprwz expecting its input to come from word 0 of the source > register. >> This sequence fails with QEMU, as a shift is required between those two >> instructions. However, the hardware splats the result to both word 0 > and >> word 1 of its output register, so the shift is not necessary. >> Expect a future revision of the ISA to specify this behavior. >> > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given > everything you said here? The key here is "expect a future revision of the ISA to specify this behavior". It's clearly within IBM's purview to adjust the specification to document a previously undocumented hardware feature. r~
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
On 8/16/19 8:27 PM, Paul A. Clarke wrote: > From: "Paul A. Clarke" > > - target/ppc/fpu_helper.c: > - helper_todouble() was not properly converting INFINITY from 32 bit > float to 64 bit double. > - helper_todouble() was not properly converting any denormalized > 32 bit float to 64 bit double. These two would seem to be my fault: Fixes: 86c0cab11aa (target/ppc: Use non-arithmetic conversions for fp load/store) > diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c > index 5611cf0..82b5425 100644 > --- a/target/ppc/fpu_helper.c > +++ b/target/ppc/fpu_helper.c > @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg) > ret = (uint64_t)extract32(arg, 30, 2) << 62; > ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59; > ret |= (uint64_t)extract32(arg, 0, 30) << 29; > +ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52; Since the overwrites bits set two lines above, I think this would be better as if (likely(abs_arg >= 0x0080)) { /* Normalized operand, or Inf or NaN. */ if (unlikely(extract32(arg, 23, 8) == 0xff)) { /* Inf or NaN. */ ... } else { /* Normalized operand. */ > } else { > /* Zero or Denormalized operand. */ > ret = (uint64_t)extract32(arg, 31, 1) << 63; > if (unlikely(abs_arg != 0)) { > /* Denormalized operand. */ > int shift = clz32(abs_arg) - 9; > -int exp = -126 - shift + 1023; > +int exp = -127 - shift + 1023; > ret |= (uint64_t)exp << 52; > ret |= abs_arg << (shift + 29); Hmm. The manual says -126, but it also shifts the fraction by a different amount, such that the implicit bit is 1. What I also don't see here is where the most significant bit is removed from the fraction, a-la "FRT[5:63] = frac[1:52]" in the manual. The soft-float code from which this was probably copied did this by shifting the fraction such that the msb overlaps the exponent, biasing the exponent by -1, and then using an add instead of an or to simultaneously remove the bias, swallow the msb, and include the lower bits unchanged. So it looks like this should be /* * Shift fraction so that the msb is in the implicit bit position. * Thus shift is in the range [1:23]. */ int shift = clz32(abs_arg) - 8; /* * The first 3 terms compute the float64 exponent. We then bias * this result by -1 so that we can swallow the implicit bit below. */ int exp = -126 - shift + 1023 - 1; ret |= (uint64_t)exp << 52; ret += (uint64_t)abs_arg << (52 - 23 + shift); Hmm. I see another bug in the existing code whereby abs_arg is not cast before shifting. This truncation is probably how you're seeing correct results with your patch, for denormals containing only one bit set, e.g. FLT_TRUE_MIN. It would be good to test other denormals, like 0x0005. > @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t > opcode, > > uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) > { > +uint64_t result; > + > float_status tstat = env->fp_status; > set_float_exception_flags(0, ); > > -return (uint64_t)float64_to_float32(xb, ) << 32; > +result = (uint64_t)float64_to_float32(xb, ); > +/* hardware replicates result to both words of the doubleword result. */ > +return (result << 32) | result; > } This definitely should be a separate patch. The comment should include some language about this behaviour being required by a future ISA revision. r~
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
17.08.2019. 00.59, "Aleksandar Markovic" је написао/ла: > > > 16.08.2019. 21.28, "Paul A. Clarke" је написао/ла: > > > > From: "Paul A. Clarke" > > > > - target/ppc/fpu_helper.c: > > - helper_todouble() was not properly converting INFINITY from 32 bit > > float to 64 bit double. > > - helper_todouble() was not properly converting any denormalized > > 32 bit float to 64 bit double. > > > > - GCC, as of version 8 or so, takes advantage of the hardware's > > implementation of the xscvdpspn instruction to optimize the following > > sequence: > > xscvdpspn vs0,vs1 > > mffprwz r8,f0 > > ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register, > > and mffprwz expecting its input to come from word 0 of the source register. > > This sequence fails with QEMU, as a shift is required between those two > > instructions. However, the hardware splats the result to both word 0 and > > word 1 of its output register, so the shift is not necessary. > > Expect a future revision of the ISA to specify this behavior. > > > > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given everything you said here? > Paul, you are touching here a very sensitive area. If I were you, I would most likely propose a similar patch, given thr info you presented. However, in general, QEMU is a compiler-agnostic tool and should not depend on compiler features, let alone fix its bugs, and therefore I think you should more clearly spell out in your commit message that the code segment in question is a workaround for an outright GCC bug, and just a kind of "necessary evil" in given situation. Let us also wait for David possibly coming up with the final verdict wrt this issue. Also, this patch should be split into two or three (infinity conversions have nothing to do with xscvdpspn) - a patch deals with only one logical unit. Yours, Aleksandar > Sincerely, > Aleksandar > > > Signed-off-by: Paul A. Clarke > > --- > > target/ppc/fpu_helper.c | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c > > index 5611cf0..82b5425 100644 > > --- a/target/ppc/fpu_helper.c > > +++ b/target/ppc/fpu_helper.c > > @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg) > > ret = (uint64_t)extract32(arg, 30, 2) << 62; > > ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59; > > ret |= (uint64_t)extract32(arg, 0, 30) << 29; > > +ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52; > > } else { > > /* Zero or Denormalized operand. */ > > ret = (uint64_t)extract32(arg, 31, 1) << 63; > > if (unlikely(abs_arg != 0)) { > > /* Denormalized operand. */ > > int shift = clz32(abs_arg) - 9; > > -int exp = -126 - shift + 1023; > > +int exp = -127 - shift + 1023; > > ret |= (uint64_t)exp << 52; > > ret |= abs_arg << (shift + 29); > > } > > @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode, > > > > uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) > > { > > +uint64_t result; > > + > > float_status tstat = env->fp_status; > > set_float_exception_flags(0, ); > > > > -return (uint64_t)float64_to_float32(xb, ) << 32; > > +result = (uint64_t)float64_to_float32(xb, ); > > +/* hardware replicates result to both words of the doubleword result. */ > > +return (result << 32) | result; > > } > > > > uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb) > > -- > > 1.8.3.1 > > > >
Re: [Qemu-devel] [PATCH] ppc: Three floating point fixes
16.08.2019. 21.28, "Paul A. Clarke" је написао/ла: > > From: "Paul A. Clarke" > > - target/ppc/fpu_helper.c: > - helper_todouble() was not properly converting INFINITY from 32 bit > float to 64 bit double. > - helper_todouble() was not properly converting any denormalized > 32 bit float to 64 bit double. > > - GCC, as of version 8 or so, takes advantage of the hardware's > implementation of the xscvdpspn instruction to optimize the following > sequence: > xscvdpspn vs0,vs1 > mffprwz r8,f0 > ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register, > and mffprwz expecting its input to come from word 0 of the source register. > This sequence fails with QEMU, as a shift is required between those two > instructions. However, the hardware splats the result to both word 0 and > word 1 of its output register, so the shift is not necessary. > Expect a future revision of the ISA to specify this behavior. > Hmmm... Isn't this a gcc bug (using undocumented hardware feature), given everything you said here? Sincerely, Aleksandar > Signed-off-by: Paul A. Clarke > --- > target/ppc/fpu_helper.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c > index 5611cf0..82b5425 100644 > --- a/target/ppc/fpu_helper.c > +++ b/target/ppc/fpu_helper.c > @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg) > ret = (uint64_t)extract32(arg, 30, 2) << 62; > ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59; > ret |= (uint64_t)extract32(arg, 0, 30) << 29; > +ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52; > } else { > /* Zero or Denormalized operand. */ > ret = (uint64_t)extract32(arg, 31, 1) << 63; > if (unlikely(abs_arg != 0)) { > /* Denormalized operand. */ > int shift = clz32(abs_arg) - 9; > -int exp = -126 - shift + 1023; > +int exp = -127 - shift + 1023; > ret |= (uint64_t)exp << 52; > ret |= abs_arg << (shift + 29); > } > @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode, > > uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) > { > +uint64_t result; > + > float_status tstat = env->fp_status; > set_float_exception_flags(0, ); > > -return (uint64_t)float64_to_float32(xb, ) << 32; > +result = (uint64_t)float64_to_float32(xb, ); > +/* hardware replicates result to both words of the doubleword result. */ > +return (result << 32) | result; > } > > uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb) > -- > 1.8.3.1 > >
[Qemu-devel] [PATCH] ppc: Three floating point fixes
From: "Paul A. Clarke" - target/ppc/fpu_helper.c: - helper_todouble() was not properly converting INFINITY from 32 bit float to 64 bit double. - helper_todouble() was not properly converting any denormalized 32 bit float to 64 bit double. - GCC, as of version 8 or so, takes advantage of the hardware's implementation of the xscvdpspn instruction to optimize the following sequence: xscvdpspn vs0,vs1 mffprwz r8,f0 ISA 3.0B has xscvdpspn leaving its result in word 1 of the target register, and mffprwz expecting its input to come from word 0 of the source register. This sequence fails with QEMU, as a shift is required between those two instructions. However, the hardware splats the result to both word 0 and word 1 of its output register, so the shift is not necessary. Expect a future revision of the ISA to specify this behavior. Signed-off-by: Paul A. Clarke --- target/ppc/fpu_helper.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c index 5611cf0..82b5425 100644 --- a/target/ppc/fpu_helper.c +++ b/target/ppc/fpu_helper.c @@ -62,13 +62,14 @@ uint64_t helper_todouble(uint32_t arg) ret = (uint64_t)extract32(arg, 30, 2) << 62; ret |= ((extract32(arg, 30, 1) ^ 1) * (uint64_t)7) << 59; ret |= (uint64_t)extract32(arg, 0, 30) << 29; +ret |= (0x7ffULL * (extract32(arg, 23, 8) == 0xff)) << 52; } else { /* Zero or Denormalized operand. */ ret = (uint64_t)extract32(arg, 31, 1) << 63; if (unlikely(abs_arg != 0)) { /* Denormalized operand. */ int shift = clz32(abs_arg) - 9; -int exp = -126 - shift + 1023; +int exp = -127 - shift + 1023; ret |= (uint64_t)exp << 52; ret |= abs_arg << (shift + 29); } @@ -2871,10 +2872,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode, uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) { +uint64_t result; + float_status tstat = env->fp_status; set_float_exception_flags(0, ); -return (uint64_t)float64_to_float32(xb, ) << 32; +result = (uint64_t)float64_to_float32(xb, ); +/* hardware replicates result to both words of the doubleword result. */ +return (result << 32) | result; } uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb) -- 1.8.3.1