[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-28 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-28 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REPORTED|RESOLVED

--- Comment #13 from Carl Love  ---
The VEX and testsuite patch were committed. 

commit d2cbb78a151256290d490fcb7a805884d6406a7e
Author: Carl Love 
Date:   Tue May 28 11:33:00 2019 -0500

PPC64, Subnormal testcase changes

VEX patch fixed issues with generating subnormal results.

This patch adds a specific test case and updates the expected values for
the existing test case.

Update jm-vmx tests, add subnormal test case.

https://bugs.kde.org/show_bug.cgi?id=406256

commit 991db2a39bcbdbf5cdb4337684c29f96c63070a8
Author: Carl Love 
Date:   Tue May 28 11:26:13 2019 -0500

PPC64, fix issues with dnormal values in the vector fp instructions.

The result of the floating point instructions vmaddfp, vnmsubfp,
vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp,
vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in
the VSCR register.  If VSCR[NJ] = 0; then denormalized values are
handled as specified by Java and the IEEE standard.  If the bit is
a 1, then the denormalized element in the vector is replaced with
a zero.

Valgrind was not properly handling the denormalized case for these
instructions.  This patch fixes the issue.

https://bugs.kde.org/show_bug.cgi?id=406256

Closing

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-27 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #12 from Julian Seward  ---
> Updated patch to fix issues with dnormal values v5 (27.44 KB, patch)
> 2019-05-15 21:27 UTC, Carl Love   Details
> Update test case, add new test (1.81 MB, patch)
> 2019-05-15 21:28 UTC, Carl Love 

Carl, these look fine to land now.  Thank you for your patience with this.
I am particularly pleased that the supporting functions (is_NaN, etc) all
got vectorised.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-27 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=406256

Julian Seward  changed:

   What|Removed |Added

 Attachment #119920|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-27 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=406256

Julian Seward  changed:

   What|Removed |Added

 Attachment #119940|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-15 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #11 from Carl Love  ---
Julian:


> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho
> w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RFEmMkZAk
> --
> _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ
> Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e=
> 
> --- Comment #8 from Julian Seward  ---
> Thanks for the respin.  I have mostly only minor comments about
> it.  Is OK to
> land provided all the comments below are addressed, except for the
> one about
> vectorising negateVF32, which would be nice to fix if you can do so
> relatively
> easily, but is not essential.
> 
> Also, when landing, please split the patch into two parts: the
> implementation
> and the tests, and land the implementation first.

Done.  I had all the changes in one patch to make it easier to move
files around to 5 different machines for testing.

> 
> 
> --- a/VEX/priv/guest_ppc_toIR.c
> +++ b/VEX/priv/guest_ppc_toIR.c
> 
> is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised
> nicely.  Is
> it also possible to do negateVF32 with vectors, rather lane
> by lane as at
> present?  Note, for a vector version of is_NaN, you can see more or
> less how
> to do it by looking at isNan() in host_ppc_isel.c.
> 
Renamed negateVF32(value) to negate_Vector( size, value) to make the
naming more consistent.  Also, structured it to generalize easily to
more vector sizes.  Currently just supporting vector of F32.  This
change requires creating the new function is_NaN_Vector(size, value) as
you eluded to above.  Again, structured the function to easily extend
to more vector sizes.

> 
> +static IRExpr* dnormV32_adj ( IRExpr* src )
> 
> nit: maybe rename this to be more consistent with your other vector-
> helper
> function names (is_Zero_Vector etc)
> 

Yea, consistency is a good thing.  Renamed dnormV32_adj() to
dnorm_adj_Vector().  Note did not restructure the function to easily
extend to other vector sizes.  Left that for the future.
> 
> +   assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128,
> + unop( Iop_1Sto64,
> +   mkexpr( VSCR_NJ ) ) ,
> + unop( Iop_1Sto64,
> +   mkexpr( VSCR_NJ ) ) ) );
> 
> nit: VSCR_NJ isn't used past this point.  Change its type to Ity_I64
> and lift
> the 1Sto64 operation into that definition, so it isn't duplicated
> here.
> 

Yea, that would be better.  Fixed.

> 
> --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S
> +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S
> 
>  LafterFP2:
> +   /* set host Vector Status Control Register bit NJ to zero
> +  to ensure the host generate subnormal results for the
> +  vector floating point instructions. */
> +mfvscr  16  /* Clear NJ bit */
> +vspltisw 9,0x1   /* 4x 0x0001 */
> +vspltisw 8,0x0   /* zero */
> +vsldoi   9,8,9,0x2   /* <<2*8 => 4x 0x0001
> */
> +vnor 9,9,9   /* 4x 0xFFFE */
> +   vand 16,16,9 /* Mask out NJ bit */
> +mtvscr  16
> 
> (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff
> in
> this file.  Otherwise this will fail when run on a non-Altivec
> enabled target
> (do we still support any of those)?  And the same for the other to
> assembly
> files.


> (2) (As a check) is the above sequence runnable even on the lowest
> level
> Altivec subset?  Otherwise (again) it will fail at run time.
> 
> (3) I wasn't entirely clear what the changes to the post-run
> invariant checks
> are (after label "postamble:").  IIUC, they already do check that
> VSCR[NJ].host == 0, but the comments are wrong, and you've updated
> the
> comments, but not the code?  Can you clarify/re-check?

When I went in to fix up the code per you comments, I noticed that the
code I added to set VSCR[NJ] = 0 is not doing anything. I had missed
the code a bit lower:

   /* set host AltiVec control word to the default mode expected   
   by VEX-generated code. */
   ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2)   
   ld 6,0(6)
cmpldi  6,0 
beq .LafterVMX2 

vspltisw 3,0x0  /* generate zero */ 
mtvscr  3    

which is forcing the entire VSCR register to zero.  The instruction
vtvscr moves the contents of the specified register to VSCR.  This is
actually what I had originally been looking for when I was trying to
figure out why Valgrind always generated subnormal results but natively
I wasn't getting subnormal results on BE.  This code t

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-15 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #10 from Carl Love  ---
Created attachment 120090
  --> https://bugs.kde.org/attachment.cgi?id=120090&action=edit
Update test case, add new test

The test case patch that goes with the subnormal changes in VEX.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-15 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #9 from Carl Love  ---
Created attachment 120089
  --> https://bugs.kde.org/attachment.cgi?id=120089&action=edit
Updated patch to fix issues with dnormal values v5

Updated the patch per latest comments from Julian.

Split patch into VEX patch and test case patch.

Renamed negateVF32(value) to negate_Vector( size, value), vectorized.

Created new function  is_NaN_Vector(size, value)

Renamed dnormV32_adj() to dnorm_adj_Vector().

Lifted VSCR_NJ, made it Ity_I64.

The various assembly routines:  Removed my new code to set VSCR[NJ]=0 as it is
redundant given that there is existing code that is already clearing the
register.  Updated the comments in the code to make it clear where VSCR[NJ] is
set to 0.  Updated comments with regard to the invarent check to make it clear
what is being checked and what to do based on the check.  There are now no
functional changes to the assembly functions as they already ensure the
VSCR[NJ] but is set to 0.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-14 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #8 from Julian Seward  ---
Thanks for the respin.  I have mostly only minor comments about it.  Is OK to
land provided all the comments below are addressed, except for the one about
vectorising negateVF32, which would be nice to fix if you can do so relatively
easily, but is not essential.

Also, when landing, please split the patch into two parts: the implementation
and the tests, and land the implementation first.


--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c

is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised nicely.  Is
it also possible to do negateVF32 with vectors, rather lane by lane as at
present?  Note, for a vector version of is_NaN, you can see more or less how
to do it by looking at isNan() in host_ppc_isel.c.


+static IRExpr* dnormV32_adj ( IRExpr* src )

nit: maybe rename this to be more consistent with your other vector-helper
function names (is_Zero_Vector etc)


+   assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128,
+ unop( Iop_1Sto64,
+   mkexpr( VSCR_NJ ) ) ,
+ unop( Iop_1Sto64,
+   mkexpr( VSCR_NJ ) ) ) );

nit: VSCR_NJ isn't used past this point.  Change its type to Ity_I64 and lift
the 1Sto64 operation into that definition, so it isn't duplicated here.


--- a/coregrind/m_dispatch/dispatch-ppc32-linux.S
+++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S

 LafterFP2:
+   /* set host Vector Status Control Register bit NJ to zero
+  to ensure the host generate subnormal results for the
+  vector floating point instructions. */
+mfvscr  16  /* Clear NJ bit */
+vspltisw 9,0x1   /* 4x 0x0001 */
+vspltisw 8,0x0   /* zero */
+vsldoi   9,8,9,0x2   /* <<2*8 => 4x 0x0001 */
+vnor 9,9,9   /* 4x 0xFFFE */
+   vand 16,16,9 /* Mask out NJ bit */
+mtvscr  16

(1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff in
this file.  Otherwise this will fail when run on a non-Altivec enabled target
(do we still support any of those)?  And the same for the other to assembly
files.

(2) (As a check) is the above sequence runnable even on the lowest level
Altivec subset?  Otherwise (again) it will fail at run time.

(3) I wasn't entirely clear what the changes to the post-run invariant checks
are (after label "postamble:").  IIUC, they already do check that
VSCR[NJ].host == 0, but the comments are wrong, and you've updated the
comments, but not the code?  Can you clarify/re-check?


diff --git a/memcheck/tests/ppc32/vgcore.9687
b/memcheck/tests/ppc32/vgcore.9687
new file mode 100644

This should definitely not be in the patch!

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-09 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #7 from Carl Love  ---
Created attachment 119940
  --> https://bugs.kde.org/attachment.cgi?id=119940&action=edit
Updated patch to fix issues with dnormal values v5

Updated patch after finding issues on Power 7.  The new assembly code was
re-written to remove the mtvsrd and mfvsrd so the code will run.  The specific
files are:

 coregrind/m_dispatch/dispatch-ppc32-linux.S  
 coregrind/m_dispatch/dispatch-ppc64be-linux.S  
 coregrind/m_dispatch/dispatch-ppc64le-linux.S 

The updated patch has been tested on Power 6, Power 7, Power 8LE, Power 8BE and
Power 9.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-08 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #6 from Carl Love  ---
Additional testing shows that the  mtvsrd  and mfvsrd instructions which are
used in the assembly interface functions are not supported on P7 and earlier. 
Will need to rework current patch.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-08 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 Attachment #119918|0   |1
is obsolete||

--- Comment #5 from Carl Love  ---
Created attachment 119920
  --> https://bugs.kde.org/attachment.cgi?id=119920&action=edit
Updated patch to fix issues with dnormal values v4

Testing on P7 found that I needed to make sure the system supports altivec or
the subnormal_test will fail on illegal inst.  Updated the
subnormal_test.vgtest file

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-05-08 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 Attachment #119504|0   |1
is obsolete||

--- Comment #4 from Carl Love  ---
Created attachment 119918
  --> https://bugs.kde.org/attachment.cgi?id=119918&action=edit
Updated patch to fix issues with dnormal values v3

Updated the patch.  Moved the code to set VSCR[NJ] to the assembly routines for
ppc64le, ppc64be, ppc32.  Tested on P8 LE, P8 BE, P9.  Manually verified the
assembly code will jump to .invariant_violation if the VSCR[NJ] bit is set to 1
by replacing the "mfvscr 7" instruction with some instructions that sets
contents of register v7 to 0x01 which is what the value would be if the NJ
bit is set.

The routine dnormV32_adj() was rewritten to use vector Iops.  Fixed a couple of
bugs that I found in retesting the patch.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-04-19 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 Attachment #119258|0   |1
is obsolete||

--- Comment #3 from Carl Love  ---
Created attachment 119504
  --> https://bugs.kde.org/attachment.cgi?id=119504&action=edit
Updated patch to fix issues with dnormal values

Updated patch, needs review by Julian

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-04-08 Thread Will Schmidt
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #2 from Will Schmidt  ---
I have no input into a different better spot for getRRegUniverse_PPC changes. 

A few other patch comments: 

+++ b/Makefile.all.am
the "-mvsx -maltivec" combo can be shrunk to just "-mvsx" .  (mvsx is a
superset that includes maltivec).

+   /* LE API requires NJ be set to 0. */
A few spots.. that API reference should prob be ABI. 

+  vrm[0] = ~(1 << (127 - 111)) & vrm[0];  // Clear NJ bit LE case
+  vrm[1] = ~(1 << (127 - 111)) & vrm[1];  // Clear NJ bit BE case
Preferably wrap this in some logic so we are not writing to reserved bits of
the vscr in the non-LE or non-BE cases. 


+static IRExpr* dnorm_adj ( IRExpr* value )
just wondering, should this be instead named dnorm_adj_int or some variation to
avoid conflict if we need to do anything similar with Longs or Shorts in the
future.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-04-05 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

Carl Love  changed:

   What|Removed |Added

 CC||will_schm...@vnet.ibm.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

2019-04-05 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=406256

--- Comment #1 from Carl Love  ---
Created attachment 119258
  --> https://bugs.kde.org/attachment.cgi?id=119258&action=edit
Proposed fix for making the subnormal results track the VSCR[NJ] bit

The attachment is a proposed fix for the issue.  The Iops to implement the
various vector floating point instructions map to a subset of the vector
floating point instructions.  The underlying host hardware needs to run with
the VSCR[NJ] bit set to zero to generate the subnormal results.  The guest
state needs to then adjust the arguments/results of the vector floating point
instructions as needed based on the guest setting of VSCR[NJ].  Hence the need
to setup the host with VSCR[NJ] = 0.  This was done in
VEX/priv/host_ppc_defs.c, function getRRegUniverse_PPC ().  The function is
actually setting up the guest register state.  The function is called once as
part of the initialization process.  That is the right time to set the host
configuration.  Ideally there would be a host initialization function that
would be called for doing this but I don't see one.  Given that there is not
host specific function, I had to put the code into the guest setup function.  I
don't consider this to be the ideal place for the code.  I would be interested
in ideas on a more appropriate location for this code. ***

The host setup code requires the addition of the -mvsx and -maltivec command
line options to be set for Power 7.  These options are on by default when
compiling for Power 8, 9.  Hence the Makefile.all.am change.

Finally, the current regression tests do not seem to cover the subnormal cases
well enough.  I created an explicit subnormal test which runs with the VSCR[NJ]
bit set to zero and one.

Again, would appreciate feedback on where best to do the host initialization.

-- 
You are receiving this mail because:
You are watching all bug changes.