[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-07 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Bill Schmidt  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #20 from Bill Schmidt  ---
So, closing.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-07 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||7.4.0
 Resolution|--- |FIXED
  Known to fail|7.4.0   |

--- Comment #19 from Bill Schmidt  ---
Fixed everywhere.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-07 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #18 from Bill Schmidt  ---
Author: wschmidt
Date: Mon Oct  7 20:50:05 2019
New Revision: 276678

URL: https://gcc.gnu.org/viewcvs?rev=276678=gcc=rev
Log:
[gcc]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* config/rs6000/rs6000.c (rtx_is_swappable_p): Don't swap
vpmsumd.

[gcc/testsuite]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* gcc.target/powerpc/pr91275.c: New.


Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/rs6000/rs6000.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-07 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #17 from Bill Schmidt  ---
Author: wschmidt
Date: Mon Oct  7 19:34:41 2019
New Revision: 276669

URL: https://gcc.gnu.org/viewcvs?rev=276669=gcc=rev
Log:
[gcc]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap
vpmsumd.

[gcc/testsuite]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* gcc.target/powerpc/pr91275.c: New.


Added:
branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/rs6000/rs6000-p8swap.c
branches/gcc-8-branch/gcc/testsuite/ChangeLog

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-07 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #16 from Bill Schmidt  ---
Author: wschmidt
Date: Mon Oct  7 18:23:20 2019
New Revision: 276667

URL: https://gcc.gnu.org/viewcvs?rev=276667=gcc=rev
Log:
[gcc]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap
vpmsumd.

[gcc/testsuite]

2019-10-07  Bill Schmidt  

Backport from mainline
2019-10-01  Bill Schmidt  

PR target/91275
* gcc.target/powerpc/pr91275.c: New.


Added:
branches/gcc-9-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/config/rs6000/rs6000-p8swap.c
branches/gcc-9-branch/gcc/testsuite/ChangeLog

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #15 from Bill Schmidt  ---
Excellent!  I'm glad you have a workaround for the time being.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread cand at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #14 from Lauri Kasanen  ---
Inline asm works on the buggy versions. Thanks.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Bill Schmidt  changed:

   What|Removed |Added

  Known to work||10.0
  Known to fail||7.4.0, 8.3.0, 9.1.0

--- Comment #13 from Bill Schmidt  ---
Planning backports for 7, 8, 9.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #12 from Bill Schmidt  ---
Committed to trunk with 276410.  Forgot to annotate the ChangeLog with the PR,
so here it is:

[gcc]

2019-10-01  Bill Schmidt  

* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap
vpmsumd.

[gcc/testsuite]

2019-10-01  Bill Schmidt  

* gcc.target/powerpc/pr91275.c: New.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #11 from Bill Schmidt  ---
I tested #pragma target -mno-optimize-swaps, and it doesn't help.  The only
options that can be specified with #pragma target for Power are listed here: 
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Function-Attributes.html#PowerPC-Function-Attributes

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #10 from Bill Schmidt  ---
I don't *believe* that "#pragma target" works with -mno-optimize-swaps, but you
could try it.  I think that mechanism only works for certain flags, but I
haven't tried that one. 

I think inline asm should kill swap optimization for just the computation
involving the vpmsumd.  (If it doesn't, that's another bug. ;-)

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread cand at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #9 from Lauri Kasanen  ---
Can -mno-optimize-swaps be used per-function, in the code via some pragma?
Alternatively, does calling the instruction via inline asm prevent the
swapping?

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #8 from Segher Boessenkool  ---
(In reply to Lauri Kasanen from comment #7)
> Are you sure about the smaller ones? To me they should not care about 64-bit
> swaps,

"swappable" here means you can swap the low and high half on all inputs and
on the output, and that changes nothing.  This is true for all smaller vpmsumX.

> but clang lists them all as lane-sensitive:
> https://llvm.org/doxygen/PPCVSXSwapRemoval_8cpp_source.html

If you think LLVM has a bug, take that up with the LLVM people?

> Also, I'd like to ask if any other instructions added at the same time were
> affected, just in case; the test suite did not catch this.

We don't know about any.  The thing with vpmsumd is the mnemonic sounds like
it works on 64-bit data, but that isn't true for its output, which is double
width.

> Will the patch be backported to stable branches?

Yes, after a week or so, to all affected still open release branches.  That
looks to be GCC 7, 8, and 9.

> What would you recommend as
> a workaround for existing gcc versions?

-mno-optimize-swaps (an undocumented option) should work.  It will reduce
performance somewhat, and it can go away or change behaviour at any time
(that's why it is undocumented, users should not use it); but it will work
as a workaround for you.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-10-01 Thread cand at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #7 from Lauri Kasanen  ---
Thanks for working on this. Per my experience, this is fast for gcc ;)

Are you sure about the smaller ones? To me they should not care about 64-bit
swaps, but clang lists them all as lane-sensitive:
https://llvm.org/doxygen/PPCVSXSwapRemoval_8cpp_source.html

Also, I'd like to ask if any other instructions added at the same time were
affected, just in case; the test suite did not catch this.

Will the patch be backported to stable branches? What would you recommend as a
workaround for existing gcc versions?

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-09-30 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #6 from Bill Schmidt  ---
Thanks for the report!  Sorry for taking so long to look at this.  Swap
optimization treats vpmsumd as swappable, but unlike vpmsumb, vpmsumh, and
vpmsumw, it is not.  Testing a patch.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-09-10 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Bill Schmidt  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wschmidt at gcc dot 
gnu.org

--- Comment #5 from Bill Schmidt  ---
I'll have a look.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-09-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Segher Boessenkool  changed:

   What|Removed |Added

 CC||wschmidt at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
It is caused by the "swaps" (p8swap) pass.  Before this pass we have:

5:  load r122
6:  r123 = swap r122
7:  r121 = swap r123
8:  load r125
9:  r126 = swap r125
10: r124 = swap r126
11: r117 = vpmsumd r121, r124
12: r127 = vec_select r117, 1  # this is the high dword, 0 in hardware
13: r128 = vec_select r117, 0  # this is the low dword, 1 in hardware
14: load r129
15: r5 = r127
16: r4 = r128
17: r3 = r129
18: call printf

The swaps pass replaces 7 and 10 by plain moves.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-09-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-09-07
 Ever confirmed|0   |1

--- Comment #3 from Segher Boessenkool  ---
Does not fail with -mcpu=power9.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-09-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

Segher Boessenkool  changed:

   What|Removed |Added

 Target|ppc64le |powerpc*-*-*
 CC||segher at gcc dot gnu.org
   Host|ppc64le |

--- Comment #2 from Segher Boessenkool  ---
Confirmed.

[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0

2019-07-31 Thread cand at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275

--- Comment #1 from Lauri Kasanen  ---
clang 7.0.0 outputs the expected values, aka the gcc -O0 ones, at all
optimization levels. (it calls the builtin __builtin_altivec_crypto_vpmsumd,
but no other changes)