[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Bill Schmidt changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #20 from Bill Schmidt --- So, closing.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Known to work||7.4.0 Resolution|--- |FIXED Known to fail|7.4.0 | --- Comment #19 from Bill Schmidt --- Fixed everywhere.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #18 from Bill Schmidt --- Author: wschmidt Date: Mon Oct 7 20:50:05 2019 New Revision: 276678 URL: https://gcc.gnu.org/viewcvs?rev=276678=gcc=rev Log: [gcc] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * config/rs6000/rs6000.c (rtx_is_swappable_p): Don't swap vpmsumd. [gcc/testsuite] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * gcc.target/powerpc/pr91275.c: New. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/rs6000/rs6000.c branches/gcc-7-branch/gcc/testsuite/ChangeLog
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #17 from Bill Schmidt --- Author: wschmidt Date: Mon Oct 7 19:34:41 2019 New Revision: 276669 URL: https://gcc.gnu.org/viewcvs?rev=276669=gcc=rev Log: [gcc] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap vpmsumd. [gcc/testsuite] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * gcc.target/powerpc/pr91275.c: New. Added: branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/rs6000/rs6000-p8swap.c branches/gcc-8-branch/gcc/testsuite/ChangeLog
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #16 from Bill Schmidt --- Author: wschmidt Date: Mon Oct 7 18:23:20 2019 New Revision: 276667 URL: https://gcc.gnu.org/viewcvs?rev=276667=gcc=rev Log: [gcc] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap vpmsumd. [gcc/testsuite] 2019-10-07 Bill Schmidt Backport from mainline 2019-10-01 Bill Schmidt PR target/91275 * gcc.target/powerpc/pr91275.c: New. Added: branches/gcc-9-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/config/rs6000/rs6000-p8swap.c branches/gcc-9-branch/gcc/testsuite/ChangeLog
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #15 from Bill Schmidt --- Excellent! I'm glad you have a workaround for the time being.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #14 from Lauri Kasanen --- Inline asm works on the buggy versions. Thanks.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Bill Schmidt changed: What|Removed |Added Known to work||10.0 Known to fail||7.4.0, 8.3.0, 9.1.0 --- Comment #13 from Bill Schmidt --- Planning backports for 7, 8, 9.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #12 from Bill Schmidt --- Committed to trunk with 276410. Forgot to annotate the ChangeLog with the PR, so here it is: [gcc] 2019-10-01 Bill Schmidt * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap vpmsumd. [gcc/testsuite] 2019-10-01 Bill Schmidt * gcc.target/powerpc/pr91275.c: New.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #11 from Bill Schmidt --- I tested #pragma target -mno-optimize-swaps, and it doesn't help. The only options that can be specified with #pragma target for Power are listed here: https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Function-Attributes.html#PowerPC-Function-Attributes
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #10 from Bill Schmidt --- I don't *believe* that "#pragma target" works with -mno-optimize-swaps, but you could try it. I think that mechanism only works for certain flags, but I haven't tried that one. I think inline asm should kill swap optimization for just the computation involving the vpmsumd. (If it doesn't, that's another bug. ;-)
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #9 from Lauri Kasanen --- Can -mno-optimize-swaps be used per-function, in the code via some pragma? Alternatively, does calling the instruction via inline asm prevent the swapping?
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #8 from Segher Boessenkool --- (In reply to Lauri Kasanen from comment #7) > Are you sure about the smaller ones? To me they should not care about 64-bit > swaps, "swappable" here means you can swap the low and high half on all inputs and on the output, and that changes nothing. This is true for all smaller vpmsumX. > but clang lists them all as lane-sensitive: > https://llvm.org/doxygen/PPCVSXSwapRemoval_8cpp_source.html If you think LLVM has a bug, take that up with the LLVM people? > Also, I'd like to ask if any other instructions added at the same time were > affected, just in case; the test suite did not catch this. We don't know about any. The thing with vpmsumd is the mnemonic sounds like it works on 64-bit data, but that isn't true for its output, which is double width. > Will the patch be backported to stable branches? Yes, after a week or so, to all affected still open release branches. That looks to be GCC 7, 8, and 9. > What would you recommend as > a workaround for existing gcc versions? -mno-optimize-swaps (an undocumented option) should work. It will reduce performance somewhat, and it can go away or change behaviour at any time (that's why it is undocumented, users should not use it); but it will work as a workaround for you.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #7 from Lauri Kasanen --- Thanks for working on this. Per my experience, this is fast for gcc ;) Are you sure about the smaller ones? To me they should not care about 64-bit swaps, but clang lists them all as lane-sensitive: https://llvm.org/doxygen/PPCVSXSwapRemoval_8cpp_source.html Also, I'd like to ask if any other instructions added at the same time were affected, just in case; the test suite did not catch this. Will the patch be backported to stable branches? What would you recommend as a workaround for existing gcc versions?
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #6 from Bill Schmidt --- Thanks for the report! Sorry for taking so long to look at this. Swap optimization treats vpmsumd as swappable, but unlike vpmsumb, vpmsumh, and vpmsumw, it is not. Testing a patch.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Bill Schmidt changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wschmidt at gcc dot gnu.org --- Comment #5 from Bill Schmidt --- I'll have a look.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Segher Boessenkool changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #4 from Segher Boessenkool --- It is caused by the "swaps" (p8swap) pass. Before this pass we have: 5: load r122 6: r123 = swap r122 7: r121 = swap r123 8: load r125 9: r126 = swap r125 10: r124 = swap r126 11: r117 = vpmsumd r121, r124 12: r127 = vec_select r117, 1 # this is the high dword, 0 in hardware 13: r128 = vec_select r117, 0 # this is the low dword, 1 in hardware 14: load r129 15: r5 = r127 16: r4 = r128 17: r3 = r129 18: call printf The swaps pass replaces 7 and 10 by plain moves.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-09-07 Ever confirmed|0 |1 --- Comment #3 from Segher Boessenkool --- Does not fail with -mcpu=power9.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 Segher Boessenkool changed: What|Removed |Added Target|ppc64le |powerpc*-*-* CC||segher at gcc dot gnu.org Host|ppc64le | --- Comment #2 from Segher Boessenkool --- Confirmed.
[Bug target/91275] __builtin_crypto_vpmsumd gives different results -O[123] vs -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 --- Comment #1 from Lauri Kasanen --- clang 7.0.0 outputs the expected values, aka the gcc -O0 ones, at all optimization levels. (it calls the builtin __builtin_altivec_crypto_vpmsumd, but no other changes)