https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85181
Bug ID: 85181 Summary: Loading wrong source/dest registers for xviexpdp instruction with -O2 optimization Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: carll at gcc dot gnu.org Target Milestone: --- Created attachment 43830 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43830&action=edit test case for xviexpdp instruction With -O2 optimization, we load the data into the wrong VR registers. With -O0 we load and use the same VSR registers. The results of the test case is wrong with -O2 optimization but the correct expected result was obtained with -O0. The compiler used is GCC 8 mainline Revision: 258857 The attached test code was extracted from the Valgrind test_isa_3_0.c. When the test code was compiled on 4/2/2018 on a Power 9 system with the mainline GCC 8.0, Revision: 258857 with optimization -O0 I got the code: gcc -g -O0 -o test_xviexpdp test_xviexpdp.c objdump -S -d test_xviexpdp > test_xviexpdp.dump I get the following generated code. 100005d4: 50 81 22 39 addi r9,r2,-32432 100005d8: 98 4e 00 7c lxvd2x vs0,0,r9 100005dc: 50 02 80 f1 xxswapd vs12,vs0 100005e0: 00 00 00 60 nop 100005e4: 60 81 22 39 addi r9,r2,-32416 100005e8: 98 4e 00 7c lxvd2x vs0,0,r9 100005ec: 50 02 60 f1 xxswapd vs11,vs0 100005f0: 00 00 00 60 nop 100005f4: 80 81 22 39 addi r9,r2,-32384 100005f8: 98 4e 00 7c lxvd2x vs0,0,r9 100005fc: 50 02 00 f0 xxswapd vs0,vs0 10000600: c0 5f 0c f0 xviexpdp vs0,vs12,vs11 The results of running the code was correct. The data was loaded into vs12 and vs11 and these are the registers used in the xviexpdp instruction. I verified in gdb that vs12 and vs11 have the expected values and vs0 is correct after the instruction. With -O2 we have 100004f8: ce e8 1f 7c lvx v0,r31,r29 100004fc: ce f8 a0 7d lvx v13,0,r31 10000500: ce f0 3f 7c lvx v1,r31,r30 10000504: c0 0f 0d f0 xviexpdp vs0,vs13,vs1 Note, we are loading v13 and v1, then use vs13 and vs1. I verified that v13 and v1 have the correct values. So, we appear to be loading the wrong register set. Not sure why the optimized code loads the wrong registers. I don't see any issues with the inline assembly. Note, same compiler compiled the next day, on 4/3/2018, on the same machine with -O0 I get slightly different unoptimized code that doesn't work. What changed overnight I don't know but it is really annoying I can't exactly reproduce things. 100005d0: 00 00 00 60 nop 100005d4: 50 81 22 39 addi r9,r2,-32432 100005d8: 98 4e 00 7c lxvd2x vs0,0,r9 100005dc: 51 02 20 f0 xxswapd vs33,vs0 Now using vs33 not vs12 100005e0: 00 00 00 60 nop 100005e4: 60 81 22 39 addi r9,r2,-32416 100005e8: 98 4e 00 7c lxvd2x vs0,0,r9 100005ec: 51 02 a0 f1 xxswapd vs45,vs0 Now using vs45 not vs11 100005f0: 00 00 00 60 nop 100005f4: 80 81 22 39 addi r9,r2,-32384 100005f8: 98 4e 00 7c lxvd2x vs0,0,r9 100005fc: 50 02 00 f0 xxswapd vs0,vs0 10000600: 91 04 00 f0 xxlor vs32,vs0,vs0 Now using vs32 not vs0 10000604: c0 6f 01 f0 xviexpdp vs0,vs1,vs13 Note, we load VS register indexes that are 32 more then what is used in the xviexpdp instruction. I get all zeros for the result today, that is not the correct result. Don't know why things are compiling differently today, but the test case result is wrong. The test does generate the same wrong code when compiled with -O2 as it did yesterday. NOTE, this may not be the only instruction where this issue is occurring. Still investigating.