https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85181

            Bug ID: 85181
           Summary: Loading wrong source/dest registers for xviexpdp
                    instruction with -O2 optimization
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: carll at gcc dot gnu.org
  Target Milestone: ---

Created attachment 43830
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43830&action=edit
test case for xviexpdp instruction

With -O2 optimization, we load the data into the wrong VR registers.  With -O0
we load and use the same VSR registers.  The results of the test case is wrong
with -O2 optimization but the correct expected result was obtained with -O0.

The compiler used is GCC 8 mainline Revision: 258857

The attached test code was extracted from the Valgrind test_isa_3_0.c. 

When the test code was compiled on 4/2/2018 on a Power 9 system with the
mainline GCC 8.0, Revision: 258857 with optimization -O0 I got the code:

gcc -g -O0 -o test_xviexpdp  test_xviexpdp.c
objdump -S -d test_xviexpdp > test_xviexpdp.dump

I get the following generated code.

    100005d4:   50 81 22 39     addi    r9,r2,-32432                            
    100005d8:   98 4e 00 7c     lxvd2x  vs0,0,r9                                
    100005dc:   50 02 80 f1     xxswapd vs12,vs0                                
    100005e0:   00 00 00 60     nop                                             
    100005e4:   60 81 22 39     addi    r9,r2,-32416                            
    100005e8:   98 4e 00 7c     lxvd2x  vs0,0,r9                                
    100005ec:   50 02 60 f1     xxswapd vs11,vs0                                
    100005f0:   00 00 00 60     nop                                             
    100005f4:   80 81 22 39     addi    r9,r2,-32384                            
    100005f8:   98 4e 00 7c     lxvd2x  vs0,0,r9                                
    100005fc:   50 02 00 f0     xxswapd vs0,vs0                                 
    10000600:   c0 5f 0c f0     xviexpdp vs0,vs12,vs11

The results of running the code was correct. The data was loaded into vs12 and
vs11 and these are the registers used in the xviexpdp instruction. 

I verified in gdb that vs12 and vs11 have the expected values and vs0
is correct after the instruction.

With -O2 we have

    100004f8:   ce e8 1f 7c     lvx     v0,r31,r29                              
    100004fc:   ce f8 a0 7d     lvx     v13,0,r31                               
    10000500:   ce f0 3f 7c     lvx     v1,r31,r30                              
    10000504:   c0 0f 0d f0     xviexpdp vs0,vs13,vs1

Note, we are loading v13 and v1, then use vs13 and vs1.  I verified
that v13 and v1 have the correct values.  So, we appear to be loading
the wrong register set.  Not sure why the optimized code loads the wrong
registers.  I don't see any issues with the inline assembly.

Note, same compiler compiled the next day, on 4/3/2018, on the same machine
with -O0 I get slightly different unoptimized code that doesn't work.  What
changed overnight I don't know but it is really annoying I can't exactly
reproduce things.

    100005d0:   00 00 00 60     nop                             
    100005d4:   50 81 22 39     addi    r9,r2,-32432            
    100005d8:   98 4e 00 7c     lxvd2x  vs0,0,r9                
    100005dc:   51 02 20 f0     xxswapd vs33,vs0        Now using vs33 not vs12 
    100005e0:   00 00 00 60     nop                             
    100005e4:   60 81 22 39     addi    r9,r2,-32416            
    100005e8:   98 4e 00 7c     lxvd2x  vs0,0,r9                
    100005ec:   51 02 a0 f1     xxswapd vs45,vs0        Now using vs45 not vs11 
    100005f0:   00 00 00 60     nop                             
    100005f4:   80 81 22 39     addi    r9,r2,-32384            
    100005f8:   98 4e 00 7c     lxvd2x  vs0,0,r9                
    100005fc:   50 02 00 f0     xxswapd vs0,vs0                 
    10000600:   91 04 00 f0     xxlor   vs32,vs0,vs0    Now using vs32 not vs0  
    10000604:   c0 6f 01 f0     xviexpdp vs0,vs1,vs13 

Note, we load VS register indexes that are 32 more then what is used in the
xviexpdp instruction.  I get all zeros for the result today, that is not the
correct result.  Don't know why things are compiling differently today, but the
test case result is wrong.

The test does generate the same wrong code when compiled with -O2 as it did
yesterday.  

NOTE, this may not be the only instruction where this issue is occurring. 
Still investigating.

Reply via email to