[Bug testsuite/101528] [11 regression] gcc.target/powerpc/int_128bit-runnable.c fails after r11-8743

2023-05-22 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101528

--- Comment #7 from Carl Love  ---
I recently committed a patch to fix the counts.

commit 5d336ae49528fde3904c9e5bfc83a450429b2961
Author: Carl Love 
Date:   Fri Mar 10 18:16:52 2023 -0500

rs6000: Fix test int_128bit-runnable.c instruction counts

The test reports two failures on Power 10LE:

FAIL: .../int_128bit-runnable.c scan-assembler-times mvdivsqM 1
FAIL: .../int_128bit-runnable.c scan-assembler-times mvextsd2qM 6

The current counts are :

  vdivsq   3
  vextsd2q 4


I tested mainline with the head at commit

commit 90685c365794e9afabc6cdc7eae7892ba5d2be3d (HEAD -> master, origin/trunk,
origin/master, origin/HEAD)
Author: Aldy Hernandez 
Date:   Mon May 22 20:15:19 2023 +0200

Implement some miscellaneous zero accessors for Value_Range.
...

Test gcc.target/powerpc/int_128bit-runnable.c is currently passing without any
regression failures.

I believe this bugzilla is ready to close.

[Bug testsuite/101528] [11 regression] gcc.target/powerpc/int_128bit-runnable.c fails after r11-8743

2023-05-19 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101528

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #6 from Carl Love  ---
I will look into this and see if the instruction counts have changed for some
reason.

[Bug c/108996] New: Proposal for adding DWARF call site information got GCC with -O0

2023-03-02 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108996

Bug ID: 108996
   Summary: Proposal for adding DWARF call site information got
GCC with -O0
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cel at us dot ibm.com
  Target Milestone: ---

On PowerPC, the address of the buffer to return non-trivial values such as
structures is passed in register r3.  The value of r3 on entry may change in
the body of the caller.  Thus the contents of r3 can not be used by GDB at the
function exit to access the address of the buffer.  

GDB needs to have the value of r3 on entry to the function to be able to print
the return value of the function when the function exits.  GDB is able to get
the value of r3 from the caller at the time of the function call if the needed
DWARF information is available.  Currently the only way to get the needed DWARF
information is to compile with -fvar-tracking.  The option actually saves lots
of additional information which may negatively impact the size and speed of the
binary when compiled with -O0.  We have not done any investigation to determine
the exact amounts but is based on a best guess.  

GDB doesn't need all the information provided by -fvar-tracking, actually a
small subset of the information.

Currently GDB on PowerPC will attempt to determine the value of r3 on entry. 
If the needed DWARF information is not available, GDB will print the message:

"Cannot resolve DW_OP_entry_value for the return value.   Try compiling
with -fvar-tracking. “

The following is an example of how gdb is unable to print the return value.  It
is a stripped down version of the gdb testsuite test
gdb/testsuite/gdb.cp/non-trivial-retval.cc.

   class B
   {
   public:
 B () {}
 B (const B );

 int b;
   };

   B::B(const B )
   {
 b = obj.b;
   }

   B
   f2 (int i1, int i2)
   {
 B b;

 b.b = i1 + i2;

 return b;
   }

   int
   main (void)
   {
 int i1 = 23;
 int i2 = 100;

 B b = f2 (i1, i2);
 return 0;
   }


   # compile the program, without -fvar-tracking
   gcc -g -o non-trivial-retval   non-trivial-retval.cc

   # Run GDB


   gdb ./non-trivial-retval
   ...
   gdb) break main
   Breakpoint 1 at 0x1744: file non-trivial-retval.cc, line 28.
   (gdb) r
   Starting program: /home/carll/GDB/binutils-gdb-
   current/gdb/testsuite/gdb.cp/non-trivial-retval 
   [Thread debugging using libthread_db
   enabled]   
   Using host libthread_db library "/lib64/libthread_db.so.1".

   Breakpoint 1, main () at non-trivial-retval.cc:28
   28 int i1 = 23;
   (gdb) n
   29 int i2 = 100;
   (gdb) n
   31 B b = f2 (i1, i2);
   (gdb) s
   f2 (i1=23, i2=100) at non-trivial-retval.cc:18
   18 B b;
   (gdb) finish
   warning: Cannot determine the function return value.   << Message to
user 
   Try compiling with -fvar-tracking. << Message to
user  
   Run till exit from #0  f2 (i1=23, i2=100) at non-trivial-
   retval.cc:18
   main () at non-trivial-retval.cc:32
   32 return 0;
   Value returned has type: B. Cannot determine contents   << GDB can not
determine return value
   (gdb) 


   When we compile with -fvar-tracking we can print the return value.

   # Compile with -fvar-tracking
gcc -g -O0 -fvar-tracking -o non-trivial-retval   non-trivial-retval.cc

   # Run GDB

   gdb ./non-trivial-retval
   (gdb) break main
   Breakpoint 1 at 0x1730: file non-trivial-retval.cc, line 27.
   (gdb) r
   Starting program: /home/carll/GDB/binutils-gdb-
   current/gdb/testsuite/gdb.cp/non-trivial-retval 
   [Thread debugging using libthread_db
   enabled]   
   Using host libthread_db library "/lib64/libthread_db.so.1".

   Breakpoint 1, 0x1730 in main () at non-trivial-
   retval.cc:27
   27   {
   (gdb) s
   28 int i1 = 23;
   (gdb) s
   29 int i2 = 100;
   (gdb) s
   31 B b = f2 (i1, i2);
   (gdb) s
   0x16b4 in f2 (i1=i1@entry=23, i2=i2@entry=100)
   at non-trivial-retval.cc:17
   17   {
   (gdb) finish
   Run till exit from #0  0x16b4 in f2 (i1=i1@entry=23, 
   i2=i2@entry=100) at non-trivial-retval.cc:17
   main () at non-trivial-retval.cc:32
   32 return 0;
   Value returned is $1 = {b = 123}<< GDB can print the return
value

   Looking at the dwarf information, we need to compile with -g -O2 -fvar-
   tracking to get the info we need:

   gcc -g -O0 -fvar-tracking -o non-trivial-retval   non-trivial-retval.cc

   dwarfdump non-trivial-retval > non-trivial-retval.dwarf

   ...
DW_AT_GNU_locviews0x0039
   < 2><0x00d6> DW_TA

[Bug target/101022] rs6000: __builtin_altivec_vcmpequt expands to wrong pattern

2021-06-10 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101022

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #3 from Carl Love  ---
I will work on the fix.

[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn)

2021-01-22 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092

--- Comment #2 from Carl Love  ---
Segher:

Yup, I saw the buzilla.  Will take a look at it. 

  Carl 

On Fri, 2021-01-22 at 18:49 +, segher at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092 
> 
> Segher Boessenkool  changed:
> 
>What|Removed |Added
> ---
> -
>Assignee|unassigned at gcc dot gnu.org  |segher at gcc
> dot gnu.org
>

[Bug target/93449] PPC: Missing conversion builtin from vector to _Decimal128 and vice versa

2020-11-02 Thread cel at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #10 from Carl Love  ---
Patch approved and committed.

commit 05161256d3d2a598966ca1cf676fa0e427570f73 (HEAD -> master, origin/master,
origin/HEAD)
Author: Carl Love 
Date:   Mon Aug 31 16:12:31 2020 -0500

Add bcd builtings listed in appendix B of the ABI

2020-10-29  Carl Love  

gcc/
PR target/93449
* config/rs6000/altivec.h (__builtin_bcdadd, __builtin_bcdadd_lt,
__builtin_bcdadd_eq, __builtin_bcdadd_gt, __builtin_bcdadd_ofl,
__builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt,
__builtin_bcdsub_eq, __builtin_bcdsub_gt, __builtin_bcdsub_ofl,
__builtin_bcdsub_ov, __builtin_bcdinvalid, __builtin_bcdmul10,
__builtin_bcddiv10, __builtin_bcd2dfp, __builtin_bcdcmpeq,
__builtin_bcdcmpgt, __builtin_bcdcmplt, __builtin_bcdcmpge,
__builtin_bcdcmple): Add defines.
* config/rs6000/altivec.md: Add UNSPEC_BCDSHIFT.
(BCD_TEST): Add le, ge to code iterator.
Add VBCD mode iterator.
(bcd_test, *bcd_test2,
bcd_, bcd_): Add mode to
name.
Change iterator from V1TI to VBCD.
(*bcdinvalid_, bcdshift_v16qi): New define_insn.
(bcdinvalid_, bcdmul10_v16qi, bcddiv10_v16qi): New define.
* config/rs6000/dfp.md (dfp_denbcd_v16qi_inst): New define_insn.
(dfp_denbcd_v16qi): New define_expand.
* config/rs6000/rs6000-builtin.def (BU_P8V_MISC_1): New define.
(BCDADD): Replaced with BCDADD_V1TI and BCDADD_V16QI.
(BCDADD_LT): Replaced with BCDADD_LT_V1TI and BCDADD_LT_V16QI.
(BCDADD_EQ): Replaced with BCDADD_EQ_V1TI and BCDADD_EQ_V16QI.
(BCDADD_GT): Replaced with BCDADD_GT_V1TI and BCDADD_GT_V16QI.
(BCDADD_OV): Replaced with BCDADD_OV_V1TI and BCDADD_OV_V16QI.
(BCDSUB_V1TI, BCDSUB_V16QI, BCDSUB_LT_V1TI, BCDSUB_LT_V16QI,
BCDSUB_LE_V1TI, BCDSUB_LE_V16QI, BCDSUB_EQ_V1TI, BCDSUB_EQ_V16QI,
BCDSUB_GT_V1TI, BCDSUB_GT_V16QI, BCDSUB_GE_V1TI, BCDSUB_GE_V16QI,
BCDSUB_OV_V1TI, BCDSUB_OV_V16QI, BCDINVALID_V1TI, BCDINVALID_V16QI,
BCDMUL10_V16QI, BCDDIV10_V16QI, DENBCD_V16QI): New builtin
definitions.
(BCDADD, BCDADD_LT, BCDADD_EQ, BCDADD_GT, BCDADD_OV, BCDSUB,
BCDSUB_LT,
BCDSUB_LE, BCDSUB_EQ, BCDSUB_GT, BCDSUB_GE, BCDSUB_OV, BCDINVALID,
BCDMUL10, BCDDIV10, DENBCD): New overload definitions.
* config/rs6000/rs6000-call.c (P8V_BUILTIN_VEC_BCDADD,
P8V_BUILTIN_VEC_BCDADD_LT,
P8V_BUILTIN_VEC_BCDADD_EQ, P8V_BUILTIN_VEC_BCDADD_GT,
P8V_BUILTIN_VEC_BCDADD_OV,
P8V_BUILTIN_VEC_BCDINVALID, P9V_BUILTIN_VEC_BCDMUL10,
P8V_BUILTIN_VEC_DENBCD.
P8V_BUILTIN_VEC_BCDSUB, P8V_BUILTIN_VEC_BCDSUB_LT,
P8V_BUILTIN_VEC_BCDSUB_LE,
   P8V_BUILTIN_VEC_BCDSUB_EQ, P8V_BUILTIN_VEC_BCDSUB_GT,
P8V_BUILTIN_VEC_BCDSUB_GE,
P8V_BUILTIN_VEC_BCDSUB_OV): New overloaded specifications.
(CODE_FOR_bcdadd): Replaced with CODE_FOR_bcdadd_v16qi and
CODE_FOR_bcdadd_v1ti.
(CODE_FOR_bcdadd_lt): Replaced with CODE_FOR_bcdadd_lt_v16qi and
CODE_FOR_bcdadd_lt_v1ti.
(CODE_FOR_bcdadd_eq): Replaced with CODE_FOR_bcdadd_eq_v16qi and
CODE_FOR_bcdadd_eq_v1ti.
(CODE_FOR_bcdadd_gt): Replaced with CODE_FOR_bcdadd_gt_v16qi and
CODE_FOR_bcdadd_gt_v1ti.
(CODE_FOR_bcdsub): Replaced with CODE_FOR_bcdsub_v16qi and
CODE_FOR_bcdsub_v1ti.
(CODE_FOR_bcdsub_lt): Replaced with CODE_FOR_bcdsub_lt_v16qi and
CODE_FOR_bcdsub_lt_v1ti.
(CODE_FOR_bcdsub_eq): Replaced with CODE_FOR_bcdsub_eq_v16qi and
CODE_FOR_bcdsub_eq_v1ti.
(CODE_FOR_bcdsub_gt): Replaced with CODE_FOR_bcdsub_gt_v16qi and
CODE_FOR_bcdsub_gt_v1ti.
(rs6000_expand_ternop_builtin):  Add CODE_FOR_dfp_denbcd_v16qi to
else if.
* doc/extend.texi: Add documentation for new builtins.

gcc/testsuite/
* gcc.target/powerpc/bcd-2.c: Add include altivec.h.
* gcc.target/powerpc/bcd-3.c: Add include altivec.h.
* gcc.target/powerpc/bcd-4.c: New test.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-08-27 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

--- Comment #4 from Carl Love  ---
Just remove 

  #define vec_popcntb __builtin_vec_vpopcntub
  #define vec_popcnth __builtin_vec_vpopcntuh
  #define vec_popcntw __builtin_vec_vpopcntuw
  #define vec_popcntd __builtin_vec_vpopcntud

from altivec.h.

We need to keep the definition for vec_popcnt as it is the currently defined
ABI builtin.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-08-27 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

--- Comment #2 from Carl Love  ---
Hit the save button a little too fast missed putting in everything I intended
to put in.  Lets try to get it all in.

(In reply to Carl Love from comment #1)
> The Power 64-Bi ELF V2 ABI specification revision 1.4 May 10, 2017 has the
> following builtins defined for popcount
>
VEC_POPCNT (ARG1) Purpose: Returns a vector containing the number of bits set
   in each element of the input vector.
  Result value:The value of each element of the result is the
   number of bits set in the corresponding input
   element.

vector unsigned char vec_popcnt (vector signed char);
vector unsigned char vec_popcnt (vector unsigned char);
vector unsigned int vec_popcnt (vector signed int);
vector unsigned int vec_popcnt (vector unsigned int);
vector unsigned long long vec_popcnt (vector signed long long);
vector unsigned long long vec_popcnt (vector unsigned long long);
vector unsigned short vec_popcnt (vector signed short);
vector unsigned short vec_popcnt (vector unsigned short);

In section A.6. Deprecated Compatibility Functions we have listed:

>   vector signed char vec_vpopcnt (vector signed char);
>   vector unsigned char vec_vpopcnt (vector unsigned char);
>   vector unsigned int vec_vpopcnt (vector int);
>   vector signed long long vec_vpopcnt (vector signed long long);
>   vector unsigned long long vec_vpopcnt (vector unsigned long long);
>   vector unsigned short vec_vpopcnt (vector unsigned short);
>   vector int vec_vpopcnt (vector int);
>   vector short vec_vpopcnt (vector short);
>   vector signed char vec_vpopcntb (vector signed char);
>   vector unsigned char vec_vpopcntb (vector unsigned char);
>   vector signed long long vec_vpopcntd (vector signed long long);
>   vector unsigned long long vec_vpopcntd (vector unsigned long long)
>   vector unsigned short vec_vpopcnth (vector unsigned short);
>   vector short vec_vpopcnth (vector short);
>   vector unsigned int vec_vpopcntw (vector unsigned int);
>   vector int vec_vpopcntw (vector int);
> 
> The functions vec_popcntb,  vec_popcnth,  vec_popcntw, vec_popcntd do not
> appear in the ABI as supported or depricated functions.
> 
> In altivec.h they are defined as:
> 
>   #define vec_popcnt __builtin_vec_vpopcntu
>   #define vec_popcntb __builtin_vec_vpopcntub
>   #define vec_popcnth __builtin_vec_vpopcntuh
>   #define vec_popcntw __builtin_vec_vpopcntuw
>   #define vec_popcntd __builtin_vec_vpopcntud
> 
> It does   appear they should be removed from altivec.h.   
> 
> The user should   use the builtin vec_popcnt(a) where a is the unsigned 
> long
> long
> or unsigned int   as desired.  These builtins are support on at least
> gcc version 8.3.1 and later.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-08-27 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #1 from Carl Love  ---
The Power 64-Bi ELF V2 ABI specification revision 1.4 May 10, 2017 has the
following builtins defined for popcount

  vector signed char vec_vpopcnt (vector signed char);
  vector unsigned char vec_vpopcnt (vector unsigned char);
  vector unsigned int vec_vpopcnt (vector int);
  vector signed long long vec_vpopcnt (vector signed long long);
  vector unsigned long long vec_vpopcnt (vector unsigned long long);
  vector unsigned short vec_vpopcnt (vector unsigned short);
  vector int vec_vpopcnt (vector int);
  vector short vec_vpopcnt (vector short);
  vector signed char vec_vpopcntb (vector signed char);
  vector unsigned char vec_vpopcntb (vector unsigned char);
  vector signed long long vec_vpopcntd (vector signed long long);
  vector unsigned long long vec_vpopcntd (vector unsigned long long)
  vector unsigned short vec_vpopcnth (vector unsigned short);
  vector short vec_vpopcnth (vector short);
  vector unsigned int vec_vpopcntw (vector unsigned int);
  vector int vec_vpopcntw (vector int);

The functions vec_popcntb,  vec_popcnth,  vec_popcntw, vec_popcntd do not
appear in the ABI as supported or depricated functions.

In altivec.h they are defined as:

  #define vec_popcnt __builtin_vec_vpopcntu
  #define vec_popcntb __builtin_vec_vpopcntub
  #define vec_popcnth __builtin_vec_vpopcntuh
  #define vec_popcntw __builtin_vec_vpopcntuw
  #define vec_popcntd __builtin_vec_vpopcntud

It does appear they should be removed from altivec.h.   

The user should use the builtin vec_popcnt(a) where a is the unsigned long long
or unsigned int as desired.  These builtins are support on at least
gcc version 8.3.1 and later.

[Bug target/93819] PPC64 builtin vec_rlnm() argument order is wrong.

2020-03-31 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

Carl Love  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #8 from Carl Love  ---
Closing

[Bug target/93819] PPC64 builtin vec_rlnm() argument order is wrong.

2020-03-31 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

Carl Love  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Carl Love  ---
Patch submitted to mainline and backported to gcc 9 and 8.

[Bug target/91638] powerpc -mlong-double-NN (documentation) issues

2020-03-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #7 from Carl Love  ---
Patch approved by Segher with a few minor fixes to the patch.

Patch committed to mainline

commit 9439378f7a08cf9c8f524c9f3758a37d804ac106
Author: Carl Love 
Date:   Thu Mar 5 12:52:35 2020 -0600

rs6000: Fix -mlong-double documentation

gcc/ChangeLog

2020-03-09  Carl Love  

* config/rs6000/rs6000.opt: Update the description of the
command line option.

[Bug target/91638] powerpc -mlong-double-NN (documentation) issues

2020-03-06 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638

--- Comment #6 from Carl Love  ---
Yea, I like that a bit better.  It is a bit shorter, mine was a bit verbose.

I updated the patch to print:

  -mlong-double-  Use -mlong-double-64 for 64-bit IEEE floating
  point format. Use -mlong-double-128 for 128-bit
  floating point format (either IEEE or IBM).

I will post the patch to the mailing list for review/acceptance on mainline.

[Bug target/91638] powerpc -mlong-double-NN (documentation) issues

2020-03-05 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638

--- Comment #4 from Carl Love  ---
Created attachment 47982
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47982=edit
patch to update -mlong-double-NN description

Attached the proposed patch so I will not lose it.

[Bug target/91638] powerpc -mlong-double-NN (documentation) issues

2020-03-05 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #3 from Carl Love  ---
I have looked at what -mlong-double-64 and -mlong-double-128 do.  

The -mlong-double-64 tells GCC to use the 64-bit IEEE floating point format.

The -mlong-double-128 tells GCC to use 128-bit floating point format.  On
ppc64, there are actually two 128-bit floating point formats that are
supported.  So it looks to me like the user should also specify either
-mabi=ieeelongdouble to get the 128-bit IEEE floating point format or
-mabi=ibmlongdouble to get the IBM 128-bit floating point format.  

I created a patch that updates the output of the gcc --target-help  command to
print the following:

  -mlong-double-  Use -mlong-double-64 for 64-bit IEEE floating
  point format. Use -mlong-double-128 and
  -mabi=ieeelongdouble for 128-bit IEEE floating
  point format. Use -mlong-double-128 and
  -mabi=ibmlongdouble for 128-bit IBM floating
  point format.

Please let me know if this description is accurate and better.  Suggestions on
improvements are always welcome.

[Bug c/93819] PPC64 builtin vec_rlnm() argument order is wrong.

2020-02-18 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

--- Comment #2 from Carl Love  ---
With the attached patch, the test program now runs as follows:

ABI says:
VEC_RLNM (ARG1, ARG2, ARG3)
ARG2 contains the shift count for each element in the low-order
byte, with other bytes zero.
ARG3 contains the mask begin and mask end for each element, with
the mask end in the low-order byte, the mask begin in the next
higher byte, and other bytes zero.

Vector int test case: mask begin = 0, mask end = 4, shift = 16

vec_arg1_int[0] = 0x12345678
vec_arg2_int[0] = 16 (0x10)
vec_arg3_int[0] = 4 (0x4)
vec_result_int[0] = 0x5000
Int result matches expected result 0x5000
vec_arg1_int[1] = 0x23456789
vec_arg2_int[1] = 16 (0x10)
vec_arg3_int[1] = 4 (0x4)
vec_result_int[1] = 0x6000
Int result matches expected result 0x6000
vec_arg1_int[2] = 0x3456789a
vec_arg2_int[2] = 16 (0x10)
vec_arg3_int[2] = 4 (0x4)
vec_result_int[2] = 0x7800
Int result matches expected result 0x7800
vec_arg1_int[3] = 0x456789ab
vec_arg2_int[3] = 16 (0x10)
vec_arg3_int[3] = 4 (0x4)
vec_result_int[3] = 0x8800
Int result matches expected result 0x8800
Vector long long int test case: mask begin = 0, mask end = 4, shift = 20

vec_arg1_di[0] = 0x123456789abcde00
vec_arg2_di[0] = 20 (0x14)
vec_arg3_di[0] = 4 (0x4)
vec_result_di[0] = 0x6000
Long long int result matches expected result 0x6000
vec_arg1_di[1] = 0x23456789abcdef11
vec_arg2_di[1] = 20 (0x14)
vec_arg3_di[1] = 4 (0x4)
vec_result_di[1] = 0x7800
Long long int result matches expected result 0x7800

[Bug c/93819] PPC64 builtin vec_rlnm() argument order is wrong.

2020-02-18 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

--- Comment #1 from Carl Love  ---
Created attachment 47873
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47873=edit
Patch to fix vec_vrlnm() functionality

The issue with the vec_rlnm() builtin is the order of the arguments in the
builtin macro definition in gcc/config/rs6000/altivec.h.  I have attached a
patch to fix the issue.  Here is the fix as well

--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -182,7 +182,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned

Basically the second and third arguments to the macro need to be reversed.

[Bug c/93819] New: PPC64 builtin vec_rlnm() argument order is wrong.

2020-02-18 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

Bug ID: 93819
   Summary: PPC64 builtin vec_rlnm() argument order is wrong.
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cel at us dot ibm.com
  Target Milestone: ---

Created attachment 47872
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47872=edit
test program to demonstrate the bug.

The API for the PPC 64 vec_rlnm() builtin says:

VEC_RLNM (ARG1, ARG2, ARG3)
Purpose:
  Rotates each element of a vector left; then intersects (AND) it with a mask.

  Result value:
  Each element of vector ARG1 is rotated left; then intersected (AND) with a
mask   
  specified by ARG2 and ARG3.

  ARG2 contains the shift count for each element in the low-order byte, with  
  other bytes zero.

  ARG3 contains the mask begin and mask end for each element, with the mask end 
  in the low-order byte, the mask begin in the next higher byte, and other
bytes 
  zero.

  vector unsigned int vec_rlnm (vector unsigned int, vector unsigned int,
vector unsigned int);

  vector unsigned long long vec_rlnm (vector unsigned long long, 
vector unsigned long long, vector unsigned long long);

However the current implementation has the shift value in argument 3 and the
mask information in argument 2.

The following is the output from a test program:


ABI says:
VEC_RLNM (ARG1, ARG2, ARG3)
ARG2 contains the shift count for each element in the low-order
byte, with other bytes zero.
ARG3 contains the mask begin and mask end for each element, with
the mask end in the low-order byte, the mask begin in the next
higher byte, and other bytes zero.

Vector int test case: mask begin = 0, mask end = 4, shift = 16

vec_arg1_int[0] = 0x12345678
vec_arg2_int[0] = 16 (0x10)
vec_arg3_int[0] = 4 (0x4)
vec_result_int[0] = 0x2345
ERROR: Int result does not match expected result 0x5000
vec_arg1_int[1] = 0x23456789
vec_arg2_int[1] = 16 (0x10)
vec_arg3_int[1] = 4 (0x4)
vec_result_int[1] = 0x3456
ERROR: Int result does not match expected result 0x6000
vec_arg1_int[2] = 0x3456789a
vec_arg2_int[2] = 16 (0x10)
vec_arg3_int[2] = 4 (0x4)
vec_result_int[2] = 0x45678000
ERROR: Int result does not match expected result 0x7800
vec_arg1_int[3] = 0x456789ab
vec_arg2_int[3] = 16 (0x10)
vec_arg3_int[3] = 4 (0x4)
vec_result_int[3] = 0x56788000
ERROR: Int result does not match expected result 0x8800
Vector long long int test case: mask begin = 0, mask end = 4, shift = 20

vec_arg1_di[0] = 0x123456789abcde00
vec_arg2_di[0] = 20 (0x14)
vec_arg3_di[0] = 4 (0x4)
vec_result_di[0] = 0x23456000
ERROR: Long long int result does not match expected result 0x6000
vec_arg1_di[1] = 0x23456789abcdef11
vec_arg2_di[1] = 20 (0x14)
vec_arg3_di[1] = 4 (0x4)
vec_result_di[1] = 0x34567800
ERROR: Long long int result does not match expected result 0x7800

If we look at the ve_result_int[0] = 0x2345, the input vec_arg1_int[0] =
0x12345678 was shifted by 0x4 (value in arg3 not arg2) and then ANDed with a
mask starting at bit 0 (counting bits from left to right) thru bit 16, all
other bits were set to zero.  The expected result is also given above.

Attached is the test program which was compiled with the following command:

gcc -g -mcpu=power9 check-builtin-vec_rlnm-runnable.c -o
check-builtin-vec_rlnm-runnable

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #11 from Carl Love  ---
Created attachment 47626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47626=edit
311r.dwarf2 file for v4si and the v2di test case

The attached file was generated with the #if in the test program set to 1 to
include the test for v4si and the second test for the v2di case. This is dwarf2
file which contains note 149 which is incorrect.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #10 from Carl Love  ---
Created attachment 47625
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47625=edit
310r.nothrow for both tests v4si and v2di

The attached file was generated with the #if in the test program set to 1 to
include the test for V4si and the second test for the v2di case.  This is the
dump file preceeding the 311r.dwarf2 file which contains note 149 which is
incorrect.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #9 from Carl Love  ---
Created attachment 47624
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47624=edit
311r.dwarf2 file for just the v2di test case

The attached file was generated with the #if in the test program set to 0 so
only the second test for the v2di case is done.  This is dwarf2 file which
contains note 111 for UNSPEC_FOO which is incorrect.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #8 from Carl Love  ---
Created attachment 47623
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47623=edit
310r.nothrow for just the __builtin_vec_foo_v2di test case

The attached file was generated with the #if in the test program set to 0 so
only the second test for the v2di case is done.  This is the dump file
preceeding the 311r.dwarf2 file which contains note 111 for UNSPEC_FOO.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #5 from Carl Love  ---
I am puzzled.  When we have both test cases included which are identical other
then the data size, the notes are correct for second test case but not the
first test case.  When we remove the first test case, then all of the sudden
the the notes are no longer correct for the second case.  This seems very
inconsistent to me.  We should either not be able to get the notes correct for
both cases or neither case.   This really seems like something isn't working
right and could be fixed.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #3 from Carl Love  ---
The initial bug report states that the bug moves around depending on the test
case.  If the test case only consists of the test for the V2DI case, you get
the error that Bill was specifically stating, i.e. UNSPEC_FOO.  This is done by
setting the #if define in the test case to 0.  If you set the #if define to 1
to include both test cases, then the bug moves to UNSPEC_VSX_SET.  Perhaps this
was not as clear as it could have been in the initial post.  I tried to
describe this behaviour in the hope it might help to find and fix the bug
correctly in all cases.

[Bug c/93206] New: non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-08 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

Bug ID: 93206
   Summary: non-delegitimized UNSPEC generated for C program on
PowerPc with current mainline GCC tree
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cel at us dot ibm.com
  Target Milestone: ---

Created attachment 47616
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47616=edit
GCC src tree patch with test case

Issue 61879 appears to be similar, as does 81280 but for different platforms.

The following GCC patch demonstrates a bug in the GCC variable tracking logic
on the PowerPC platform, Power 9 processor but is probably independent of the
processor type.

The bug was found as part of some internal development.  The key parts of
the work were backported to demonstrate the issue on current mainline as
of commit

r279927 | danglin | 2020-01-06 17:48:42 -0600 (Mon, 06 Jan 2020) | 6 lines

* config/pa/pa.md: Revert change to use ordered_comparison_operator
instead of cmpib_comparison_operator in cmpib patterns.
* config/pa/predicates.md (cmpib_comparison_operator): Revert removal
of cmpib_comparison_operator.  Revise comment.

With this patch applied, gcc generates the following warning during
compilation.

gcc -g -mcpu=future -O2 var-tracking-bug-test.c -o var-tracking-bug-test
var-tracking-bug-test.c: In function ‘main’:
var-tracking-bug-test.c:4:5: note: non-delegitimized UNSPEC UNSPEC_FOO (170)
fou
nd in variable location
4 | int main ()
  | ^~~~
var-tracking-bug-test.c:4:5: note: non-delegitimized UNSPEC UNSPEC_FOO (170)
fou
nd in variable location

The test case must be compiled with -O2 to get the message.  If you use
-O2 and -fno-var-tracking the note is not printed.

Note, the patch adds RTL support for a V4SI and a V2DI future instruction.

The following is from the dwarf2out phase in dump file
var-tracking-bug-test.c.311r.dwarf2

  (expr_list (use (reg:DI 2 2))
(expr_list:DI (use (reg:DI 3 3))
(expr_list:DI (use (reg:DI 4 4))
(expr_list:DI (use (reg:DI 5 5))
(nil))
(note/c 149 34 115 (var_location result (lshiftrt:DI (unspec:DI [
(unspec:V4SI [
(unspec:V4SI [
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int -1 [0x])
(const_int 0 [0])
] UNSPEC_VSX_SET)
(const_int -1 [0x])
(const_int 1 [0x1])
] UNSPEC_VSX_SET)
(const_int 1 [0x1])
] UNSPEC_FOO)

We see in the note "(lshiftrt:DI (unspec:DI" which as I understand it is
what triggers the note in the output.  So in this case the issue has to do
with the V4SI test case.

If you comment out the entire test for
result = __builtin_vec_foo_v4si (wi_src, 1); the error then shows up as:

(expr_list (use (reg:DI 2 2))
(expr_list:DI (use (reg:DI 3 3))
(expr_list:DI (use (reg:DI 4 4))
(expr_list:DI (use (reg:DI 5 5))
(nil))
(note/c 111 36 105 (var_location result (lshiftrt:DI (unspec:DI [
(const_vector:V2DI [
(const_int -1 [0x]) repeated x2
])
(const_int 1 [0x1])
] UNSPEC_FOO)

in the V2DI test.

[Bug target/84371] test case gcc.target/powerpc/builtins-3.c fails on power9

2018-02-19 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84371

--- Comment #2 from Carl Love  ---
Will:

Here is the bug report I just got from Peter.  From our sametime
conversation sounds like you have addressed these in a recent update. 
Take a look, may be that Peter needs to update his tree

 Carl 

On Mon, 2018-02-19 at 16:22 +, bergner at gcc dot gnu.org wrote:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_bugz
> illa_show-5Fbug.cgi-3Fid-3D84371=DwIFaQ=jf_iaSHvJObTbx-
> siA1ZOg=RFEmMkZAk--_wFGN5tkM_A=KakkHTM7gWnvABQzEIFU4wyGWq_1-
> lFbt1DyIEjX8s8=WSpl7W2dIBon1uGdUhwW9CIOG9X_PMNcNiV7aqRuPXs=
> 
> Peter Bergner  changed:
> 
>    What|Removed |Added
> ---
> -
> URL||https://urldefense.p
> roofpoint.com/v2/url?u=https-3A__gcc.gnu.org_ml_gcc-
> 2D=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=RFEmMkZAk
> --_wFGN5tkM_A=KakkHTM7gWnvABQzEIFU4wyGWq_1-
> lFbt1DyIEjX8s8=FDyAZOw6NwZxSBmmKn2eKIEv5rTNen7rU8toxwggWJs=
>    ||patches/2018-
> 02/msg00937.ht
>    ||ml
>  CC||bergner at gcc dot
> gnu.org
>

[Bug target/81158] [8 regression] Many test case failures starting with r249424

2017-06-22 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81158

--- Comment #2 from Carl Love  ---
On Thu, 2017-06-22 at 21:06 +, wschmidt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81158
> 
> --- Comment #1 from Bill Schmidt  ---
> I expect this is probably due to the changes to rs6000_gimple_fold_builtin.
> 

Bill:

FAIL: gcc.target/powerpc/builtins-3.c scan-assembler-times vmulesh 1
FAIL: gcc.target/powerpc/builtins-3.c scan-assembler-times vmuleuh 1
FAIL: gcc.target/powerpc/builtins-3.c scan-assembler-times vmulosh 1
FAIL: gcc.target/powerpc/builtins-3.c scan-assembler-times vmulouh 1

Unfortunately, this is due to my screw up.  The patch to fix the
vec_mule and vec_mulo was missing the test case update.  I made the
change to the test case but didn't get it included in the patch.  I have
since fixed the above failures, commit 249572.  The commit was tested on
Power 8 BE, LE and Power 7 to make sure they are all working now.

The other tests listed, gcc.c-torture/execute/* and gcc.dg/vect/* I have
never touched so I don't think I am responsible for those.  :-) 

Carl

[Bug target/79039] builtins-3-p9.c fails with -m32

2017-01-10 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79039

--- Comment #2 from Carl Love  ---
On Tue, 2017-01-10 at 14:29 +, wschmidt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79039
> 
> Bill Schmidt  changed:
> 
>What|Removed |Added
> 
>  CC||carll at gcc dot gnu.org,
>||wschmidt at gcc dot gnu.org
> 
> --- Comment #1 from Bill Schmidt  ---
> CCing Carl.  Carl, be sure to use long long instead of long so that you get 64
> bits on both -m32 and -m64.
> 
Bill:

Yes, I see I just have "long" for the last test case.  I am compiling
the code on willow5 and want to make sure I can reproduce the bug with
-m32 and then fix it.  I know we didn't test this patch with -m32The
popcnt patch that I just posted for review I did do the testing on -m32
and -m64.  I would have expected to see the failure on -m32 but didn't
so I am a little paranoid, again, that my testing/environment isn't
working as well as expected.

  Carl

[Bug target/68752] PowerPC: vector reciprocal square root estimate missed optimisations

2016-10-10 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68752

--- Comment #4 from Carl Love  ---
I do not seem to have permission to change the status of the bug.  

Anton, can you recheck the issue and close if you agree it is no longer an
issue.  Thanks.

  Carl

[Bug target/68752] PowerPC: vector reciprocal square root estimate missed optimisations

2016-10-10 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68752

--- Comment #3 from Carl Love  ---
I investigated the issue using GCC 6.1. The t1() function from file
recip-vec-sqrtf.c file is as follows:

void t1(void)
{
int i;

for (i = 0; i < 4; i++)
  r[i] = a[i] / sqrtf (b[i]);
}

The assembly code being generated for the loop using the options specified in
the bug plus -S to generate the assembly code results in the following

 -O2 -ffast-math -ftree-vectorize -mcpu=power7 -mrecip -fno-common  -S 

.file "recip-vec-sqrtf.c"
.section ".toc","aw"
.section ".text"
.machine power7
.align 2
.p2align 4,,15
.globl t1
.section ".opd","aw"
.align 3
t1:
.quad .L.t1,.TOC.@tocbase
.previous
.type t1, @function
.L.t1:
addis 9,2,.LANCHOR0@toc@ha
addis 10,2,.LC0@toc@ha
addi 9,9,.LANCHOR0@toc@l
addi 10,10,.LC0@toc@l
lxvd2x 12,0,9
lxvd2x 11,0,10
li 10,32
lxvd2x 10,9,10
li 10,16
xvrsqrtesp 0,12
xvmsubasp 12,12,11
xvmulsp 9,0,0
xvnmsubasp 11,12,9
xvmulsp 0,0,11
xvmulsp 0,0,10
stxvd2x 0,9,10
blr

The expected vector instructions are there.  It does not appear that any scalar
instructions are being generated for the computation as implied in the
bugzilla.

The other loops for the test cases in attachment 36938 are similar in that they
appear to only be generating the expected vector instructions.  The bugzilla
was filed before GCC 6 was released.  It appears the issue has since been
fixed.

[Bug target/68752] PowerPC: vector reciprocal square root estimate missed optimisations

2016-10-10 Thread cel at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68752

Carl Love  changed:

   What|Removed |Added

 CC||cel at us dot ibm.com

--- Comment #2 from Carl Love  ---
I investigated the issue using GCC 6.1. The t1() function from file
recip-vec-sqrtf.c file is as follows:

void t1(void)
{
int i;

for (i = 0; i < 4; i++)
  r[i] = a[i] / sqrtf (b[i]);
}

The assembly code being generated for the loop using the options specified in
the bug plus -S to generate the assembly code results in the following

 -O2 -ffast-math -ftree-vectorize -mcpu=power7 -mrecip -fno-common  -S 

.file "recip-vec-sqrtf.c"
.section ".toc","aw"
.section ".text"
.machine power7
.align 2
.p2align 4,,15
.globl t1
.section ".opd","aw"
.align 3
t1:
.quad .L.t1,.TOC.@tocbase
.previous
.type t1, @function
.L.t1:
addis 9,2,.LANCHOR0@toc@ha
addis 10,2,.LC0@toc@ha
addi 9,9,.LANCHOR0@toc@l
addi 10,10,.LC0@toc@l
lxvd2x 12,0,9
lxvd2x 11,0,10
li 10,32
lxvd2x 10,9,10
li 10,16
xvrsqrtesp 0,12
xvmsubasp 12,12,11
xvmulsp 9,0,0
xvnmsubasp 11,12,9
xvmulsp 0,0,11
xvmulsp 0,0,10
stxvd2x 0,9,10
blr

The expected vector instructions are there.  It does not appear that any scalar
instructions are being generated for the computation as implied in the
bugzilla.

The other loops for the test cases in attachment 36938 are similar in that they
appear to only be generating the expected vector instructions.  The bugzilla
was filed before GCC 6 was released.  It appears the issue has since been
fixed.