Re: [Patch v2] Enable math functions linking with static library for LTO

2019-08-22 Thread luoxhu

Hi Richard,

On 2019/8/13 17:10, Richard Biener wrote:

On Tue, Aug 13, 2019 at 4:22 AM luoxhu  wrote:


Hi Richard,

On 2019/8/12 16:51, Richard Biener wrote:

On Mon, Aug 12, 2019 at 8:50 AM luoxhu  wrote:


Hi Richard,
Thanks for your comments, updated the v2 patch as below:
1. Define and use builtin_with_linkage_p.
2. Add comments.
3. Add a testcase.

In LTO mode, if static library and dynamic library contains same
function and both libraries are passed as arguments, linker will link
the function in dynamic library no matter the sequence.  This patch
will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL
is a math function, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.


Comments below


gcc/ChangeLog

  2019-08-12  Xiong Hu Luo  

  PR lto/91287
  * builtins.c (builtin_with_linkage_p): New function.
  * builtins.h (builtin_with_linkage_p): New function.
  * symtab.c (write_symbol): Use builtin_with_linkage_p.
  * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p):
  Likewise.

gcc/testsuite/ChangeLog

  2019-08-12  Xiong Hu Luo  

  PR lto/91287
  * gcc.dg/pr91287.c: New testcase.
---
   gcc/builtins.c | 89 ++
   gcc/builtins.h |  2 +
   gcc/lto-streamer-out.c |  4 +-
   gcc/symtab.c   | 13 -
   gcc/testsuite/gcc.dg/pr91287.c | 40 +++
   5 files changed, 145 insertions(+), 3 deletions(-)
   create mode 100644 gcc/testsuite/gcc.dg/pr91287.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..f4dea941a27 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p)
 *p = (char)tree_to_uhwi (t);
 return true;
   }
+
+/* Return true if DECL is a specified builtin math function.  These functions
+   should have symbol in symbol table to provide linkage with faster version of
+   libraries.  */


The comment should read like

/* Return true if the builtin DECL is implemented in a standard
library.  Otherwise
 returns false which doesn't guarantee it is not (thus the list of
handled builtins
 below may be incomplete).  */


+bool
+builtin_with_linkage_p (tree decl)
+{
+  if (!decl)
+return false;


Omit this check please.


+  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+switch (DECL_FUNCTION_CODE (decl))
+{
+  CASE_FLT_FN (BUILT_IN_ACOS):
+  CASE_FLT_FN (BUILT_IN_ACOSH):
+  CASE_FLT_FN (BUILT_IN_ASIN):
+  CASE_FLT_FN (BUILT_IN_ASINH):
+  CASE_FLT_FN (BUILT_IN_ATAN):
+  CASE_FLT_FN (BUILT_IN_ATANH):
+  CASE_FLT_FN (BUILT_IN_ATAN2):
+  CASE_FLT_FN (BUILT_IN_CBRT):
+  CASE_FLT_FN (BUILT_IN_CEIL):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL):
+  CASE_FLT_FN (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN (BUILT_IN_COS):
+  CASE_FLT_FN (BUILT_IN_COSH):
+  CASE_FLT_FN (BUILT_IN_ERF):
+  CASE_FLT_FN (BUILT_IN_ERFC):
+  CASE_FLT_FN (BUILT_IN_EXP):
+  CASE_FLT_FN (BUILT_IN_EXP2):
+  CASE_FLT_FN (BUILT_IN_EXPM1):
+  CASE_FLT_FN (BUILT_IN_FABS):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
+  CASE_FLT_FN (BUILT_IN_FDIM):
+  CASE_FLT_FN (BUILT_IN_FLOOR):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR):
+  CASE_FLT_FN (BUILT_IN_FMA):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
+  CASE_FLT_FN (BUILT_IN_FMAX):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX):
+  CASE_FLT_FN (BUILT_IN_FMIN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN):
+  CASE_FLT_FN (BUILT_IN_FMOD):
+  CASE_FLT_FN (BUILT_IN_FREXP):
+  CASE_FLT_FN (BUILT_IN_HYPOT):
+  CASE_FLT_FN (BUILT_IN_ILOGB):
+  CASE_FLT_FN (BUILT_IN_LDEXP):
+  CASE_FLT_FN (BUILT_IN_LGAMMA):
+  CASE_FLT_FN (BUILT_IN_LLRINT):
+  CASE_FLT_FN (BUILT_IN_LLROUND):
+  CASE_FLT_FN (BUILT_IN_LOG):
+  CASE_FLT_FN (BUILT_IN_LOG10):
+  CASE_FLT_FN (BUILT_IN_LOG1P):
+  CASE_FLT_FN (BUILT_IN_LOG2):
+  CASE_FLT_FN (BUILT_IN_LOGB):
+  CASE_FLT_FN (BUILT_IN_LRINT):
+  CASE_FLT_FN (BUILT_IN_LROUND):
+  CASE_FLT_FN (BUILT_IN_MODF):
+  CASE_FLT_FN (BUILT_IN_NAN):
+  CASE_FLT_FN (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN (BUILT_IN_NEXTAFTER):
+  CASE_FLT_FN (BUILT_IN_NEXTTOWARD):
+  CASE_FLT_FN (BUILT_IN_POW):
+  CASE_FLT_FN (BUILT_IN_REMAINDER):
+  CASE_FLT_FN (BUILT_IN_REMQUO):
+  CASE_FLT_FN (BUILT_IN_RINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT):
+  CASE_FLT_FN (BUILT_IN_ROUND):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND):
+  CASE_FLT_FN (BUILT_IN_SCALBLN):
+  CASE_FLT_FN (BUILT_IN_SCALBN):
+  CASE_FLT_FN (BUILT_IN_SIN):
+  CASE_FLT_FN (BUILT_IN_SINH):
+  CASE_FLT_FN (BUILT_IN_SINCOS):
+  CASE_FLT_FN (BUILT_IN_SQRT

Re: [PATCH] Add MD Function type check for builtin_md vectorize

2019-08-21 Thread luoxhu

On 2019/8/21 15:40, Richard Biener wrote:

On Tue, 20 Aug 2019, Xiong Hu Luo wrote:


The DECL_MD_FUNCTION_CODE added in r274404(PR 91421) by rsandifo requires that
DECL to be a BUILTIN_IN_MD class built-in, asserts will happen when lto
as the patch r274411(PR 91287) outputs some math function symbol to the object,
this patch will check function type before do builtin_md vectorize.


I think Richard fixed this already.


Thanks. It was fixed by Richard's r274524 already. Please ignore this
patch.

Xionghu



Richard.


gcc/ChangeLog

2019-08-21  Xiong Hu Luo  

* tree-vect-stmts.c (vectorizable_call): Check callee built-in type.
* gcc/tree.h (DECL_MD_FUNCTION_P): New function.
---
  gcc/tree-vect-stmts.c |  2 +-
  gcc/tree.h| 12 
  2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 1e2dfe5d22d..ef947f20d63 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3376,7 +3376,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
if (cfn != CFN_LAST)
fndecl = targetm.vectorize.builtin_vectorized_function
  (cfn, vectype_out, vectype_in);
-  else if (callee)
+  else if (callee && DECL_MD_FUNCTION_P (callee))
fndecl = targetm.vectorize.builtin_md_vectorized_function
  (callee, vectype_out, vectype_in);
  }
diff --git a/gcc/tree.h b/gcc/tree.h
index b910c5cb475..8cce89e5cf3 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3905,6 +3905,18 @@ DECL_MD_FUNCTION_CODE (const_tree decl)
return fndecl.function_code;
  }
  
+/* Return true if decl is a FUNCTION_DECL with built-in class BUILT_IN_MD.

+   Otherwise return false.  */
+inline bool
+DECL_MD_FUNCTION_P (const_tree decl)
+{
+  const tree_function_decl  = FUNCTION_DECL_CHECK (decl)->function_decl;
+  if (fndecl.built_in_class == BUILT_IN_MD)
+return true;
+  else
+return false;
+}
+
  /* Return the frontend-specific built-in function that DECL represents,
 given that it is known to be a FUNCTION_DECL with built-in class
 BUILT_IN_FRONTEND.  */







Re: [Patch v2] Enable math functions linking with static library for LTO

2019-08-12 Thread luoxhu



On 2019/8/13 10:22, luoxhu wrote:
diff --git a/gcc/testsuite/gcc.dg/pr91287.c 
b/gcc/testsuite/gcc.dg/pr91287.c

new file mode 100644
index 000..c816e0537aa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr91287.c
@@ -0,0 +1,40 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2" } */


You don't use -flto here so the testcase doesn't exercise any of the patched
code.  Does it work when you add -flto here?  That is, do
scan-symbol[-not] properly use gcc-nm or the linker plugin?


-flto is needed here to check this patch correctness, my mistake here, 
thanks for catching.  atan2 will exists in pr91287.o even without lto as

pr91287.o has the instruction "bl atan2".
After adding -flto the case also works as symbol is written to pr91287.o.
PS: update other changes in patch attached.

What's more, this test case depends on this patch 
https://www.sourceware.org/ml/binutils/2019-08/msg00113.html.

otherwise nm will report error (use plugin or local gcc-nm is OK):

PASS: gcc.dg/pr91287.c (test for excess errors)
ERROR: gcc.dg/pr91287.c: error executing dg-final: /usr/bin/nm: pr91287.o: 
plugin needed to handle lto object
UNRESOLVED: gcc.dg/pr91287.c: error executing dg-final: /usr/bin/nm: 
pr91287.o: plugin needed to handle lto object


Xionghu



Re: [Patch v2] Enable math functions linking with static library for LTO

2019-08-12 Thread luoxhu

Hi Richard,

On 2019/8/12 16:51, Richard Biener wrote:

On Mon, Aug 12, 2019 at 8:50 AM luoxhu  wrote:


Hi Richard,
Thanks for your comments, updated the v2 patch as below:
1. Define and use builtin_with_linkage_p.
2. Add comments.
3. Add a testcase.

In LTO mode, if static library and dynamic library contains same
function and both libraries are passed as arguments, linker will link
the function in dynamic library no matter the sequence.  This patch
will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL
is a math function, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.


Comments below


gcc/ChangeLog

 2019-08-12  Xiong Hu Luo  

 PR lto/91287
 * builtins.c (builtin_with_linkage_p): New function.
 * builtins.h (builtin_with_linkage_p): New function.
 * symtab.c (write_symbol): Use builtin_with_linkage_p.
 * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p):
 Likewise.

gcc/testsuite/ChangeLog

 2019-08-12  Xiong Hu Luo  

 PR lto/91287
 * gcc.dg/pr91287.c: New testcase.
---
  gcc/builtins.c | 89 ++
  gcc/builtins.h |  2 +
  gcc/lto-streamer-out.c |  4 +-
  gcc/symtab.c   | 13 -
  gcc/testsuite/gcc.dg/pr91287.c | 40 +++
  5 files changed, 145 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/pr91287.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..f4dea941a27 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p)
*p = (char)tree_to_uhwi (t);
return true;
  }
+
+/* Return true if DECL is a specified builtin math function.  These functions
+   should have symbol in symbol table to provide linkage with faster version of
+   libraries.  */


The comment should read like

/* Return true if the builtin DECL is implemented in a standard
library.  Otherwise
returns false which doesn't guarantee it is not (thus the list of
handled builtins
below may be incomplete).  */


+bool
+builtin_with_linkage_p (tree decl)
+{
+  if (!decl)
+return false;


Omit this check please.


+  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+switch (DECL_FUNCTION_CODE (decl))
+{
+  CASE_FLT_FN (BUILT_IN_ACOS):
+  CASE_FLT_FN (BUILT_IN_ACOSH):
+  CASE_FLT_FN (BUILT_IN_ASIN):
+  CASE_FLT_FN (BUILT_IN_ASINH):
+  CASE_FLT_FN (BUILT_IN_ATAN):
+  CASE_FLT_FN (BUILT_IN_ATANH):
+  CASE_FLT_FN (BUILT_IN_ATAN2):
+  CASE_FLT_FN (BUILT_IN_CBRT):
+  CASE_FLT_FN (BUILT_IN_CEIL):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL):
+  CASE_FLT_FN (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN (BUILT_IN_COS):
+  CASE_FLT_FN (BUILT_IN_COSH):
+  CASE_FLT_FN (BUILT_IN_ERF):
+  CASE_FLT_FN (BUILT_IN_ERFC):
+  CASE_FLT_FN (BUILT_IN_EXP):
+  CASE_FLT_FN (BUILT_IN_EXP2):
+  CASE_FLT_FN (BUILT_IN_EXPM1):
+  CASE_FLT_FN (BUILT_IN_FABS):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
+  CASE_FLT_FN (BUILT_IN_FDIM):
+  CASE_FLT_FN (BUILT_IN_FLOOR):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR):
+  CASE_FLT_FN (BUILT_IN_FMA):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
+  CASE_FLT_FN (BUILT_IN_FMAX):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX):
+  CASE_FLT_FN (BUILT_IN_FMIN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN):
+  CASE_FLT_FN (BUILT_IN_FMOD):
+  CASE_FLT_FN (BUILT_IN_FREXP):
+  CASE_FLT_FN (BUILT_IN_HYPOT):
+  CASE_FLT_FN (BUILT_IN_ILOGB):
+  CASE_FLT_FN (BUILT_IN_LDEXP):
+  CASE_FLT_FN (BUILT_IN_LGAMMA):
+  CASE_FLT_FN (BUILT_IN_LLRINT):
+  CASE_FLT_FN (BUILT_IN_LLROUND):
+  CASE_FLT_FN (BUILT_IN_LOG):
+  CASE_FLT_FN (BUILT_IN_LOG10):
+  CASE_FLT_FN (BUILT_IN_LOG1P):
+  CASE_FLT_FN (BUILT_IN_LOG2):
+  CASE_FLT_FN (BUILT_IN_LOGB):
+  CASE_FLT_FN (BUILT_IN_LRINT):
+  CASE_FLT_FN (BUILT_IN_LROUND):
+  CASE_FLT_FN (BUILT_IN_MODF):
+  CASE_FLT_FN (BUILT_IN_NAN):
+  CASE_FLT_FN (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN (BUILT_IN_NEXTAFTER):
+  CASE_FLT_FN (BUILT_IN_NEXTTOWARD):
+  CASE_FLT_FN (BUILT_IN_POW):
+  CASE_FLT_FN (BUILT_IN_REMAINDER):
+  CASE_FLT_FN (BUILT_IN_REMQUO):
+  CASE_FLT_FN (BUILT_IN_RINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT):
+  CASE_FLT_FN (BUILT_IN_ROUND):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND):
+  CASE_FLT_FN (BUILT_IN_SCALBLN):
+  CASE_FLT_FN (BUILT_IN_SCALBN):
+  CASE_FLT_FN (BUILT_IN_SIN):
+  CASE_FLT_FN (BUILT_IN_SINH):
+  CASE_FLT_FN (BUILT_IN_SINCOS):
+  CASE_FLT_FN (BUILT_IN_SQRT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_SQRT):
+  CASE_FLT_FN (BUILT_IN_TAN):
+  CASE_FLT_FN (BUILT_IN_TANH):
+  CASE_FLT_FN

[Patch v2] Enable math functions linking with static library for LTO

2019-08-12 Thread luoxhu
Hi Richard,
Thanks for your comments, updated the v2 patch as below:
1. Define and use builtin_with_linkage_p.
2. Add comments.
3. Add a testcase.

In LTO mode, if static library and dynamic library contains same
function and both libraries are passed as arguments, linker will link
the function in dynamic library no matter the sequence.  This patch
will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL
is a math function, then the function in static library will be linked
first if its sequence is ahead of the dynamic library.

gcc/ChangeLog

2019-08-12  Xiong Hu Luo  

PR lto/91287
* builtins.c (builtin_with_linkage_p): New function.
* builtins.h (builtin_with_linkage_p): New function.
* symtab.c (write_symbol): Use builtin_with_linkage_p.
* lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p):
Likewise.

gcc/testsuite/ChangeLog

2019-08-12  Xiong Hu Luo  

PR lto/91287
* gcc.dg/pr91287.c: New testcase.
---
 gcc/builtins.c | 89 ++
 gcc/builtins.h |  2 +
 gcc/lto-streamer-out.c |  4 +-
 gcc/symtab.c   | 13 -
 gcc/testsuite/gcc.dg/pr91287.c | 40 +++
 5 files changed, 145 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr91287.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af..f4dea941a27 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p)
   *p = (char)tree_to_uhwi (t);
   return true;
 }
+
+/* Return true if DECL is a specified builtin math function.  These functions
+   should have symbol in symbol table to provide linkage with faster version of
+   libraries.  */
+
+bool
+builtin_with_linkage_p (tree decl)
+{
+  if (!decl)
+return false;
+  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+switch (DECL_FUNCTION_CODE (decl))
+{
+  CASE_FLT_FN (BUILT_IN_ACOS):
+  CASE_FLT_FN (BUILT_IN_ACOSH):
+  CASE_FLT_FN (BUILT_IN_ASIN):
+  CASE_FLT_FN (BUILT_IN_ASINH):
+  CASE_FLT_FN (BUILT_IN_ATAN):
+  CASE_FLT_FN (BUILT_IN_ATANH):
+  CASE_FLT_FN (BUILT_IN_ATAN2):
+  CASE_FLT_FN (BUILT_IN_CBRT):
+  CASE_FLT_FN (BUILT_IN_CEIL):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL):
+  CASE_FLT_FN (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN):
+  CASE_FLT_FN (BUILT_IN_COS):
+  CASE_FLT_FN (BUILT_IN_COSH):
+  CASE_FLT_FN (BUILT_IN_ERF):
+  CASE_FLT_FN (BUILT_IN_ERFC):
+  CASE_FLT_FN (BUILT_IN_EXP):
+  CASE_FLT_FN (BUILT_IN_EXP2):
+  CASE_FLT_FN (BUILT_IN_EXPM1):
+  CASE_FLT_FN (BUILT_IN_FABS):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
+  CASE_FLT_FN (BUILT_IN_FDIM):
+  CASE_FLT_FN (BUILT_IN_FLOOR):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR):
+  CASE_FLT_FN (BUILT_IN_FMA):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA):
+  CASE_FLT_FN (BUILT_IN_FMAX):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX):
+  CASE_FLT_FN (BUILT_IN_FMIN):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN):
+  CASE_FLT_FN (BUILT_IN_FMOD):
+  CASE_FLT_FN (BUILT_IN_FREXP):
+  CASE_FLT_FN (BUILT_IN_HYPOT):
+  CASE_FLT_FN (BUILT_IN_ILOGB):
+  CASE_FLT_FN (BUILT_IN_LDEXP):
+  CASE_FLT_FN (BUILT_IN_LGAMMA):
+  CASE_FLT_FN (BUILT_IN_LLRINT):
+  CASE_FLT_FN (BUILT_IN_LLROUND):
+  CASE_FLT_FN (BUILT_IN_LOG):
+  CASE_FLT_FN (BUILT_IN_LOG10):
+  CASE_FLT_FN (BUILT_IN_LOG1P):
+  CASE_FLT_FN (BUILT_IN_LOG2):
+  CASE_FLT_FN (BUILT_IN_LOGB):
+  CASE_FLT_FN (BUILT_IN_LRINT):
+  CASE_FLT_FN (BUILT_IN_LROUND):
+  CASE_FLT_FN (BUILT_IN_MODF):
+  CASE_FLT_FN (BUILT_IN_NAN):
+  CASE_FLT_FN (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT):
+  CASE_FLT_FN (BUILT_IN_NEXTAFTER):
+  CASE_FLT_FN (BUILT_IN_NEXTTOWARD):
+  CASE_FLT_FN (BUILT_IN_POW):
+  CASE_FLT_FN (BUILT_IN_REMAINDER):
+  CASE_FLT_FN (BUILT_IN_REMQUO):
+  CASE_FLT_FN (BUILT_IN_RINT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT):
+  CASE_FLT_FN (BUILT_IN_ROUND):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND):
+  CASE_FLT_FN (BUILT_IN_SCALBLN):
+  CASE_FLT_FN (BUILT_IN_SCALBN):
+  CASE_FLT_FN (BUILT_IN_SIN):
+  CASE_FLT_FN (BUILT_IN_SINH):
+  CASE_FLT_FN (BUILT_IN_SINCOS):
+  CASE_FLT_FN (BUILT_IN_SQRT):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_SQRT):
+  CASE_FLT_FN (BUILT_IN_TAN):
+  CASE_FLT_FN (BUILT_IN_TANH):
+  CASE_FLT_FN (BUILT_IN_TGAMMA):
+  CASE_FLT_FN (BUILT_IN_TRUNC):
+  CASE_FLT_FN_FLOATN_NX (BUILT_IN_TRUNC):
+   return true;
+  default:
+   break;
+}
+  return false;
+}
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 1ffb491d785..91cbd81be48 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -151,4 +151,6 @@ extern internal_fn replacement_internal_fn (gcall *);
 extern void warn_string_no_nul 

[PATCH v3] Missed function specialization + partial devirtualization

2019-07-30 Thread luoxhu
This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement.
Still, currently the default profile only generates SINGLE indirect target
that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
  1.  PGO with topn is enbaled by default now, but only one indirect
  target edge will be generated in ipa-profile pass, so add variables to enable
  multiple speculative edges through passes, speculative_id will record the 
direct edge
  index bind to the indirect edge, num_of_ics records how many direct edges 
owned by
  the indirect edge, postpone gimple_ic to ipa-profile like default as inline
  pass will decide whether it is benefit to transform indirect call.
  2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
  profile full support in ipa passes and cgraph_edge functions.  speculative_id
  can be set by make_speculative id when multiple targets are binded to
  one indirect edge, and cloned if new edge is cloned.  speculative_id
  is streamed out and stream int by lto like lto_stmt_uid.
  3.  Add 1 in module testcase and 2 cross module testcases.
  4.  Bootstrap and regression test passed on Power8-LE.

v3 Changes:
 1. Rebase to trunk.
 2. Use speculative_id to track and search the reference node matched
 with the direct edge's callee for multiple targets.  This could
 eliminate the workaround strstr before.  Actually, it is the caller's
 response to handle the direct edges mapped to same indirect edge.
 speculative_call_info will still return one of the direct edge
 specified, this will leverage current IPA edge process framework mostly.

gcc/ChangeLog

2019-07-31  Xiong Hu Luo  

PR ipa/69678
* cgraph.c (symbol_table::create_edge): Init speculative_id.
(cgraph_edge::make_speculative): Add param for setting speculative_id.
(cgraph_edge::speculative_call_info): Find reference by
speculative_id for multiple indirect targets.
(cgraph_edge::resolve_speculation): Decrease the speculations
for indirect edge, drop it's speculative if not direct target
left.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
(cgraph_node::verify_node): Don't report error if speculative
edge not include statement.
* cgraph.h (struct indirect_target_info): New struct.
(indirect_call_targets): New vector variable.
(num_of_ics): New variable.
(make_speculative): Add param for setting speculative_id.
(speculative_id): New variable.
* cgraphclones.c (cgraph_node::create_clone): Clone speculative_id.
* ipa-inline.c (inline_small_functions): Add iterator update.
* ipa-profile.c (ipa_profile_generate_summary): Add indirect
multiple targets logic.
(ipa_profile): Likewise.
* ipa-ref.h (speculative_id): New variable.
* ipa.c (process_references): Fix typo.
* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
logic.  Stream out speculative_id.
(input_edge): Likewise.
* predict.c (dump_prediction): Revome edges count assert to be
precise.
* symtab.c (symtab_node::create_reference): Init speculative_id.
(symtab_node::clone_references): Clone speculative_id.
(symtab_node::clone_referring): Clone speculative_id.
(symtab_node::clone_reference): Clone speculative_id.
(symtab_node::clear_stmts_in_references): Clear speculative_id.
* tree-inline.c (copy_bb): Duplicate all the speculative edges
if indirect call contains multiple speculative targets.
* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
__gcov_indirect_call.counters and __gcov_indirect_call.callee.
(gimple_gen_ic_func_profiler): Likewise.
(pass_ipa_tree_profile::gate): Fix comment typos.
* value-prof.c  (gimple_ic_transform): Handle topn case.
Fix comment typos.

gcc/testsuite/ChangeLog

2019-07-31  Xiong Hu Luo  

PR ipa/69678
* gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
---
 gcc/cgraph.c  |  70 +-
 gcc/cgraph.h  |  28 ++-
 gcc/cgraphclones.c|   1 +
 gcc/ipa-inline.c  |  15 +-
 gcc/ipa-profile.c

Re: [PATCH v4] Generalize get_most_common_single_value to return k_th value & count

2019-07-17 Thread luoxhu

Hi Martin,

On 2019/7/17 15:55, Martin Liška wrote:

On 7/17/19 7:44 AM, luoxhu wrote:

Hi Martin,
Thanks for your review, v4 Changes as below:
  1. Use decrease bubble sort.
BTW, I have a question about hist->hvalue.counters[2], when will it become
  -1, please? Thanks.  Currently, if it is -1, the function will return false.


Hi.

Thanks for that. I made a minor changes to your patch, please see it in 
attachment.
-1 is a value that we use for invalidated histogram. That happens when you need
to fit in more values during instrumentation than you have counters in the 
histogram.
It helps to make reproducible builds of a software.

Thanks for your patience with many tiny fixes.  I will install the updated
patch to trunk.

Xionghu



Martin





Re: [PATCH v4] Generalize get_most_common_single_value to return k_th value & count

2019-07-16 Thread luoxhu
Currently get_most_common_single_value could only return the max hist
, add sort after reading from disk, then it return nth value
in later use.  Rename it to get_nth_most_common_value.

Hi Martin,
Thanks for your review, v4 Changes as below:
 1. Use decrease bubble sort.
BTW, I have a question about hist->hvalue.counters[2], when will it become
 -1, please? Thanks.  Currently, if it is -1, the function will return false.

gcc/ChangeLog:

2019-07-15  Xiong Hu Luo  

* ipa-profile.c (get_most_common_single_value): Use
get_nth_most_common_value.
* profile.c (sort_hist_value): New function.
(compute_value_histograms): Call sort_hist_value to sort the
values after loading from disk.
* value-prof.c (get_most_common_single_value): Rename to ...
get_nth_most_common_value.  Add input params n, return
the n_th value and count.
(gimple_divmod_fixed_value_transform): Use
get_nth_most_common_value.
(gimple_ic_transform): Likewise.
(gimple_stringops_transform): Likewise.
* value-prof.h (get_most_common_single_value): Add input params
n, default to 0.
---
 gcc/ipa-profile.c |  4 ++--
 gcc/profile.c | 44 +++
 gcc/value-prof.c  | 53 ---
 gcc/value-prof.h  |  9 
 4 files changed, 73 insertions(+), 37 deletions(-)

diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index 1fb939b73d0..970dba39c80 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -192,8 +192,8 @@ ipa_profile_generate_summary (void)
  if (h)
{
  gcov_type val, count, all;
- if (get_most_common_single_value (NULL, "indirect call",
-   h, , , ))
+ if (get_nth_most_common_value (NULL, "indirect call", h,
+, , ))
{
  struct cgraph_edge * e = node->get_edge (stmt);
  if (e && !e->indirect_unknown_callee)
diff --git a/gcc/profile.c b/gcc/profile.c
index 441cb8eb183..ae21b1192a0 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -743,6 +743,44 @@ compute_branch_probabilities (unsigned cfg_checksum, 
unsigned lineno_checksum)
   free_aux_for_blocks ();
 }
 
+  /* Sort the histogram value and count for TOPN and INDIR_CALL type.  */
+
+static bool
+sort_hist_value (histogram_value hist)
+{
+
+  if (hist->hvalue.counters[2] == -1)
+return false;
+
+  gcc_assert (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_INDIR_CALL);
+
+  gcc_assert (hist->n_counters == GCOV_TOPN_VALUES_COUNTERS);
+
+  unsigned i, j;
+  bool swapped = true;
+  /* Hist value is organized as:
+ [counter0 value1 counter1 value2 counter2 value3 counter3 value4 counter4]
+ Use decrese bubble sort to rearrange it.  The sort starts from  and compares counter first, If counter is same, compares the
+ value, exchange it if small to keep stable.  */
+  for (i = 0; i < GCOV_TOPN_VALUES - 1 && swapped; i++)
+{
+  swapped = false;
+  for (j = 0; j < GCOV_TOPN_VALUES - 1 - i; j++)
+   {
+ gcov_type *p = >hvalue.counters[2 * j + 1];
+ if (p[1] < p[3] || (p[1] == p[3] && p[0] < p[2]))
+   {
+ std::swap (p[0], p[2]);
+ std::swap (p[1], p[3]);
+ swapped = true;
+   }
+   }
+}
+
+  return true;
+}
 /* Load value histograms values whose description is stored in VALUES array
from .gcda file.  
 
@@ -808,6 +846,12 @@ compute_value_histograms (histogram_values values, 
unsigned cfg_checksum,
 else
   hist->hvalue.counters[j] = 0;
 
+  if (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_INDIR_CALL)
+   {
+ sort_hist_value (hist);
+   }
+
   /* Time profiler counter is not related to any statement,
  so that we have to read the counter and set the value to
  the corresponding call graph node.  */
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 32e6ddd8165..759458868a8 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -713,45 +713,38 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
profile_probability prob,
   return tmp2;
 }
 
-/* Return most common value of TOPN_VALUE histogram.  If
-   there's a unique value, return true and set VALUE and COUNT
+/* Return the n-th value count of TOPN_VALUE histogram.  If
+   there's a value, return true and set VALUE and COUNT
arguments.  */
 
 bool
-get_most_common_single_value (gimple *stmt, const char *counter_type,
- histogram_value hist,
- gcov_type *value, gcov_type *count,
- gcov_type *all)
+get_nth_most_common_value (gimple *stmt, const char *counter_type,
+

Re: [PATCH v3] Generalize get_most_common_single_value to return k_th value & count

2019-07-16 Thread luoxhu

Currently get_most_common_single_value could only return the max hist
, add qsort to enable this function return nth value.
Rename it to get_nth_most_common_value.

v3 Changes:
 1. Move sort to profile.c after loading values from disk.  Simplify
get_nth_most_common_value.
 2. Make qsort stable with value check if count is same.
 3. Other comments from v2.

gcc/ChangeLog:

2019-07-15  Xiong Hu Luo  

* ipa-profile.c (get_most_common_single_value): Use
get_nth_most_common_value.
* profile.c (struct value_count_t): New struct.
(cmp_counts): New function.
(sort_hist_value): New function.
(compute_value_histograms): Call sort_hist_value to sort the
values after loading from disk.
* value-prof.c (get_most_common_single_value): Rename to ...
get_nth_most_common_value.  Add input params n, return
the n_th value and count.
(gimple_divmod_fixed_value_transform): Use
get_nth_most_common_value.
(gimple_ic_transform): Likewise.
(gimple_stringops_transform): Likewise.
* value-prof.h (get_most_common_single_value): Add input params
n, default to 0.
---
 gcc/ipa-profile.c |  4 +--
 gcc/profile.c | 74 +++
 gcc/value-prof.c  | 53 +++--
 gcc/value-prof.h  |  9 +++---
 4 files changed, 103 insertions(+), 37 deletions(-)

diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index 1fb939b73d0..970dba39c80 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -192,8 +192,8 @@ ipa_profile_generate_summary (void)
  if (h)
{
  gcov_type val, count, all;
- if (get_most_common_single_value (NULL, "indirect call",
-   h, , , ))
+ if (get_nth_most_common_value (NULL, "indirect call", h,
+, , ))
{
  struct cgraph_edge * e = node->get_edge (stmt);
  if (e && !e->indirect_unknown_callee)
diff --git a/gcc/profile.c b/gcc/profile.c
index 441cb8eb183..54780b44859 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -743,6 +743,74 @@ compute_branch_probabilities (unsigned cfg_checksum, 
unsigned lineno_checksum)

   free_aux_for_blocks ();
 }

+struct value_count_t
+{
+  gcov_type value;
+  gcov_type count;
+};
+
+static int
+cmp_counts (const void *v1, const void *v2)
+{
+  const value_count_t *h1 = (const value_count_t *) v1;
+  const value_count_t *h2 = (const value_count_t *) v2;
+  if (h1->count < h2->count)
+return 1;
+  if (h1->count > h2->count)
+return -1;
+  if (h1->count == h2->count)
+{
+  if (h1->value < h2->value)
+   return 1;
+  if (h1->value > h2->value)
+   return -1;
+}
+  /* There may be two entries with same count as well as value very unlikely
+ in a multi-threaded instrumentation.  But the memory layout of the 
{value,

+ count} tuple can be different.  The function will return K-th most
+ common value.  */
+  return 0;
+}
+
+/* Sort the histogram value and count for TOPN and INDIR_CALL type.  */
+
+static bool
+sort_hist_value (histogram_value hist)
+{
+  auto_vec value_vec;
+  struct value_count_t temp;
+  unsigned i;
+
+  if (hist->hvalue.counters[2] == -1)
+return false;
+
+  gcc_assert (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_INDIR_CALL);
+
+  gcc_assert (hist->n_counters == GCOV_TOPN_VALUES_COUNTERS);
+
+  for (i = 0; i < GCOV_TOPN_VALUES; i++)
+{
+  gcov_type v = hist->hvalue.counters[2 * i + 1];
+  gcov_type c = hist->hvalue.counters[2 * i + 2];
+
+  temp.value = v;
+  temp.count = c;
+
+  value_vec.safe_push (temp);
+}
+
+  value_vec.qsort (cmp_counts);
+
+  gcc_assert (value_vec.length () == GCOV_TOPN_VALUES);
+
+  for (i = 0; i < GCOV_TOPN_VALUES; i++)
+{
+  hist->hvalue.counters[2 * i + 1] = value_vec[i].value;
+  hist->hvalue.counters[2 * i + 2] = value_vec[i].count;
+}
+  return true;
+}
 /* Load value histograms values whose description is stored in VALUES array
from .gcda file.

@@ -808,6 +876,12 @@ compute_value_histograms (histogram_values values, 
unsigned cfg_checksum,

 else
   hist->hvalue.counters[j] = 0;

+  if (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_INDIR_CALL)
+   {
+ sort_hist_value (hist);
+   }
+
   /* Time profiler counter is not related to any statement,
  so that we have to read the counter and set the value to
  the corresponding call graph node.  */
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 32e6ddd8165..97e4ae18ba3 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -713,45 +713,38 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
profile_probability prob,

   

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-24 Thread luoxhu




On 2019/6/24 10:34, luoxhu wrote:

Hi Honza,
Thanks very much to get so many useful comments from you.
As a newbie to GCC, not sure whether my questions are described clearly
enough.  Thanks for your patience in advance.  :)


On 2019/6/20 21:47, Jan Hubicka wrote:

Hi,
some comments on the ipa part of the patch
(and thanks for working on it - this was on my TODO list for years)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de82316d4b1..0d373a67d1b 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
  fprintf (dump_file, "Introduced new external node "
   "(%s) and turned into root of the clone tree.\n",
   node->dump_name ());
+  node->profile_id = first_clone->profile_id;
  }
    else if (dump_file)
  fprintf (dump_file, "Introduced new external node "


This is independent of the rest of changes.  Do you have example where
this matters? The inline clones are created in ipa-inline while
ipa-profile is run before it, so I can not think of such a scenario.
I see you also copy profile_id from function to clone.  I would like to
know why you needed that.

Also you mention that you hit some ICEs. If fixes are independent of
rest of your changes, send them separately.


I copy the profile_id for cloned node as when in LTO ltrans, there is no
references or referrings info for the specialized node/cloned node, so it 
is difficult to track the node's reference in 
cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.

Will remove it and split the patches in later version to include ICE fixes.



@@ -1110,6 +,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
*,

    int i;
    cgraph_edge *e2;
    cgraph_edge *e = this;
+  cgraph_node *referred_node;
    if (!e->indirect_unknown_callee)
  for (e2 = e->caller->indirect_calls;
@@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
*,

  && ((ref->stmt && ref->stmt == e->call_stmt)
  || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
    {
-    reference = ref;
-    break;
+    if (e2->indirect_info && e2->indirect_info->num_of_ics)
+  {
+    referred_node = dyn_cast (ref->referred);
+    if (strstr (e->callee->name (), referred_node->name ()))
+  {
+    reference = ref;
+    break;
+  }
+  }
+    else
+  {
+    reference = ref;
+    break;
+  }
    }


This function is intended to return everything related to the
speculative call, so if you add multiple direct targets, i would expect
it to tage auto_vec of cgraph_nodes for direct and auto_vec of
references.


So will the signature becomes
cgraph_edge::speculative_call_info (auto_vec *direct,
     cgraph_edge *,
     auto_vec *reference)

Seems a lot of code related to it, maybe should split to another patch.
And will the sequence of direct and reference in each auto_vec be strictly
mapped for iteration convenience?
Second question is "this" is a direct edge will be pushed to auto_vec 
"direct", how can it get its next direct edge here?  From e->caller->callees?


There maybe some misunderstanding here.  The direct should be one edge
only, but reference could be multiple.

For example: two indirect edge on one single statement x = p(3);
the first speculative edge is main -> one;
the second speculative edge 2 is main -> two.
direct->call_stmt is: x_10 = p_3 (3);

call code in ipa-inline-transform.c:
for (e = node->callees; e; e = next)
  {
 next = e->next_callee;
 e->redirect_call_stmt_to_callee ();
  }

redirect_call_stmt_to_callee will call
e->speculative_call_info(e, e2, ref).

When e is “main -> one" being redirected, The returned auto_vec reference
length will be 2.
So the map should be 1:N instead of N:N.  (one direct edge will find N 
reference nodes, but only one of it is correct, need iterate to find it

out.)
e2 is the indirect call(e->caller->indirect_calls) can only be set to false
speculative if all indirect targets are redirected by "next=e->next_callee"
Or else, the next speculative edge couldn't finish the redirect as the e2
is not speculative again in next round iteration.
As a result, maybe still need similar logic to check the returned reference
length, only set "e2->speculative = false;" when the length is 1.  which
means all direct targets are redirected.




    /* Speculative edge always consist of all three components - direct 
edge,

@@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
   in the functions inlined through it.  */
  }
    edge->count += e2->count;
-  edge->speculative = false;
+  if (edge->indirect_info && edge->indirect_in

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-23 Thread luoxhu

Hi Honza,
Thanks very much to get so many useful comments from you.
As a newbie to GCC, not sure whether my questions are described clearly
enough.  Thanks for your patience in advance.  :)


On 2019/6/20 21:47, Jan Hubicka wrote:

Hi,
some comments on the ipa part of the patch
(and thanks for working on it - this was on my TODO list for years)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de82316d4b1..0d373a67d1b 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
fprintf (dump_file, "Introduced new external node "
 "(%s) and turned into root of the clone tree.\n",
 node->dump_name ());
+  node->profile_id = first_clone->profile_id;
  }
else if (dump_file)
  fprintf (dump_file, "Introduced new external node "


This is independent of the rest of changes.  Do you have example where
this matters? The inline clones are created in ipa-inline while
ipa-profile is run before it, so I can not think of such a scenario.
I see you also copy profile_id from function to clone.  I would like to
know why you needed that.

Also you mention that you hit some ICEs. If fixes are independent of
rest of your changes, send them separately.


I copy the profile_id for cloned node as when in LTO ltrans, there is no
references or referrings info for the specialized node/cloned node, so it 
is difficult to track the node's reference in 
cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.

Will remove it and split the patches in later version to include ICE fixes.




@@ -1110,6 +,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *,
int i;
cgraph_edge *e2;
cgraph_edge *e = this;
+  cgraph_node *referred_node;
  
if (!e->indirect_unknown_callee)

  for (e2 = e->caller->indirect_calls;
@@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *,
&& ((ref->stmt && ref->stmt == e->call_stmt)
|| (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
{
-   reference = ref;
-   break;
+   if (e2->indirect_info && e2->indirect_info->num_of_ics)
+ {
+   referred_node = dyn_cast (ref->referred);
+   if (strstr (e->callee->name (), referred_node->name ()))
+ {
+   reference = ref;
+   break;
+ }
+ }
+   else
+ {
+   reference = ref;
+   break;
+ }
}


This function is intended to return everything related to the
speculative call, so if you add multiple direct targets, i would expect
it to tage auto_vec of cgraph_nodes for direct and auto_vec of
references.


So will the signature becomes
cgraph_edge::speculative_call_info (auto_vec *direct,
cgraph_edge *,
auto_vec *reference)

Seems a lot of code related to it, maybe should split to another patch.
And will the sequence of direct and reference in each auto_vec be strictly
mapped for iteration convenience?
Second question is "this" is a direct edge will be pushed to auto_vec 
"direct", how can it get its next direct edge here?  From e->caller->callees?



  
/* Speculative edge always consist of all three components - direct edge,

@@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
   in the functions inlined through it.  */
  }
edge->count += e2->count;
-  edge->speculative = false;
+  if (edge->indirect_info && edge->indirect_info->num_of_ics)
+{
+  edge->indirect_info->num_of_ics--;
+  if (edge->indirect_info->num_of_ics == 0)
+   edge->speculative = false;
+}
+  else
+edge->speculative = false;
e2->speculative = false;
ref->remove_reference ();
if (e2->indirect_unknown_callee || e2->inline_failed)


This function should turn speculative call into direct call to DECL, so
I think it should remove all the other direct calls associated with stmt
and the indirect one.

There are now two cases - in first case you want to turn speculative
call into direct call or give up on especulation completely, while in
other case you want to only remove one of speculations.

I guess we want to have resolve_speculation(decl) for first and
remove_one_speculation(edge) for the second case?
The second case would be useful for the code below handling type
mismatches and also for inline when one of speculative targets seems not
useful to bother with.


So the logic will be:

if (edge->indirect_info->num_of_ics > 1)
cgraph_edge::resolve_speculation (tree callee_decl);
else
remove_one_speculation(edge);

cgraph_edge::resolve_speculation will call edge->speculative_call_info (e2, 
edge, ref) internally, at this time, e2 and ref will only contains one 
direct target?




@@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
 

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-20 Thread luoxhu

Hi Martin,

On 2019/6/20 09:59, luoxhu wrote:



On 2019/6/19 20:18, Martin Liška wrote:

On 6/19/19 10:56 AM, Martin Liška wrote:
Thank you very much for the numbers. Today, I'm going to prepare the 
generalization of single-value counter to track N values.


Ok, here's a patch candidate that does tracking of most common N values. 
For your test-case I can see:


pr69678.gcda:    01a9:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:   0: 35000 1868707024 17500 
969338501 17500 0 0 0

pr69678.gcda:   8: 0

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just 
a single value

any longer. I can take care of it.

Can you please verify that the patch candidate works for you?

Thanks, the profile data seems good, I will try it.  I need rebase my patch
to trunk first, as there are many conflicts with your previous patch.


The patch works perfect for me, lots of duplicate code can be removed base
on that.  Hope you can upstream it soon.  :)
BTW, I don't need call the get_most_common_single_value function to access
the histogram values & counters, I will loop access it directly one by one.

Thanks
Xionghu





Thanks,
Martin





Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread luoxhu




On 2019/6/19 20:18, Martin Liška wrote:

On 6/19/19 10:56 AM, Martin Liška wrote:

Thank you very much for the numbers. Today, I'm going to prepare the 
generalization of single-value counter to track N values.


Ok, here's a patch candidate that does tracking of most common N values. For 
your test-case I can see:

pr69678.gcda:01a9:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:   0: 35000 1868707024 17500 969338501 
17500 0 0 0
pr69678.gcda:   8: 0

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just a 
single value
any longer. I can take care of it.

Can you please verify that the patch candidate works for you?

Thanks, the profile data seems good, I will try it.  I need rebase my patch
to trunk first, as there are many conflicts with your previous patch.



Thanks,
Martin





Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread luoxhu

Hi Martin,

On 2019/6/18 18:21, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

 6.2.  SPEC2017 peakrate:
 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
(+13.33%);
 525.x264_r (-5.29%).


Can you please elaborate what are the key indirect call promotions that are 
needed
to achieve such a significant speed up? Are we talking about calls to virtual 
functions
or C-style indirect calls?


For benchmark 511.povray_r, no speculations and indirect call promotion
happened from povray_r.wpa.069i.profile_estimate:

994 171 indirect calls trained.
995 0 (0.00%) have common target.
996 0 (0.00%) targets was not found.
997 0 (0.00%) targets had parameter count mismatch.
998 0 (0.00%) targets was not in polymorphic call target list.
999 0 (0.00%) speculations seems useless.
   1000 0 (0.00%) speculations produced.


After applying my patch:

   1259 171 indirect calls trained.
   1260 60 (35.09%) have common target.
   1261 41 (23.98%) targets was not found.
   1262 0 (0.00%) targets had parameter count mismatch.
   1263 0 (0.00%) targets was not in polymorphic call target list.
   1264 57 (33.33%) speculations seems useless.
   1265 5 (2.92%) speculations produced.

Below indirect calls conversion will take effect, as all of these calls
are hot functions, performance boosts a lot by the combination optimization
of later stage ipa/inline/clone.

ls *.*i.* | xargs grep "Expanding speculative call" 

povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 
(adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 
count: 3784081 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 
5041557 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_Sphere_Intersections/76184 count: 31488162 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
Inside_Plane/76197 count: 19044626 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 
count: 22068935 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 
5434215 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 
1636158 (adjusted)




Thanks,
Martin





Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread luoxhu

Hi Martin,

On 2019/6/18 17:34, Martin Liška wrote:

On 6/18/19 11:02 AM, luoxhu wrote:

Hi,

On 2019/6/18 13:51, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.


This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
Currently the default instrument function can only find the indirect function
that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?


For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data 
is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used 
in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect 
as WITHOUT your patch it is 0,  things getting better with your fix as the count[0] is 35000, but still not 
correct, in fact, "one" is running 17500 times, and "two" is running the other 17500 times):

indir-call-topn.gcda:   22:    01a9:  18:COUNTERS indirect_call 9 counts
indir-call-topn.gcda:   24:   0: *35000 1868707024 0* 0 0 0 
0 0

Running with the "--param indir-call-topn-profile=1" will give below profile 
data, My patch is based on this profile result and do the optimization for multiple 
indirect targets, performance can get much improve on this testcase and SPEC2017 for some 
benchmarks(LLVM already support this several years ago...).

indir-call-topn.gcda:   26:    01b1:  18:COUNTERS indirect_call_topn 9 
counts
indir-call-topn.gcda:   28:   0: *0 969338501 17500 
1868707024 17500* 0 0 0


test case indir-call-topn.c:

#include 


typedef int (*fptr) (int);
int
one (int a)
{
   return 1;
}

int
two (int a)
{
   return 0;
}

fptr table[] = {, };

int
main()
{
   int i, x;
   fptr p = 

   one (3);

   for (i = 0; i < 35000; i++)
     {
   x = (*p) (3);
   p = table[x];
     }
   printf ("done:%d\n", x);
}


I've got it. So it's situation where you have distribution equal to 50% and 
50%. Note that it's
the only valid situation where both edges with be >= 50%. That's the threshold 
for which
we speculatively devirtualize edges. That said, you don't need generic topn 
counter, but a probably
only a top2 counter which can be generalized from single-value counter type. 
I'm saying that
because I removed the TOPN, mainly due to:
https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179

which is over-complicated profiling function. And the changes that I've done 
recently are motivated
to preserve a stable builds. That's achieved by noticing that a single-value 
counter can't handle all
seen values.


Actually, the algorithm of function __gcov_one_value_profiler_body in 
libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I 
provide.


118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
119 int use_atomic)
120 {
121   if (value == counters[1])
122 counters[2]++;
123   else if (counters[2] == 0)
124 {
125   counters[2] = 1;
126   counters[1] = value;
127 }
128   else
129 counters[2]--;
130
131   if (use_atomic)
132 __atomic_fetch_add ([0], 1, __ATOMIC_RELAXED);
133   else
134 counters[0]++;
135 }

function "one" is 1868707024, function "two" is 969338501. Loop running from 
0->(35000-1):


  value  counters[0]counters[1]   counters[2]
18687070241 1868707024 1
 9693385012 1868707024 0
18687070243 1868707024 1
 9693385014 1868707024 0
18687070245 1868707024 1
...
 969338501 350001868707024 0

Finally, counters[] return value is [35000, 1868707024, 0].
In ipa-profile.c and value-prof.c, counters[0] is the statement that executed 
all, counters[2] is the indirect call that counters[1] executed which is 0 here.
This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected 
to be 50%, right?). This prob will cause ipa-profile fail to create speculative 
edge and do indirect call later. I think this is the reason why topn was 
introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. 
There was definitely a bug here before re-enable topn.


dump-profile: indir-call-topn.fb.gcc.wpa.069i.profile_estimate
  1 Histogram:5
  2   35001: time:2 (8.70) size:2 (8.00)
  3   35000: time:19 (91.30) size:7 (36.00)
  4   17500: time:4 (100.00) size:2 (44.00)
  5   1: time:0 (100.00) size:0 (44.0

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread luoxhu

Hi,

On 2019/6/18 13:51, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.


This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
Currently the default instrument function can only find the indirect function
that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?


For a test case indir-call-topn.c, it include 2 indirect calls "one" and 
"two". the profiling data is as below with trunk code (including your 
patch, count[0] and count[2] is switched by your code, the count[0] is 
used in ipa-profile but only support the top1 format, my patch adds the 
support for the topn format. count[0] was incorrect as WITHOUT your 
patch it is 0,  things getting better with your fix as the count[0] is 
35000, but still not correct, in fact, "one" is running 17500 
times, and "two" is running the other 17500 times):


indir-call-topn.gcda:   22:    01a9:  18:COUNTERS indirect_call 9 counts
indir-call-topn.gcda:   24:   0: *35000 1868707024 
0* 0 0 0 0 0


Running with the "--param indir-call-topn-profile=1" will give below 
profile data, My patch is based on this profile result and do the 
optimization for multiple indirect targets, performance can get much 
improve on this testcase and SPEC2017 for some benchmarks(LLVM already 
support this several years ago...).


indir-call-topn.gcda:   26:    01b1:  18:COUNTERS indirect_call_topn 
9 counts
indir-call-topn.gcda:   28:   0: *0 969338501 17500 
1868707024 17500* 0 0 0



test case indir-call-topn.c:

#include 


typedef int (*fptr) (int);
int
one (int a)
{
  return 1;
}

int
two (int a)
{
  return 0;
}

fptr table[] = {, };

int
main()
{
  int i, x;
  fptr p = 

  one (3);

  for (i = 0; i < 35000; i++)
    {
  x = (*p) (3);
  p = table[x];
    }
  printf ("done:%d\n", x);
}




  This patch
leverages the "--param indir-call-topn-profile=1" and enables multiple indirect

Note that I've remove indir-call-topn-profile last week, the patch will not 
apply
on current trunk. However, I can help you how to adapt single-value counters
to support tracking of multiple values.


It will be very useful if you help me to track multiple values similarly 
on trunk code. I will rebase to your code once topn is ready again. 
Actually topn is more general and top1 is included in, I thought that 
top1 should be removed instead of topn, though topn will consume longer 
time than top1 in profile-generate.





targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and cloning could
be done successfully based on it.

This decision is definitely big question for Honza?


Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
Details are:
   1.  When do PGO with indir-call-topn-profile, the gcda data format is not
   supported in ipa-profile pass,

If you take a look at gcc/ipa-profile.c:195 you can see how the probability
is propagated to IPA passes. Why is that not sufficient?


Current code only support single indirect target, I need track multiple 
indirect targets and create multiple speculative edges on single 
indirect call statement.


What's more, many ICEs happened in later stage due to single speculative 
target design, part of this patch is to solve the ICEs of multiple 
speculative target edges handling.



Thanks

Xionghu



Martin


so add variables to pass the information
   through passes, and postpone gimple_ic to ipa-profile like default as inline
   pass will decide whether it is benefit to transform indirect call.
   2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
   profile full support in ipa passes and cgraph_edge functions.
   3.  Fix various hidden speculative call ICEs exposed after enabling this
   feature when running SPEC2017.
   4.  Add 1 in module testcase and 2 cross module testcases.
   5.  TODOs:
 5.1.  Some reference info will be dropped from WPA to LTRANS, so
 reference check will be difficult in LTRANS, need replace the strstr
 with reference compare.
 5.2.  Some duplicate code need be removed as top1 and topn share same 
logic.
 Actually top1 related logic could be eliminated totally as topn includes 
it.
 5.3.  Split patch maybe needed as too big but not sure how many would be
 reasonable.
   6.  Performance result for ppc64le:
 6.1.  Representative test: indir-call-prof-topn.c runtime improved from
 1.7s to 0.4s.
 6.2.  SPEC2017 peakrate:
 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
(+13.33%);
 525.x264_r (-5.29%).
 No big changes of other benchmarks.
 Option: -Ofast -mcpu=power8
 PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 
-flto
 PASS2_OPTIMIZE: 

Re: *Ping* Re: [PATCH] PR c/43673 - Incorrect warning in dfp printf.

2019-05-20 Thread luoxhu

Ping for GCC-10.


Thanks

Xionghu

On 2019/3/4 09:13, Xiong Hu Luo wrote:

Ping:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01949.html

Thanks
Xionghu

On 2019/2/26 AM9:13, luo...@linux.ibm.com wrote:

From: Xiong Hu Luo 

dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause
incorrect warning happens:
"use of 'D' length modifier with 'a' type character".

Regression-tested on powerpc64le-linux, OK for trunk and gcc-8?

gcc/c-family/ChangeLog:

2019-02-25  Xiong Hu Luo  

PR c/43673
* c-format.c (print_char_table, scanf_char_table): Replace BADLEN with
TEX_D32, TEX_D64 or TEX_D128.

gcc/testsuit/ChangeLog:

2019-02-25  Xiong Hu Luo  

PR c/43673
* gcc.dg/format-dfp-printf-1.c: New test.
* gcc.dg/format-dfp-scanf-1.c: Likewise.
---
  gcc/c-family/c-format.c|  4 ++--
  gcc/testsuite/gcc.dg/format/dfp-printf-1.c | 28 ++--
  gcc/testsuite/gcc.dg/format/dfp-scanf-1.c  | 22 --
  3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 9b48ee3..af33ef9 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -674,7 +674,7 @@ static const format_char_info print_char_table[] =
{ "n",   1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  BADLEN,  T99_SST, T99_PD,  
T99_IM,  BADLEN,  BADLEN,  BADLEN }, "",  "W",  NULL },
/* C99 conversion specifiers.  */
{ "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "",   NULL },
-  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0 +#",   "",   NULL },
+  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  TEX_D32, TEX_D64,  TEX_D128 }, "-wp0 +#",   "",   NULL },
/* X/Open conversion specifiers.  */
{ "C",   0, STD_EXT, { TEX_WI,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","",   NULL },
{ "S",   1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "R",  NULL },
@@ -847,7 +847,7 @@ static const format_char_info scan_char_table[] =
{ "n", 1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  BADLEN,  T99_SST, T99_PD, 
 T99_IM,  BADLEN,  BADLEN,  BADLEN }, "", "W",   NULL },
/* C99 conversion specifiers.  */
{ "F",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "*w'",  "W",   NULL },
-  { "aA",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*w'",  "W",   NULL },
+  { "aA",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  BADLEN,  BADLEN,  
BADLEN,  TEX_D32,  TEX_D64,  TEX_D128 }, "*w'",  "W",   NULL },
/* X/Open conversion specifiers.  */
{ "C", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN, 
 BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*mw",   "W",   NULL },
{ "S", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN, 
 BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*amw",  "W",   NULL },
diff --git a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c 
b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
index e92f161..a290895 100644
--- a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
+++ b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
@@ -17,6 +17,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
  
/* Check lack of warnings for valid usage.  */
  
+  printf ("%Ha\n", x);

+  printf ("%HA\n", x);
printf ("%Hf\n", x);
printf ("%HF\n", x);
printf ("%He\n", x);
@@ -24,6 +26,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
printf ("%Hg\n", x);
printf ("%HG\n", x);
  
+  printf ("%Da\n", y);

+  printf ("%DA\n", y);
printf ("%Df\n", y);
printf ("%DF\n", y);
printf ("%De\n", y);
@@ -31,6 +35,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
printf ("%Dg\n", y);
printf ("%DG\n", y);
  
+  printf ("%DDa\n", z);

+  printf ("%DDA\n", z);
printf ("%DDf\n", z);
printf ("%DDF\n", z);
printf ("%DDe\n", z);
@@ -43,12 +49,16 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
  
/* Check warnings for type mismatches.  */
  
+  printf ("%Ha\n", y);	/* { dg-warning "expects argument" "bad use of %H" } */

+  printf ("%HA\n", y);   /* { dg-warning "expects argument" "bad use of 
%H" } */
printf ("%Hf\n", y);  /* { dg-warning "expects argument" "bad use of 
%H" } */
printf ("%HF\n", y);  /* { dg-warning "expects argument" "bad use 

[PATCH] backport r257541, r259936, r260294, r260623, r261098, r261333, r268585.

2019-04-04 Thread luoxhu
From: Xiong Hu Luo 

These patches are followed changes for r25 on testcases
vsx-vector-6*.c.  backport them to update file names and fix regressions
for GCC7 on power9.

Regression tested on power7-be, power8-be, power8-le, power9.

gcc/ChangeLog:

2019-04-03  Xiong Hu Luo 

backport from trunk r260623.

2018-05-23  Segher Boessenkool  

* doc/sourcebuild.texi (Endianness): New subsubsection.

gcc/testsuite/ChangeLog:

2019-04-03  Xiong Hu Luo 

backport from trunk r257541.

2018-02-07  Will Schmidt  

* gcc.target/powerpc/vsx-vector-6-le.c:  Update CPU target.
* gcc.target/powerpc/vsx-vector-6-le.p9.c:  New.

backport from trunk r259936.

2018-05-04 Carl Love  
* gcc.target/powerpc/vsx-vector-6.h (foo): Add test for vec_max,
vec_trunc.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update xvcmpeqdp,
xvcmpgtdp, xvcmpgedp counts. Add xxsel counts.
* gcc.target/powerpc/vsx-vector-6-be.c (dg-final): Update xvcmpgtdp,
xvcmpgedp counts. Add xxsel counts.

backport from trunk r260294.

2018-05-16 Carl Love  
* gcc.target/powerpc/vsx-vector-6-be.c: Remove file.
* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file.
* gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for
xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi.

backport from trunk r260623.

2018-05-23  Segher Boessenkool  

* lib/target-supports.exp (check_effective_target_be): New.
(check_effective_target_le): New.

backport from part of trunk r261097.

2018-06-01  Carl Love  

* gcc.target/powerpc/altivec-7-be.c: Delete file.
* gcc.target/powerpc/altivec-7-le.c: Delete file.
* gcc.target/powerpc/vsx-7-be.c: Remove file.

backport from trunk r261098.

2018-06-01  Carl Love  

Commit 260294 on 2018-05-16 by Carl Love was supposed to add the
following files.

* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file.
* gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file.

backport from trunk r261333.

2018-06-08  Carl Love  

* gcc.target/powerpc/vsx-vector-6-be.p7.c: Rename this file to
vsx-vector-6.p7.c.
* gcc.target/powerpc/vsx-vector-6-le.p9.c: Rename this file to
vsx-vector-6.p9.c.
* gcc.target/powerpc/vsx-vector-6-be.p8.c: Move instruction counts
for BE system that are different then for an LE system from this file
into vsx-vector-6-le.c using be target qualifier.  Remove this file.
* gcc.target/powerpc/vsx-vector-6-le.c: Add le qualifiers as needed for
the various instruction counts.  Rename file to vsx-vector-6.p8.c.

backport from trunk r268585.

2019-02-06  Bill Seurer  

* gcc.target/powerpc/vsx-vector-6.p7.c: Update instruction
counts and target.
* gcc.target/powerpc/vsx-vector-6.p8.c: Update instruction
counts and target.
* gcc.target/powerpc/vsx-vector-6.p9.c: Update instruction
counts and target.
---
 gcc/doc/sourcebuild.texi   | 10 
 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c| 30 
 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c| 37 ---
 gcc/testsuite/gcc.target/powerpc/vsx-7-be.c| 50 
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c | 31 -
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c | 32 -
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h| 14 +-
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c | 42 +
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c | 54 ++
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c | 39 
 gcc/testsuite/lib/target-supports.exp  | 16 +++
 11 files changed, 173 insertions(+), 182 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-7-be.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c7bb4b7..f0e9bb8 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1273,6 +1273,16 @@ By convention, keywords ending in @code{_nocache} can 
also include options
 specified for the particular test in an earlier @code{dg-options} or
 @code{dg-add-options} 

[PATCH] backport r268834 from mainline to gcc-7-branch

2019-03-05 Thread luoxhu
From: Xiong Hu Luo 

Backport r268834 of "Add support for the vec_sbox_be, vec_cipher_be etc."
from mainline to gcc-8-branch.

Regression-tested on Linux POWER8 LE.  Backport patch for gcc-8-branch
already got approved and commited.  OK for gcc-7-branch?

gcc/ChangeLog:
2019-03-05  Xiong Hu Luo  

Backport of r268834 from mainline to gcc-7-branch.
2019-02-13  Xiong Hu Luo  

* config/rs6000/altivec.h (vec_sbox_be, vec_cipher_be,
vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New #defines.
* config/rs6000/crypto.md (CR_vqdi): New define_mode_iterator.
(crypto_vsbox_, crypto__): New define_insns.
* config/rs6000/rs6000-builtin.def (VSBOX_BE): New BU_CRYPTO_1.
(VCIPHER_BE, VCIPHERLAST_BE, VNCIPHER_BE, VNCIPHERLAST_BE):
New BU_CRYPTO_2.
* config/rs6000/rs6000.c (builtin_function_type)
: New switch options.
* doc/extend.texi (vec_sbox_be, vec_cipher_be, vec_cipherlast_be,
vec_ncipher_be, vec_ncipherlast_be): New builtin functions.

gcc/testsuite/ChangeLog:
2019-03-05  Xiong Hu Luo  

Backport of r268834 from mainline to gcc-7-branch.
2019-01-23  Xiong Hu Luo  

* gcc.target/powerpc/crypto-builtin-1.c
(crypto1_be, crypto2_be, crypto3_be, crypto4_be, crypto5_be):
New testcases.
---
 gcc/config/rs6000/altivec.h|  5 +++
 gcc/config/rs6000/crypto.md| 17 ++
 gcc/config/rs6000/rs6000-builtin.def   | 19 ---
 gcc/config/rs6000/rs6000.c |  5 +++
 gcc/doc/extend.texi| 13 
 .../gcc.target/powerpc/crypto-builtin-1.c  | 38 ++
 6 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e04c3a5..a89e4a0 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -388,6 +388,11 @@
 #define vec_vsubuqm __builtin_vec_vsubuqm
 #define vec_vupkhsw __builtin_vec_vupkhsw
 #define vec_vupklsw __builtin_vec_vupklsw
+#define vec_sbox_be __builtin_crypto_vsbox_be
+#define vec_cipher_be __builtin_crypto_vcipher_be
+#define vec_cipherlast_be __builtin_crypto_vcipherlast_be
+#define vec_ncipher_be __builtin_crypto_vncipher_be
+#define vec_ncipherlast_be __builtin_crypto_vncipherlast_be
 #endif
 
 #ifdef __POWER9_VECTOR__
diff --git a/gcc/config/rs6000/crypto.md b/gcc/config/rs6000/crypto.md
index 5892f891..316f5aa 100644
--- a/gcc/config/rs6000/crypto.md
+++ b/gcc/config/rs6000/crypto.md
@@ -48,6 +48,9 @@
 ;; Iterator for VSHASIGMAD/VSHASIGMAW
 (define_mode_iterator CR_hash [V4SI V2DI])
 
+;; Iterator for VSBOX/VCIPHER/VNCIPHER/VCIPHERLAST/VNCIPHERLAST
+(define_mode_iterator CR_vqdi [V16QI V2DI])
+
 ;; Iterator for the other crypto functions
 (define_int_iterator CR_code   [UNSPEC_VCIPHER
UNSPEC_VNCIPHER
@@ -60,10 +63,10 @@
  (UNSPEC_VNCIPHERLAST "vncipherlast")])
 
 ;; 2 operand crypto instructions
-(define_insn "crypto_"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
-   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
- (match_operand:V2DI 2 "register_operand" "v")]
+(define_insn "crypto__"
+  [(set (match_operand:CR_vqdi 0 "register_operand" "=v")
+   (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")
+ (match_operand:CR_vqdi 2 "register_operand" "v")]
 CR_code))]
   "TARGET_CRYPTO"
   " %0,%1,%2"
@@ -90,9 +93,9 @@
   [(set_attr "type" "vecperm")])
 
 ;; 1 operand crypto instruction
-(define_insn "crypto_vsbox"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
-   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+(define_insn "crypto_vsbox_"
+  [(set (match_operand:CR_vqdi 0 "register_operand" "=v")
+   (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")]
 UNSPEC_VSBOX))]
   "TARGET_CRYPTO"
   "vsbox %0,%1"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 2cc07c6..ff134eb 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2233,13 +2233,22 @@ BU_FLOAT128_1 (FABSQ,   "fabsq",   CONST, 
abskf2)
 BU_FLOAT128_2 (COPYSIGNQ,  "copysignq",   CONST, copysignkf3)
 
 /* 1 argument crypto functions.  */
-BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox)
+BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox_v2di)
+BU_CRYPTO_1 (VSBOX_BE, "vsbox_be",   CONST, crypto_vsbox_v16qi)
 
 /* 2 argument crypto functions.  */
-BU_CRYPTO_2 (VCIPHER,  "vcipher",CONST, crypto_vcipher)
-BU_CRYPTO_2 (VCIPHERLAST,  "vcipherlast",CONST, crypto_vcipherlast)
-BU_CRYPTO_2 (VNCIPHER, "vncipher",   CONST, crypto_vncipher)
-BU_CRYPTO_2 (VNCIPHERLAST, 

[PATCH v3] luoxhu - backport r250477, r255555, r257253 and r258137

2019-03-04 Thread luoxhu
From: Xiong Hu Luo 

This is a backport of r250477, r25, r257253 and r258137 from trunk to
gcc-7-branch to support built-in functions:
vec_extract_fp_from_shorth, vec_extract_fp_from_shortl,
vec_extract_fp32_from_shorth and vec_extract_fp32_from_shortl, etc.
The patches were on trunk before GCC 8 forked already.  r257253 and r258137
are dependent testcases require vsx support need merge to avoid regression.

The discussion for the patch r250477 that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00624.html
The discussion for the patch r25 that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html
VSX support for patch r257253 and r258137:
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html

Regression-tested on Linux POWER8 LE.

2019-02-28  Xiong Hu Luo 

Backport from trunk r250477.

2017-07-24  Carl Love  

* config/rs6000/rs6000-c.c: Add support for built-in functions
vector float vec_extract_fp32_from_shorth (vector unsigned short);
vector float vec_extract_fp32_from_shortl (vector unsigned short);
* config/rs6000/altivec.h (vec_extract_fp_from_shorth,
vec_extract_fp_from_shortl): Add defines for the two builtins.
* config/rs6000/rs6000-builtin.def (VEXTRACT_FP_FROM_SHORTH,
VEXTRACT_FP_FROM_SHORTL): Add BU_P9V_OVERLOAD_1 and BU_P9V_VSX_1
new builtins.
* config/rs6000/vsx.md vsx_xvcvhpsp): Add define_insn.
(vextract_fp_from_shorth, vextract_fp_from_shortl): Add define_expands.
* doc/extend.texi: Update the built-in documentation file for the
new built-in function.

Backport from trunk r25.

2017-12-11  Carl Love  

* config/rs6000/altivec.h (vec_extract_fp32_from_shorth,
vec_extract_fp32_from_shortl]): Add #defines.
* config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion.
* config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH,
ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND,
ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL,
ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD,
ALTIVEC_BUILTIN_VEC_SLL): Add expansions.
* doc/extend.texi: Add documentation for the added builtins.

gcc/testsuite/ChangeLog:

2019-02-28  Xiong Hu Luo 

Backport from trunk r250477.

2017-07-24  Carl Love  

* gcc.target/powerpc/builtins-3-p9-runnable.c: Add new test file for
the new built-ins.

Backport from trunk r25.

2017-12-11  Carl Love  
* gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h.
* gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl.
Add dg-final tests for the instructions generated.
* gcc.target/powerpc/altivec-7-be.c: New file to test on big endian.
* gcc.target/powerpc/altivec-7-le.c: New file to test on little endian.
* gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl,
 vec_sro testcases. Add dg-final tests for the instructions generated.
* gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui,
test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi,
test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll,
test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add
testcases. Add dg-final tests for new instructions.
* gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq,
vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne,
vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add
tests.
Add dg-final instruction tests.
* gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h.
* gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd,
vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests.
Add dg-final tests for the generated instructions.
* gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc,
test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc,
test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc,
test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc,
test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc,
test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc,
test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc,
test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc,
test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc,
test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc,
test_slo_vf_vf_vuc, test_cmpb_float): Add tests.

Backport from trunk r257253.

2018-01-31  Will Schmidt  

* gcc.target/powerpc/altivec-13.c: Remove VSX-requiring 

[PATCH] PR c/43673 - Incorrect warning in dfp printf.

2019-02-25 Thread luoxhu
From: Xiong Hu Luo 

dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause
incorrect warning happens:
"use of 'D' length modifier with 'a' type character".

Regression-tested on powerpc64le-linux, OK for trunk and gcc-8?

gcc/c-family/ChangeLog:

2019-02-25  Xiong Hu Luo  

PR c/43673
* c-format.c (print_char_table, scanf_char_table): Replace BADLEN with
TEX_D32, TEX_D64 or TEX_D128.

gcc/testsuit/ChangeLog:

2019-02-25  Xiong Hu Luo  

PR c/43673
* gcc.dg/format-dfp-printf-1.c: New test.
* gcc.dg/format-dfp-scanf-1.c: Likewise.
---
 gcc/c-family/c-format.c|  4 ++--
 gcc/testsuite/gcc.dg/format/dfp-printf-1.c | 28 ++--
 gcc/testsuite/gcc.dg/format/dfp-scanf-1.c  | 22 --
 3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 9b48ee3..af33ef9 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -674,7 +674,7 @@ static const format_char_info print_char_table[] =
   { "n",   1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  BADLEN,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN }, "",  "W",  NULL 
},
   /* C99 conversion specifiers.  */
   { "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "",   
NULL },
-  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp0 +#",   "",   NULL 
},
+  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64,  TEX_D128 }, "-wp0 +#",   "",   
NULL },
   /* X/Open conversion specifiers.  */
   { "C",   0, STD_EXT, { TEX_WI,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","",   NULL 
},
   { "S",   1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "R",  NULL 
},
@@ -847,7 +847,7 @@ static const format_char_info scan_char_table[] =
   { "n", 1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  
BADLEN,  T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN }, "", "W",   
NULL },
   /* C99 conversion specifiers.  */
   { "F",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "*w'",  "W",   NULL },
-  { "aA",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD, 
 BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*w'",  "W",   NULL },
+  { "aA",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD, 
 BADLEN,  BADLEN,  BADLEN,  TEX_D32,  TEX_D64,  TEX_D128 }, "*w'",  "W",   NULL 
},
   /* X/Open conversion specifiers.  */
   { "C", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*mw",   "W",   
NULL },
   { "S", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*amw",  "W",   
NULL },
diff --git a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c 
b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
index e92f161..a290895 100644
--- a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
+++ b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c
@@ -17,6 +17,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
 
   /* Check lack of warnings for valid usage.  */
 
+  printf ("%Ha\n", x);
+  printf ("%HA\n", x);
   printf ("%Hf\n", x);
   printf ("%HF\n", x);
   printf ("%He\n", x);
@@ -24,6 +26,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
   printf ("%Hg\n", x);
   printf ("%HG\n", x);
 
+  printf ("%Da\n", y);
+  printf ("%DA\n", y);
   printf ("%Df\n", y);
   printf ("%DF\n", y);
   printf ("%De\n", y);
@@ -31,6 +35,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
   printf ("%Dg\n", y);
   printf ("%DG\n", y);
 
+  printf ("%DDa\n", z);
+  printf ("%DDA\n", z);
   printf ("%DDf\n", z);
   printf ("%DDF\n", z);
   printf ("%DDe\n", z);
@@ -43,12 +49,16 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, 
unsigned int j,
 
   /* Check warnings for type mismatches.  */
 
+  printf ("%Ha\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
+  printf ("%HA\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   printf ("%Hf\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   printf ("%HF\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   printf ("%He\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   printf ("%HE\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   printf ("%Hg\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
   

[PATCH] luoxhu - backport from trunk r255555, r257253 and r258137

2019-02-18 Thread luoxhu
From: Xiong Hu Luo 

This is a backport of r25, r257253 and r258137 of trunk to gcc-7-branch.
The patches were on trunk before GCC 8 forked already. Totally 5 files need
mannual resolve due to code changes for r25. r257253 and r258137 are
dependent testcases require vsx support need merge to avoid regression.

The discussion for the patch r25 that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html
VSX support for patch r257253 and r258137:
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html

gcc/ChangeLog:

2019-01-14  Luo Xiong Hu  

Backport from trunk. Mannually resolve 3 files:
* config/rs6000/altivec.h (vec_extract_fp32_from_shorth,
vec_extract_fp32_from_shortl): Resolve new #defines.
* config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_SLD): Resolve
new expensions.
* doc/extend.texi: (vec_sld, vec_sll, vec_srl, vec_sro,
vec_unpackh, vec_unpackl, test_vsi_packsu_vssi_vssi, vec_packsu,
vec_cmpne): Resolve new documentation.
2017-12-11  Carl Love  

* config/rs6000/altivec.h (vec_extract_fp32_from_shorth,
vec_extract_fp32_from_shortl]): Add #defines.
* config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion.
* config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH,
ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND,
ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL,
ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD,
ALTIVEC_BUILTIN_VEC_SLL): Add expansions.
* doc/extend.texi: Add documentation for the added builtins.

gcc/testsuite/ChangeLog:

2019-01-14  Luo Xiong Hu  

Backport from trunk r25. Mannually resolve 2 files:
* gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vusi,
test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll,
test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui):
Resolve new cases.
* gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc,
test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc,
test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc,
test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc,
test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc,
test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vp_vp_vsc,
test_slo_vp_vp_vuc, test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc,
test_slo_vusi_vusi_vsc, test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc,
test_slo_vf_vf_vsc, test_slo_vf_vf_vuc, test_cmpb_float): Resolve
new cases.
2017-12-11  Carl Love  

* gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h.
* gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl.
Add dg-final tests for the instructions generated.
* gcc.target/powerpc/altivec-7-be.c: New file to test on big endian.
* gcc.target/powerpc/altivec-7-le.c: New file to test on little endian.
* gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl,
 vec_sro testcases. Add dg-final tests for the instructions generated.
* gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui,
test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi,
test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll,
test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add
testcases. Add dg-final tests for new instructions.
* gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq,
vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne,
vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add
tests.
Add dg-final instruction tests.
* gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h.
* gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd,
vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests.
Add dg-final tests for the generated instructions.
* gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc,
test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc,
test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc,
test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc,
test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc,
test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc,
test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc,
test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc,
test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc,
test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc,
test_slo_vf_vf_vuc, test_cmpb_float): Add tests.

Backport 

[PATCH 2/2] fix comments typo.

2019-01-23 Thread luoxhu
From: Xiong Hu Luo 

commited in 268229.
---
gcc/ChangeLog

2019-01-24  Xiong Hu Luo  

* tree-ssa-dom.c (test_for_singularity): fix a comment typo.
* vr-values.c (find_case_label_ranges): fix a comment typo.
---
 gcc/tree-ssa-dom.c | 2 +-
 gcc/vr-values.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 458f711..12647e7 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1929,7 +1929,7 @@ test_for_singularity (gimple *stmt, gcond *dummy_cond,
 
3- Very simple redundant store elimination is performed.
 
-   4- We can simpify a condition to a constant or from a relational
+   4- We can simplify a condition to a constant or from a relational
   condition to an equality condition.  */
 
 edge
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index f4058ea..a734ef9 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -2597,7 +2597,7 @@ find_case_label_ranges (gswitch *stmt, value_range *vr, 
size_t *min_idx1,
 
   take_default = !find_case_label_range (stmt, min, max, , );
 
-  /* Set second range to emtpy.  */
+  /* Set second range to empty.  */
   *min_idx2 = 1;
   *max_idx2 = 0;
 
-- 
2.7.4



[PATCH 1/2] fix tab alignment issue.

2019-01-23 Thread luoxhu
From: Xiong Hu Luo 

commited in r268228.
---
ChangeLog

2019-01-24  Xiong Hu Luo  
* ChangeLog: replace space with tab.
* MAINTAINERS: delete 1 tab to keep alignment.
---
 ChangeLog   | 4 ++--
 MAINTAINERS | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 60ff3e0..8a5d078 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -21,9 +21,9 @@
 
* MAINTAINERS (Write After Approval): Add myself.
 
- 2019-01-16  Xiong Hu Luo 
+2019-01-16  Xiong Hu Luo 
 
- * MAINTAINERS (Write After Approval): Add myself.
+   * MAINTAINERS (Write After Approval): Add myself.
 
 2019-01-03  Rainer Orth  
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 860ba32..0c362aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -484,7 +484,7 @@ Manuel López-Ibáñez 

 Carl Love  
 Martin v. Löwis

 H.J. Lu
-Xiong Hu Luo   

+Xiong Hu Luo   
 Christophe Lyon
 Luis Machado   
 Ziga Mahkovec  
-- 
2.7.4



[PATCH] rs6000: Add support for the vec_sbox_be, vec_cipher_be etc. builtins.

2019-01-23 Thread luoxhu
From: Xiong Hu Luo 

The 5 new builtins vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be
and vec_ncipherlast_be only support vector unsigned char type parameters.
Add new instruction crypto_vsbox_ and crypto__ to handle
them accordingly, where the new mode CR_vqdi can be expanded to vector unsigned
long long for none _be postfix builtins or vector unsigned char for _be postfix
builtins.

---
gcc/ChangeLog

2019-01-23  Xiong Hu Luo  

* gcc/config/rs6000/altivec.h (vec_sbox_be, vec_cipher_be,
vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New #defines.
* gcc/config/rs6000/crypto.md (CR_vqdi): New define_mode_iterator.
(crypto_vsbox_, crypto__): New define_insns.
* gcc/config/rs6000/rs6000-builtin.def (VSBOX_BE): New BU_CRYPTO_1.
(VCIPHER_BE, VCIPHERLAST_BE, VNCIPHER_BE, VNCIPHERLAST_BE):
New BU_CRYPTO_2.
* gcc/config/rs6000/rs6000.c (builtin_function_type)
: New switch options.
* gcc/doc/extend.texi (vec_sbox_be, vec_cipher_be, vec_cipherlast_be,
vec_ncipher_be, vec_ncipherlast_be): New builtin functions.

gcc/testsuite/ChangeLog

2019-01-23  Xiong Hu Luo  

* gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
(crpyto1_be, crpyto2_be, crpyto3_be, crpyto4_be, crpyto5_be):
New testcases.
---
 gcc/config/rs6000/altivec.h|  5 +++
 gcc/config/rs6000/crypto.md| 17 +-
 gcc/config/rs6000/rs6000-builtin.def   | 19 +---
 gcc/config/rs6000/rs6000.c |  5 +++
 gcc/doc/extend.texi| 13 
 .../gcc.target/powerpc/crypto-builtin-1.c  | 36 +++---
 6 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index bf29d46..d66ae7c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -418,6 +418,11 @@
 #define vec_vupkhsw __builtin_vec_vupkhsw
 #define vec_vupklsw __builtin_vec_vupklsw
 #define vec_revb __builtin_vec_revb
+#define vec_sbox_be __builtin_crypto_vsbox_be
+#define vec_cipher_be __builtin_crypto_vcipher_be
+#define vec_cipherlast_be __builtin_crypto_vcipherlast_be
+#define vec_ncipher_be __builtin_crypto_vncipher_be
+#define vec_ncipherlast_be __builtin_crypto_vncipherlast_be
 #endif
 
 #ifdef __POWER9_VECTOR__
diff --git a/gcc/config/rs6000/crypto.md b/gcc/config/rs6000/crypto.md
index 2ee3e3a..b9917b0 100644
--- a/gcc/config/rs6000/crypto.md
+++ b/gcc/config/rs6000/crypto.md
@@ -48,6 +48,9 @@
 ;; Iterator for VSHASIGMAD/VSHASIGMAW
 (define_mode_iterator CR_hash [V4SI V2DI])
 
+;; Iterator for VSBOX/VCIPHER/VNCIPHER/VCIPHERLAST/VNCIPHERLAST
+(define_mode_iterator CR_vqdi [V16QI V2DI])
+
 ;; Iterator for the other crypto functions
 (define_int_iterator CR_code   [UNSPEC_VCIPHER
UNSPEC_VNCIPHER
@@ -60,10 +63,10 @@
  (UNSPEC_VNCIPHERLAST "vncipherlast")])
 
 ;; 2 operand crypto instructions
-(define_insn "crypto_"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
-   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
- (match_operand:V2DI 2 "register_operand" "v")]
+(define_insn "crypto__"
+  [(set (match_operand:CR_vqdi 0 "register_operand" "=v")
+   (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")
+ (match_operand:CR_vqdi 2 "register_operand" "v")]
 CR_code))]
   "TARGET_CRYPTO"
   " %0,%1,%2"
@@ -90,9 +93,9 @@
   [(set_attr "type" "vecperm")])
 
 ;; 1 operand crypto instruction
-(define_insn "crypto_vsbox"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
-   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+(define_insn "crypto_vsbox_"
+  [(set (match_operand:CR_vqdi 0 "register_operand" "=v")
+   (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")]
 UNSPEC_VSBOX))]
   "TARGET_CRYPTO"
   "vsbox %0,%1"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 60b3bd0..0a2bdb7 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2418,13 +2418,22 @@ BU_P9_OVERLOAD_2 (CMPRB2,   "byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,  "byte_in_set")
 
 /* 1 argument crypto functions.  */
-BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox)
+BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox_v2di)
+BU_CRYPTO_1 (VSBOX_BE, "vsbox_be",   CONST, crypto_vsbox_v16qi)
 
 /* 2 argument crypto functions.  */
-BU_CRYPTO_2 (VCIPHER,  "vcipher",CONST, crypto_vcipher)
-BU_CRYPTO_2 (VCIPHERLAST,  "vcipherlast",CONST, crypto_vcipherlast)
-BU_CRYPTO_2 (VNCIPHER, "vncipher",   CONST, crypto_vncipher)
-BU_CRYPTO_2 (VNCIPHERLAST, "vncipherlast",   CONST, crypto_vncipherlast)

[PATCH] luoxhu - backport from trunk r255555:

2019-01-22 Thread luoxhu
From: carll 

backport from trunk to gcc-7-branch.

gcc/ChangeLog:

2017-12-11  Carl Love  

* config/rs6000/altivec.h (vec_extract_fp32_from_shorth,
vec_extract_fp32_from_shortl]): Add #defines.
* config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion.
* config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH,
ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND,
ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL,
ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD,
ALTIVEC_BUILTIN_VEC_SLL): Add expansions.
* doc/extend.texi: Add documentation for the added builtins.

gcc/testsuite/ChangeLog:

2017-12-11  Carl Love  
* gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h.
* gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl.
Add dg-final tests for the instructions generated.
* gcc.target/powerpc/altivec-7-be.c: New file to test on big endian.
* gcc.target/powerpc/altivec-7-le.c: New file to test on little endian.
* gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl,
 vec_sro testcases. Add dg-final tests for the instructions generated.
* gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui,
test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi,
test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll,
test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add
testcases. Add dg-final tests for new instructions.
* gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq,
vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne,
vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add
tests.
Add dg-final instruction tests.
* gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h.
* gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd,
vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests.
Add dg-final tests for the generated instructions.
* gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc,
test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc,
test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc,
test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc,
test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc,
test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc,
test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc,
test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc,
test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc,
test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc,
test_slo_vf_vf_vuc, test_cmpb_float): Add tests.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@25 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/config/rs6000/altivec.h|   3 +
 gcc/config/rs6000/rs6000-builtin.def   |   1 +
 gcc/config/rs6000/rs6000-c.c   |  38 +
 gcc/doc/extend.texi|  48 +-
 gcc/testsuite/gcc.target/powerpc/altivec-13.c  |  69 -
 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c|  35 +
 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c|  36 +
 gcc/testsuite/gcc.target/powerpc/altivec-7.c   |  46 --
 gcc/testsuite/gcc.target/powerpc/altivec-7.h   |  50 ++
 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c   |  79 +-
 gcc/testsuite/gcc.target/powerpc/builtins-3.c  | 168 -
 .../gcc.target/powerpc/p8vector-builtin-2.c|  83 +-
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c |  31 
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c |  32 
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.c|  81 --
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h| 157 +++
 16 files changed, 825 insertions(+), 132 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e04c3a5..b8df599 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -421,6 +421,9 @@
 #define vec_insert_exp __builtin_vec_insert_exp
 #define vec_test_data_class __builtin_vec_test_data_class
 
+#define 

<    1   2