Re: [Patch v2] Enable math functions linking with static library for LTO
Hi Richard, On 2019/8/13 17:10, Richard Biener wrote: On Tue, Aug 13, 2019 at 4:22 AM luoxhu wrote: Hi Richard, On 2019/8/12 16:51, Richard Biener wrote: On Mon, Aug 12, 2019 at 8:50 AM luoxhu wrote: Hi Richard, Thanks for your comments, updated the v2 patch as below: 1. Define and use builtin_with_linkage_p. 2. Add comments. 3. Add a testcase. In LTO mode, if static library and dynamic library contains same function and both libraries are passed as arguments, linker will link the function in dynamic library no matter the sequence. This patch will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL is a math function, then the function in static library will be linked first if its sequence is ahead of the dynamic library. Comments below gcc/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * builtins.c (builtin_with_linkage_p): New function. * builtins.h (builtin_with_linkage_p): New function. * symtab.c (write_symbol): Use builtin_with_linkage_p. * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p): Likewise. gcc/testsuite/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * gcc.dg/pr91287.c: New testcase. --- gcc/builtins.c | 89 ++ gcc/builtins.h | 2 + gcc/lto-streamer-out.c | 4 +- gcc/symtab.c | 13 - gcc/testsuite/gcc.dg/pr91287.c | 40 +++ 5 files changed, 145 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr91287.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 695a9d191af..f4dea941a27 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p) *p = (char)tree_to_uhwi (t); return true; } + +/* Return true if DECL is a specified builtin math function. These functions + should have symbol in symbol table to provide linkage with faster version of + libraries. */ The comment should read like /* Return true if the builtin DECL is implemented in a standard library. Otherwise returns false which doesn't guarantee it is not (thus the list of handled builtins below may be incomplete). */ +bool +builtin_with_linkage_p (tree decl) +{ + if (!decl) +return false; Omit this check please. + if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL) +switch (DECL_FUNCTION_CODE (decl)) +{ + CASE_FLT_FN (BUILT_IN_ACOS): + CASE_FLT_FN (BUILT_IN_ACOSH): + CASE_FLT_FN (BUILT_IN_ASIN): + CASE_FLT_FN (BUILT_IN_ASINH): + CASE_FLT_FN (BUILT_IN_ATAN): + CASE_FLT_FN (BUILT_IN_ATANH): + CASE_FLT_FN (BUILT_IN_ATAN2): + CASE_FLT_FN (BUILT_IN_CBRT): + CASE_FLT_FN (BUILT_IN_CEIL): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL): + CASE_FLT_FN (BUILT_IN_COPYSIGN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN): + CASE_FLT_FN (BUILT_IN_COS): + CASE_FLT_FN (BUILT_IN_COSH): + CASE_FLT_FN (BUILT_IN_ERF): + CASE_FLT_FN (BUILT_IN_ERFC): + CASE_FLT_FN (BUILT_IN_EXP): + CASE_FLT_FN (BUILT_IN_EXP2): + CASE_FLT_FN (BUILT_IN_EXPM1): + CASE_FLT_FN (BUILT_IN_FABS): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): + CASE_FLT_FN (BUILT_IN_FDIM): + CASE_FLT_FN (BUILT_IN_FLOOR): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR): + CASE_FLT_FN (BUILT_IN_FMA): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): + CASE_FLT_FN (BUILT_IN_FMAX): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX): + CASE_FLT_FN (BUILT_IN_FMIN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN): + CASE_FLT_FN (BUILT_IN_FMOD): + CASE_FLT_FN (BUILT_IN_FREXP): + CASE_FLT_FN (BUILT_IN_HYPOT): + CASE_FLT_FN (BUILT_IN_ILOGB): + CASE_FLT_FN (BUILT_IN_LDEXP): + CASE_FLT_FN (BUILT_IN_LGAMMA): + CASE_FLT_FN (BUILT_IN_LLRINT): + CASE_FLT_FN (BUILT_IN_LLROUND): + CASE_FLT_FN (BUILT_IN_LOG): + CASE_FLT_FN (BUILT_IN_LOG10): + CASE_FLT_FN (BUILT_IN_LOG1P): + CASE_FLT_FN (BUILT_IN_LOG2): + CASE_FLT_FN (BUILT_IN_LOGB): + CASE_FLT_FN (BUILT_IN_LRINT): + CASE_FLT_FN (BUILT_IN_LROUND): + CASE_FLT_FN (BUILT_IN_MODF): + CASE_FLT_FN (BUILT_IN_NAN): + CASE_FLT_FN (BUILT_IN_NEARBYINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT): + CASE_FLT_FN (BUILT_IN_NEXTAFTER): + CASE_FLT_FN (BUILT_IN_NEXTTOWARD): + CASE_FLT_FN (BUILT_IN_POW): + CASE_FLT_FN (BUILT_IN_REMAINDER): + CASE_FLT_FN (BUILT_IN_REMQUO): + CASE_FLT_FN (BUILT_IN_RINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT): + CASE_FLT_FN (BUILT_IN_ROUND): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND): + CASE_FLT_FN (BUILT_IN_SCALBLN): + CASE_FLT_FN (BUILT_IN_SCALBN): + CASE_FLT_FN (BUILT_IN_SIN): + CASE_FLT_FN (BUILT_IN_SINH): + CASE_FLT_FN (BUILT_IN_SINCOS): + CASE_FLT_FN (BUILT_IN_SQRT
Re: [PATCH] Add MD Function type check for builtin_md vectorize
On 2019/8/21 15:40, Richard Biener wrote: On Tue, 20 Aug 2019, Xiong Hu Luo wrote: The DECL_MD_FUNCTION_CODE added in r274404(PR 91421) by rsandifo requires that DECL to be a BUILTIN_IN_MD class built-in, asserts will happen when lto as the patch r274411(PR 91287) outputs some math function symbol to the object, this patch will check function type before do builtin_md vectorize. I think Richard fixed this already. Thanks. It was fixed by Richard's r274524 already. Please ignore this patch. Xionghu Richard. gcc/ChangeLog 2019-08-21 Xiong Hu Luo * tree-vect-stmts.c (vectorizable_call): Check callee built-in type. * gcc/tree.h (DECL_MD_FUNCTION_P): New function. --- gcc/tree-vect-stmts.c | 2 +- gcc/tree.h| 12 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 1e2dfe5d22d..ef947f20d63 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -3376,7 +3376,7 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (cfn != CFN_LAST) fndecl = targetm.vectorize.builtin_vectorized_function (cfn, vectype_out, vectype_in); - else if (callee) + else if (callee && DECL_MD_FUNCTION_P (callee)) fndecl = targetm.vectorize.builtin_md_vectorized_function (callee, vectype_out, vectype_in); } diff --git a/gcc/tree.h b/gcc/tree.h index b910c5cb475..8cce89e5cf3 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3905,6 +3905,18 @@ DECL_MD_FUNCTION_CODE (const_tree decl) return fndecl.function_code; } +/* Return true if decl is a FUNCTION_DECL with built-in class BUILT_IN_MD. + Otherwise return false. */ +inline bool +DECL_MD_FUNCTION_P (const_tree decl) +{ + const tree_function_decl = FUNCTION_DECL_CHECK (decl)->function_decl; + if (fndecl.built_in_class == BUILT_IN_MD) +return true; + else +return false; +} + /* Return the frontend-specific built-in function that DECL represents, given that it is known to be a FUNCTION_DECL with built-in class BUILT_IN_FRONTEND. */
Re: [Patch v2] Enable math functions linking with static library for LTO
On 2019/8/13 10:22, luoxhu wrote: diff --git a/gcc/testsuite/gcc.dg/pr91287.c b/gcc/testsuite/gcc.dg/pr91287.c new file mode 100644 index 000..c816e0537aa --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr91287.c @@ -0,0 +1,40 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2" } */ You don't use -flto here so the testcase doesn't exercise any of the patched code. Does it work when you add -flto here? That is, do scan-symbol[-not] properly use gcc-nm or the linker plugin? -flto is needed here to check this patch correctness, my mistake here, thanks for catching. atan2 will exists in pr91287.o even without lto as pr91287.o has the instruction "bl atan2". After adding -flto the case also works as symbol is written to pr91287.o. PS: update other changes in patch attached. What's more, this test case depends on this patch https://www.sourceware.org/ml/binutils/2019-08/msg00113.html. otherwise nm will report error (use plugin or local gcc-nm is OK): PASS: gcc.dg/pr91287.c (test for excess errors) ERROR: gcc.dg/pr91287.c: error executing dg-final: /usr/bin/nm: pr91287.o: plugin needed to handle lto object UNRESOLVED: gcc.dg/pr91287.c: error executing dg-final: /usr/bin/nm: pr91287.o: plugin needed to handle lto object Xionghu
Re: [Patch v2] Enable math functions linking with static library for LTO
Hi Richard, On 2019/8/12 16:51, Richard Biener wrote: On Mon, Aug 12, 2019 at 8:50 AM luoxhu wrote: Hi Richard, Thanks for your comments, updated the v2 patch as below: 1. Define and use builtin_with_linkage_p. 2. Add comments. 3. Add a testcase. In LTO mode, if static library and dynamic library contains same function and both libraries are passed as arguments, linker will link the function in dynamic library no matter the sequence. This patch will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL is a math function, then the function in static library will be linked first if its sequence is ahead of the dynamic library. Comments below gcc/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * builtins.c (builtin_with_linkage_p): New function. * builtins.h (builtin_with_linkage_p): New function. * symtab.c (write_symbol): Use builtin_with_linkage_p. * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p): Likewise. gcc/testsuite/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * gcc.dg/pr91287.c: New testcase. --- gcc/builtins.c | 89 ++ gcc/builtins.h | 2 + gcc/lto-streamer-out.c | 4 +- gcc/symtab.c | 13 - gcc/testsuite/gcc.dg/pr91287.c | 40 +++ 5 files changed, 145 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr91287.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 695a9d191af..f4dea941a27 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p) *p = (char)tree_to_uhwi (t); return true; } + +/* Return true if DECL is a specified builtin math function. These functions + should have symbol in symbol table to provide linkage with faster version of + libraries. */ The comment should read like /* Return true if the builtin DECL is implemented in a standard library. Otherwise returns false which doesn't guarantee it is not (thus the list of handled builtins below may be incomplete). */ +bool +builtin_with_linkage_p (tree decl) +{ + if (!decl) +return false; Omit this check please. + if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL) +switch (DECL_FUNCTION_CODE (decl)) +{ + CASE_FLT_FN (BUILT_IN_ACOS): + CASE_FLT_FN (BUILT_IN_ACOSH): + CASE_FLT_FN (BUILT_IN_ASIN): + CASE_FLT_FN (BUILT_IN_ASINH): + CASE_FLT_FN (BUILT_IN_ATAN): + CASE_FLT_FN (BUILT_IN_ATANH): + CASE_FLT_FN (BUILT_IN_ATAN2): + CASE_FLT_FN (BUILT_IN_CBRT): + CASE_FLT_FN (BUILT_IN_CEIL): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL): + CASE_FLT_FN (BUILT_IN_COPYSIGN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN): + CASE_FLT_FN (BUILT_IN_COS): + CASE_FLT_FN (BUILT_IN_COSH): + CASE_FLT_FN (BUILT_IN_ERF): + CASE_FLT_FN (BUILT_IN_ERFC): + CASE_FLT_FN (BUILT_IN_EXP): + CASE_FLT_FN (BUILT_IN_EXP2): + CASE_FLT_FN (BUILT_IN_EXPM1): + CASE_FLT_FN (BUILT_IN_FABS): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): + CASE_FLT_FN (BUILT_IN_FDIM): + CASE_FLT_FN (BUILT_IN_FLOOR): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR): + CASE_FLT_FN (BUILT_IN_FMA): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): + CASE_FLT_FN (BUILT_IN_FMAX): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX): + CASE_FLT_FN (BUILT_IN_FMIN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN): + CASE_FLT_FN (BUILT_IN_FMOD): + CASE_FLT_FN (BUILT_IN_FREXP): + CASE_FLT_FN (BUILT_IN_HYPOT): + CASE_FLT_FN (BUILT_IN_ILOGB): + CASE_FLT_FN (BUILT_IN_LDEXP): + CASE_FLT_FN (BUILT_IN_LGAMMA): + CASE_FLT_FN (BUILT_IN_LLRINT): + CASE_FLT_FN (BUILT_IN_LLROUND): + CASE_FLT_FN (BUILT_IN_LOG): + CASE_FLT_FN (BUILT_IN_LOG10): + CASE_FLT_FN (BUILT_IN_LOG1P): + CASE_FLT_FN (BUILT_IN_LOG2): + CASE_FLT_FN (BUILT_IN_LOGB): + CASE_FLT_FN (BUILT_IN_LRINT): + CASE_FLT_FN (BUILT_IN_LROUND): + CASE_FLT_FN (BUILT_IN_MODF): + CASE_FLT_FN (BUILT_IN_NAN): + CASE_FLT_FN (BUILT_IN_NEARBYINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT): + CASE_FLT_FN (BUILT_IN_NEXTAFTER): + CASE_FLT_FN (BUILT_IN_NEXTTOWARD): + CASE_FLT_FN (BUILT_IN_POW): + CASE_FLT_FN (BUILT_IN_REMAINDER): + CASE_FLT_FN (BUILT_IN_REMQUO): + CASE_FLT_FN (BUILT_IN_RINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT): + CASE_FLT_FN (BUILT_IN_ROUND): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND): + CASE_FLT_FN (BUILT_IN_SCALBLN): + CASE_FLT_FN (BUILT_IN_SCALBN): + CASE_FLT_FN (BUILT_IN_SIN): + CASE_FLT_FN (BUILT_IN_SINH): + CASE_FLT_FN (BUILT_IN_SINCOS): + CASE_FLT_FN (BUILT_IN_SQRT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_SQRT): + CASE_FLT_FN (BUILT_IN_TAN): + CASE_FLT_FN (BUILT_IN_TANH): + CASE_FLT_FN
[Patch v2] Enable math functions linking with static library for LTO
Hi Richard, Thanks for your comments, updated the v2 patch as below: 1. Define and use builtin_with_linkage_p. 2. Add comments. 3. Add a testcase. In LTO mode, if static library and dynamic library contains same function and both libraries are passed as arguments, linker will link the function in dynamic library no matter the sequence. This patch will output LTO symbol node as UNDEF if BUILT_IN_NORMAL function FNDECL is a math function, then the function in static library will be linked first if its sequence is ahead of the dynamic library. gcc/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * builtins.c (builtin_with_linkage_p): New function. * builtins.h (builtin_with_linkage_p): New function. * symtab.c (write_symbol): Use builtin_with_linkage_p. * lto-streamer-out.c (symtab_node::output_to_lto_symbol_table_p): Likewise. gcc/testsuite/ChangeLog 2019-08-12 Xiong Hu Luo PR lto/91287 * gcc.dg/pr91287.c: New testcase. --- gcc/builtins.c | 89 ++ gcc/builtins.h | 2 + gcc/lto-streamer-out.c | 4 +- gcc/symtab.c | 13 - gcc/testsuite/gcc.dg/pr91287.c | 40 +++ 5 files changed, 145 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr91287.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 695a9d191af..f4dea941a27 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -11244,3 +11244,92 @@ target_char_cst_p (tree t, char *p) *p = (char)tree_to_uhwi (t); return true; } + +/* Return true if DECL is a specified builtin math function. These functions + should have symbol in symbol table to provide linkage with faster version of + libraries. */ + +bool +builtin_with_linkage_p (tree decl) +{ + if (!decl) +return false; + if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL) +switch (DECL_FUNCTION_CODE (decl)) +{ + CASE_FLT_FN (BUILT_IN_ACOS): + CASE_FLT_FN (BUILT_IN_ACOSH): + CASE_FLT_FN (BUILT_IN_ASIN): + CASE_FLT_FN (BUILT_IN_ASINH): + CASE_FLT_FN (BUILT_IN_ATAN): + CASE_FLT_FN (BUILT_IN_ATANH): + CASE_FLT_FN (BUILT_IN_ATAN2): + CASE_FLT_FN (BUILT_IN_CBRT): + CASE_FLT_FN (BUILT_IN_CEIL): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_CEIL): + CASE_FLT_FN (BUILT_IN_COPYSIGN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN): + CASE_FLT_FN (BUILT_IN_COS): + CASE_FLT_FN (BUILT_IN_COSH): + CASE_FLT_FN (BUILT_IN_ERF): + CASE_FLT_FN (BUILT_IN_ERFC): + CASE_FLT_FN (BUILT_IN_EXP): + CASE_FLT_FN (BUILT_IN_EXP2): + CASE_FLT_FN (BUILT_IN_EXPM1): + CASE_FLT_FN (BUILT_IN_FABS): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): + CASE_FLT_FN (BUILT_IN_FDIM): + CASE_FLT_FN (BUILT_IN_FLOOR): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FLOOR): + CASE_FLT_FN (BUILT_IN_FMA): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): + CASE_FLT_FN (BUILT_IN_FMAX): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMAX): + CASE_FLT_FN (BUILT_IN_FMIN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMIN): + CASE_FLT_FN (BUILT_IN_FMOD): + CASE_FLT_FN (BUILT_IN_FREXP): + CASE_FLT_FN (BUILT_IN_HYPOT): + CASE_FLT_FN (BUILT_IN_ILOGB): + CASE_FLT_FN (BUILT_IN_LDEXP): + CASE_FLT_FN (BUILT_IN_LGAMMA): + CASE_FLT_FN (BUILT_IN_LLRINT): + CASE_FLT_FN (BUILT_IN_LLROUND): + CASE_FLT_FN (BUILT_IN_LOG): + CASE_FLT_FN (BUILT_IN_LOG10): + CASE_FLT_FN (BUILT_IN_LOG1P): + CASE_FLT_FN (BUILT_IN_LOG2): + CASE_FLT_FN (BUILT_IN_LOGB): + CASE_FLT_FN (BUILT_IN_LRINT): + CASE_FLT_FN (BUILT_IN_LROUND): + CASE_FLT_FN (BUILT_IN_MODF): + CASE_FLT_FN (BUILT_IN_NAN): + CASE_FLT_FN (BUILT_IN_NEARBYINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_NEARBYINT): + CASE_FLT_FN (BUILT_IN_NEXTAFTER): + CASE_FLT_FN (BUILT_IN_NEXTTOWARD): + CASE_FLT_FN (BUILT_IN_POW): + CASE_FLT_FN (BUILT_IN_REMAINDER): + CASE_FLT_FN (BUILT_IN_REMQUO): + CASE_FLT_FN (BUILT_IN_RINT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_RINT): + CASE_FLT_FN (BUILT_IN_ROUND): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_ROUND): + CASE_FLT_FN (BUILT_IN_SCALBLN): + CASE_FLT_FN (BUILT_IN_SCALBN): + CASE_FLT_FN (BUILT_IN_SIN): + CASE_FLT_FN (BUILT_IN_SINH): + CASE_FLT_FN (BUILT_IN_SINCOS): + CASE_FLT_FN (BUILT_IN_SQRT): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_SQRT): + CASE_FLT_FN (BUILT_IN_TAN): + CASE_FLT_FN (BUILT_IN_TANH): + CASE_FLT_FN (BUILT_IN_TGAMMA): + CASE_FLT_FN (BUILT_IN_TRUNC): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_TRUNC): + return true; + default: + break; +} + return false; +} diff --git a/gcc/builtins.h b/gcc/builtins.h index 1ffb491d785..91cbd81be48 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -151,4 +151,6 @@ extern internal_fn replacement_internal_fn (gcall *); extern void warn_string_no_nul
[PATCH v3] Missed function specialization + partial devirtualization
This patch aims to fix PR69678 caused by PGO indirect call profiling performance issues. The bug that profiling data is never working was fixed by Martin's pull back of topN patches, performance got GEOMEAN ~1% improvement. Still, currently the default profile only generates SINGLE indirect target that called more than 75%. This patch leverages MULTIPLE indirect targets use in LTO-WPA and LTO-LTRANS stage, as a result, function specialization, profiling, partial devirtualization, inlining and cloning could be done successfully based on it. Performance can get improved from 0.70 sec to 0.38 sec on simple tests. Details are: 1. PGO with topn is enbaled by default now, but only one indirect target edge will be generated in ipa-profile pass, so add variables to enable multiple speculative edges through passes, speculative_id will record the direct edge index bind to the indirect edge, num_of_ics records how many direct edges owned by the indirect edge, postpone gimple_ic to ipa-profile like default as inline pass will decide whether it is benefit to transform indirect call. 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for profile full support in ipa passes and cgraph_edge functions. speculative_id can be set by make_speculative id when multiple targets are binded to one indirect edge, and cloned if new edge is cloned. speculative_id is streamed out and stream int by lto like lto_stmt_uid. 3. Add 1 in module testcase and 2 cross module testcases. 4. Bootstrap and regression test passed on Power8-LE. v3 Changes: 1. Rebase to trunk. 2. Use speculative_id to track and search the reference node matched with the direct edge's callee for multiple targets. This could eliminate the workaround strstr before. Actually, it is the caller's response to handle the direct edges mapped to same indirect edge. speculative_call_info will still return one of the direct edge specified, this will leverage current IPA edge process framework mostly. gcc/ChangeLog 2019-07-31 Xiong Hu Luo PR ipa/69678 * cgraph.c (symbol_table::create_edge): Init speculative_id. (cgraph_edge::make_speculative): Add param for setting speculative_id. (cgraph_edge::speculative_call_info): Find reference by speculative_id for multiple indirect targets. (cgraph_edge::resolve_speculation): Decrease the speculations for indirect edge, drop it's speculative if not direct target left. (cgraph_edge::redirect_call_stmt_to_callee): Likewise. (cgraph_node::verify_node): Don't report error if speculative edge not include statement. * cgraph.h (struct indirect_target_info): New struct. (indirect_call_targets): New vector variable. (num_of_ics): New variable. (make_speculative): Add param for setting speculative_id. (speculative_id): New variable. * cgraphclones.c (cgraph_node::create_clone): Clone speculative_id. * ipa-inline.c (inline_small_functions): Add iterator update. * ipa-profile.c (ipa_profile_generate_summary): Add indirect multiple targets logic. (ipa_profile): Likewise. * ipa-ref.h (speculative_id): New variable. * ipa.c (process_references): Fix typo. * lto-cgraph.c (lto_output_edge): Add indirect multiple targets logic. Stream out speculative_id. (input_edge): Likewise. * predict.c (dump_prediction): Revome edges count assert to be precise. * symtab.c (symtab_node::create_reference): Init speculative_id. (symtab_node::clone_references): Clone speculative_id. (symtab_node::clone_referring): Clone speculative_id. (symtab_node::clone_reference): Clone speculative_id. (symtab_node::clear_stmts_in_references): Clear speculative_id. * tree-inline.c (copy_bb): Duplicate all the speculative edges if indirect call contains multiple speculative targets. * tree-profile.c (gimple_gen_ic_profiler): Use the new variable __gcov_indirect_call.counters and __gcov_indirect_call.callee. (gimple_gen_ic_func_profiler): Likewise. (pass_ipa_tree_profile::gate): Fix comment typos. * value-prof.c (gimple_ic_transform): Handle topn case. Fix comment typos. gcc/testsuite/ChangeLog 2019-07-31 Xiong Hu Luo PR ipa/69678 * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase. * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase. * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase. * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase. --- gcc/cgraph.c | 70 +- gcc/cgraph.h | 28 ++- gcc/cgraphclones.c| 1 + gcc/ipa-inline.c | 15 +- gcc/ipa-profile.c
Re: [PATCH v4] Generalize get_most_common_single_value to return k_th value & count
Hi Martin, On 2019/7/17 15:55, Martin Liška wrote: On 7/17/19 7:44 AM, luoxhu wrote: Hi Martin, Thanks for your review, v4 Changes as below: 1. Use decrease bubble sort. BTW, I have a question about hist->hvalue.counters[2], when will it become -1, please? Thanks. Currently, if it is -1, the function will return false. Hi. Thanks for that. I made a minor changes to your patch, please see it in attachment. -1 is a value that we use for invalidated histogram. That happens when you need to fit in more values during instrumentation than you have counters in the histogram. It helps to make reproducible builds of a software. Thanks for your patience with many tiny fixes. I will install the updated patch to trunk. Xionghu Martin
Re: [PATCH v4] Generalize get_most_common_single_value to return k_th value & count
Currently get_most_common_single_value could only return the max hist , add sort after reading from disk, then it return nth value in later use. Rename it to get_nth_most_common_value. Hi Martin, Thanks for your review, v4 Changes as below: 1. Use decrease bubble sort. BTW, I have a question about hist->hvalue.counters[2], when will it become -1, please? Thanks. Currently, if it is -1, the function will return false. gcc/ChangeLog: 2019-07-15 Xiong Hu Luo * ipa-profile.c (get_most_common_single_value): Use get_nth_most_common_value. * profile.c (sort_hist_value): New function. (compute_value_histograms): Call sort_hist_value to sort the values after loading from disk. * value-prof.c (get_most_common_single_value): Rename to ... get_nth_most_common_value. Add input params n, return the n_th value and count. (gimple_divmod_fixed_value_transform): Use get_nth_most_common_value. (gimple_ic_transform): Likewise. (gimple_stringops_transform): Likewise. * value-prof.h (get_most_common_single_value): Add input params n, default to 0. --- gcc/ipa-profile.c | 4 ++-- gcc/profile.c | 44 +++ gcc/value-prof.c | 53 --- gcc/value-prof.h | 9 4 files changed, 73 insertions(+), 37 deletions(-) diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c index 1fb939b73d0..970dba39c80 100644 --- a/gcc/ipa-profile.c +++ b/gcc/ipa-profile.c @@ -192,8 +192,8 @@ ipa_profile_generate_summary (void) if (h) { gcov_type val, count, all; - if (get_most_common_single_value (NULL, "indirect call", - h, , , )) + if (get_nth_most_common_value (NULL, "indirect call", h, +, , )) { struct cgraph_edge * e = node->get_edge (stmt); if (e && !e->indirect_unknown_callee) diff --git a/gcc/profile.c b/gcc/profile.c index 441cb8eb183..ae21b1192a0 100644 --- a/gcc/profile.c +++ b/gcc/profile.c @@ -743,6 +743,44 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum) free_aux_for_blocks (); } + /* Sort the histogram value and count for TOPN and INDIR_CALL type. */ + +static bool +sort_hist_value (histogram_value hist) +{ + + if (hist->hvalue.counters[2] == -1) +return false; + + gcc_assert (hist->type == HIST_TYPE_TOPN_VALUES + || hist->type == HIST_TYPE_INDIR_CALL); + + gcc_assert (hist->n_counters == GCOV_TOPN_VALUES_COUNTERS); + + unsigned i, j; + bool swapped = true; + /* Hist value is organized as: + [counter0 value1 counter1 value2 counter2 value3 counter3 value4 counter4] + Use decrese bubble sort to rearrange it. The sort starts from and compares counter first, If counter is same, compares the + value, exchange it if small to keep stable. */ + for (i = 0; i < GCOV_TOPN_VALUES - 1 && swapped; i++) +{ + swapped = false; + for (j = 0; j < GCOV_TOPN_VALUES - 1 - i; j++) + { + gcov_type *p = >hvalue.counters[2 * j + 1]; + if (p[1] < p[3] || (p[1] == p[3] && p[0] < p[2])) + { + std::swap (p[0], p[2]); + std::swap (p[1], p[3]); + swapped = true; + } + } +} + + return true; +} /* Load value histograms values whose description is stored in VALUES array from .gcda file. @@ -808,6 +846,12 @@ compute_value_histograms (histogram_values values, unsigned cfg_checksum, else hist->hvalue.counters[j] = 0; + if (hist->type == HIST_TYPE_TOPN_VALUES + || hist->type == HIST_TYPE_INDIR_CALL) + { + sort_hist_value (hist); + } + /* Time profiler counter is not related to any statement, so that we have to read the counter and set the value to the corresponding call graph node. */ diff --git a/gcc/value-prof.c b/gcc/value-prof.c index 32e6ddd8165..759458868a8 100644 --- a/gcc/value-prof.c +++ b/gcc/value-prof.c @@ -713,45 +713,38 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob, return tmp2; } -/* Return most common value of TOPN_VALUE histogram. If - there's a unique value, return true and set VALUE and COUNT +/* Return the n-th value count of TOPN_VALUE histogram. If + there's a value, return true and set VALUE and COUNT arguments. */ bool -get_most_common_single_value (gimple *stmt, const char *counter_type, - histogram_value hist, - gcov_type *value, gcov_type *count, - gcov_type *all) +get_nth_most_common_value (gimple *stmt, const char *counter_type, +
Re: [PATCH v3] Generalize get_most_common_single_value to return k_th value & count
Currently get_most_common_single_value could only return the max hist , add qsort to enable this function return nth value. Rename it to get_nth_most_common_value. v3 Changes: 1. Move sort to profile.c after loading values from disk. Simplify get_nth_most_common_value. 2. Make qsort stable with value check if count is same. 3. Other comments from v2. gcc/ChangeLog: 2019-07-15 Xiong Hu Luo * ipa-profile.c (get_most_common_single_value): Use get_nth_most_common_value. * profile.c (struct value_count_t): New struct. (cmp_counts): New function. (sort_hist_value): New function. (compute_value_histograms): Call sort_hist_value to sort the values after loading from disk. * value-prof.c (get_most_common_single_value): Rename to ... get_nth_most_common_value. Add input params n, return the n_th value and count. (gimple_divmod_fixed_value_transform): Use get_nth_most_common_value. (gimple_ic_transform): Likewise. (gimple_stringops_transform): Likewise. * value-prof.h (get_most_common_single_value): Add input params n, default to 0. --- gcc/ipa-profile.c | 4 +-- gcc/profile.c | 74 +++ gcc/value-prof.c | 53 +++-- gcc/value-prof.h | 9 +++--- 4 files changed, 103 insertions(+), 37 deletions(-) diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c index 1fb939b73d0..970dba39c80 100644 --- a/gcc/ipa-profile.c +++ b/gcc/ipa-profile.c @@ -192,8 +192,8 @@ ipa_profile_generate_summary (void) if (h) { gcov_type val, count, all; - if (get_most_common_single_value (NULL, "indirect call", - h, , , )) + if (get_nth_most_common_value (NULL, "indirect call", h, +, , )) { struct cgraph_edge * e = node->get_edge (stmt); if (e && !e->indirect_unknown_callee) diff --git a/gcc/profile.c b/gcc/profile.c index 441cb8eb183..54780b44859 100644 --- a/gcc/profile.c +++ b/gcc/profile.c @@ -743,6 +743,74 @@ compute_branch_probabilities (unsigned cfg_checksum, unsigned lineno_checksum) free_aux_for_blocks (); } +struct value_count_t +{ + gcov_type value; + gcov_type count; +}; + +static int +cmp_counts (const void *v1, const void *v2) +{ + const value_count_t *h1 = (const value_count_t *) v1; + const value_count_t *h2 = (const value_count_t *) v2; + if (h1->count < h2->count) +return 1; + if (h1->count > h2->count) +return -1; + if (h1->count == h2->count) +{ + if (h1->value < h2->value) + return 1; + if (h1->value > h2->value) + return -1; +} + /* There may be two entries with same count as well as value very unlikely + in a multi-threaded instrumentation. But the memory layout of the {value, + count} tuple can be different. The function will return K-th most + common value. */ + return 0; +} + +/* Sort the histogram value and count for TOPN and INDIR_CALL type. */ + +static bool +sort_hist_value (histogram_value hist) +{ + auto_vec value_vec; + struct value_count_t temp; + unsigned i; + + if (hist->hvalue.counters[2] == -1) +return false; + + gcc_assert (hist->type == HIST_TYPE_TOPN_VALUES + || hist->type == HIST_TYPE_INDIR_CALL); + + gcc_assert (hist->n_counters == GCOV_TOPN_VALUES_COUNTERS); + + for (i = 0; i < GCOV_TOPN_VALUES; i++) +{ + gcov_type v = hist->hvalue.counters[2 * i + 1]; + gcov_type c = hist->hvalue.counters[2 * i + 2]; + + temp.value = v; + temp.count = c; + + value_vec.safe_push (temp); +} + + value_vec.qsort (cmp_counts); + + gcc_assert (value_vec.length () == GCOV_TOPN_VALUES); + + for (i = 0; i < GCOV_TOPN_VALUES; i++) +{ + hist->hvalue.counters[2 * i + 1] = value_vec[i].value; + hist->hvalue.counters[2 * i + 2] = value_vec[i].count; +} + return true; +} /* Load value histograms values whose description is stored in VALUES array from .gcda file. @@ -808,6 +876,12 @@ compute_value_histograms (histogram_values values, unsigned cfg_checksum, else hist->hvalue.counters[j] = 0; + if (hist->type == HIST_TYPE_TOPN_VALUES + || hist->type == HIST_TYPE_INDIR_CALL) + { + sort_hist_value (hist); + } + /* Time profiler counter is not related to any statement, so that we have to read the counter and set the value to the corresponding call graph node. */ diff --git a/gcc/value-prof.c b/gcc/value-prof.c index 32e6ddd8165..97e4ae18ba3 100644 --- a/gcc/value-prof.c +++ b/gcc/value-prof.c @@ -713,45 +713,38 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
On 2019/6/24 10:34, luoxhu wrote: Hi Honza, Thanks very much to get so many useful comments from you. As a newbie to GCC, not sure whether my questions are described clearly enough. Thanks for your patience in advance. :) On 2019/6/20 21:47, Jan Hubicka wrote: Hi, some comments on the ipa part of the patch (and thanks for working on it - this was on my TODO list for years) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index de82316d4b1..0d373a67d1b 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl) fprintf (dump_file, "Introduced new external node " "(%s) and turned into root of the clone tree.\n", node->dump_name ()); + node->profile_id = first_clone->profile_id; } else if (dump_file) fprintf (dump_file, "Introduced new external node " This is independent of the rest of changes. Do you have example where this matters? The inline clones are created in ipa-inline while ipa-profile is run before it, so I can not think of such a scenario. I see you also copy profile_id from function to clone. I would like to know why you needed that. Also you mention that you hit some ICEs. If fixes are independent of rest of your changes, send them separately. I copy the profile_id for cloned node as when in LTO ltrans, there is no references or referrings info for the specialized node/cloned node, so it is difficult to track the node's reference in cgraph_edge::speculative_call_info. I use it mainly for debug purpose now. Will remove it and split the patches in later version to include ICE fixes. @@ -1110,6 +,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *, int i; cgraph_edge *e2; cgraph_edge *e = this; + cgraph_node *referred_node; if (!e->indirect_unknown_callee) for (e2 = e->caller->indirect_calls; @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *, && ((ref->stmt && ref->stmt == e->call_stmt) || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) { - reference = ref; - break; + if (e2->indirect_info && e2->indirect_info->num_of_ics) + { + referred_node = dyn_cast (ref->referred); + if (strstr (e->callee->name (), referred_node->name ())) + { + reference = ref; + break; + } + } + else + { + reference = ref; + break; + } } This function is intended to return everything related to the speculative call, so if you add multiple direct targets, i would expect it to tage auto_vec of cgraph_nodes for direct and auto_vec of references. So will the signature becomes cgraph_edge::speculative_call_info (auto_vec *direct, cgraph_edge *, auto_vec *reference) Seems a lot of code related to it, maybe should split to another patch. And will the sequence of direct and reference in each auto_vec be strictly mapped for iteration convenience? Second question is "this" is a direct edge will be pushed to auto_vec "direct", how can it get its next direct edge here? From e->caller->callees? There maybe some misunderstanding here. The direct should be one edge only, but reference could be multiple. For example: two indirect edge on one single statement x = p(3); the first speculative edge is main -> one; the second speculative edge 2 is main -> two. direct->call_stmt is: x_10 = p_3 (3); call code in ipa-inline-transform.c: for (e = node->callees; e; e = next) { next = e->next_callee; e->redirect_call_stmt_to_callee (); } redirect_call_stmt_to_callee will call e->speculative_call_info(e, e2, ref). When e is “main -> one" being redirected, The returned auto_vec reference length will be 2. So the map should be 1:N instead of N:N. (one direct edge will find N reference nodes, but only one of it is correct, need iterate to find it out.) e2 is the indirect call(e->caller->indirect_calls) can only be set to false speculative if all indirect targets are redirected by "next=e->next_callee" Or else, the next speculative edge couldn't finish the redirect as the e2 is not speculative again in next round iteration. As a result, maybe still need similar logic to check the returned reference length, only set "e2->speculative = false;" when the length is 1. which means all direct targets are redirected. /* Speculative edge always consist of all three components - direct edge, @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl) in the functions inlined through it. */ } edge->count += e2->count; - edge->speculative = false; + if (edge->indirect_info && edge->indirect_in
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Hi Honza, Thanks very much to get so many useful comments from you. As a newbie to GCC, not sure whether my questions are described clearly enough. Thanks for your patience in advance. :) On 2019/6/20 21:47, Jan Hubicka wrote: Hi, some comments on the ipa part of the patch (and thanks for working on it - this was on my TODO list for years) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index de82316d4b1..0d373a67d1b 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl) fprintf (dump_file, "Introduced new external node " "(%s) and turned into root of the clone tree.\n", node->dump_name ()); + node->profile_id = first_clone->profile_id; } else if (dump_file) fprintf (dump_file, "Introduced new external node " This is independent of the rest of changes. Do you have example where this matters? The inline clones are created in ipa-inline while ipa-profile is run before it, so I can not think of such a scenario. I see you also copy profile_id from function to clone. I would like to know why you needed that. Also you mention that you hit some ICEs. If fixes are independent of rest of your changes, send them separately. I copy the profile_id for cloned node as when in LTO ltrans, there is no references or referrings info for the specialized node/cloned node, so it is difficult to track the node's reference in cgraph_edge::speculative_call_info. I use it mainly for debug purpose now. Will remove it and split the patches in later version to include ICE fixes. @@ -1110,6 +,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *, int i; cgraph_edge *e2; cgraph_edge *e = this; + cgraph_node *referred_node; if (!e->indirect_unknown_callee) for (e2 = e->caller->indirect_calls; @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *, && ((ref->stmt && ref->stmt == e->call_stmt) || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) { - reference = ref; - break; + if (e2->indirect_info && e2->indirect_info->num_of_ics) + { + referred_node = dyn_cast (ref->referred); + if (strstr (e->callee->name (), referred_node->name ())) + { + reference = ref; + break; + } + } + else + { + reference = ref; + break; + } } This function is intended to return everything related to the speculative call, so if you add multiple direct targets, i would expect it to tage auto_vec of cgraph_nodes for direct and auto_vec of references. So will the signature becomes cgraph_edge::speculative_call_info (auto_vec *direct, cgraph_edge *, auto_vec *reference) Seems a lot of code related to it, maybe should split to another patch. And will the sequence of direct and reference in each auto_vec be strictly mapped for iteration convenience? Second question is "this" is a direct edge will be pushed to auto_vec "direct", how can it get its next direct edge here? From e->caller->callees? /* Speculative edge always consist of all three components - direct edge, @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl) in the functions inlined through it. */ } edge->count += e2->count; - edge->speculative = false; + if (edge->indirect_info && edge->indirect_info->num_of_ics) +{ + edge->indirect_info->num_of_ics--; + if (edge->indirect_info->num_of_ics == 0) + edge->speculative = false; +} + else +edge->speculative = false; e2->speculative = false; ref->remove_reference (); if (e2->indirect_unknown_callee || e2->inline_failed) This function should turn speculative call into direct call to DECL, so I think it should remove all the other direct calls associated with stmt and the indirect one. There are now two cases - in first case you want to turn speculative call into direct call or give up on especulation completely, while in other case you want to only remove one of speculations. I guess we want to have resolve_speculation(decl) for first and remove_one_speculation(edge) for the second case? The second case would be useful for the code below handling type mismatches and also for inline when one of speculative targets seems not useful to bother with. So the logic will be: if (edge->indirect_info->num_of_ics > 1) cgraph_edge::resolve_speculation (tree callee_decl); else remove_one_speculation(edge); cgraph_edge::resolve_speculation will call edge->speculative_call_info (e2, edge, ref) internally, at this time, e2 and ref will only contains one direct target? @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void) e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Hi Martin, On 2019/6/20 09:59, luoxhu wrote: On 2019/6/19 20:18, Martin Liška wrote: On 6/19/19 10:56 AM, Martin Liška wrote: Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values. Ok, here's a patch candidate that does tracking of most common N values. For your test-case I can see: pr69678.gcda: 01a9: 18:COUNTERS indirect_call 9 counts pr69678.gcda: 0: 35000 1868707024 17500 969338501 17500 0 0 0 pr69678.gcda: 8: 0 So for now, you'll need to generalize get_most_common_single_value to return N most common values. Eventually we'll need to renamed the counter as it won't be tracking just a single value any longer. I can take care of it. Can you please verify that the patch candidate works for you? Thanks, the profile data seems good, I will try it. I need rebase my patch to trunk first, as there are many conflicts with your previous patch. The patch works perfect for me, lots of duplicate code can be removed base on that. Hope you can upstream it soon. :) BTW, I don't need call the get_most_common_single_value function to access the histogram values & counters, I will loop access it directly one by one. Thanks Xionghu Thanks, Martin
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
On 2019/6/19 20:18, Martin Liška wrote: On 6/19/19 10:56 AM, Martin Liška wrote: Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values. Ok, here's a patch candidate that does tracking of most common N values. For your test-case I can see: pr69678.gcda:01a9: 18:COUNTERS indirect_call 9 counts pr69678.gcda: 0: 35000 1868707024 17500 969338501 17500 0 0 0 pr69678.gcda: 8: 0 So for now, you'll need to generalize get_most_common_single_value to return N most common values. Eventually we'll need to renamed the counter as it won't be tracking just a single value any longer. I can take care of it. Can you please verify that the patch candidate works for you? Thanks, the profile data seems good, I will try it. I need rebase my patch to trunk first, as there are many conflicts with your previous patch. Thanks, Martin
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Hi Martin, On 2019/6/18 18:21, Martin Liška wrote: On 6/18/19 3:45 AM, Xiong Hu Luo wrote: 6.2. SPEC2017 peakrate: 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%); 525.x264_r (-5.29%). Can you please elaborate what are the key indirect call promotions that are needed to achieve such a significant speed up? Are we talking about calls to virtual functions or C-style indirect calls? For benchmark 511.povray_r, no speculations and indirect call promotion happened from povray_r.wpa.069i.profile_estimate: 994 171 indirect calls trained. 995 0 (0.00%) have common target. 996 0 (0.00%) targets was not found. 997 0 (0.00%) targets had parameter count mismatch. 998 0 (0.00%) targets was not in polymorphic call target list. 999 0 (0.00%) speculations seems useless. 1000 0 (0.00%) speculations produced. After applying my patch: 1259 171 indirect calls trained. 1260 60 (35.09%) have common target. 1261 41 (23.98%) targets was not found. 1262 0 (0.00%) targets had parameter count mismatch. 1263 0 (0.00%) targets was not in polymorphic call target list. 1264 57 (33.33%) speculations seems useless. 1265 5 (2.92%) speculations produced. Below indirect calls conversion will take effect, as all of these calls are hot functions, performance boosts a lot by the combination optimization of later stage ipa/inline/clone. ls *.*i.* | xargs grep "Expanding speculative call" povray_r.ltrans5.076i.inline:Expanding speculative call of create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 count: 3784081 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 5041557 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> All_Sphere_Intersections/76184 count: 31488162 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> Inside_Plane/76197 count: 19044626 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 count: 22068935 (adjusted) povray_r.ltrans5.076i.inline:Expanding speculative call of All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 (adjusted) povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 5434215 (adjusted) povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 (adjusted) povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted) povray_r.ltrans6.076i.inline:Expanding speculative call of Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted) povray_r.ltrans6.076i.inline:Expanding speculative call of All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 1636158 (adjusted) Thanks, Martin
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Hi Martin, On 2019/6/18 17:34, Martin Liška wrote: On 6/18/19 11:02 AM, luoxhu wrote: Hi, On 2019/6/18 13:51, Martin Liška wrote: On 6/18/19 3:45 AM, Xiong Hu Luo wrote: Hello. Thank you for the interest in the area. This patch aims to fix PR69678 caused by PGO indirect call profiling bugs. Currently the default instrument function can only find the indirect function that called more than 50% with an incorrect count number returned. Can you please explain what you mean by 'an incorrect count number returned'? For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect as WITHOUT your patch it is 0, things getting better with your fix as the count[0] is 35000, but still not correct, in fact, "one" is running 17500 times, and "two" is running the other 17500 times): indir-call-topn.gcda: 22: 01a9: 18:COUNTERS indirect_call 9 counts indir-call-topn.gcda: 24: 0: *35000 1868707024 0* 0 0 0 0 0 Running with the "--param indir-call-topn-profile=1" will give below profile data, My patch is based on this profile result and do the optimization for multiple indirect targets, performance can get much improve on this testcase and SPEC2017 for some benchmarks(LLVM already support this several years ago...). indir-call-topn.gcda: 26: 01b1: 18:COUNTERS indirect_call_topn 9 counts indir-call-topn.gcda: 28: 0: *0 969338501 17500 1868707024 17500* 0 0 0 test case indir-call-topn.c: #include typedef int (*fptr) (int); int one (int a) { return 1; } int two (int a) { return 0; } fptr table[] = {, }; int main() { int i, x; fptr p = one (3); for (i = 0; i < 35000; i++) { x = (*p) (3); p = table[x]; } printf ("done:%d\n", x); } I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's the only valid situation where both edges with be >= 50%. That's the threshold for which we speculatively devirtualize edges. That said, you don't need generic topn counter, but a probably only a top2 counter which can be generalized from single-value counter type. I'm saying that because I removed the TOPN, mainly due to: https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179 which is over-complicated profiling function. And the changes that I've done recently are motivated to preserve a stable builds. That's achieved by noticing that a single-value counter can't handle all seen values. Actually, the algorithm of function __gcov_one_value_profiler_body in libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I provide. 118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value, 119 int use_atomic) 120 { 121 if (value == counters[1]) 122 counters[2]++; 123 else if (counters[2] == 0) 124 { 125 counters[2] = 1; 126 counters[1] = value; 127 } 128 else 129 counters[2]--; 130 131 if (use_atomic) 132 __atomic_fetch_add ([0], 1, __ATOMIC_RELAXED); 133 else 134 counters[0]++; 135 } function "one" is 1868707024, function "two" is 969338501. Loop running from 0->(35000-1): value counters[0]counters[1] counters[2] 18687070241 1868707024 1 9693385012 1868707024 0 18687070243 1868707024 1 9693385014 1868707024 0 18687070245 1868707024 1 ... 969338501 350001868707024 0 Finally, counters[] return value is [35000, 1868707024, 0]. In ipa-profile.c and value-prof.c, counters[0] is the statement that executed all, counters[2] is the indirect call that counters[1] executed which is 0 here. This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected to be 50%, right?). This prob will cause ipa-profile fail to create speculative edge and do indirect call later. I think this is the reason why topn was introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. There was definitely a bug here before re-enable topn. dump-profile: indir-call-topn.fb.gcc.wpa.069i.profile_estimate 1 Histogram:5 2 35001: time:2 (8.70) size:2 (8.00) 3 35000: time:19 (91.30) size:7 (36.00) 4 17500: time:4 (100.00) size:2 (44.00) 5 1: time:0 (100.00) size:0 (44.0
Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Hi, On 2019/6/18 13:51, Martin Liška wrote: On 6/18/19 3:45 AM, Xiong Hu Luo wrote: Hello. Thank you for the interest in the area. This patch aims to fix PR69678 caused by PGO indirect call profiling bugs. Currently the default instrument function can only find the indirect function that called more than 50% with an incorrect count number returned. Can you please explain what you mean by 'an incorrect count number returned'? For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect as WITHOUT your patch it is 0, things getting better with your fix as the count[0] is 35000, but still not correct, in fact, "one" is running 17500 times, and "two" is running the other 17500 times): indir-call-topn.gcda: 22: 01a9: 18:COUNTERS indirect_call 9 counts indir-call-topn.gcda: 24: 0: *35000 1868707024 0* 0 0 0 0 0 Running with the "--param indir-call-topn-profile=1" will give below profile data, My patch is based on this profile result and do the optimization for multiple indirect targets, performance can get much improve on this testcase and SPEC2017 for some benchmarks(LLVM already support this several years ago...). indir-call-topn.gcda: 26: 01b1: 18:COUNTERS indirect_call_topn 9 counts indir-call-topn.gcda: 28: 0: *0 969338501 17500 1868707024 17500* 0 0 0 test case indir-call-topn.c: #include typedef int (*fptr) (int); int one (int a) { return 1; } int two (int a) { return 0; } fptr table[] = {, }; int main() { int i, x; fptr p = one (3); for (i = 0; i < 35000; i++) { x = (*p) (3); p = table[x]; } printf ("done:%d\n", x); } This patch leverages the "--param indir-call-topn-profile=1" and enables multiple indirect Note that I've remove indir-call-topn-profile last week, the patch will not apply on current trunk. However, I can help you how to adapt single-value counters to support tracking of multiple values. It will be very useful if you help me to track multiple values similarly on trunk code. I will rebase to your code once topn is ready again. Actually topn is more general and top1 is included in, I thought that top1 should be removed instead of topn, though topn will consume longer time than top1 in profile-generate. targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function specialization, profiling, partial devirtualization, inlining and cloning could be done successfully based on it. This decision is definitely big question for Honza? Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests. Details are: 1. When do PGO with indir-call-topn-profile, the gcda data format is not supported in ipa-profile pass, If you take a look at gcc/ipa-profile.c:195 you can see how the probability is propagated to IPA passes. Why is that not sufficient? Current code only support single indirect target, I need track multiple indirect targets and create multiple speculative edges on single indirect call statement. What's more, many ICEs happened in later stage due to single speculative target design, part of this patch is to solve the ICEs of multiple speculative target edges handling. Thanks Xionghu Martin so add variables to pass the information through passes, and postpone gimple_ic to ipa-profile like default as inline pass will decide whether it is benefit to transform indirect call. 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for profile full support in ipa passes and cgraph_edge functions. 3. Fix various hidden speculative call ICEs exposed after enabling this feature when running SPEC2017. 4. Add 1 in module testcase and 2 cross module testcases. 5. TODOs: 5.1. Some reference info will be dropped from WPA to LTRANS, so reference check will be difficult in LTRANS, need replace the strstr with reference compare. 5.2. Some duplicate code need be removed as top1 and topn share same logic. Actually top1 related logic could be eliminated totally as topn includes it. 5.3. Split patch maybe needed as too big but not sure how many would be reasonable. 6. Performance result for ppc64le: 6.1. Representative test: indir-call-prof-topn.c runtime improved from 1.7s to 0.4s. 6.2. SPEC2017 peakrate: 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%); 525.x264_r (-5.29%). No big changes of other benchmarks. Option: -Ofast -mcpu=power8 PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto PASS2_OPTIMIZE:
Re: *Ping* Re: [PATCH] PR c/43673 - Incorrect warning in dfp printf.
Ping for GCC-10. Thanks Xionghu On 2019/3/4 09:13, Xiong Hu Luo wrote: Ping: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01949.html Thanks Xionghu On 2019/2/26 AM9:13, luo...@linux.ibm.com wrote: From: Xiong Hu Luo dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause incorrect warning happens: "use of 'D' length modifier with 'a' type character". Regression-tested on powerpc64le-linux, OK for trunk and gcc-8? gcc/c-family/ChangeLog: 2019-02-25 Xiong Hu Luo PR c/43673 * c-format.c (print_char_table, scanf_char_table): Replace BADLEN with TEX_D32, TEX_D64 or TEX_D128. gcc/testsuit/ChangeLog: 2019-02-25 Xiong Hu Luo PR c/43673 * gcc.dg/format-dfp-printf-1.c: New test. * gcc.dg/format-dfp-scanf-1.c: Likewise. --- gcc/c-family/c-format.c| 4 ++-- gcc/testsuite/gcc.dg/format/dfp-printf-1.c | 28 ++-- gcc/testsuite/gcc.dg/format/dfp-scanf-1.c | 22 -- 3 files changed, 48 insertions(+), 6 deletions(-) diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index 9b48ee3..af33ef9 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -674,7 +674,7 @@ static const format_char_info print_char_table[] = { "n", 1, STD_C89, { T89_I, T99_SC, T89_S, T89_L, T9L_LL, BADLEN, T99_SST, T99_PD, T99_IM, BADLEN, BADLEN, BADLEN }, "", "W", NULL }, /* C99 conversion specifiers. */ { "F", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "", NULL }, - { "aA", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "", NULL }, + { "aA", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#", "", NULL }, /* X/Open conversion specifiers. */ { "C", 0, STD_EXT, { TEX_WI, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-w","", NULL }, { "S", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "R", NULL }, @@ -847,7 +847,7 @@ static const format_char_info scan_char_table[] = { "n", 1, STD_C89, { T89_I, T99_SC, T89_S, T89_L, T9L_LL, BADLEN, T99_SST, T99_PD, T99_IM, BADLEN, BADLEN, BADLEN }, "", "W", NULL }, /* C99 conversion specifiers. */ { "F", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "*w'", "W", NULL }, - { "aA", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*w'", "W", NULL }, + { "aA", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "*w'", "W", NULL }, /* X/Open conversion specifiers. */ { "C", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*mw", "W", NULL }, { "S", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*amw", "W", NULL }, diff --git a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c index e92f161..a290895 100644 --- a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c +++ b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c @@ -17,6 +17,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, /* Check lack of warnings for valid usage. */ + printf ("%Ha\n", x); + printf ("%HA\n", x); printf ("%Hf\n", x); printf ("%HF\n", x); printf ("%He\n", x); @@ -24,6 +26,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, printf ("%Hg\n", x); printf ("%HG\n", x); + printf ("%Da\n", y); + printf ("%DA\n", y); printf ("%Df\n", y); printf ("%DF\n", y); printf ("%De\n", y); @@ -31,6 +35,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, printf ("%Dg\n", y); printf ("%DG\n", y); + printf ("%DDa\n", z); + printf ("%DDA\n", z); printf ("%DDf\n", z); printf ("%DDF\n", z); printf ("%DDe\n", z); @@ -43,12 +49,16 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, /* Check warnings for type mismatches. */ + printf ("%Ha\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ + printf ("%HA\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%Hf\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%HF\n", y); /* { dg-warning "expects argument" "bad use
[PATCH] backport r257541, r259936, r260294, r260623, r261098, r261333, r268585.
From: Xiong Hu Luo These patches are followed changes for r25 on testcases vsx-vector-6*.c. backport them to update file names and fix regressions for GCC7 on power9. Regression tested on power7-be, power8-be, power8-le, power9. gcc/ChangeLog: 2019-04-03 Xiong Hu Luo backport from trunk r260623. 2018-05-23 Segher Boessenkool * doc/sourcebuild.texi (Endianness): New subsubsection. gcc/testsuite/ChangeLog: 2019-04-03 Xiong Hu Luo backport from trunk r257541. 2018-02-07 Will Schmidt * gcc.target/powerpc/vsx-vector-6-le.c: Update CPU target. * gcc.target/powerpc/vsx-vector-6-le.p9.c: New. backport from trunk r259936. 2018-05-04 Carl Love * gcc.target/powerpc/vsx-vector-6.h (foo): Add test for vec_max, vec_trunc. * gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update xvcmpeqdp, xvcmpgtdp, xvcmpgedp counts. Add xxsel counts. * gcc.target/powerpc/vsx-vector-6-be.c (dg-final): Update xvcmpgtdp, xvcmpgedp counts. Add xxsel counts. backport from trunk r260294. 2018-05-16 Carl Love * gcc.target/powerpc/vsx-vector-6-be.c: Remove file. * gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file. * gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file. * gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi. backport from trunk r260623. 2018-05-23 Segher Boessenkool * lib/target-supports.exp (check_effective_target_be): New. (check_effective_target_le): New. backport from part of trunk r261097. 2018-06-01 Carl Love * gcc.target/powerpc/altivec-7-be.c: Delete file. * gcc.target/powerpc/altivec-7-le.c: Delete file. * gcc.target/powerpc/vsx-7-be.c: Remove file. backport from trunk r261098. 2018-06-01 Carl Love Commit 260294 on 2018-05-16 by Carl Love was supposed to add the following files. * gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file. * gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file. backport from trunk r261333. 2018-06-08 Carl Love * gcc.target/powerpc/vsx-vector-6-be.p7.c: Rename this file to vsx-vector-6.p7.c. * gcc.target/powerpc/vsx-vector-6-le.p9.c: Rename this file to vsx-vector-6.p9.c. * gcc.target/powerpc/vsx-vector-6-be.p8.c: Move instruction counts for BE system that are different then for an LE system from this file into vsx-vector-6-le.c using be target qualifier. Remove this file. * gcc.target/powerpc/vsx-vector-6-le.c: Add le qualifiers as needed for the various instruction counts. Rename file to vsx-vector-6.p8.c. backport from trunk r268585. 2019-02-06 Bill Seurer * gcc.target/powerpc/vsx-vector-6.p7.c: Update instruction counts and target. * gcc.target/powerpc/vsx-vector-6.p8.c: Update instruction counts and target. * gcc.target/powerpc/vsx-vector-6.p9.c: Update instruction counts and target. --- gcc/doc/sourcebuild.texi | 10 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c| 30 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c| 37 --- gcc/testsuite/gcc.target/powerpc/vsx-7-be.c| 50 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c | 31 - gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c | 32 - gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h| 14 +- gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c | 42 + gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c | 54 ++ gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c | 39 gcc/testsuite/lib/target-supports.exp | 16 +++ 11 files changed, 173 insertions(+), 182 deletions(-) delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-7-be.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index c7bb4b7..f0e9bb8 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1273,6 +1273,16 @@ By convention, keywords ending in @code{_nocache} can also include options specified for the particular test in an earlier @code{dg-options} or @code{dg-add-options}
[PATCH] backport r268834 from mainline to gcc-7-branch
From: Xiong Hu Luo Backport r268834 of "Add support for the vec_sbox_be, vec_cipher_be etc." from mainline to gcc-8-branch. Regression-tested on Linux POWER8 LE. Backport patch for gcc-8-branch already got approved and commited. OK for gcc-7-branch? gcc/ChangeLog: 2019-03-05 Xiong Hu Luo Backport of r268834 from mainline to gcc-7-branch. 2019-02-13 Xiong Hu Luo * config/rs6000/altivec.h (vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New #defines. * config/rs6000/crypto.md (CR_vqdi): New define_mode_iterator. (crypto_vsbox_, crypto__): New define_insns. * config/rs6000/rs6000-builtin.def (VSBOX_BE): New BU_CRYPTO_1. (VCIPHER_BE, VCIPHERLAST_BE, VNCIPHER_BE, VNCIPHERLAST_BE): New BU_CRYPTO_2. * config/rs6000/rs6000.c (builtin_function_type) : New switch options. * doc/extend.texi (vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New builtin functions. gcc/testsuite/ChangeLog: 2019-03-05 Xiong Hu Luo Backport of r268834 from mainline to gcc-7-branch. 2019-01-23 Xiong Hu Luo * gcc.target/powerpc/crypto-builtin-1.c (crypto1_be, crypto2_be, crypto3_be, crypto4_be, crypto5_be): New testcases. --- gcc/config/rs6000/altivec.h| 5 +++ gcc/config/rs6000/crypto.md| 17 ++ gcc/config/rs6000/rs6000-builtin.def | 19 --- gcc/config/rs6000/rs6000.c | 5 +++ gcc/doc/extend.texi| 13 .../gcc.target/powerpc/crypto-builtin-1.c | 38 ++ 6 files changed, 79 insertions(+), 18 deletions(-) diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index e04c3a5..a89e4a0 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -388,6 +388,11 @@ #define vec_vsubuqm __builtin_vec_vsubuqm #define vec_vupkhsw __builtin_vec_vupkhsw #define vec_vupklsw __builtin_vec_vupklsw +#define vec_sbox_be __builtin_crypto_vsbox_be +#define vec_cipher_be __builtin_crypto_vcipher_be +#define vec_cipherlast_be __builtin_crypto_vcipherlast_be +#define vec_ncipher_be __builtin_crypto_vncipher_be +#define vec_ncipherlast_be __builtin_crypto_vncipherlast_be #endif #ifdef __POWER9_VECTOR__ diff --git a/gcc/config/rs6000/crypto.md b/gcc/config/rs6000/crypto.md index 5892f891..316f5aa 100644 --- a/gcc/config/rs6000/crypto.md +++ b/gcc/config/rs6000/crypto.md @@ -48,6 +48,9 @@ ;; Iterator for VSHASIGMAD/VSHASIGMAW (define_mode_iterator CR_hash [V4SI V2DI]) +;; Iterator for VSBOX/VCIPHER/VNCIPHER/VCIPHERLAST/VNCIPHERLAST +(define_mode_iterator CR_vqdi [V16QI V2DI]) + ;; Iterator for the other crypto functions (define_int_iterator CR_code [UNSPEC_VCIPHER UNSPEC_VNCIPHER @@ -60,10 +63,10 @@ (UNSPEC_VNCIPHERLAST "vncipherlast")]) ;; 2 operand crypto instructions -(define_insn "crypto_" - [(set (match_operand:V2DI 0 "register_operand" "=v") - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v") - (match_operand:V2DI 2 "register_operand" "v")] +(define_insn "crypto__" + [(set (match_operand:CR_vqdi 0 "register_operand" "=v") + (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v") + (match_operand:CR_vqdi 2 "register_operand" "v")] CR_code))] "TARGET_CRYPTO" " %0,%1,%2" @@ -90,9 +93,9 @@ [(set_attr "type" "vecperm")]) ;; 1 operand crypto instruction -(define_insn "crypto_vsbox" - [(set (match_operand:V2DI 0 "register_operand" "=v") - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")] +(define_insn "crypto_vsbox_" + [(set (match_operand:CR_vqdi 0 "register_operand" "=v") + (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")] UNSPEC_VSBOX))] "TARGET_CRYPTO" "vsbox %0,%1" diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 2cc07c6..ff134eb 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2233,13 +2233,22 @@ BU_FLOAT128_1 (FABSQ, "fabsq", CONST, abskf2) BU_FLOAT128_2 (COPYSIGNQ, "copysignq", CONST, copysignkf3) /* 1 argument crypto functions. */ -BU_CRYPTO_1 (VSBOX,"vsbox", CONST, crypto_vsbox) +BU_CRYPTO_1 (VSBOX,"vsbox", CONST, crypto_vsbox_v2di) +BU_CRYPTO_1 (VSBOX_BE, "vsbox_be", CONST, crypto_vsbox_v16qi) /* 2 argument crypto functions. */ -BU_CRYPTO_2 (VCIPHER, "vcipher",CONST, crypto_vcipher) -BU_CRYPTO_2 (VCIPHERLAST, "vcipherlast",CONST, crypto_vcipherlast) -BU_CRYPTO_2 (VNCIPHER, "vncipher", CONST, crypto_vncipher) -BU_CRYPTO_2 (VNCIPHERLAST,
[PATCH v3] luoxhu - backport r250477, r255555, r257253 and r258137
From: Xiong Hu Luo This is a backport of r250477, r25, r257253 and r258137 from trunk to gcc-7-branch to support built-in functions: vec_extract_fp_from_shorth, vec_extract_fp_from_shortl, vec_extract_fp32_from_shorth and vec_extract_fp32_from_shortl, etc. The patches were on trunk before GCC 8 forked already. r257253 and r258137 are dependent testcases require vsx support need merge to avoid regression. The discussion for the patch r250477 that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00624.html The discussion for the patch r25 that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html VSX support for patch r257253 and r258137: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html Regression-tested on Linux POWER8 LE. 2019-02-28 Xiong Hu Luo Backport from trunk r250477. 2017-07-24 Carl Love * config/rs6000/rs6000-c.c: Add support for built-in functions vector float vec_extract_fp32_from_shorth (vector unsigned short); vector float vec_extract_fp32_from_shortl (vector unsigned short); * config/rs6000/altivec.h (vec_extract_fp_from_shorth, vec_extract_fp_from_shortl): Add defines for the two builtins. * config/rs6000/rs6000-builtin.def (VEXTRACT_FP_FROM_SHORTH, VEXTRACT_FP_FROM_SHORTL): Add BU_P9V_OVERLOAD_1 and BU_P9V_VSX_1 new builtins. * config/rs6000/vsx.md vsx_xvcvhpsp): Add define_insn. (vextract_fp_from_shorth, vextract_fp_from_shortl): Add define_expands. * doc/extend.texi: Update the built-in documentation file for the new built-in function. Backport from trunk r25. 2017-12-11 Carl Love * config/rs6000/altivec.h (vec_extract_fp32_from_shorth, vec_extract_fp32_from_shortl]): Add #defines. * config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion. * config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL, ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SLL): Add expansions. * doc/extend.texi: Add documentation for the added builtins. gcc/testsuite/ChangeLog: 2019-02-28 Xiong Hu Luo Backport from trunk r250477. 2017-07-24 Carl Love * gcc.target/powerpc/builtins-3-p9-runnable.c: Add new test file for the new built-ins. Backport from trunk r25. 2017-12-11 Carl Love * gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h. * gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl. Add dg-final tests for the instructions generated. * gcc.target/powerpc/altivec-7-be.c: New file to test on big endian. * gcc.target/powerpc/altivec-7-le.c: New file to test on little endian. * gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl, vec_sro testcases. Add dg-final tests for the instructions generated. * gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui, test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi, test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll, test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add testcases. Add dg-final tests for new instructions. * gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq, vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne, vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add tests. Add dg-final instruction tests. * gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h. * gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd, vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests. Add dg-final tests for the generated instructions. * gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc, test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc, test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc, test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc, test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc, test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc, test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc, test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc, test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc, test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc, test_slo_vf_vf_vuc, test_cmpb_float): Add tests. Backport from trunk r257253. 2018-01-31 Will Schmidt * gcc.target/powerpc/altivec-13.c: Remove VSX-requiring
[PATCH] PR c/43673 - Incorrect warning in dfp printf.
From: Xiong Hu Luo dfp printf/scanf of Ha/HA, Da/DA and DDa/DDA is not set properly, cause incorrect warning happens: "use of 'D' length modifier with 'a' type character". Regression-tested on powerpc64le-linux, OK for trunk and gcc-8? gcc/c-family/ChangeLog: 2019-02-25 Xiong Hu Luo PR c/43673 * c-format.c (print_char_table, scanf_char_table): Replace BADLEN with TEX_D32, TEX_D64 or TEX_D128. gcc/testsuit/ChangeLog: 2019-02-25 Xiong Hu Luo PR c/43673 * gcc.dg/format-dfp-printf-1.c: New test. * gcc.dg/format-dfp-scanf-1.c: Likewise. --- gcc/c-family/c-format.c| 4 ++-- gcc/testsuite/gcc.dg/format/dfp-printf-1.c | 28 ++-- gcc/testsuite/gcc.dg/format/dfp-scanf-1.c | 22 -- 3 files changed, 48 insertions(+), 6 deletions(-) diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index 9b48ee3..af33ef9 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -674,7 +674,7 @@ static const format_char_info print_char_table[] = { "n", 1, STD_C89, { T89_I, T99_SC, T89_S, T89_L, T9L_LL, BADLEN, T99_SST, T99_PD, T99_IM, BADLEN, BADLEN, BADLEN }, "", "W", NULL }, /* C99 conversion specifiers. */ { "F", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "", NULL }, - { "aA", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "", NULL }, + { "aA", 0, STD_C99, { T99_D, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#", "", NULL }, /* X/Open conversion specifiers. */ { "C", 0, STD_EXT, { TEX_WI, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-w","", NULL }, { "S", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "R", NULL }, @@ -847,7 +847,7 @@ static const format_char_info scan_char_table[] = { "n", 1, STD_C89, { T89_I, T99_SC, T89_S, T89_L, T9L_LL, BADLEN, T99_SST, T99_PD, T99_IM, BADLEN, BADLEN, BADLEN }, "", "W", NULL }, /* C99 conversion specifiers. */ { "F", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "*w'", "W", NULL }, - { "aA", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*w'", "W", NULL }, + { "aA", 1, STD_C99, { T99_F, BADLEN, BADLEN, T99_D, BADLEN, T99_LD, BADLEN, BADLEN, BADLEN, TEX_D32, TEX_D64, TEX_D128 }, "*w'", "W", NULL }, /* X/Open conversion specifiers. */ { "C", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*mw", "W", NULL }, { "S", 1, STD_EXT, { TEX_W, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "*amw", "W", NULL }, diff --git a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c index e92f161..a290895 100644 --- a/gcc/testsuite/gcc.dg/format/dfp-printf-1.c +++ b/gcc/testsuite/gcc.dg/format/dfp-printf-1.c @@ -17,6 +17,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, /* Check lack of warnings for valid usage. */ + printf ("%Ha\n", x); + printf ("%HA\n", x); printf ("%Hf\n", x); printf ("%HF\n", x); printf ("%He\n", x); @@ -24,6 +26,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, printf ("%Hg\n", x); printf ("%HG\n", x); + printf ("%Da\n", y); + printf ("%DA\n", y); printf ("%Df\n", y); printf ("%DF\n", y); printf ("%De\n", y); @@ -31,6 +35,8 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, printf ("%Dg\n", y); printf ("%DG\n", y); + printf ("%DDa\n", z); + printf ("%DDA\n", z); printf ("%DDf\n", z); printf ("%DDF\n", z); printf ("%DDe\n", z); @@ -43,12 +49,16 @@ foo (_Decimal32 x, _Decimal64 y, _Decimal128 z, int i, unsigned int j, /* Check warnings for type mismatches. */ + printf ("%Ha\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ + printf ("%HA\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%Hf\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%HF\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%He\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%HE\n", y); /* { dg-warning "expects argument" "bad use of %H" } */ printf ("%Hg\n", y); /* { dg-warning "expects argument" "bad use of %H" } */
[PATCH] luoxhu - backport from trunk r255555, r257253 and r258137
From: Xiong Hu Luo This is a backport of r25, r257253 and r258137 of trunk to gcc-7-branch. The patches were on trunk before GCC 8 forked already. Totally 5 files need mannual resolve due to code changes for r25. r257253 and r258137 are dependent testcases require vsx support need merge to avoid regression. The discussion for the patch r25 that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html VSX support for patch r257253 and r258137: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html gcc/ChangeLog: 2019-01-14 Luo Xiong Hu Backport from trunk. Mannually resolve 3 files: * config/rs6000/altivec.h (vec_extract_fp32_from_shorth, vec_extract_fp32_from_shortl): Resolve new #defines. * config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_SLD): Resolve new expensions. * doc/extend.texi: (vec_sld, vec_sll, vec_srl, vec_sro, vec_unpackh, vec_unpackl, test_vsi_packsu_vssi_vssi, vec_packsu, vec_cmpne): Resolve new documentation. 2017-12-11 Carl Love * config/rs6000/altivec.h (vec_extract_fp32_from_shorth, vec_extract_fp32_from_shortl]): Add #defines. * config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion. * config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL, ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SLL): Add expansions. * doc/extend.texi: Add documentation for the added builtins. gcc/testsuite/ChangeLog: 2019-01-14 Luo Xiong Hu Backport from trunk r25. Mannually resolve 2 files: * gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vusi, test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll, test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Resolve new cases. * gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc, test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc, test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc, test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc, test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc, test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc, test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc, test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc, test_slo_vf_vf_vuc, test_cmpb_float): Resolve new cases. 2017-12-11 Carl Love * gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h. * gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl. Add dg-final tests for the instructions generated. * gcc.target/powerpc/altivec-7-be.c: New file to test on big endian. * gcc.target/powerpc/altivec-7-le.c: New file to test on little endian. * gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl, vec_sro testcases. Add dg-final tests for the instructions generated. * gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui, test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi, test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll, test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add testcases. Add dg-final tests for new instructions. * gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq, vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne, vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add tests. Add dg-final instruction tests. * gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h. * gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd, vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests. Add dg-final tests for the generated instructions. * gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc, test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc, test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc, test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc, test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc, test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc, test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc, test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc, test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc, test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc, test_slo_vf_vf_vuc, test_cmpb_float): Add tests. Backport
[PATCH 2/2] fix comments typo.
From: Xiong Hu Luo commited in 268229. --- gcc/ChangeLog 2019-01-24 Xiong Hu Luo * tree-ssa-dom.c (test_for_singularity): fix a comment typo. * vr-values.c (find_case_label_ranges): fix a comment typo. --- gcc/tree-ssa-dom.c | 2 +- gcc/vr-values.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 458f711..12647e7 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1929,7 +1929,7 @@ test_for_singularity (gimple *stmt, gcond *dummy_cond, 3- Very simple redundant store elimination is performed. - 4- We can simpify a condition to a constant or from a relational + 4- We can simplify a condition to a constant or from a relational condition to an equality condition. */ edge diff --git a/gcc/vr-values.c b/gcc/vr-values.c index f4058ea..a734ef9 100644 --- a/gcc/vr-values.c +++ b/gcc/vr-values.c @@ -2597,7 +2597,7 @@ find_case_label_ranges (gswitch *stmt, value_range *vr, size_t *min_idx1, take_default = !find_case_label_range (stmt, min, max, , ); - /* Set second range to emtpy. */ + /* Set second range to empty. */ *min_idx2 = 1; *max_idx2 = 0; -- 2.7.4
[PATCH 1/2] fix tab alignment issue.
From: Xiong Hu Luo commited in r268228. --- ChangeLog 2019-01-24 Xiong Hu Luo * ChangeLog: replace space with tab. * MAINTAINERS: delete 1 tab to keep alignment. --- ChangeLog | 4 ++-- MAINTAINERS | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 60ff3e0..8a5d078 100644 --- a/ChangeLog +++ b/ChangeLog @@ -21,9 +21,9 @@ * MAINTAINERS (Write After Approval): Add myself. - 2019-01-16 Xiong Hu Luo +2019-01-16 Xiong Hu Luo - * MAINTAINERS (Write After Approval): Add myself. + * MAINTAINERS (Write After Approval): Add myself. 2019-01-03 Rainer Orth diff --git a/MAINTAINERS b/MAINTAINERS index 860ba32..0c362aa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -484,7 +484,7 @@ Manuel López-Ibáñez Carl Love Martin v. Löwis H.J. Lu -Xiong Hu Luo +Xiong Hu Luo Christophe Lyon Luis Machado Ziga Mahkovec -- 2.7.4
[PATCH] rs6000: Add support for the vec_sbox_be, vec_cipher_be etc. builtins.
From: Xiong Hu Luo The 5 new builtins vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be and vec_ncipherlast_be only support vector unsigned char type parameters. Add new instruction crypto_vsbox_ and crypto__ to handle them accordingly, where the new mode CR_vqdi can be expanded to vector unsigned long long for none _be postfix builtins or vector unsigned char for _be postfix builtins. --- gcc/ChangeLog 2019-01-23 Xiong Hu Luo * gcc/config/rs6000/altivec.h (vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New #defines. * gcc/config/rs6000/crypto.md (CR_vqdi): New define_mode_iterator. (crypto_vsbox_, crypto__): New define_insns. * gcc/config/rs6000/rs6000-builtin.def (VSBOX_BE): New BU_CRYPTO_1. (VCIPHER_BE, VCIPHERLAST_BE, VNCIPHER_BE, VNCIPHERLAST_BE): New BU_CRYPTO_2. * gcc/config/rs6000/rs6000.c (builtin_function_type) : New switch options. * gcc/doc/extend.texi (vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be, vec_ncipherlast_be): New builtin functions. gcc/testsuite/ChangeLog 2019-01-23 Xiong Hu Luo * gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c (crpyto1_be, crpyto2_be, crpyto3_be, crpyto4_be, crpyto5_be): New testcases. --- gcc/config/rs6000/altivec.h| 5 +++ gcc/config/rs6000/crypto.md| 17 +- gcc/config/rs6000/rs6000-builtin.def | 19 +--- gcc/config/rs6000/rs6000.c | 5 +++ gcc/doc/extend.texi| 13 .../gcc.target/powerpc/crypto-builtin-1.c | 36 +++--- 6 files changed, 78 insertions(+), 17 deletions(-) diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index bf29d46..d66ae7c 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -418,6 +418,11 @@ #define vec_vupkhsw __builtin_vec_vupkhsw #define vec_vupklsw __builtin_vec_vupklsw #define vec_revb __builtin_vec_revb +#define vec_sbox_be __builtin_crypto_vsbox_be +#define vec_cipher_be __builtin_crypto_vcipher_be +#define vec_cipherlast_be __builtin_crypto_vcipherlast_be +#define vec_ncipher_be __builtin_crypto_vncipher_be +#define vec_ncipherlast_be __builtin_crypto_vncipherlast_be #endif #ifdef __POWER9_VECTOR__ diff --git a/gcc/config/rs6000/crypto.md b/gcc/config/rs6000/crypto.md index 2ee3e3a..b9917b0 100644 --- a/gcc/config/rs6000/crypto.md +++ b/gcc/config/rs6000/crypto.md @@ -48,6 +48,9 @@ ;; Iterator for VSHASIGMAD/VSHASIGMAW (define_mode_iterator CR_hash [V4SI V2DI]) +;; Iterator for VSBOX/VCIPHER/VNCIPHER/VCIPHERLAST/VNCIPHERLAST +(define_mode_iterator CR_vqdi [V16QI V2DI]) + ;; Iterator for the other crypto functions (define_int_iterator CR_code [UNSPEC_VCIPHER UNSPEC_VNCIPHER @@ -60,10 +63,10 @@ (UNSPEC_VNCIPHERLAST "vncipherlast")]) ;; 2 operand crypto instructions -(define_insn "crypto_" - [(set (match_operand:V2DI 0 "register_operand" "=v") - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v") - (match_operand:V2DI 2 "register_operand" "v")] +(define_insn "crypto__" + [(set (match_operand:CR_vqdi 0 "register_operand" "=v") + (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v") + (match_operand:CR_vqdi 2 "register_operand" "v")] CR_code))] "TARGET_CRYPTO" " %0,%1,%2" @@ -90,9 +93,9 @@ [(set_attr "type" "vecperm")]) ;; 1 operand crypto instruction -(define_insn "crypto_vsbox" - [(set (match_operand:V2DI 0 "register_operand" "=v") - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")] +(define_insn "crypto_vsbox_" + [(set (match_operand:CR_vqdi 0 "register_operand" "=v") + (unspec:CR_vqdi [(match_operand:CR_vqdi 1 "register_operand" "v")] UNSPEC_VSBOX))] "TARGET_CRYPTO" "vsbox %0,%1" diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 60b3bd0..0a2bdb7 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2418,13 +2418,22 @@ BU_P9_OVERLOAD_2 (CMPRB2, "byte_in_either_range") BU_P9_OVERLOAD_2 (CMPEQB, "byte_in_set") /* 1 argument crypto functions. */ -BU_CRYPTO_1 (VSBOX,"vsbox", CONST, crypto_vsbox) +BU_CRYPTO_1 (VSBOX,"vsbox", CONST, crypto_vsbox_v2di) +BU_CRYPTO_1 (VSBOX_BE, "vsbox_be", CONST, crypto_vsbox_v16qi) /* 2 argument crypto functions. */ -BU_CRYPTO_2 (VCIPHER, "vcipher",CONST, crypto_vcipher) -BU_CRYPTO_2 (VCIPHERLAST, "vcipherlast",CONST, crypto_vcipherlast) -BU_CRYPTO_2 (VNCIPHER, "vncipher", CONST, crypto_vncipher) -BU_CRYPTO_2 (VNCIPHERLAST, "vncipherlast", CONST, crypto_vncipherlast)
[PATCH] luoxhu - backport from trunk r255555:
From: carll backport from trunk to gcc-7-branch. gcc/ChangeLog: 2017-12-11 Carl Love * config/rs6000/altivec.h (vec_extract_fp32_from_shorth, vec_extract_fp32_from_shortl]): Add #defines. * config/rs6000/rs6000-builtin.def (VSLDOI_2DI): Add macro expansion. * config/rs6000/rs6000-c.c (ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VEC_AND, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SRL, ALTIVEC_BUILTIN_VEC_SRO, ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VEC_SLL): Add expansions. * doc/extend.texi: Add documentation for the added builtins. gcc/testsuite/ChangeLog: 2017-12-11 Carl Love * gcc.target/powerpc/altivec-7.c: Renamed altivec-7.h. * gcc.target/powerpc/altivec-7.h (main): Add testcases for vec_unpackl. Add dg-final tests for the instructions generated. * gcc.target/powerpc/altivec-7-be.c: New file to test on big endian. * gcc.target/powerpc/altivec-7-le.c: New file to test on little endian. * gcc.target/powerpc/altivec-13.c (foo): Add vec_sld, vec_srl, vec_sro testcases. Add dg-final tests for the instructions generated. * gcc.target/powerpc/builtins-3-p8.c (test_vsi_packs_vui, test_vsi_packs_vsi, test_vsi_packs_vssi, test_vsi_packs_vusi, test_vsi_packsu-vssi, test_vsi_packsu-vusi, test_vsi_packsu-vsll, test_vsi_packsu-vull, test_vsi_packsu-vsi, test_vsi_packsu-vui): Add testcases. Add dg-final tests for new instructions. * gcc.target/powerpc/p8vector-builtin-2.c (vbschar_eq, vbchar_eq, vuchar_eq, vbint_eq, vsint_eq, viint_eq, vuint_eq, vbool_eq, vbint_ne, vsint_ne, vuint_ne, vbool_ne, vsign_ne, vuns_ne, vbshort_ne): Add tests. Add dg-final instruction tests. * gcc.target/powerpc/vsx-vector-6.c: Renamed vsx-vector-6.h. * gcc.target/powerpc/vsx-vector-6.h (vec_andc,vec_nmsub, vec_nmadd, vec_or, vec_nor, vec_andc, vec_or, vec_andc, vec_msums): Add tests. Add dg-final tests for the generated instructions. * gcc.target/powerpc/builtins-3.c (test_sll_vsc_vsc_vsuc, test_sll_vuc_vuc, test_sll_vsi_vsi_vuc, test_sll_vui_vui_vuc, test_sll_vbll_vull, test_sll_vbll_vbll_vus, test_sll_vp_vp_vuc, test_sll_vssi_vssi_vuc, test_sll_vusi_vusi_vuc, test_slo_vsc_vsc_vsc, test_slo_vuc_vuc_vsc, test_slo_vsi_vsi_vsc, test_slo_vsi_vsi_vuc, test_slo_vui_vui_vsc, test_slo_vui_vui_vuc, test_slo_vsll_slo_vsll_vsc, test_slo_vsll_slo_vsll_vuc, test_slo_vull_slo_vull_vsc, test_slo_vull_slo_vull_vuc, test_slo_vp_vp_vsc, test_slo_vp_vp_vuc, test_slo_vssi_vssi_vsc, test_slo_vssi_vssi_vuc, test_slo_vusi_vusi_vsc, test_slo_vusi_vusi_vuc, test_slo_vusi_vusi_vuc, test_slo_vf_vf_vsc, test_slo_vf_vf_vuc, test_cmpb_float): Add tests. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@25 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/config/rs6000/altivec.h| 3 + gcc/config/rs6000/rs6000-builtin.def | 1 + gcc/config/rs6000/rs6000-c.c | 38 + gcc/doc/extend.texi| 48 +- gcc/testsuite/gcc.target/powerpc/altivec-13.c | 69 - gcc/testsuite/gcc.target/powerpc/altivec-7-be.c| 35 + gcc/testsuite/gcc.target/powerpc/altivec-7-le.c| 36 + gcc/testsuite/gcc.target/powerpc/altivec-7.c | 46 -- gcc/testsuite/gcc.target/powerpc/altivec-7.h | 50 ++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c | 79 +- gcc/testsuite/gcc.target/powerpc/builtins-3.c | 168 - .../gcc.target/powerpc/p8vector-builtin-2.c| 83 +- gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c | 31 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c | 32 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.c| 81 -- gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h| 157 +++ 16 files changed, 825 insertions(+), 132 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7.c create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-7.h create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index e04c3a5..b8df599 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -421,6 +421,9 @@ #define vec_insert_exp __builtin_vec_insert_exp #define vec_test_data_class __builtin_vec_test_data_class +#define