Re: [PATCH 3/5] IPA ICF pass
On Thu, Nov 13, 2014 at 2:17 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Oct 15, 2014 at 10:03 AM, Martin Liška mli...@suse.cz wrote: Hello There's final version of the patch I'm going to commit tomorrow in the morning (CEST). Thank you Honza for the review. Martin This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63856 This also caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64693 -- H.J.
Re: [PATCH 3/5] IPA ICF pass
On Wed, Oct 15, 2014 at 10:03 AM, Martin Liška mli...@suse.cz wrote: Hello There's final version of the patch I'm going to commit tomorrow in the morning (CEST). Thank you Honza for the review. Martin This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63856 -- H.J.
Re: [PATCH 3/5] IPA ICF pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747 is likely caused by this patch. compare_gimple_switch does not check CASE_LOW and CASE_HIGH, resulting merging functions not identical. Interestingly in the first a few versions of this patch CASE_LOW/HIGH were checked. But last versions only checks CASE_LABEL. What was the intention? It sounds like an accidental change. CASE_LOW/CASE_HIGH needs to be compared ;) Honza Thanks, Joey On Thu, Oct 23, 2014 at 5:18 AM, Jiong Wang wong.kwongyuan.to...@gmail.com wrote: PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is caused by this commit. from the backtrace, the ICF pass is trying to compare two label tree node without type info. while looks like compare_operand expect the type info always be not empty before invoking func_checker::compatible_types_p. I think we should add a similiar t1/t2 check at the start of func_checker::compatible_types_p, just like what has been done at the head of func_checker::compare_operand. And I verified if we add that check, the crash gone away. Regards, Jiong 2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz: On 10/14/2014 06:04 PM, Jan Hubicka wrote: diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) +m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if +
Re: [PATCH 3/5] IPA ICF pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747 is likely caused by this patch. compare_gimple_switch does not check CASE_LOW and CASE_HIGH, resulting merging functions not identical. Interestingly in the first a few versions of this patch CASE_LOW/HIGH were checked. But last versions only checks CASE_LABEL. What was the intention? Thanks, Joey On Thu, Oct 23, 2014 at 5:18 AM, Jiong Wang wong.kwongyuan.to...@gmail.com wrote: PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is caused by this commit. from the backtrace, the ICF pass is trying to compare two label tree node without type info. while looks like compare_operand expect the type info always be not empty before invoking func_checker::compatible_types_p. I think we should add a similiar t1/t2 check at the start of func_checker::compatible_types_p, just like what has been done at the head of func_checker::compare_operand. And I verified if we add that check, the crash gone away. Regards, Jiong 2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz: On 10/14/2014 06:04 PM, Jan Hubicka wrote: diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) +m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. + + In general, a collection of equivalence dictionaries is built for types + like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure + is utilized by
Re: [PATCH 3/5] IPA ICF pass
PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is caused by this commit. from the backtrace, the ICF pass is trying to compare two label tree node without type info. while looks like compare_operand expect the type info always be not empty before invoking func_checker::compatible_types_p. I think we should add a similiar t1/t2 check at the start of func_checker::compatible_types_p, just like what has been done at the head of func_checker::compare_operand. And I verified if we add that check, the crash gone away. Regards, Jiong 2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz: On 10/14/2014 06:04 PM, Jan Hubicka wrote: diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) +m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. + + In general, a collection of equivalence dictionaries is built for types + like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure + is utilized by every statement-by-stament comparison function. */ + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return return_false (); + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +
Re: [PATCH 3/5] IPA ICF pass
On 10/14/2014 06:04 PM, Jan Hubicka wrote: diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) +m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. + + In general, a collection of equivalence dictionaries is built for types + like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure + is utilized by every statement-by-stament comparison function. */ + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return return_false (); + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + int eh1 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_source_func_decl), s1); + int eh2 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_target_func_decl), s2); + + if (eh1 != eh2) + return return_false_with_msg (EH regions are different); + + if (gimple_code (s1) != gimple_code (s2)) + return return_false_with_msg (gimple codes are different); + + switch (gimple_code (s1)) + { + case GIMPLE_CALL: + if (!compare_gimple_call
Re: [PATCH 3/5] IPA ICF pass
diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) + m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. + + In general, a collection of equivalence dictionaries is built for types + like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure + is utilized by every statement-by-stament comparison function. */ + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return return_false (); + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + int eh1 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_source_func_decl), s1); + int eh2 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_target_func_decl), s2); + + if (eh1 != eh2) + return return_false_with_msg (EH regions are different); + + if (gimple_code (s1) != gimple_code (s2)) + return return_false_with_msg (gimple codes are different); + + switch (gimple_code (s1)) + { + case GIMPLE_CALL: +
Re: [PATCH 3/5] IPA ICF pass
On 10/11/2014 10:19 AM, Jan Hubicka wrote: After few days of measurement and tuning, I was able to get numbers to the following shape: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc phase stream out: 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc ipa devirtualization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc ipa lto decl out: 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall 8220 kB ( 0%) ggc ipa lto cgraph merge: 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall 4943 kB ( 0%) ggc ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall 1352 kB ( 0%) ggc inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental: 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 47.49 3.4877.46 4276682 kB and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem 35% means that 35% of all function bodies are compared with something else? That feels pretty high. but overall numbers are not so terrible. Currently, the pass is able to merge 32K functions. As you know, we group functions to so called classes. According to stats, average non-singular class size contains at the end of comparison 7.39 candidates and we have 5K such functions. Because we load body for each candidate in such groups, it gives us minimum number of loaded bodies: 37K. As we load 70K function, we have still place to improve. But I guess WPA body-less comparison is quite efficient. with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list and a flag is used to identify if a set is
Re: [PATCH 3/5] IPA ICF pass
35% means that 35% of all function bodies are compared with something else? That feels pretty high. but overall numbers are not so terrible. Currently, the pass is able to merge 32K functions. As you know, we group functions to so called classes. According to stats, average non-singular class size contains at the end of comparison 7.39 candidates and we have 5K such functions. Because we load body for each candidate in such groups, it gives us minimum number of loaded bodies: 37K. As we load 70K function, we have still place to improve. But I guess WPA body-less comparison is quite efficient. OK, that seems resonable. with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list and a flag is used to identify if a set is removed or not. Interesting, I would not expect bottleneck in a congruence solving :) The problem was just the hash_set that showed to be slow data structure for a set of operations needed in congruence solving. I have no clue who complicated can it be to implement release_body function to an operation that really releases the memory? I suppose one can keep the caches from streamer and free trees read. Freeing gimple statemnts, cfg should be relatively easy. Lets however first try to tune the implementation rather than try to this hack implemented. Explicit ggc_free calls traditionally tended to cause some negative reactions wrt memory fragmentation concerns. Agree with suggested approach. In future we actually may keep the duplicated functions in WPA memory and use corresponding body whenever the function is inlined to avoid disturbing debug info more than needed. Honza Markus' problem with -fprofile-use has been removed, IPA-ICF is preceding devirtualization pass. I hope it is fine? Yes, I think devirtualization should actually work better with identical virutal methods merged. We just need to be sure it sees through the newly introduced aliases (there should be no thunks for virutal methods) Thanks, Martin Honza
Re: [PATCH 3/5] IPA ICF pass
On 10/11/2014 02:05 AM, Martin Liška wrote: On 09/26/2014 09:46 PM, Jan Hubicka wrote: Hi, this is on ipa-icf-gimple.c @@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void) { if (verify_edge_corresponds_to_fndecl (e, decl)) { - error (edge points to wrong declaration:); - debug_tree (e-callee-decl); - fprintf (stderr, Instead of:); - debug_tree (decl); - error_found = true; + /* The edge can be redirected in WPA by IPA ICF. +Following check really ensures that it's +not the case. */ + + cgraph_node *current_node = cgraph_node::get (decl); + if (!current_node || !current_node-icf_merged) I would move this into verify_edge_corresponds_to_fndecl. diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c new file mode 100644 index 000..7031eaa --- /dev/null +++ b/gcc/ipa-icf-gimple.c @@ -0,0 +1,384 @@ +/* Interprocedural Identical Code Folding pass + Copyright (C) 2014 Free Software Foundation, Inc. + + Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska mli...@suse.cz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ Please add toplevel comment about what the code does and how to use it. +namespace ipa_icf { + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. */ ... to each other? I would add short comment that as comparsion goes you build voclabulary of equivalences of variables/ssanames etc. So people reading the code do not get lost at very beggining. + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return RETURN_FALSE (); The UPPERCASE looks ugly. I see that RETURN_FALSE is a warpper for return_false_with_msg that outputs line and file information. I would make it lowercase even if it is macro. You may consider using CXX_MEM_STAT_INFO style default argument to avoid function macro completely. Probably not big win given that it won't save you from preprocesor mess. + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + if (gimple_code (s1) != gimple_code (s2)) + return RETURN_FALSE_WITH_MSG (gimple codes are different); I think you need to compare EH here. Consider case where one unit is compiled with -fno-exception and thus all EH regions are removed, while other function has EH regions in it. Those are not equivalent. EH region is obtained by lookup_stmt_eh and then you need to comapre them for match as you do with gimple_resx_regoin. + t1 = gimple_call_fndecl (s1); + t2 = gimple_call_fndecl (s2); + + /* Function pointer variables are not supported yet. */ They seems to be, compare_operand seems just right. + +/* Verifies for given GIMPLEs S1 and S2 that + label statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_label (gimple g1, gimple g2) +{ + if (m_ignore_labels) +return true; + + tree t1 = gimple_label_label (g1); + tree t2 = gimple_label_label (g2); + + return compare_tree_ssa_label (t1, t2); +} I would expect the main BB loop to record BB in which label belongs to and the BB assciatio neing checked here. Otherwise I do not see how switch statements are compared to not have different permutations of targets. Also note that one BB may have multiple labels in them and they are equivalent. Also I would punt on occurence of FORCED_LABEL. Those are tricky as they may be passed around and compared for address and no one really defines what should happen. Better to avoid those. Hi. I will remove this support in the pass. + +/* Verifies for given
Re: [PATCH 3/5] IPA ICF pass
After few days of measurement and tuning, I was able to get numbers to the following shape: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc phase stream out: 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc ipa devirtualization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc ipa lto decl out: 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall8220 kB ( 0%) ggc ipa lto cgraph merge: 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall4943 kB ( 0%) ggc ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall1352 kB ( 0%) ggc inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental: 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 47.49 3.4877.46 4276682 kB and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem 35% means that 35% of all function bodies are compared with something else? That feels pretty high. but overall numbers are not so terrible. with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data structure to support delete operation, but it was really slow. Thus, hash_set was replaced with linked list and a flag is used to identify if a set is removed or not. Interesting, I would not expect bottleneck in a congruence solving :) I have no clue who complicated can it be to implement release_body function to an operation that really releases the memory? I suppose one can keep the caches from streamer and free trees read. Freeing gimple statemnts, cfg should be relatively easy. Lets however first try to tune the implementation rather than try to this hack implemented. Explicit ggc_free calls
Re: [PATCH 3/5] IPA ICF pass
+/* Verifies for given GIMPLEs S1 and S2 that + goto statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_goto (gimple g1, gimple g2) +{ + tree dest1, dest2; + + dest1 = gimple_goto_dest (g1); + dest2 = gimple_goto_dest (g2); + + if (TREE_CODE (dest1) != TREE_CODE (dest2) || TREE_CODE (dest1) != SSA_NAME) +return false; + + return compare_operand (dest1, dest2); You probably need to care only about indirect gotos, the direct ones are checked by CFG compare. So is the condtional jump. It looks that this code is visited quite rare. Hmm, perhaps it is called only for indirect calls, because all others are not represented as statements. + +/* Verifies for given GIMPLEs S1 and S2 that ASM statements are equivalent. + For the beginning, the pass only supports equality for + '__asm__ __volatile__ (, , , memory)'. */ + +bool +func_checker::compare_gimple_asm (gimple g1, gimple g2) +{ + if (gimple_asm_volatile_p (g1) != gimple_asm_volatile_p (g2)) +return false; + + if (gimple_asm_ninputs (g1) || gimple_asm_ninputs (g2)) +return false; + + if (gimple_asm_noutputs (g1) || gimple_asm_noutputs (g2)) +return false; + + if (gimple_asm_nlabels (g1) || gimple_asm_nlabels (g2)) +return false; + + if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2)) +return false; + + for (unsigned i = 0; i gimple_asm_nclobbers (g1); i++) +{ + tree clobber1 = TREE_VALUE (gimple_asm_clobber_op (g1, i)); + tree clobber2 = TREE_VALUE (gimple_asm_clobber_op (g2, i)); + + if (!operand_equal_p (clobber1, clobber2, OEP_ONLY_CONST)) + return false; +} + Even asm statements with no inputs or outputs can differ by the actual asm statement. Compare it too. Comparing inputs/outputs/labels should be very easy to do. Compare all gimple_asm_n* for equivalency. This makes fully sense, but I don't understand what kind of operands do you mean? You can look some other code dealing with gimple asm statements. You can just compare gimple_op for 0 gimple_num_ops and be ready to deal with TREE_LIST as described bellow. Honza At the end walk operands and watch the case they are TREE_LIST. THen compare TREE_VALUE (op) of the list for operand_equal_p and TREE_VALUE (TREE_PURPOSE (op)) for equivalency (those are the constraints) If they are not (clobbers are not, those are just strings), operand_equal_p should do. + return true; +} + +} // ipa_icf namespace Otherwise I think ipa-gimple-icf is quite fine now. Please send updated version and I think it can go to mainline before the actual ipa-icf. I renamed both files and put them to a newly created namespace ipa_icf_gimple. Thank you, Martin
Re: [PATCH 3/5] IPA ICF pass
Hello. Yeah, you are right. But even Richard advised me to put it to a single place. Maybe we are a bit more strict than it would be necessary. But I hope that's fine ;) OK, lets do extra checking now and play with this incrementally. Good point. Do you mean cases like, foo (alias_foo) and bar (alias_bar). If we prove that foo equals to bar, can we also merge aliases? I am curious if such comparison can really save something? Are there any interesting cases? What probably matters is that you recognize the equivalence to know that uses of alias_foo can be merged with uses alias_bar. Similarly for thunks. Again something to do incrementally I guess. Honza Martin +case INTEGER_CST: + { + ret = types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2)) + wi::to_offset (t1) == wi::to_offset (t2); tree_int_cst_equal +case FIELD_DECL: + { + tree fctx1 = DECL_FCONTEXT (t1); + tree fctx2 = DECL_FCONTEXT (t2); DECL_FCONTEXT has no semantic meaning; so you can skip comparing it. + + tree offset1 = DECL_FIELD_OFFSET (t1); + tree offset2 = DECL_FIELD_OFFSET (t2); + + tree bit_offset1 = DECL_FIELD_BIT_OFFSET (t1); + tree bit_offset2 = DECL_FIELD_BIT_OFFSET (t2); + + ret = compare_operand (fctx1, fctx2) + compare_operand (offset1, offset2) + compare_operand (bit_offset1, bit_offset2); You probably want to compare type here? +case CONSTRUCTOR: + { + unsigned len1 = vec_safe_length (CONSTRUCTOR_ELTS (t1)); + unsigned len2 = vec_safe_length (CONSTRUCTOR_ELTS (t2)); + + if (len1 != len2) +return false; + + for (unsigned i = 0; i len1; i++) +if (!sem_variable::equals (CONSTRUCTOR_ELT (t1, i)-value, + CONSTRUCTOR_ELT (t2, i)-value)) + return false; You want to compare -index, too. +case INTEGER_CST: + return func_checker::types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2), + true) +wi::to_offset (t1) == wi::to_offset (t2); again ;) This is where I stopped for now. Generally the patch seems OK to me with few of these details fixed. Honza
Re: [PATCH 3/5] IPA ICF pass
On 09/28/2014 04:20 AM, Jan Hubicka wrote: Hi. Thank you Markus for presenting numbers, it corresponds with I measured. If I see correctly, IPA ICF pass takes about 7 seconds, the rest is distributed in verifier (not interesting for release version of the compiler) and 'phase opt and generate'. No idea what can make the difference? phase opt and generate just combine all the optimization times together, so it is same 7 seconds as in the ICF pass :) 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you have any idea how many of those turns out to be different? It would be nice to be able to release the duplicate bodies from memory after the equivalency was stablished Honza Martin Hello. After few days of measurement and tuning, I was able to get numbers to the following shape: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc phase stream out: 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc ipa devirtualization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc ipa lto decl out: 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall 8220 kB ( 0%) ggc ipa lto cgraph merge: 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall 4943 kB ( 0%) ggc ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall 1352 kB ( 0%) ggc inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental: 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 47.49 3.4877.46 4276682 kB and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem with speed was hidden in work list for congruence classes, where hash_set was used. I chose the data structure to support delete operation, but it was really slow. Thus, hash_set was replaced with
Re: [PATCH 3/5] IPA ICF pass
On 09/28/2014 03:20 AM, Jan Hubicka wrote: Hi. Thank you Markus for presenting numbers, it corresponds with I measured. If I see correctly, IPA ICF pass takes about 7 seconds, the rest is distributed in verifier (not interesting for release version of the compiler) and 'phase opt and generate'. No idea what can make the difference? phase opt and generate just combine all the optimization times together, so it is same 7 seconds as in the ICF pass :) 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you have any idea how many of those turns out to be different? It would be nice to be able to release the duplicate bodies from memory after the equivalency was stablished Honza Martin (I resend the message, my mail client was a bit confused, please do _not_ reply to faktur...@foxlink.cz) Hello. After few days of measurement and tuning, I was able to get numbers to the following shape: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 27.83 (59%) usr 0.66 (19%) sys 28.52 (37%) wall 1028813 kB (24%) ggc phase stream in : 16.90 (36%) usr 0.63 (18%) sys 17.60 (23%) wall 3246453 kB (76%) ggc phase stream out: 2.76 ( 6%) usr 2.19 (63%) sys 31.34 (40%) wall 2 kB ( 0%) ggc callgraph optimization : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 40 kB ( 0%) ggc ipa dead code removal : 3.31 ( 7%) usr 0.01 ( 0%) sys 3.25 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 3.69 ( 8%) usr 0.03 ( 1%) sys 3.80 ( 5%) wall 21 kB ( 0%) ggc ipa devirtualization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 13704 kB ( 0%) ggc ipa cp : 1.11 ( 2%) usr 0.07 ( 2%) sys 1.17 ( 2%) wall 188558 kB ( 4%) ggc ipa inlining heuristics : 8.17 (17%) usr 0.14 ( 4%) sys 8.27 (11%) wall 494738 kB (12%) ggc ipa comdats : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.86 ( 4%) usr 0.40 (11%) sys 2.20 ( 3%) wall 537970 kB (13%) ggc ipa lto gimple out : 0.19 ( 0%) usr 0.08 ( 2%) sys 0.27 ( 0%) wall 2 kB ( 0%) ggc ipa lto decl in : 12.20 (26%) usr 0.37 (11%) sys 12.64 (16%) wall 2441687 kB (57%) ggc ipa lto decl out: 2.51 ( 5%) usr 0.21 ( 6%) sys 2.71 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall 15692 kB ( 0%) ggc ipa lto constructors out: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.54 ( 1%) usr 0.09 ( 3%) sys 0.63 ( 1%) wall 407182 kB (10%) ggc ipa lto decl merge : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.34 ( 2%) wall 8220 kB ( 0%) ggc ipa lto cgraph merge: 1.00 ( 2%) usr 0.00 ( 0%) sys 1.00 ( 1%) wall 14605 kB ( 0%) ggc whopr wpa : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 1.90 (55%) sys 28.31 (37%) wall 0 kB ( 0%) ggc whopr partitioning : 2.81 ( 6%) usr 0.01 ( 0%) sys 2.83 ( 4%) wall 4943 kB ( 0%) ggc ipa reference : 1.34 ( 3%) usr 0.00 ( 0%) sys 1.35 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 1.62 ( 3%) usr 0.00 ( 0%) sys 1.63 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.68 ( 3%) wall 1352 kB ( 0%) ggc inline parameters : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA rewrite: 0.11 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 18919 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental: 0.24 ( 1%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 11325 kB ( 0%) ggc tree operand scan : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.18 ( 0%) wall 116283 kB ( 3%) ggc dominance frontiers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.02 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.55 ( 1%) usr 0.00 ( 0%) sys 0.56 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 47.49 3.4877.46 4276682 kB and I was able to reduce function bodies loaded in WPA to 35% (from previous 55%). The main problem with speed was hidden in work list for congruence classes, where hash_set was used. I chose
Re: [PATCH 3/5] IPA ICF pass
On 09/26/2014 09:46 PM, Jan Hubicka wrote: Hi, this is on ipa-icf-gimple.c @@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void) { if (verify_edge_corresponds_to_fndecl (e, decl)) { - error (edge points to wrong declaration:); - debug_tree (e-callee-decl); - fprintf (stderr, Instead of:); - debug_tree (decl); - error_found = true; + /* The edge can be redirected in WPA by IPA ICF. + Following check really ensures that it's + not the case. */ + + cgraph_node *current_node = cgraph_node::get (decl); + if (!current_node || !current_node-icf_merged) I would move this into verify_edge_corresponds_to_fndecl. diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c new file mode 100644 index 000..7031eaa --- /dev/null +++ b/gcc/ipa-icf-gimple.c @@ -0,0 +1,384 @@ +/* Interprocedural Identical Code Folding pass + Copyright (C) 2014 Free Software Foundation, Inc. + + Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska mli...@suse.cz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ Please add toplevel comment about what the code does and how to use it. +namespace ipa_icf { + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. */ ... to each other? I would add short comment that as comparsion goes you build voclabulary of equivalences of variables/ssanames etc. So people reading the code do not get lost at very beggining. + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return RETURN_FALSE (); The UPPERCASE looks ugly. I see that RETURN_FALSE is a warpper for return_false_with_msg that outputs line and file information. I would make it lowercase even if it is macro. You may consider using CXX_MEM_STAT_INFO style default argument to avoid function macro completely. Probably not big win given that it won't save you from preprocesor mess. + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + if (gimple_code (s1) != gimple_code (s2)) + return RETURN_FALSE_WITH_MSG (gimple codes are different); I think you need to compare EH here. Consider case where one unit is compiled with -fno-exception and thus all EH regions are removed, while other function has EH regions in it. Those are not equivalent. EH region is obtained by lookup_stmt_eh and then you need to comapre them for match as you do with gimple_resx_regoin. + t1 = gimple_call_fndecl (s1); + t2 = gimple_call_fndecl (s2); + + /* Function pointer variables are not supported yet. */ They seems to be, compare_operand seems just right. + +/* Verifies for given GIMPLEs S1 and S2 that + label statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_label (gimple g1, gimple g2) +{ + if (m_ignore_labels) +return true; + + tree t1 = gimple_label_label (g1); + tree t2 = gimple_label_label (g2); + + return compare_tree_ssa_label (t1, t2); +} I would expect the main BB loop to record BB in which label belongs to and the BB assciatio neing checked here. Otherwise I do not see how switch statements are compared to not have different permutations of targets. Also note that one BB may have multiple labels in them and they are equivalent. Also I would punt on occurence of FORCED_LABEL. Those are tricky as they may be passed around and compared for address and no one really defines what should happen. Better to avoid those. Hi. I will
Re: [PATCH 3/5] IPA ICF pass
On 09/26/2014 11:27 PM, Jan Hubicka wrote: diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c new file mode 100644 index 000..f3472fe --- /dev/null +++ b/gcc/ipa-icf.c @@ -0,0 +1,2841 @@ +/* Interprocedural Identical Code Folding pass + Copyright (C) 2014 Free Software Foundation, Inc. + + Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska mli...@suse.cz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +/* Interprocedural Identical Code Folding for functions and + read-only variables. + + The goal of this transformation is to discover functions and read-only + variables which do have exactly the same semantics. (or value) + + In case of functions, + we could either create a virtual clone or do a simple function wrapper + that will call equivalent function. If the function is just locally visible, + all function calls can be redirected. For read-only variables, we create + aliases if possible. + + Optimization pass arranges as follows: The optimization pass is arranged as follows: (I guess) I also wonder if the gimple equality code should be in ipa_icf namespace, it is intended to be shared with tail merging pass, so what about just calling it gimple_sem_equality? +/* Verification function for edges E1 and E2. */ + +bool +func_checker::compare_edge (edge e1, edge e2) +{ + if (e1-flags != e2-flags) +return false; In future we may want to experiment with checking that edge probabilities with profile feedback match and refuse to merge BBs with different outgoing probabilities (i.e. +-5%). Just add it as TODO there, please. + +/* Return true if types are compatible from perspective of ICF. */ +bool func_checker::types_are_compatible_p (tree t1, tree t2, Perhaps dropping _are_ would make sense, so we do not have two names for essentially same thing. +bool compare_polymorphic, +bool first_argument) +{ + if (TREE_CODE (t1) != TREE_CODE (t2)) +return RETURN_FALSE_WITH_MSG (different tree types); + + if (!types_compatible_p (t1, t2)) +return RETURN_FALSE_WITH_MSG (types are not compatible); + + if (get_alias_set (t1) != get_alias_set (t2)) +return RETURN_FALSE_WITH_MSG (alias sets are different); You do not need to compare alias sets except for memory operations IMO. Hello. Yeah, you are right. But even Richard advised me to put it to a single place. Maybe we are a bit more strict than it would be necessary. But I hope that's fine ;) + + /* We call contains_polymorphic_type_p with this pointer type. */ + if (first_argument TREE_CODE (t1) == POINTER_TYPE) +{ + t1 = TREE_TYPE (t1); + t2 = TREE_TYPE (t2); +} + + if (compare_polymorphic + (contains_polymorphic_type_p (t1) || contains_polymorphic_type_p (t2))) +{ + if (!contains_polymorphic_type_p (t1) || !contains_polymorphic_type_p (t2)) +return RETURN_FALSE_WITH_MSG (one type is not polymorphic); + + if (TYPE_MAIN_VARIANT (t1) != TYPE_MAIN_VARIANT (t2)) +return RETURN_FALSE_WITH_MSG (type variants are different for + polymorphic type); I added types_must_be_same_for_odr (t1,t2) for you here. +/* Fast equality function based on knowledge known in WPA. */ + +bool +sem_function::equals_wpa (sem_item *item) +{ + gcc_assert (item-type == FUNC); + + m_compared_func = static_castsem_function * (item); + + if (arg_types.length () != m_compared_func-arg_types.length ()) +return RETURN_FALSE_WITH_MSG (different number of arguments); + + /* Checking types of arguments. */ + for (unsigned i = 0; i arg_types.length (); i++) +{ + /* This guard is here for function pointer with attributes (pr59927.c). */ + if (!arg_types[i] || !m_compared_func-arg_types[i]) +return RETURN_FALSE_WITH_MSG (NULL argument type); + + if (!func_checker::types_are_compatible_p (arg_types[i], + m_compared_func-arg_types[i], + true, i == 0)) +return RETURN_FALSE_WITH_MSG (argument type is different); +} + + /* Result type checking. */ + if (!func_checker::types_are_compatible_p (result_type, + m_compared_func-result_type)) +return RETURN_FALSE_WITH_MSG (result types are different); You may want to compare ECF flags, such as nothrow/const/pure. We do not want to merge
Re: [PATCH 3/5] IPA ICF pass
On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? (without ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall 403069 kB (12%) ggc phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall 2944210 kB (88%) ggc phase stream out: 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall 0 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall 0 kB ( 0%) ggc ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization: 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall 173701 kB ( 5%) ggc ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall 532704 kB (16%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall 2226088 kB (66%) ggc ipa lto decl out: 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall 14226 kB ( 0%) ggc ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall 364151 kB (11%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall 8196 kB ( 0%) ggc ipa lto cgraph merge: 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall 4938 kB ( 0%) ggc ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 92.91 4.29 151.73 3348693 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. (with ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall 1468975 kB (33%) ggc phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall 2944210 kB (67%) ggc phase stream out: 4.52 ( 4%) usr 1.90 (30%) sys 73.47 (38%) wall 12 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 (
Re: [PATCH 3/5] IPA ICF pass
On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote: It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. Forgot to post the binary size numbers (in bytes): | gold's icf off | gold's icf on | --+++ gcc's icf off |79793880|74881040| --+-+ gcc's icf on |78043608|73612800| --+++ -- Markus
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 01:27 AM, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. The majority of the problem are groups of candidates that are built according to hash. The hash value is based on a number of arguments, number of BB, number of gimple statements and types of these statements. It groups function into classes. In WPA (before a body of any function is loaded) I get following histogram: Dump after WPA based types groups Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items Class size histogram [num of members]: number of classe number of classess [1]: 86453 classes [2]: 5680 classes [3]: 1541 classes [4]: 915 classes [5]: 446 classes [6]: 346 classes [7]: 200 classes [8]: 181 classes [9]: 154 classes [10]: 109 classes [11]: 87 classes [12]: 87 classes [13]: 68 classes [14]: 58 classes [15]: 58 classes [16]: 41 classes [17]: 25 classes [18]: 33 classes [19]: 28 classes [20]: 25 classes [21]: 19 classes [22]: 30 classes [23]: 24 classes [24]: 33 classes [25]: 17 classes [26]: 15 classes [27]: 10 classes [28]: 13 classes [29]: 18 classes [30]: 10 classes It means that each class with more than one member needs to be iterated and these functions are compared. And yes, there's the root of the problem. I have to load function body to process deep function comparison. As you can see, we have almost 200k function, where more than half each situated in a group with more that one member. So that 1GB extra memory usage is caused by these bodies: Init called for 105004 items (54.84%). Memory footprint can be significantly reduced if one can load the body and release it and the memory is freed. I asked Honza about it, but it looks GGC mechanism cannot be easily forced to release it. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. You are right, the number were quite promising, but during the time, I had to reduce the aggressivity of the pass. As Honza said, it can be improved step-by-step. One thing that Martin can try is to hook into lto-symtab and try to check that the COMDAT functions that are known to be same pass the equality check. I suppose we will learn interesting things this way. Good point, I will try it. Martin I think the patch adds quite important infrastructure for gimple semantic equality checking and function merging. I went through the majority of code and I think it is mostly ready to mainline (i.e. cleaner than what we have in tree-ssa-tailmerge) so hope we can finish the review process next week. We will need to get better cost/benefits ratio to enable it for -O2 that is someting I would really like to see for 5.0, but it seems to be easier to handle this incrementally Thank you for the review, Martin Honza
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 07:59 AM, Markus Trippelsdorf wrote: On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote: While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? (without ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall 403069 kB (12%) ggc phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall 2944210 kB (88%) ggc phase stream out: 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall 0 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall 0 kB ( 0%) ggc ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization: 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall 173701 kB ( 5%) ggc ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall 532704 kB (16%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall 2226088 kB (66%) ggc ipa lto decl out: 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall 14226 kB ( 0%) ggc ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall 364151 kB (11%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall8196 kB ( 0%) ggc ipa lto cgraph merge: 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall4938 kB ( 0%) ggc ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst: 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo: 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 92.91 4.29 151.73 3348693 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. (with ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall1412 kB ( 0%) ggc phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall 1468975 kB (33%) ggc phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall 2944210 kB (67%) ggc phase stream out: 4.52 ( 4%) usr
Re: [PATCH 3/5] IPA ICF pass
On 09/27/2014 09:47 AM, Markus Trippelsdorf wrote: On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote: It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. Forgot to post the binary size numbers (in bytes): | gold's icf off | gold's icf on | --+++ gcc's icf off |79793880|74881040| --+-+ gcc's icf on |78043608|73612800| --+++ Thanks once more! Gold ICF is quite strong, I will verify what functions are not caught by IPA ICF. These data present that IPA ICF can reduce the binary by 2.19%. I know that it's quite a small improvement, but if you realize that the pass can reduce just the size of .text (and slightly related sections). There are stats about libxul.so (please ignore last 3 columns): Section name Start Size in BSizePortion Disk read in B Disk read Sec. portion 0 0 0.00 B 0.00% 0 0.00 B 0.00% .note.gnu.build-i512 36 36.00 B 0.00% 0 0.00 B 0.00% .dynsym 552 8119279.29 KB 0.08% 0 0.00 B 0.00% .dynstr81744 9085988.73 KB 0.09% 0 0.00 B 0.00% .hash 172608 2175221.24 KB 0.02% 0 0.00 B 0.00% .gnu.version 1943606766 6.61 KB 0.01% 0 0.00 B 0.00% .gnu.version_d201128 56 56.00 B 0.00% 0 0.00 B 0.00% .gnu.version_r2011841216 1.19 KB 0.00% 0 0.00 B 0.00% .rela.dyn 202400 8198208 7.82 MB 8.56% 0 0.00 B 0.00% .rela.plt8400608 7027268.62 KB 0.07% 0 0.00 B 0.00% .init8470880 26 26.00 B 0.00% 0 0.00 B 0.00% .plt 8470912 4686445.77 KB 0.05% 0 0.00 B 0.00% .text85177763901433337.21 MB 40.72% 0 0.00 B 0.00% .fini 47532112 9 9.00 B 0.00% 0 0.00 B 0.00% .rodata 475322881525856014.55 MB 15.93% 0 0.00 B 0.00% .eh_frame 62790848 6203564 5.92 MB 6.47% 0 0.00 B 0.00% .eh_frame_hdr 68994412 1088012 1.04 MB 1.14% 0 0.00 B 0.00% .tbss 70082560 4 4.00 B 0.00% 0 0.00 B 0.00% .dynamic700825601104 1.08 KB 0.00% 0 0.00 B 0.00% .got700836641384 1.35 KB 0.00% 0 0.00 B 0.00% .got.plt70085048 2344822.90 KB 0.02% 0 0.00 B 0.00% .data 70108544 811616 792.59 KB 0.85% 0 0.00 B 0.00% .jcr70920160 8 8.00 B 0.00% 0 0.00 B 0.00% .tm_clone_table 70920168 0 0.00 B 0.00% 0 0.00 B 0.00% .fini_array 70920168 8 8.00 B 0.00% 0 0.00 B 0.00% .init_array 70920176 16 16.00 B 0.00% 0 0.00 B 0.00% .data.rel.ro.loca 70920192 3938880 3.76 MB 4.11% 0 0.00 B 0.00% .data.rel.ro74859072 269216 262.91 KB 0.28% 0 0.00 B 0.00% .bss75128320 1844246 1.76 MB 1.92% 0 0.00 B 0.00% .debug_line 75128288 517517.00 B 0.00% 0 0.00 B 0.00% .debug_info 75128805 817817.00 B 0.00% 0 0.00 B 0.00% .debug_abbrev 75129622 438438.00 B 0.00% 0
Re: [PATCH 3/5] IPA ICF pass
Hi. Thank you Markus for presenting numbers, it corresponds with I measured. If I see correctly, IPA ICF pass takes about 7 seconds, the rest is distributed in verifier (not interesting for release version of the compiler) and 'phase opt and generate'. No idea what can make the difference? phase opt and generate just combine all the optimization times together, so it is same 7 seconds as in the ICF pass :) 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you have any idea how many of those turns out to be different? It would be nice to be able to release the duplicate bodies from memory after the equivalency was stablished Honza Martin
Re: [PATCH 3/5] IPA ICF pass
On 07/17/2014 05:05 PM, Martin Liška wrote: On 07/06/2014 12:53 AM, Jan Hubicka wrote: On Fri, 20 Jun 2014, Trevor Saunders wrote: +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. I would perhaps explicitly say that the optimizations reduce code size and may disturb unwind stacks by replacing a function by equivalent one with different name. +Behavior is similar to Gold Linker ICF optimization. Symbols proved Perhaps tell a bit more here. The optimization works more effectively with link time optimization enabled and that the Gold and GCC ICF works on different levels and thus are not equivalent optimizations - there are equivallences that are found only by GCC and equivalences found only by Gold. +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. Probably at -Os too. I found this a bit hard to read/understand. Perhaps first describe what it does and then, before This flag is enabled... note that This is similar to the ICF optimization performed by the Gold linker. Symbols proved (plural) vs to corresponding symbol seems to miss an an a as in a corresponding symbol. Alas, how is that one determined? Is this more ...are merged into one, from the user's perspective? What does it mean to sensitively decide for usage of alias, thunk, or local redirection? I think this is just a technical detail of the implementation. I would not put that into user manual. It means that for some functions you can make alias, for others you need thunk (so addresses stay different) Gerald Hello, there's updated version of patch that newly uses devirtualization machinery to identify polymorphic types that can potentially break ICF (There are such examples in Firefox). Apart from that, I did many small updates, incorporated Trevor's comments and I tried to improve documentation entry for the pass. Patch has been tested for Firefox and Inkscape with LTO. Thanks, Martin Hello. After couple of weeks I spent with fixing new issues connected to the pass: 1) Inliner failed in case I created a thunk and release body of a function. In such situation we need to preserve DECL_ARGUMENTS. I added new argument for: cgraph_node::release_body. 2) Awkward error was hidden in libstdc++ test for trees, there were two functions having one argument that differs in one sub-template. Thank to Richard who helped me to fix alias set accuracy. 3) There was missing comparison for FIELD_DECLS (DECL_FIELD_BIT_OFFSET) which caused me miscompilation. 4) After discussion with Honza, we introduced new cgraph_node flag called icf_merged. The flag helps to fix verifier in cgraph_node::verify. Current version of the patch can bootstrap on x86_64-linux. With following patch applied, there's not testcase regression. I tried to build Firefox, Inkscape, GIMP and Chromium with LTO and patch applied and no regression has been observed. Moreover, I discussed with Richard and the pass is capable of playing role in tree-ssa-tail-merge (according to first experiments). It can replace current usage of value numbering. I hope we can apply the patch to the mainline in a short-term time window? Thank you, Martin From 53d20d0b0c209b50d385ee8d85d5a7ed4594d477 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Fri, 26 Sep 2014 13:51:47 +0200 Subject: [PATCH 1/3] IPA ICF: patch1 --- gcc/Makefile.in |2 + gcc/cgraph.c | 20 +- gcc/cgraph.h |2 + gcc/cgraphunit.c |2 +- gcc/common.opt | 12 + gcc/doc/invoke.texi | 16 +- gcc/ipa-icf-gimple.c | 384 +++ gcc/ipa-icf.c| 2841 ++ gcc/ipa-icf.h| 803 ++ gcc/lto-cgraph.c |2 + gcc/lto-section-in.c |3 +- gcc/lto-streamer.h |1 + gcc/opts.c |6 + gcc/passes.def |1 + gcc/timevar.def |1 + gcc/tree-pass.h |1 + 16 files changed, 4089 insertions(+), 8 deletions(-) create mode 100644 gcc/ipa-icf-gimple.c create mode 100644 gcc/ipa-icf.c create mode 100644 gcc/ipa-icf.h diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 3dd9d8f..8d02425 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1265,6 +1265,8 @@ OBJS = \ ipa-profile.o \ ipa-prop.o \ ipa-pure-const.o \ + ipa-icf.o \ + ipa-icf-gimple.o \ ipa-reference.o \ ipa-ref.o \ ipa-utils.o \ diff --git a/gcc/cgraph.c b/gcc/cgraph.c index fdcaf79..439db49 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1913,6 +1913,8 @@ cgraph_node::dump (FILE *f) fprintf (f, only_called_at_exit); if (tm_clone) fprintf (f, tm_clone); + if (icf_merged) +fprintf (f, icf_merged); if (DECL_STATIC_CONSTRUCTOR (decl)) fprintf (f, static_constructor (priority:%i), get_init_priority ()); if (DECL_STATIC_DESTRUCTOR (decl))
Re: [PATCH 3/5] IPA ICF pass
On 2014.09.26 at 14:20 +0200, Martin Liška wrote: After couple of weeks I spent with fixing new issues connected to the pass: 1) Inliner failed in case I created a thunk and release body of a function. In such situation we need to preserve DECL_ARGUMENTS. I added new argument for: cgraph_node::release_body. 2) Awkward error was hidden in libstdc++ test for trees, there were two functions having one argument that differs in one sub-template. Thank to Richard who helped me to fix alias set accuracy. 3) There was missing comparison for FIELD_DECLS (DECL_FIELD_BIT_OFFSET) which caused me miscompilation. 4) After discussion with Honza, we introduced new cgraph_node flag called icf_merged. The flag helps to fix verifier in cgraph_node::verify. Current version of the patch can bootstrap on x86_64-linux. With following patch applied, there's not testcase regression. I tried to build Firefox, Inkscape, GIMP and Chromium with LTO and patch applied and no regression has been observed. While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. -- Markus
Re: [PATCH 3/5] IPA ICF pass
Hi, this is on ipa-icf-gimple.c @@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void) { if (verify_edge_corresponds_to_fndecl (e, decl)) { - error (edge points to wrong declaration:); - debug_tree (e-callee-decl); - fprintf (stderr, Instead of:); - debug_tree (decl); - error_found = true; + /* The edge can be redirected in WPA by IPA ICF. +Following check really ensures that it's +not the case. */ + + cgraph_node *current_node = cgraph_node::get (decl); + if (!current_node || !current_node-icf_merged) I would move this into verify_edge_corresponds_to_fndecl. diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c new file mode 100644 index 000..7031eaa --- /dev/null +++ b/gcc/ipa-icf-gimple.c @@ -0,0 +1,384 @@ +/* Interprocedural Identical Code Folding pass + Copyright (C) 2014 Free Software Foundation, Inc. + + Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska mli...@suse.cz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ Please add toplevel comment about what the code does and how to use it. +namespace ipa_icf { + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. */ ... to each other? I would add short comment that as comparsion goes you build voclabulary of equivalences of variables/ssanames etc. So people reading the code do not get lost at very beggining. + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return RETURN_FALSE (); The UPPERCASE looks ugly. I see that RETURN_FALSE is a warpper for return_false_with_msg that outputs line and file information. I would make it lowercase even if it is macro. You may consider using CXX_MEM_STAT_INFO style default argument to avoid function macro completely. Probably not big win given that it won't save you from preprocesor mess. + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + if (gimple_code (s1) != gimple_code (s2)) + return RETURN_FALSE_WITH_MSG (gimple codes are different); I think you need to compare EH here. Consider case where one unit is compiled with -fno-exception and thus all EH regions are removed, while other function has EH regions in it. Those are not equivalent. EH region is obtained by lookup_stmt_eh and then you need to comapre them for match as you do with gimple_resx_regoin. + t1 = gimple_call_fndecl (s1); + t2 = gimple_call_fndecl (s2); + + /* Function pointer variables are not supported yet. */ They seems to be, compare_operand seems just right. + +/* Verifies for given GIMPLEs S1 and S2 that + label statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_label (gimple g1, gimple g2) +{ + if (m_ignore_labels) +return true; + + tree t1 = gimple_label_label (g1); + tree t2 = gimple_label_label (g2); + + return compare_tree_ssa_label (t1, t2); +} I would expect the main BB loop to record BB in which label belongs to and the BB assciatio neing checked here. Otherwise I do not see how switch statements are compared to not have different permutations of targets. Also note that one BB may have multiple labels in them and they are equivalent. Also I would punt on occurence of FORCED_LABEL. Those are tricky as they may be passed around and compared for address and no one really defines what should happen. Better to avoid those. + +/* Verifies for given GIMPLEs S1 and S2 that + switch statements are semantically equivalent. */ + +bool +func_checker::compare_gimple_switch (gimple g1, gimple
Re: [PATCH 3/5] IPA ICF pass
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c new file mode 100644 index 000..f3472fe --- /dev/null +++ b/gcc/ipa-icf.c @@ -0,0 +1,2841 @@ +/* Interprocedural Identical Code Folding pass + Copyright (C) 2014 Free Software Foundation, Inc. + + Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska mli...@suse.cz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +/* Interprocedural Identical Code Folding for functions and + read-only variables. + + The goal of this transformation is to discover functions and read-only + variables which do have exactly the same semantics. (or value) + + In case of functions, + we could either create a virtual clone or do a simple function wrapper + that will call equivalent function. If the function is just locally visible, + all function calls can be redirected. For read-only variables, we create + aliases if possible. + + Optimization pass arranges as follows: The optimization pass is arranged as follows: (I guess) I also wonder if the gimple equality code should be in ipa_icf namespace, it is intended to be shared with tail merging pass, so what about just calling it gimple_sem_equality? +/* Verification function for edges E1 and E2. */ + +bool +func_checker::compare_edge (edge e1, edge e2) +{ + if (e1-flags != e2-flags) +return false; In future we may want to experiment with checking that edge probabilities with profile feedback match and refuse to merge BBs with different outgoing probabilities (i.e. +-5%). Just add it as TODO there, please. + +/* Return true if types are compatible from perspective of ICF. */ +bool func_checker::types_are_compatible_p (tree t1, tree t2, Perhaps dropping _are_ would make sense, so we do not have two names for essentially same thing. +bool compare_polymorphic, +bool first_argument) +{ + if (TREE_CODE (t1) != TREE_CODE (t2)) +return RETURN_FALSE_WITH_MSG (different tree types); + + if (!types_compatible_p (t1, t2)) +return RETURN_FALSE_WITH_MSG (types are not compatible); + + if (get_alias_set (t1) != get_alias_set (t2)) +return RETURN_FALSE_WITH_MSG (alias sets are different); You do not need to compare alias sets except for memory operations IMO. + + /* We call contains_polymorphic_type_p with this pointer type. */ + if (first_argument TREE_CODE (t1) == POINTER_TYPE) +{ + t1 = TREE_TYPE (t1); + t2 = TREE_TYPE (t2); +} + + if (compare_polymorphic + (contains_polymorphic_type_p (t1) || contains_polymorphic_type_p (t2))) +{ + if (!contains_polymorphic_type_p (t1) || !contains_polymorphic_type_p (t2)) + return RETURN_FALSE_WITH_MSG (one type is not polymorphic); + + if (TYPE_MAIN_VARIANT (t1) != TYPE_MAIN_VARIANT (t2)) + return RETURN_FALSE_WITH_MSG (type variants are different for + polymorphic type); I added types_must_be_same_for_odr (t1,t2) for you here. +/* Fast equality function based on knowledge known in WPA. */ + +bool +sem_function::equals_wpa (sem_item *item) +{ + gcc_assert (item-type == FUNC); + + m_compared_func = static_castsem_function * (item); + + if (arg_types.length () != m_compared_func-arg_types.length ()) +return RETURN_FALSE_WITH_MSG (different number of arguments); + + /* Checking types of arguments. */ + for (unsigned i = 0; i arg_types.length (); i++) +{ + /* This guard is here for function pointer with attributes (pr59927.c). */ + if (!arg_types[i] || !m_compared_func-arg_types[i]) + return RETURN_FALSE_WITH_MSG (NULL argument type); + + if (!func_checker::types_are_compatible_p (arg_types[i], + m_compared_func-arg_types[i], + true, i == 0)) + return RETURN_FALSE_WITH_MSG (argument type is different); +} + + /* Result type checking. */ + if (!func_checker::types_are_compatible_p (result_type, + m_compared_func-result_type)) +return RETURN_FALSE_WITH_MSG (result types are different); You may want to compare ECF flags, such as nothrow/const/pure. We do not want to merge non-pure function into pure as it may not be pure in the context it is used. Do you compare attributes? I think optimize attribute needs to be matched. + /* Checking function arguments. */ attributes. + tree decl1 = DECL_ATTRIBUTES
Re: [PATCH 3/5] IPA ICF pass
While a plain Firefox -flto build works fine. LTO/PGO build fails with: lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) ../../gcc/gcc/ipa-utils.c:540 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) ../../gcc/gcc/ipa-icf.c:753 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) ../../gcc/gcc/ipa-icf.c:2706 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:2098 0xf1d3f1 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2784 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2831 The pass is also very memory hungry (from 3GB without ICF to 4GB during libxul link), while the code size savings are in the 1% range. Thnks for checking. I was just thinking about doing that myself. Would you mind posting -ftime-report of firefox WPA stage? It seems that in this case we reject too many of equality candidates? It think the original numbers was about 4-5% but later some equivalences was disabled because of devirt/aliasing issues. Do you compare it with gold ICF enabled? There are quite few obvious improvements to the analysis that can be done, but I guess we need to analyze the interesting cases one by one. One thing that Martin can try is to hook into lto-symtab and try to check that the COMDAT functions that are known to be same pass the equality check. I suppose we will learn interesting things this way. I think the patch adds quite important infrastructure for gimple semantic equality checking and function merging. I went through the majority of code and I think it is mostly ready to mainline (i.e. cleaner than what we have in tree-ssa-tailmerge) so hope we can finish the review process next week. We will need to get better cost/benefits ratio to enable it for -O2 that is someting I would really like to see for 5.0, but it seems to be easier to handle this incrementally Honza
Re: [PATCH 3/5] IPA ICF pass
On Fri, 20 Jun 2014, Trevor Saunders wrote: +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +Behavior is similar to Gold Linker ICF optimization. Symbols proved +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. I found this a bit hard to read/understand. Perhaps first describe what it does and then, before This flag is enabled... note that This is similar to the ICF optimization performed by the Gold linker. Symbols proved (plural) vs to corresponding symbol seems to miss an an a as in a corresponding symbol. Alas, how is that one determined? Is this more ...are merged into one, from the user's perspective? What does it mean to sensitively decide for usage of alias, thunk, or local redirection? Gerald
Re: [PATCH 3/5] IPA ICF pass
On Fri, 20 Jun 2014, Trevor Saunders wrote: +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. I would perhaps explicitly say that the optimizations reduce code size and may disturb unwind stacks by replacing a function by equivalent one with different name. +Behavior is similar to Gold Linker ICF optimization. Symbols proved Perhaps tell a bit more here. The optimization works more effectively with link time optimization enabled and that the Gold and GCC ICF works on different levels and thus are not equivalent optimizations - there are equivallences that are found only by GCC and equivalences found only by Gold. +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. Probably at -Os too. I found this a bit hard to read/understand. Perhaps first describe what it does and then, before This flag is enabled... note that This is similar to the ICF optimization performed by the Gold linker. Symbols proved (plural) vs to corresponding symbol seems to miss an an a as in a corresponding symbol. Alas, how is that one determined? Is this more ...are merged into one, from the user's perspective? What does it mean to sensitively decide for usage of alias, thunk, or local redirection? I think this is just a technical detail of the implementation. I would not put that into user manual. It means that for some functions you can make alias, for others you need thunk (so addresses stay different) Gerald
Re: [PATCH 3/5] IPA ICF pass
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c new file mode 100644 index 000..8a13dca --- /dev/null +++ b/gcc/ipa-icf.c +/* Itializes internal structures according to given number of initialize + if (is_a_helpercgraph_node *::test (node)) shouldn't you just use is_acgraph_node* (node) ? +sem_item_optimizer::filter_removed_items (void) +{ + vec sem_item * filtered; + filtered.create (m_items.length()); use auto_vec here? + m_items.release (); + + for (unsigned int i = 0; i filtered.length(); i++) +m_items.safe_push (filtered[i]); hrm, maybe adding vec::swap makes sense. + if (c-members.length() 1) + { + vec sem_item * new_vector; + new_vector.create (c-members.length ()); same comment about auto_vec and swap. +bool +sem_item_optimizer::release_split_map (__attribute__((__unused__)) congruence_class * +const cls, +__attribute__((__unused__)) bitmap const b, that one is definitly used +__attribute__((__unused__)) traverse_split_pair *pair) Why can't you just leave the arguments unnamed to fix the warning? +sem_item_optimizer::do_congruence_step_for_index (congruence_class *cls, +unsigned int index) +{ + hash_map congruence_class *, bitmap *split_map = + new hash_map congruence_class *, bitmap (); why aren't you putting the hash_map on the stack? in any case it looks like you fail to delete it when your done with it. diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h new file mode 100644 index 000..d328dd6 --- /dev/null +++ b/gcc/ipa-icf.h @@ -0,0 +1,743 @@ +/* Prints string STRING to a FILE with a given number of SPACE_COUNT. */ +#define FPUTS_SPACES(file, space_count, string) \ + do \ + { \ +fprintf (file, %*s string, space_count, ); \ seems like you could do this with a static inline function. +/* Prints a MESSAGE to dump_file if exists. */ +#define SE_DUMP_MESSAGE(message) \ + do \ + { \ +if (dump_file (dump_flags TDF_DETAILS)) \ + fprintf (dump_file, debug message: %s (%s:%u)\n, message, __func__, __LINE__); \ a inline function that used the builtins like the memory statis stuff might be slightly less ugly imho +/* Logs a MESSAGE to dump_file if exists and returns false. */ +#define SE_EXIT_FALSE_WITH_MSG(message) \ + do \ + { \ +if (dump_file (dump_flags TDF_DETAILS)) \ + fprintf (dump_file, false returned: '%s' (%s:%u)\n, message, __func__, __LINE__); \ +return false; \ + } \ + while (false); ugh macros that effect control flow, instead maybe define a inline function that does the logging and then returns the value you want your function to return so you can write if (whatever) return SE_LOG (NULL, something or other); ? +/* Forward declaration for sem_func class. */ +class sem_item; comment is wrong, and imho useless. + /* Initializes internal structures according to given number of + source and target SSA names. The number of source names is SSA_SOURCE, + respectively SSA_TARGET. */ + func_checker (unsigned ssa_source, unsigned sss_target); ssa_target? + /* Source to target edge map. */ + hash_map edge, edge *m_edge_map; is there a reason to not embedd these in the object? you seem to create them in the ctor and delete them in the dtor, so I expect they have teh same lifetime. +/* Basic block struct for sematic equality pass. */ semantic? +typedef struct sem_bb you don't need the typedef in C++ + /* Item type. */ + enum sem_item_type type; loose the enum keyword since you don't need it? +class sem_function: public sem_item +{ + /* COMPARED_FUNC is a function that we compare to. */ + sem_function *m_compared_func; this feels like a weird place for this, would func_checker maybe make more sense as a place to put it? +class sem_variable: public sem_item +{ + /* Initializes references to other semantic functions/variables. */ + inline virtual void init_refs () iirc defining with in a class definition implies inline. +typedef struct congruence_class_group +{ + hashval_t hash; + sem_item_type type; + vec congruence_class * classes; +} congruence_class_group_t; lose the typedef? + /* Returns true if a congruence class CLS is presented in worklist. */ s/presented/present/ ? Trev
Re: [PATCH 3/5] IPA ICF pass
On 06/26/14 12:46, Jan Hubicka wrote: So you've added this at -O2, what is the general compile-time impact? Would it make more sense to instead have it be part of -O3, particularly since ICF is rarely going to improve performance (sans icache issues). I think code size optimization not sacrifying any (or too much of) performance are generally very welcome at -O2. Compared to LLVM and Microsoft compilers we are on code bloat side at -O2. I'm not so much worried about runtime performance here, but compile-time performance. ICF would seem like a general win as we're likely going to be more icache efficient. So, just to be clear, if the compile-time impact is minimal, then I'll fully support -O2, but my worry is that it won't be minimal :( Is returning TRUE really the conservatively correct thing to do in the absence of aliasing information? Isn't that case really I don't know in which case the proper return value is FALSE? I think with -fno-strict-aliasing the set should be 0 (Richi?) and thus we can compare for equality. We probably can be on agressive side and let 0 alias set prevail the non-0. But that can be done incrementally. I'd think it should be zero in the -fno-strict-aliasing case. My concern was that in the -fno-strict-aliasing case we seems to always assume the objects are the same. Is that the safe thing to do? That hints that the comment for the function needs tweaking. It really doesn't say anything about what the return values really mean. There are few, like we can ignore weak or visibility attribute because we do produce alias with proper visibility anyway. My plan is to start removing those attributes from declarations once they are turned into suitable representation in symbol table (or for attributes like const/noreturn/pure where we have explicit decl flags). This will make our life bit easier later, too. We probably then can whitelist some attributes, but I would deal with this later. Sure, I don't mind going with the conservative approach and iterating to remove some of the limitations. Yep, there are no resonable orders on it. If function starts same in source code they ought to end up same here. Plan was to first match for exact equality and then play with smarter tricks here. Yea, that's fine too. It's conservatively correct and we may find that there just isn't much to gain by doing hte dfs walk to build indices and such for the CFG. I guess ultimately I just want someone to look at that issue and evaluate if what we're doing now is good enough or if we're missing most of the benefit because of something like bb index stability or something dumb like that. Yep, the pass has grown up to be rather long. The gimple equality testing is the main body of work, so perhaps doing this in separate file is good idea. I'd certainly like to see that happen both because of its size and because I think those bits are useful in other contexts. Overall, most of the stuff looks quite reasonable. jeff
Re: [PATCH 3/5] IPA ICF pass
On 06/24/2014 10:31 PM, Jeff Law wrote: On 06/13/14 04:44, mliska wrote: Hello, this is core of IPA ICF patchset. It adds new pass and registers all needed stuff related to a newly introduced interprocedural optimization. Algorithm description: In LGEN, we visit all read-only variables and functions. For each symbol, a hash value based on e.g. number of arguments, number of BB, GIMPLE CODES is computed (similar hash is computed for read-only variables). This kind of information is streamed for LTO. In WPA, we build congruence classes for all symbols having a same hash value. For functions, these classes are subdivided in WPA by argument type comparison. Each reference (a call or a variable reference) to another semantic item candidate is marked and stored for further congruence class reduction (similar algorithm as Value Numbering: www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf). For every congruence class of functions with more than one semantic function, we load function body. Having this information, we can process complete semantic function equality and subdivide such congruence class. Read-only variable class members are also deeply compared. After that, we process Value numbering algorithm to do a final subdivision. Finally, all items belonging to a congruence class with more than one item are merged. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Jan Hubicka hubi...@ucw.cz * Makefile.in: New pass object file added. * common.opt: New -fipa-icf flag introduced. * doc/invoke.texi: Documentation enhanced for the pass. * lto-section-in.c: New LTO section for a summary created by IPA-ICF. * lto-streamer.h: New section name introduced. * opts.c: Optimization is added to -O2. * passes.def: New pass added. * timevar.def: New time var for IPA-ICF. * tree-pass.h: Pass construction function. * ipa-icf.h: New pass header file added. * ipa-icf.c: New pass source file added. Hi Jeff, I must agree that the implementation of the patch is quite big. Suggested split makes sense, I'll do it. You'll note many of my comments are do you need to You may in fact be handling that stuff correctly, they're just things I'd like you to verify are properly handled. If they're properly handled just say so :-) At a high level, I think this needs to be broken down a bit more. We've got two high level concepts in ipa-icf. One is all the equivalence testing the other is using that information for the icf optimization. Splitting out the equivalence testing seems like a good thing to do as there's other contexts where it would be useful. Overall I think you're on the right path and we just need to iterate a bit on this part of the patchset. @@ -7862,6 +7863,14 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +Behavior is similar to Gold Linker ICF optimization. Symbols proved +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. So you've added this at -O2, what is the general compile-time impact? Would it make more sense to instead have it be part of -O3, particularly since ICF is rarely going to improve performance (sans icache issues). This was Honza's idea to put the optimization for -O2, I'll measure compile-time impact. + +/* Interprocedural Identical Code Folding for functions and + read-only variables. + + The goal of this transformation is to discover functions and read-only + variables which do have exactly the same semantics. + + In case of functions, + we could either create a virtual clone or do a simple function wrapper + that will call equivalent function. If the function is just locally visible, + all function calls can be redirected. For read-only variables, we create + aliases if possible. + + Optimization pass arranges as follows: + 1) All functions and read-only variables are visited and internal + data structure, either sem_function or sem_variables is created. + 2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are + saved and matched to corresponding sem_items. s/previoues/previous/ + 3) These declaration are ignored for equality check and are solved + by Value Numbering algorithm published by Alpert, Zadeck in 1992. + 4) We compute hash value for each symbol. + 5) Congruence classes are created based on hash value. If hash value are + equal, equals function is called and symbols are deeply compared. + We must prove that all SSA names, declarations and other items + correspond. + 6) Value Numbering is executed for these classes.
Re: [PATCH 3/5] IPA ICF pass
Jeff, thanks for review! I did some passes over the patch before it got to the ML, I am happy to have independent opinion. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +Behavior is similar to Gold Linker ICF optimization. Symbols proved +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. So you've added this at -O2, what is the general compile-time impact? Would it make more sense to instead have it be part of -O3, particularly since ICF is rarely going to improve performance (sans icache issues). I think code size optimization not sacrifying any (or too much of) performance are generally very welcome at -O2. Compared to LLVM and Microsoft compilers we are on code bloat side at -O2. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html has some numbers for -O2 GGC/LLVM. I believe this is result of tunning for relatively small benchamrks (SPECS) and I hope we could revisit -O2 for code size considerations for 4.10 somewhat. If tuned well, ICF has no reason to be expnesive wrt compile time. So lets shoot for that. The considerable donwside of enabling ICF IMO should be only disturbing effect on debug info. + return true; +} Isn't this really checking for equivalence? do correspond seems awkward here. The function turns the names equivalent on first invocation for a given name and later checks that this tentative equivalence holds. Not sure what is best name for it (originaly it was verify that did not sound right to me either) + +/* Verification function for edges E1 and E2. */ + +bool +func_checker::compare_edge (edge e1, edge e2) +{ + if (e1-flags != e2-flags) +return false; Presumably there's no flags we can safely ignore. So absolute equality seems reasonable here. Yep +/* Compare two types if are same aliases in case of strict aliasing + is enabled. */ +bool +sem_item::compare_for_aliasing (tree t1, tree t2) +{ + if (flag_strict_aliasing) +{ + alias_set_type s1 = get_deref_alias_set (TREE_TYPE (t1)); + alias_set_type s2 = get_deref_alias_set (TREE_TYPE (t2)); + + return s1 == s2; +} + + return true; +} Is returning TRUE really the conservatively correct thing to do in the absence of aliasing information? Isn't that case really I don't know in which case the proper return value is FALSE? I think with -fno-strict-aliasing the set should be 0 (Richi?) and thus we can compare for equality. We probably can be on agressive side and let 0 alias set prevail the non-0. But that can be done incrementally. We also need to match type inheritance equality for polymorphic types. I will add function for that into ipa-devirt. +/* References independent hash function. */ + +hashval_t +sem_function::get_hash (void) +{ + if(!hash) +{ + hash = 177454; /* Random number for function type. */ + + hash = iterative_hash_object (arg_count, hash); + hash = iterative_hash_object (bb_count, hash); + hash = iterative_hash_object (edge_count, hash); + hash = iterative_hash_object (cfg_checksum, hash); Does CFG_CHECKSUM encompass the bb/edge counts? It is one used by profiling code to match profiles, so it should. +SE_EXIT_FALSE(); + + if (!equals_wpa (item)) +return false; + + /* Checking function arguments. */ + tree decl1 = DECL_ATTRIBUTES (decl); + tree decl2 = DECL_ATTRIBUTES (compared_func-decl); So are there any attributes we can safely ignore? Probably not. However, we ought to handle the case where the attributes appear in different orders. There are few, like we can ignore weak or visibility attribute because we do produce alias with proper visibility anyway. My plan is to start removing those attributes from declarations once they are turned into suitable representation in symbol table (or for attributes like const/noreturn/pure where we have explicit decl flags). This will make our life bit easier later, too. We probably then can whitelist some attributes, but I would deal with this later. +/* Returns cgraph_node. */ + +struct cgraph_node * +sem_function::get_node (void) +{ + return cgraph (node); +} + +/* Initialize semantic item by info reachable during LTO WPA phase. */ + +void +sem_function::init_wpa (void) +{ + parse_tree_args (); +} inline? Worth or not worth the headache? We ought to autoinline simple wrappers even at -Os (for size) (I am not agains explicit inline keywords here tough) + +bool +sem_function::compare_bb (sem_bb_t *bb1, sem_bb_t *bb2, tree func1, tree func2) So this routine walks down the gimple statements and compares them for equality. Would it make sense to have the equality testing in gimple? That way if someone adds a new gimple code the places they need to
Re: [PATCH 3/5] IPA ICF pass
On 06/13/14 04:44, mliska wrote: Hello, this is core of IPA ICF patchset. It adds new pass and registers all needed stuff related to a newly introduced interprocedural optimization. Algorithm description: In LGEN, we visit all read-only variables and functions. For each symbol, a hash value based on e.g. number of arguments, number of BB, GIMPLE CODES is computed (similar hash is computed for read-only variables). This kind of information is streamed for LTO. In WPA, we build congruence classes for all symbols having a same hash value. For functions, these classes are subdivided in WPA by argument type comparison. Each reference (a call or a variable reference) to another semantic item candidate is marked and stored for further congruence class reduction (similar algorithm as Value Numbering: www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf). For every congruence class of functions with more than one semantic function, we load function body. Having this information, we can process complete semantic function equality and subdivide such congruence class. Read-only variable class members are also deeply compared. After that, we process Value numbering algorithm to do a final subdivision. Finally, all items belonging to a congruence class with more than one item are merged. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Jan Hubicka hubi...@ucw.cz * Makefile.in: New pass object file added. * common.opt: New -fipa-icf flag introduced. * doc/invoke.texi: Documentation enhanced for the pass. * lto-section-in.c: New LTO section for a summary created by IPA-ICF. * lto-streamer.h: New section name introduced. * opts.c: Optimization is added to -O2. * passes.def: New pass added. * timevar.def: New time var for IPA-ICF. * tree-pass.h: Pass construction function. * ipa-icf.h: New pass header file added. * ipa-icf.c: New pass source file added. You'll note many of my comments are do you need to You may in fact be handling that stuff correctly, they're just things I'd like you to verify are properly handled. If they're properly handled just say so :-) At a high level, I think this needs to be broken down a bit more. We've got two high level concepts in ipa-icf. One is all the equivalence testing the other is using that information for the icf optimization. Splitting out the equivalence testing seems like a good thing to do as there's other contexts where it would be useful. Overall I think you're on the right path and we just need to iterate a bit on this part of the patchset. @@ -7862,6 +7863,14 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +Behavior is similar to Gold Linker ICF optimization. Symbols proved +as semantically equivalent are redirected to corresponding symbol. The pass +sensitively decides for usage of alias, thunk or local redirection. +This flag is enabled by default at @option{-O2}. So you've added this at -O2, what is the general compile-time impact? Would it make more sense to instead have it be part of -O3, particularly since ICF is rarely going to improve performance (sans icache issues). + +/* Interprocedural Identical Code Folding for functions and + read-only variables. + + The goal of this transformation is to discover functions and read-only + variables which do have exactly the same semantics. + + In case of functions, + we could either create a virtual clone or do a simple function wrapper + that will call equivalent function. If the function is just locally visible, + all function calls can be redirected. For read-only variables, we create + aliases if possible. + + Optimization pass arranges as follows: + 1) All functions and read-only variables are visited and internal + data structure, either sem_function or sem_variables is created. + 2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are + saved and matched to corresponding sem_items. s/previoues/previous/ + 3) These declaration are ignored for equality check and are solved + by Value Numbering algorithm published by Alpert, Zadeck in 1992. + 4) We compute hash value for each symbol. + 5) Congruence classes are created based on hash value. If hash value are + equal, equals function is called and symbols are deeply compared. + We must prove that all SSA names, declarations and other items + correspond. + 6) Value Numbering is executed for these classes. At the end of the process + all symbol members in remaining classes can be mrged. s/mrged/merged. + 7) Merge operation creates alias in case of read-only variables. For + callgraph