Re: [PATCH 3/5] IPA ICF pass

2015-01-20 Thread H.J. Lu
On Thu, Nov 13, 2014 at 2:17 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Oct 15, 2014 at 10:03 AM, Martin Liška mli...@suse.cz wrote:

 Hello

 There's final version of the patch I'm going to commit tomorrow in the
 morning (CEST).
 Thank you Honza for the review.

 Martin

 This caused:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63856


This also caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64693

-- 
H.J.


Re: [PATCH 3/5] IPA ICF pass

2014-11-13 Thread H.J. Lu
On Wed, Oct 15, 2014 at 10:03 AM, Martin Liška mli...@suse.cz wrote:

 Hello

 There's final version of the patch I'm going to commit tomorrow in the
 morning (CEST).
 Thank you Honza for the review.

 Martin

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63856

-- 
H.J.


Re: [PATCH 3/5] IPA ICF pass

2014-11-06 Thread Jan Hubicka
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747 is likely caused by
 this patch. compare_gimple_switch does not check CASE_LOW and
 CASE_HIGH, resulting merging functions not identical.
 
 Interestingly in the first a few versions of this patch CASE_LOW/HIGH
 were checked. But last versions only checks CASE_LABEL. What was the
 intention?

It sounds like an accidental change.  CASE_LOW/CASE_HIGH needs to be
 compared ;)

Honza
 
 Thanks,
 Joey
 
 On Thu, Oct 23, 2014 at 5:18 AM, Jiong Wang
 wong.kwongyuan.to...@gmail.com wrote:
  PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is
  caused by this commit.
 
  from the backtrace, the ICF pass is trying to compare two label tree
  node without type info.
 
  while looks like compare_operand expect the type info always be not
  empty before invoking func_checker::compatible_types_p.
 
  I think we should add a similiar t1/t2 check at the start of
  func_checker::compatible_types_p, just
  like what has been done at the head of func_checker::compare_operand.
 
  And I verified if we add that check, the crash gone away.
 
  Regards,
  Jiong
 
 
  2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz:
  On 10/14/2014 06:04 PM, Jan Hubicka wrote:
 
  diff --git a/gcc/cgraph.h b/gcc/cgraph.h
  index fb41b01..2de98b4 100644
  --- a/gcc/cgraph.h
  +++ b/gcc/cgraph.h
  @@ -172,6 +172,12 @@ public:
  /* Dump referring in list to FILE.  */
  void dump_referring (FILE *);
 
  +  /* Get number of references for this node.  */
  +  inline unsigned get_references_count (void)
  +  {
  +return ref_list.references ? ref_list.references-length () : 0;
  +  }
 
 
  Probably better called num_references() (like we have num_edge in
  basic-block.h)
 
  @@ -8068,6 +8069,19 @@ it may significantly increase code size
(see @option{--param ipcp-unit-growth=@var{value}}).
This flag is enabled by default at @option{-O3}.
 
  +@item -fipa-icf
  +@opindex fipa-icf
  +Perform Identical Code Folding for functions and read-only variables.
  +The optimization reduces code size and may disturb unwind stacks by
  replacing
  +a function by equivalent one with a different name. The optimization
  works
  +more effectively with link time optimization enabled.
  +
  +Nevertheless the behavior is similar to Gold Linker ICF optimization,
  GCC ICF
  +works on different levels and thus the optimizations are not same -
  there are
  +equivalences that are found only by GCC and equivalences found only by
  Gold.
  +
  +This flag is enabled by default at @option{-O2}.
 
  ... and -Os?
 
  +case ARRAY_REF:
  +case ARRAY_RANGE_REF:
  +  {
  +   x1 = TREE_OPERAND (t1, 0);
  +   x2 = TREE_OPERAND (t2, 0);
  +   y1 = TREE_OPERAND (t1, 1);
  +   y2 = TREE_OPERAND (t2, 1);
  +
  +   if (!compare_operand (array_ref_low_bound (t1),
  + array_ref_low_bound (t2)))
  + return return_false_with_msg ();
  +   if (!compare_operand (array_ref_element_size (t1),
  + array_ref_element_size (t2)))
  + return return_false_with_msg ();
  +   if (!compare_operand (x1, x2))
  + return return_false_with_msg ();
  +   return compare_operand (y1, y2);
  +  }
 
 
  No need for {...} if there are no local vars.
 
  +bool
  +func_checker::compare_function_decl (tree t1, tree t2)
  +{
  +  bool ret = false;
  +
  +  if (t1 == t2)
  +return true;
  +
  +  symtab_node *n1 = symtab_node::get (t1);
  +  symtab_node *n2 = symtab_node::get (t2);
  +
  +  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
  +{
  +  ret = m_ignored_source_nodes-contains (n1)
  +m_ignored_target_nodes-contains (n2);
  +
  +  if (ret)
  +   return true;
  +}
  +
  +  /* If function decl is WEAKREF, we compare targets.  */
  +  cgraph_node *f1 = cgraph_node::get (t1);
  +  cgraph_node *f2 = cgraph_node::get (t2);
  +
  +  if(f1  f2  f1-weakref  f2-weakref)
  +ret = f1-alias_target == f2-alias_target;
  +
  +  return ret;
 
 
  Comparing aliases is bit more complicated than just handling weakrefs. I
  have
  patch for symtab_node::equivalent_address_p somewhre in queue.  lets just
  drop
  the fancy stuff for the moment and compare f1f2 for equivalence.
 
  +  ret = compare_decl (t1, t2);
 
 
  Why functions are not compared with compare_decl while variables are?
 
  +
  +  return return_with_debug (ret);
  +}
  +
  +void
  +func_checker::parse_labels (sem_bb *bb)
  +{
  +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p
  (gsi);
  +   gsi_next (gsi))
  +{
  +  gimple stmt = gsi_stmt (gsi);
  +
  +  if (gimple_code (stmt) == GIMPLE_LABEL)
  +   {
  + tree t = gimple_label_label (stmt);
  + gcc_assert (TREE_CODE (t) == LABEL_DECL);
  +
  + m_label_bb_map.put (t, bb-bb-index);
  +   }
  +}
  +}
  +
  +/* Basic block equivalence comparison function that returns true if
  +  

Re: [PATCH 3/5] IPA ICF pass

2014-11-05 Thread Joey Ye
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63747 is likely caused by
this patch. compare_gimple_switch does not check CASE_LOW and
CASE_HIGH, resulting merging functions not identical.

Interestingly in the first a few versions of this patch CASE_LOW/HIGH
were checked. But last versions only checks CASE_LABEL. What was the
intention?

Thanks,
Joey

On Thu, Oct 23, 2014 at 5:18 AM, Jiong Wang
wong.kwongyuan.to...@gmail.com wrote:
 PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is
 caused by this commit.

 from the backtrace, the ICF pass is trying to compare two label tree
 node without type info.

 while looks like compare_operand expect the type info always be not
 empty before invoking func_checker::compatible_types_p.

 I think we should add a similiar t1/t2 check at the start of
 func_checker::compatible_types_p, just
 like what has been done at the head of func_checker::compare_operand.

 And I verified if we add that check, the crash gone away.

 Regards,
 Jiong


 2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz:
 On 10/14/2014 06:04 PM, Jan Hubicka wrote:

 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index fb41b01..2de98b4 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -172,6 +172,12 @@ public:
 /* Dump referring in list to FILE.  */
 void dump_referring (FILE *);

 +  /* Get number of references for this node.  */
 +  inline unsigned get_references_count (void)
 +  {
 +return ref_list.references ? ref_list.references-length () : 0;
 +  }


 Probably better called num_references() (like we have num_edge in
 basic-block.h)

 @@ -8068,6 +8069,19 @@ it may significantly increase code size
   (see @option{--param ipcp-unit-growth=@var{value}}).
   This flag is enabled by default at @option{-O3}.

 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +The optimization reduces code size and may disturb unwind stacks by
 replacing
 +a function by equivalent one with a different name. The optimization
 works
 +more effectively with link time optimization enabled.
 +
 +Nevertheless the behavior is similar to Gold Linker ICF optimization,
 GCC ICF
 +works on different levels and thus the optimizations are not same -
 there are
 +equivalences that are found only by GCC and equivalences found only by
 Gold.
 +
 +This flag is enabled by default at @option{-O2}.

 ... and -Os?

 +case ARRAY_REF:
 +case ARRAY_RANGE_REF:
 +  {
 +   x1 = TREE_OPERAND (t1, 0);
 +   x2 = TREE_OPERAND (t2, 0);
 +   y1 = TREE_OPERAND (t1, 1);
 +   y2 = TREE_OPERAND (t2, 1);
 +
 +   if (!compare_operand (array_ref_low_bound (t1),
 + array_ref_low_bound (t2)))
 + return return_false_with_msg ();
 +   if (!compare_operand (array_ref_element_size (t1),
 + array_ref_element_size (t2)))
 + return return_false_with_msg ();
 +   if (!compare_operand (x1, x2))
 + return return_false_with_msg ();
 +   return compare_operand (y1, y2);
 +  }


 No need for {...} if there are no local vars.

 +bool
 +func_checker::compare_function_decl (tree t1, tree t2)
 +{
 +  bool ret = false;
 +
 +  if (t1 == t2)
 +return true;
 +
 +  symtab_node *n1 = symtab_node::get (t1);
 +  symtab_node *n2 = symtab_node::get (t2);
 +
 +  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
 +{
 +  ret = m_ignored_source_nodes-contains (n1)
 +m_ignored_target_nodes-contains (n2);
 +
 +  if (ret)
 +   return true;
 +}
 +
 +  /* If function decl is WEAKREF, we compare targets.  */
 +  cgraph_node *f1 = cgraph_node::get (t1);
 +  cgraph_node *f2 = cgraph_node::get (t2);
 +
 +  if(f1  f2  f1-weakref  f2-weakref)
 +ret = f1-alias_target == f2-alias_target;
 +
 +  return ret;


 Comparing aliases is bit more complicated than just handling weakrefs. I
 have
 patch for symtab_node::equivalent_address_p somewhre in queue.  lets just
 drop
 the fancy stuff for the moment and compare f1f2 for equivalence.

 +  ret = compare_decl (t1, t2);


 Why functions are not compared with compare_decl while variables are?

 +
 +  return return_with_debug (ret);
 +}
 +
 +void
 +func_checker::parse_labels (sem_bb *bb)
 +{
 +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p
 (gsi);
 +   gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  if (gimple_code (stmt) == GIMPLE_LABEL)
 +   {
 + tree t = gimple_label_label (stmt);
 + gcc_assert (TREE_CODE (t) == LABEL_DECL);
 +
 + m_label_bb_map.put (t, bb-bb-index);
 +   }
 +}
 +}
 +
 +/* Basic block equivalence comparison function that returns true if
 +   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.
 +
 +   In general, a collection of equivalence dictionaries is built for
 types
 +   like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This
 infrastructure
 +   is utilized by 

Re: [PATCH 3/5] IPA ICF pass

2014-10-22 Thread Jiong Wang
PR 63574 ICE building libjava (segfault) on arm-linux-gnueabihf is
caused by this commit.

from the backtrace, the ICF pass is trying to compare two label tree
node without type info.

while looks like compare_operand expect the type info always be not
empty before invoking func_checker::compatible_types_p.

I think we should add a similiar t1/t2 check at the start of
func_checker::compatible_types_p, just
like what has been done at the head of func_checker::compare_operand.

And I verified if we add that check, the crash gone away.

Regards,
Jiong


2014-10-15 18:03 GMT+01:00 Martin Liška mli...@suse.cz:
 On 10/14/2014 06:04 PM, Jan Hubicka wrote:

 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index fb41b01..2de98b4 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -172,6 +172,12 @@ public:
 /* Dump referring in list to FILE.  */
 void dump_referring (FILE *);

 +  /* Get number of references for this node.  */
 +  inline unsigned get_references_count (void)
 +  {
 +return ref_list.references ? ref_list.references-length () : 0;
 +  }


 Probably better called num_references() (like we have num_edge in
 basic-block.h)

 @@ -8068,6 +8069,19 @@ it may significantly increase code size
   (see @option{--param ipcp-unit-growth=@var{value}}).
   This flag is enabled by default at @option{-O3}.

 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +The optimization reduces code size and may disturb unwind stacks by
 replacing
 +a function by equivalent one with a different name. The optimization
 works
 +more effectively with link time optimization enabled.
 +
 +Nevertheless the behavior is similar to Gold Linker ICF optimization,
 GCC ICF
 +works on different levels and thus the optimizations are not same -
 there are
 +equivalences that are found only by GCC and equivalences found only by
 Gold.
 +
 +This flag is enabled by default at @option{-O2}.

 ... and -Os?

 +case ARRAY_REF:
 +case ARRAY_RANGE_REF:
 +  {
 +   x1 = TREE_OPERAND (t1, 0);
 +   x2 = TREE_OPERAND (t2, 0);
 +   y1 = TREE_OPERAND (t1, 1);
 +   y2 = TREE_OPERAND (t2, 1);
 +
 +   if (!compare_operand (array_ref_low_bound (t1),
 + array_ref_low_bound (t2)))
 + return return_false_with_msg ();
 +   if (!compare_operand (array_ref_element_size (t1),
 + array_ref_element_size (t2)))
 + return return_false_with_msg ();
 +   if (!compare_operand (x1, x2))
 + return return_false_with_msg ();
 +   return compare_operand (y1, y2);
 +  }


 No need for {...} if there are no local vars.

 +bool
 +func_checker::compare_function_decl (tree t1, tree t2)
 +{
 +  bool ret = false;
 +
 +  if (t1 == t2)
 +return true;
 +
 +  symtab_node *n1 = symtab_node::get (t1);
 +  symtab_node *n2 = symtab_node::get (t2);
 +
 +  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
 +{
 +  ret = m_ignored_source_nodes-contains (n1)
 +m_ignored_target_nodes-contains (n2);
 +
 +  if (ret)
 +   return true;
 +}
 +
 +  /* If function decl is WEAKREF, we compare targets.  */
 +  cgraph_node *f1 = cgraph_node::get (t1);
 +  cgraph_node *f2 = cgraph_node::get (t2);
 +
 +  if(f1  f2  f1-weakref  f2-weakref)
 +ret = f1-alias_target == f2-alias_target;
 +
 +  return ret;


 Comparing aliases is bit more complicated than just handling weakrefs. I
 have
 patch for symtab_node::equivalent_address_p somewhre in queue.  lets just
 drop
 the fancy stuff for the moment and compare f1f2 for equivalence.

 +  ret = compare_decl (t1, t2);


 Why functions are not compared with compare_decl while variables are?

 +
 +  return return_with_debug (ret);
 +}
 +
 +void
 +func_checker::parse_labels (sem_bb *bb)
 +{
 +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p
 (gsi);
 +   gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  if (gimple_code (stmt) == GIMPLE_LABEL)
 +   {
 + tree t = gimple_label_label (stmt);
 + gcc_assert (TREE_CODE (t) == LABEL_DECL);
 +
 + m_label_bb_map.put (t, bb-bb-index);
 +   }
 +}
 +}
 +
 +/* Basic block equivalence comparison function that returns true if
 +   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.
 +
 +   In general, a collection of equivalence dictionaries is built for
 types
 +   like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This
 infrastructure
 +   is utilized by every statement-by-stament comparison function.  */
 +
 +bool
 +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
 +{
 +  unsigned i;
 +  gimple_stmt_iterator gsi1, gsi2;
 +  gimple s1, s2;
 +
 +  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
 +  || bb1-edge_count != bb2-edge_count)
 +return return_false ();
 +
 +  gsi1 = gsi_start_bb (bb1-bb);
 +  gsi2 = gsi_start_bb (bb2-bb);
 +
 +  for (i = 0; i  bb1-nondbg_stmt_count; i++)
 + 

Re: [PATCH 3/5] IPA ICF pass

2014-10-15 Thread Martin Liška

On 10/14/2014 06:04 PM, Jan Hubicka wrote:

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index fb41b01..2de98b4 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -172,6 +172,12 @@ public:
/* Dump referring in list to FILE.  */
void dump_referring (FILE *);

+  /* Get number of references for this node.  */
+  inline unsigned get_references_count (void)
+  {
+return ref_list.references ? ref_list.references-length () : 0;
+  }


Probably better called num_references() (like we have num_edge in basic-block.h)

@@ -8068,6 +8069,19 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.

+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.
+The optimization reduces code size and may disturb unwind stacks by replacing
+a function by equivalent one with a different name. The optimization works
+more effectively with link time optimization enabled.
+
+Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
+works on different levels and thus the optimizations are not same - there are
+equivalences that are found only by GCC and equivalences found only by Gold.
+
+This flag is enabled by default at @option{-O2}.

... and -Os?

+case ARRAY_REF:
+case ARRAY_RANGE_REF:
+  {
+   x1 = TREE_OPERAND (t1, 0);
+   x2 = TREE_OPERAND (t2, 0);
+   y1 = TREE_OPERAND (t1, 1);
+   y2 = TREE_OPERAND (t2, 1);
+
+   if (!compare_operand (array_ref_low_bound (t1),
+ array_ref_low_bound (t2)))
+ return return_false_with_msg ();
+   if (!compare_operand (array_ref_element_size (t1),
+ array_ref_element_size (t2)))
+ return return_false_with_msg ();
+   if (!compare_operand (x1, x2))
+ return return_false_with_msg ();
+   return compare_operand (y1, y2);
+  }


No need for {...} if there are no local vars.

+bool
+func_checker::compare_function_decl (tree t1, tree t2)
+{
+  bool ret = false;
+
+  if (t1 == t2)
+return true;
+
+  symtab_node *n1 = symtab_node::get (t1);
+  symtab_node *n2 = symtab_node::get (t2);
+
+  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
+{
+  ret = m_ignored_source_nodes-contains (n1)
+m_ignored_target_nodes-contains (n2);
+
+  if (ret)
+   return true;
+}
+
+  /* If function decl is WEAKREF, we compare targets.  */
+  cgraph_node *f1 = cgraph_node::get (t1);
+  cgraph_node *f2 = cgraph_node::get (t2);
+
+  if(f1  f2  f1-weakref  f2-weakref)
+ret = f1-alias_target == f2-alias_target;
+
+  return ret;


Comparing aliases is bit more complicated than just handling weakrefs. I have
patch for symtab_node::equivalent_address_p somewhre in queue.  lets just drop
the fancy stuff for the moment and compare f1f2 for equivalence.

+  ret = compare_decl (t1, t2);


Why functions are not compared with compare_decl while variables are?

+
+  return return_with_debug (ret);
+}
+
+void
+func_checker::parse_labels (sem_bb *bb)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi);
+   gsi_next (gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+
+  if (gimple_code (stmt) == GIMPLE_LABEL)
+   {
+ tree t = gimple_label_label (stmt);
+ gcc_assert (TREE_CODE (t) == LABEL_DECL);
+
+ m_label_bb_map.put (t, bb-bb-index);
+   }
+}
+}
+
+/* Basic block equivalence comparison function that returns true if
+   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.
+
+   In general, a collection of equivalence dictionaries is built for types
+   like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure
+   is utilized by every statement-by-stament comparison function.  */
+
+bool
+func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
+{
+  unsigned i;
+  gimple_stmt_iterator gsi1, gsi2;
+  gimple s1, s2;
+
+  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
+  || bb1-edge_count != bb2-edge_count)
+return return_false ();
+
+  gsi1 = gsi_start_bb (bb1-bb);
+  gsi2 = gsi_start_bb (bb2-bb);
+
+  for (i = 0; i  bb1-nondbg_stmt_count; i++)
+{
+  if (is_gimple_debug (gsi_stmt (gsi1)))
+   gsi_next_nondebug (gsi1);
+
+  if (is_gimple_debug (gsi_stmt (gsi2)))
+   gsi_next_nondebug (gsi2);
+
+  s1 = gsi_stmt (gsi1);
+  s2 = gsi_stmt (gsi2);
+
+  int eh1 = lookup_stmt_eh_lp_fn
+   (DECL_STRUCT_FUNCTION (m_source_func_decl), s1);
+  int eh2 = lookup_stmt_eh_lp_fn
+   (DECL_STRUCT_FUNCTION (m_target_func_decl), s2);
+
+  if (eh1 != eh2)
+   return return_false_with_msg (EH regions are different);
+
+  if (gimple_code (s1) != gimple_code (s2))
+   return return_false_with_msg (gimple codes are different);
+
+  switch (gimple_code (s1))
+   {
+   case GIMPLE_CALL:
+ if (!compare_gimple_call 

Re: [PATCH 3/5] IPA ICF pass

2014-10-14 Thread Jan Hubicka
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index fb41b01..2de98b4 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -172,6 +172,12 @@ public:
/* Dump referring in list to FILE.  */
void dump_referring (FILE *);
  
 +  /* Get number of references for this node.  */
 +  inline unsigned get_references_count (void)
 +  {
 +return ref_list.references ? ref_list.references-length () : 0;
 +  }

Probably better called num_references() (like we have num_edge in basic-block.h)
 @@ -8068,6 +8069,19 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.
  
 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +The optimization reduces code size and may disturb unwind stacks by replacing
 +a function by equivalent one with a different name. The optimization works
 +more effectively with link time optimization enabled.
 +
 +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
 +works on different levels and thus the optimizations are not same - there are
 +equivalences that are found only by GCC and equivalences found only by Gold.
 +
 +This flag is enabled by default at @option{-O2}.
... and -Os?
 +case ARRAY_REF:
 +case ARRAY_RANGE_REF:
 +  {
 + x1 = TREE_OPERAND (t1, 0);
 + x2 = TREE_OPERAND (t2, 0);
 + y1 = TREE_OPERAND (t1, 1);
 + y2 = TREE_OPERAND (t2, 1);
 +
 + if (!compare_operand (array_ref_low_bound (t1),
 +   array_ref_low_bound (t2)))
 +   return return_false_with_msg ();
 + if (!compare_operand (array_ref_element_size (t1),
 +   array_ref_element_size (t2)))
 +   return return_false_with_msg ();
 + if (!compare_operand (x1, x2))
 +   return return_false_with_msg ();
 + return compare_operand (y1, y2);
 +  }

No need for {...} if there are no local vars.
 +bool
 +func_checker::compare_function_decl (tree t1, tree t2)
 +{
 +  bool ret = false;
 +
 +  if (t1 == t2)
 +return true;
 +
 +  symtab_node *n1 = symtab_node::get (t1);
 +  symtab_node *n2 = symtab_node::get (t2);
 +
 +  if (m_ignored_source_nodes != NULL  m_ignored_target_nodes != NULL)
 +{
 +  ret = m_ignored_source_nodes-contains (n1)
 +  m_ignored_target_nodes-contains (n2);
 +
 +  if (ret)
 + return true;
 +}
 +
 +  /* If function decl is WEAKREF, we compare targets.  */
 +  cgraph_node *f1 = cgraph_node::get (t1);
 +  cgraph_node *f2 = cgraph_node::get (t2);
 +
 +  if(f1  f2  f1-weakref  f2-weakref)
 +ret = f1-alias_target == f2-alias_target;
 +
 +  return ret;

Comparing aliases is bit more complicated than just handling weakrefs. I have
patch for symtab_node::equivalent_address_p somewhre in queue.  lets just drop
the fancy stuff for the moment and compare f1f2 for equivalence.
 +  ret = compare_decl (t1, t2);

Why functions are not compared with compare_decl while variables are?
 +
 +  return return_with_debug (ret);
 +}
 +
 +void
 +func_checker::parse_labels (sem_bb *bb)
 +{
 +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi);
 +   gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  if (gimple_code (stmt) == GIMPLE_LABEL)
 + {
 +   tree t = gimple_label_label (stmt);
 +   gcc_assert (TREE_CODE (t) == LABEL_DECL);
 +
 +   m_label_bb_map.put (t, bb-bb-index);
 + }
 +}
 +}
 +
 +/* Basic block equivalence comparison function that returns true if
 +   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.
 +
 +   In general, a collection of equivalence dictionaries is built for types
 +   like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This 
 infrastructure
 +   is utilized by every statement-by-stament comparison function.  */
 +
 +bool
 +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
 +{
 +  unsigned i;
 +  gimple_stmt_iterator gsi1, gsi2;
 +  gimple s1, s2;
 +
 +  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
 +  || bb1-edge_count != bb2-edge_count)
 +return return_false ();
 +
 +  gsi1 = gsi_start_bb (bb1-bb);
 +  gsi2 = gsi_start_bb (bb2-bb);
 +
 +  for (i = 0; i  bb1-nondbg_stmt_count; i++)
 +{
 +  if (is_gimple_debug (gsi_stmt (gsi1)))
 + gsi_next_nondebug (gsi1);
 +
 +  if (is_gimple_debug (gsi_stmt (gsi2)))
 + gsi_next_nondebug (gsi2);
 +
 +  s1 = gsi_stmt (gsi1);
 +  s2 = gsi_stmt (gsi2);
 +
 +  int eh1 = lookup_stmt_eh_lp_fn
 + (DECL_STRUCT_FUNCTION (m_source_func_decl), s1);
 +  int eh2 = lookup_stmt_eh_lp_fn
 + (DECL_STRUCT_FUNCTION (m_target_func_decl), s2);
 +
 +  if (eh1 != eh2)
 + return return_false_with_msg (EH regions are different);
 +
 +  if (gimple_code (s1) != gimple_code (s2))
 + return return_false_with_msg (gimple codes are different);
 +
 +  switch (gimple_code (s1))
 + {
 + case GIMPLE_CALL:
 + 

Re: [PATCH 3/5] IPA ICF pass

2014-10-13 Thread Martin Liška

On 10/11/2014 10:19 AM, Jan Hubicka wrote:


After few days of measurement and tuning, I was able to get numbers to the 
following shape:
Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
   1412 kB ( 0%) ggc
  phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) wall 
1028813 kB (24%) ggc
  phase stream in :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) wall 
3246453 kB (76%) ggc
  phase stream out:   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) wall 
  2 kB ( 0%) ggc
  callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall 
 40 kB ( 0%) ggc
  ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) wall 
  0 kB ( 0%) ggc
  ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) wall 
 21 kB ( 0%) ggc
  ipa devirtualization:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall 
  13704 kB ( 0%) ggc
  ipa cp  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) wall 
 188558 kB ( 4%) ggc
  ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) wall 
 494738 kB (12%) ggc
  ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
  0 kB ( 0%) ggc
  ipa lto gimple in   :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) wall 
 537970 kB (13%) ggc
  ipa lto gimple out  :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) wall 
  2 kB ( 0%) ggc
  ipa lto decl in :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) wall 
2441687 kB (57%) ggc
  ipa lto decl out:   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) wall 
  0 kB ( 0%) ggc
  ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall 
  15692 kB ( 0%) ggc
  ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
  0 kB ( 0%) ggc
  ipa lto cgraph I/O  :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) wall 
 407182 kB (10%) ggc
  ipa lto decl merge  :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) wall 
   8220 kB ( 0%) ggc
  ipa lto cgraph merge:   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) wall 
  14605 kB ( 0%) ggc
  whopr wpa   :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) wall 
  1 kB ( 0%) ggc
  whopr wpa I/O   :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) wall 
  0 kB ( 0%) ggc
  whopr partitioning  :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) wall 
   4943 kB ( 0%) ggc
  ipa reference   :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) wall 
  0 kB ( 0%) ggc
  ipa profile :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall 
  0 kB ( 0%) ggc
  ipa pure const  :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) wall 
  0 kB ( 0%) ggc
  ipa icf :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) wall 
   1352 kB ( 0%) ggc
  inline parameters   :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall 
  0 kB ( 0%) ggc
  tree SSA rewrite:   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall 
  18919 kB ( 0%) ggc
  tree SSA other  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  0 kB ( 0%) ggc
  tree SSA incremental:   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) wall 
  11325 kB ( 0%) ggc
  tree operand scan   :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) wall 
 116283 kB ( 3%) ggc
  dominance frontiers :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
  0 kB ( 0%) ggc
  dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall 
  0 kB ( 0%) ggc
  varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) wall 
  0 kB ( 0%) ggc
  loop fini   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
  0 kB ( 0%) ggc
  unaccounted todo:   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) wall 
  0 kB ( 0%) ggc
  TOTAL :  47.49 3.4877.46
4276682 kB

and I was able to reduce function bodies loaded in WPA to 35% (from previous 
55%). The main problem


35% means that 35% of all function bodies are compared with something else? 
That feels pretty high.
but overall numbers are not so terrible.


Currently, the pass is able to merge 32K functions. As you know, we group 
functions to so called classes.
According to stats, average non-singular class size contains at the end of 
comparison 7.39 candidates and we
have 5K such functions. Because we load body for each candidate in such groups, 
it gives us minimum number
of loaded bodies: 37K. As we load 70K function, we have still place to improve. 
But I guess WPA body-less
comparison is quite efficient.




with speed was hidden in work list for congruence classes, where hash_set was 
used. I chose the data
structure to support delete operation, but it was really slow. Thus, hash_set 
was replaced with linked list
and a flag is used to identify if a set is 

Re: [PATCH 3/5] IPA ICF pass

2014-10-13 Thread Jan Hubicka
 
 35% means that 35% of all function bodies are compared with something else? 
 That feels pretty high.
 but overall numbers are not so terrible.
 
 Currently, the pass is able to merge 32K functions. As you know, we group 
 functions to so called classes.
 According to stats, average non-singular class size contains at the end of 
 comparison 7.39 candidates and we
 have 5K such functions. Because we load body for each candidate in such 
 groups, it gives us minimum number
 of loaded bodies: 37K. As we load 70K function, we have still place to 
 improve. But I guess WPA body-less
 comparison is quite efficient.

OK, that seems resonable.
 
 
 with speed was hidden in work list for congruence classes, where hash_set 
 was used. I chose the data
 structure to support delete operation, but it was really slow. Thus, 
 hash_set was replaced with linked list
 and a flag is used to identify if a set is removed or not.
 
 Interesting, I would not expect bottleneck in a congruence solving :)
 
 The problem was just the hash_set that showed to be slow data structure for a 
 set of operations needed
 in congruence solving.
 
 
 I have no clue who complicated can it be to implement release_body function 
 to an operation that
 really releases the memory?
 
 I suppose one can keep the caches from streamer and free trees read.  Freeing
 gimple statemnts, cfg should be relatively easy.
 
 Lets however first try to tune the implementation rather than try to this 
 hack
 implemented. Explicit ggc_free calls traditionally tended to cause some 
 negative
 reactions wrt memory fragmentation concerns.
 
 Agree with suggested approach.

In future we actually may keep the duplicated functions in WPA memory and use 
corresponding body
whenever the function is inlined to avoid disturbing debug info more than 
needed.

Honza
 
 
 
 Markus' problem with -fprofile-use has been removed, IPA-ICF is preceding 
 devirtualization pass. I hope it is fine?
 
 Yes, I think devirtualization should actually work better with identical
 virutal methods merged.  We just need to be sure it sees through the newly
 introduced aliases (there should be no thunks for virutal methods)
 
 Thanks,
 Martin
 
 
 Honza
 


Re: [PATCH 3/5] IPA ICF pass

2014-10-13 Thread Martin Liška

On 10/11/2014 02:05 AM, Martin Liška wrote:

On 09/26/2014 09:46 PM, Jan Hubicka wrote:

Hi,
this is on ipa-icf-gimple.c

@@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void)
{
  if (verify_edge_corresponds_to_fndecl (e, decl))
{
- error (edge points to wrong declaration:);
- debug_tree (e-callee-decl);
- fprintf (stderr, Instead of:);
- debug_tree (decl);
- error_found = true;
+ /* The edge can be redirected in WPA by IPA 
ICF.
+Following check really ensures that it's
+not the case.  */
+
+ cgraph_node *current_node = cgraph_node::get 
(decl);
+ if (!current_node || 
!current_node-icf_merged)

I would move this into verify_edge_corresponds_to_fndecl.

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
new file mode 100644
index 000..7031eaa
--- /dev/null
+++ b/gcc/ipa-icf-gimple.c
@@ -0,0 +1,384 @@
+/* Interprocedural Identical Code Folding pass
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska 
mli...@suse.cz
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */

Please add toplevel comment about what the code does and how to use it.

+namespace ipa_icf {
+
+/* Basic block equivalence comparison function that returns true if
+   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.  */
... to each other?
I would add short comment that as comparsion goes you build voclabulary
of equivalences of variables/ssanames etc.
So people reading the code do not get lost at very beggining.

+
+bool
+func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
+{
+  unsigned i;
+  gimple_stmt_iterator gsi1, gsi2;
+  gimple s1, s2;
+
+  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
+  || bb1-edge_count != bb2-edge_count)
+return RETURN_FALSE ();

The UPPERCASE looks ugly.  I see that RETURN_FALSE is a warpper for 
return_false_with_msg
that outputs line and file information.

I would make it lowercase even if it is macro. You may consider using
CXX_MEM_STAT_INFO style default argument to avoid function macro completely.
Probably not big win given that it won't save you from preprocesor mess.
+
+  gsi1 = gsi_start_bb (bb1-bb);
+  gsi2 = gsi_start_bb (bb2-bb);
+
+  for (i = 0; i  bb1-nondbg_stmt_count; i++)
+{
+  if (is_gimple_debug (gsi_stmt (gsi1)))
+   gsi_next_nondebug (gsi1);
+
+  if (is_gimple_debug (gsi_stmt (gsi2)))
+   gsi_next_nondebug (gsi2);
+
+  s1 = gsi_stmt (gsi1);
+  s2 = gsi_stmt (gsi2);
+
+  if (gimple_code (s1) != gimple_code (s2))
+   return RETURN_FALSE_WITH_MSG (gimple codes are different);

I think you need to compare EH here.  Consider case where one unit
is compiled with -fno-exception and thus all EH regions are removed,
while other function has EH regions in it.  Those are not equivalent.

EH region is obtained by lookup_stmt_eh and then you need to comapre
them for match as you do with gimple_resx_regoin.

+  t1 = gimple_call_fndecl (s1);
+  t2 = gimple_call_fndecl (s2);
+
+  /* Function pointer variables are not supported yet.  */

They seems to be, compare_operand seems just right.

+
+/* Verifies for given GIMPLEs S1 and S2 that
+   label statements are semantically equivalent.  */
+
+bool
+func_checker::compare_gimple_label (gimple g1, gimple g2)
+{
+  if (m_ignore_labels)
+return true;
+
+  tree t1 = gimple_label_label (g1);
+  tree t2 = gimple_label_label (g2);
+
+  return compare_tree_ssa_label (t1, t2);
+}

I would expect the main BB loop to record BB in which label belongs to
and the BB assciatio neing checked here.
Otherwise I do not see how switch statements are compared to not have
different permutations of targets. Also note that one BB may have
multiple labels in them and they are equivalent.

Also I would punt on occurence of FORCED_LABEL. Those are tricky as they
may be passed around and compared for address and no one really defines
what should happen.  Better to avoid those.


Hi.

I will remove this support in the pass.



+
+/* Verifies for given 

Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
 
 After few days of measurement and tuning, I was able to get numbers to the 
 following shape:
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) 
 wall 1028813 kB (24%) ggc
  phase stream in :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) 
 wall 3246453 kB (76%) ggc
  phase stream out:   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) 
 wall   2 kB ( 0%) ggc
  callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) 
 wall  40 kB ( 0%) ggc
  ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) 
 wall   0 kB ( 0%) ggc
  ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) 
 wall  21 kB ( 0%) ggc
  ipa devirtualization:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) 
 wall   13704 kB ( 0%) ggc
  ipa cp  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) 
 wall  188558 kB ( 4%) ggc
  ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) 
 wall  494738 kB (12%) ggc
  ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto gimple in   :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) 
 wall  537970 kB (13%) ggc
  ipa lto gimple out  :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) 
 wall   2 kB ( 0%) ggc
  ipa lto decl in :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) 
 wall 2441687 kB (57%) ggc
  ipa lto decl out:   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) 
 wall   0 kB ( 0%) ggc
  ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) 
 wall   15692 kB ( 0%) ggc
  ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto cgraph I/O  :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) 
 wall  407182 kB (10%) ggc
  ipa lto decl merge  :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) 
 wall8220 kB ( 0%) ggc
  ipa lto cgraph merge:   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) 
 wall   14605 kB ( 0%) ggc
  whopr wpa   :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) 
 wall   1 kB ( 0%) ggc
  whopr wpa I/O   :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) 
 wall   0 kB ( 0%) ggc
  whopr partitioning  :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) 
 wall4943 kB ( 0%) ggc
  ipa reference   :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) 
 wall   0 kB ( 0%) ggc
  ipa profile :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa pure const  :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) 
 wall   0 kB ( 0%) ggc
  ipa icf :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) 
 wall1352 kB ( 0%) ggc
  inline parameters   :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) 
 wall   0 kB ( 0%) ggc
  tree SSA rewrite:   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) 
 wall   18919 kB ( 0%) ggc
  tree SSA other  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall   0 kB ( 0%) ggc
  tree SSA incremental:   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) 
 wall   11325 kB ( 0%) ggc
  tree operand scan   :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) 
 wall  116283 kB ( 3%) ggc
  dominance frontiers :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
 wall   0 kB ( 0%) ggc
  dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) 
 wall   0 kB ( 0%) ggc
  varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) 
 wall   0 kB ( 0%) ggc
  loop fini   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
 wall   0 kB ( 0%) ggc
  unaccounted todo:   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) 
 wall   0 kB ( 0%) ggc
  TOTAL :  47.49 3.4877.46
 4276682 kB
 
 and I was able to reduce function bodies loaded in WPA to 35% (from previous 
 55%). The main problem

35% means that 35% of all function bodies are compared with something else? 
That feels pretty high.
but overall numbers are not so terrible.

 with speed was hidden in work list for congruence classes, where hash_set was 
 used. I chose the data
 structure to support delete operation, but it was really slow. Thus, hash_set 
 was replaced with linked list
 and a flag is used to identify if a set is removed or not.

Interesting, I would not expect bottleneck in a congruence solving :)
 
 I have no clue who complicated can it be to implement release_body function 
 to an operation that
 really releases the memory?

I suppose one can keep the caches from streamer and free trees read.  Freeing
gimple statemnts, cfg should be relatively easy. 

Lets however first try to tune the implementation rather than try to this hack
implemented. Explicit ggc_free calls 

Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
  +/* Verifies for given GIMPLEs S1 and S2 that
  +   goto statements are semantically equivalent.  */
  +
  +bool
  +func_checker::compare_gimple_goto (gimple g1, gimple g2)
  +{
  +  tree dest1, dest2;
  +
  +  dest1 = gimple_goto_dest (g1);
  +  dest2 = gimple_goto_dest (g2);
  +
  +  if (TREE_CODE (dest1) != TREE_CODE (dest2) || TREE_CODE (dest1) != 
  SSA_NAME)
  +return false;
  +
  +  return compare_operand (dest1, dest2);
  
  You probably need to care only about indirect gotos, the direct ones are 
  checked by
  CFG compare.  So is the condtional jump.
 
 It looks that this code is visited quite rare.

Hmm, perhaps it is called only for indirect calls, because all others are not 
represented as statements.
 
  +
  +/* Verifies for given GIMPLEs S1 and S2 that ASM statements are equivalent.
  +   For the beginning, the pass only supports equality for
  +   '__asm__ __volatile__ (, , , memory)'.  */
  +
  +bool
  +func_checker::compare_gimple_asm (gimple g1, gimple g2)
  +{
  +  if (gimple_asm_volatile_p (g1) != gimple_asm_volatile_p (g2))
  +return false;
  +
  +  if (gimple_asm_ninputs (g1) || gimple_asm_ninputs (g2))
  +return false;
  +
  +  if (gimple_asm_noutputs (g1) || gimple_asm_noutputs (g2))
  +return false;
  +
  +  if (gimple_asm_nlabels (g1) || gimple_asm_nlabels (g2))
  +return false;
  +
  +  if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2))
  +return false;
  +
  +  for (unsigned i = 0; i  gimple_asm_nclobbers (g1); i++)
  +{
  +  tree clobber1 = TREE_VALUE (gimple_asm_clobber_op (g1, i));
  +  tree clobber2 = TREE_VALUE (gimple_asm_clobber_op (g2, i));
  +
  +  if (!operand_equal_p (clobber1, clobber2, OEP_ONLY_CONST))
  +   return false;
  +}
  +
  
  Even asm statements with no inputs or outputs can differ by the actual
  asm statement. Compare it too.
  
  Comparing inputs/outputs/labels should be very easy to do.
  
  Compare all gimple_asm_n* for equivalency.
 
 This makes fully sense, but I don't understand what kind of operands do you 
 mean?

You can look some other code dealing with gimple asm statements.  You can just 
compare
gimple_op for 0 gimple_num_ops and be ready to deal with TREE_LIST as 
described
bellow. 

Honza
 
  At the end walk operands and watch the case they are TREE_LIST.
  THen compare TREE_VALUE (op) of the list for operand_equal_p
  and TREE_VALUE (TREE_PURPOSE (op)) for equivalency
  (those are the constraints)
  
  If they are not (clobbers are not, those are just strings), operand_equal_p
  should do.
  
  +  return true;
  +}
  +
  +} // ipa_icf namespace
  
  Otherwise I think ipa-gimple-icf is quite fine now.
  Please send updated version and I think it can go to mainline before the 
  actual ipa-icf.
 
 I renamed both files and put them to a newly created namespace ipa_icf_gimple.
 
 Thank you,
 Martin


Re: [PATCH 3/5] IPA ICF pass

2014-10-11 Thread Jan Hubicka
 
 Hello.
 
 Yeah, you are right. But even Richard advised me to put it to a single place. 
 Maybe we are a bit
 more strict than it would be necessary. But I hope that's fine ;)

OK, lets do extra checking now and play with this incrementally.
 
 Good point. Do you mean cases like, foo (alias_foo) and bar (alias_bar). If 
 we prove that foo equals to bar, can we also merge aliases?
 I am curious if such comparison can really save something? Are there any 
 interesting cases?

What probably matters is that you recognize the equivalence to know that uses 
of alias_foo can be merged with uses alias_bar.
Similarly for thunks.  Again something to do incrementally I guess.

Honza
 
 Martin
 
 
  +case INTEGER_CST:
  +  {
  +  ret = types_are_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2))
  + wi::to_offset  (t1) == wi::to_offset  (t2);
  
tree_int_cst_equal
  
  +case FIELD_DECL:
  +  {
  +  tree fctx1 = DECL_FCONTEXT (t1);
  +  tree fctx2 = DECL_FCONTEXT (t2);
  
  DECL_FCONTEXT has no semantic meaning; so you can skip comparing it.
  +
  +  tree offset1 = DECL_FIELD_OFFSET (t1);
  +  tree offset2 = DECL_FIELD_OFFSET (t2);
  +
  +  tree bit_offset1 = DECL_FIELD_BIT_OFFSET (t1);
  +  tree bit_offset2 = DECL_FIELD_BIT_OFFSET (t2);
  +
  +  ret = compare_operand (fctx1, fctx2)
  + compare_operand (offset1, offset2)
  + compare_operand (bit_offset1, bit_offset2);
  
  You probably want to compare type here?
  +case CONSTRUCTOR:
  +  {
  +  unsigned len1 = vec_safe_length (CONSTRUCTOR_ELTS (t1));
  +  unsigned len2 = vec_safe_length (CONSTRUCTOR_ELTS (t2));
  +
  +  if (len1 != len2)
  +return false;
  +
  +  for (unsigned i = 0; i  len1; i++)
  +if (!sem_variable::equals (CONSTRUCTOR_ELT (t1, i)-value,
  +   CONSTRUCTOR_ELT (t2, i)-value))
  +  return false;
  
  You want to compare -index, too.
  +case INTEGER_CST:
  +  return func_checker::types_are_compatible_p (TREE_TYPE (t1), 
  TREE_TYPE (t2),
  +   true)
  +wi::to_offset (t1) == wi::to_offset (t2);
  again ;)
  
  This is where I stopped for now.  Generally the patch seems OK to me with 
  few of these
  details fixed.
  
  Honza
  


Re: [PATCH 3/5] IPA ICF pass

2014-10-10 Thread Fakturace
On 09/28/2014 04:20 AM, Jan Hubicka wrote:

 Hi.

 Thank you Markus for presenting numbers, it corresponds with I measured. If 
 I see correctly, IPA ICF pass takes about 7 seconds,
 the rest is distributed in verifier (not interesting for release version of 
 the compiler) and 'phase opt and generate'. No idea
 what can make the difference?

 phase opt and generate just combine all the optimization times together, so it
 is same 7 seconds as in the ICF pass :)
 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you
 have any idea how many of those turns out to be different?
 It would be nice to be able to release the duplicate bodies from memory after
 the equivalency was stablished

 Honza


 Martin

Hello.

After few days of measurement and tuning, I was able to get numbers to the 
following shape:
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) wall 
1028813 kB (24%) ggc
 phase stream in :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) wall 
3246453 kB (76%) ggc
 phase stream out:   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) wall  
 2 kB ( 0%) ggc
 callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall  
40 kB ( 0%) ggc
 ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) wall  
21 kB ( 0%) ggc
 ipa devirtualization:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall  
 13704 kB ( 0%) ggc
 ipa cp  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) wall  
188558 kB ( 4%) ggc
 ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) wall  
494738 kB (12%) ggc
 ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple in   :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) wall  
537970 kB (13%) ggc
 ipa lto gimple out  :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) wall  
 2 kB ( 0%) ggc
 ipa lto decl in :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) wall 
2441687 kB (57%) ggc
 ipa lto decl out:   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall  
 15692 kB ( 0%) ggc
 ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) wall  
407182 kB (10%) ggc
 ipa lto decl merge  :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) wall  
  8220 kB ( 0%) ggc
 ipa lto cgraph merge:   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) wall  
 14605 kB ( 0%) ggc
 whopr wpa   :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) wall  
 1 kB ( 0%) ggc
 whopr wpa I/O   :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) wall  
  4943 kB ( 0%) ggc
 ipa reference   :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa icf :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) wall  
  1352 kB ( 0%) ggc
 inline parameters   :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall  
 0 kB ( 0%) ggc
 tree SSA rewrite:   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall  
 18919 kB ( 0%) ggc
 tree SSA other  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 tree SSA incremental:   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) wall  
 11325 kB ( 0%) ggc
 tree operand scan   :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) wall  
116283 kB ( 3%) ggc
 dominance frontiers :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 loop fini   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) wall  
 0 kB ( 0%) ggc
 TOTAL :  47.49 3.4877.46
4276682 kB

and I was able to reduce function bodies loaded in WPA to 35% (from previous 
55%). The main problem
with speed was hidden in work list for congruence classes, where hash_set was 
used. I chose the data
structure to support delete operation, but it was really slow. Thus, hash_set 
was replaced with 

Re: [PATCH 3/5] IPA ICF pass

2014-10-10 Thread Martin Liška
On 09/28/2014 03:20 AM, Jan Hubicka wrote:

 Hi.

 Thank you Markus for presenting numbers, it corresponds with I measured. If 
 I see correctly, IPA ICF pass takes about 7 seconds,
 the rest is distributed in verifier (not interesting for release version of 
 the compiler) and 'phase opt and generate'. No idea
 what can make the difference?
 
 phase opt and generate just combine all the optimization times together, so it
 is same 7 seconds as in the ICF pass :)
 1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you
 have any idea how many of those turns out to be different?
 It would be nice to be able to release the duplicate bodies from memory after
 the equivalency was stablished
 
 Honza
 

 Martin

(I resend the message, my mail client was a bit confused, please do _not_ reply 
to faktur...@foxlink.cz)

Hello.

After few days of measurement and tuning, I was able to get numbers to the 
following shape:
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  27.83 (59%) usr   0.66 (19%) sys  28.52 (37%) wall 
1028813 kB (24%) ggc
 phase stream in :  16.90 (36%) usr   0.63 (18%) sys  17.60 (23%) wall 
3246453 kB (76%) ggc
 phase stream out:   2.76 ( 6%) usr   2.19 (63%) sys  31.34 (40%) wall  
 2 kB ( 0%) ggc
 callgraph optimization  :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall  
40 kB ( 0%) ggc
 ipa dead code removal   :   3.31 ( 7%) usr   0.01 ( 0%) sys   3.25 ( 4%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :   3.69 ( 8%) usr   0.03 ( 1%) sys   3.80 ( 5%) wall  
21 kB ( 0%) ggc
 ipa devirtualization:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall  
 13704 kB ( 0%) ggc
 ipa cp  :   1.11 ( 2%) usr   0.07 ( 2%) sys   1.17 ( 2%) wall  
188558 kB ( 4%) ggc
 ipa inlining heuristics :   8.17 (17%) usr   0.14 ( 4%) sys   8.27 (11%) wall  
494738 kB (12%) ggc
 ipa comdats :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple in   :   1.86 ( 4%) usr   0.40 (11%) sys   2.20 ( 3%) wall  
537970 kB (13%) ggc
 ipa lto gimple out  :   0.19 ( 0%) usr   0.08 ( 2%) sys   0.27 ( 0%) wall  
 2 kB ( 0%) ggc
 ipa lto decl in :  12.20 (26%) usr   0.37 (11%) sys  12.64 (16%) wall 
2441687 kB (57%) ggc
 ipa lto decl out:   2.51 ( 5%) usr   0.21 ( 6%) sys   2.71 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors in :   0.13 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall  
 15692 kB ( 0%) ggc
 ipa lto constructors out:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.54 ( 1%) usr   0.09 ( 3%) sys   0.63 ( 1%) wall  
407182 kB (10%) ggc
 ipa lto decl merge  :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.34 ( 2%) wall  
  8220 kB ( 0%) ggc
 ipa lto cgraph merge:   1.00 ( 2%) usr   0.00 ( 0%) sys   1.00 ( 1%) wall  
 14605 kB ( 0%) ggc
 whopr wpa   :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.89 ( 1%) wall  
 1 kB ( 0%) ggc
 whopr wpa I/O   :   0.01 ( 0%) usr   1.90 (55%) sys  28.31 (37%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   2.81 ( 6%) usr   0.01 ( 0%) sys   2.83 ( 4%) wall  
  4943 kB ( 0%) ggc
 ipa reference   :   1.34 ( 3%) usr   0.00 ( 0%) sys   1.35 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   1.62 ( 3%) usr   0.00 ( 0%) sys   1.63 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa icf :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.68 ( 3%) wall  
  1352 kB ( 0%) ggc
 inline parameters   :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall  
 0 kB ( 0%) ggc
 tree SSA rewrite:   0.11 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall  
 18919 kB ( 0%) ggc
 tree SSA other  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 tree SSA incremental:   0.24 ( 1%) usr   0.01 ( 0%) sys   0.32 ( 0%) wall  
 11325 kB ( 0%) ggc
 tree operand scan   :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.18 ( 0%) wall  
116283 kB ( 3%) ggc
 dominance frontiers :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 dominance computation   :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.01 ( 0%) usr   0.02 ( 1%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 loop fini   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.55 ( 1%) usr   0.00 ( 0%) sys   0.56 ( 1%) wall  
 0 kB ( 0%) ggc
 TOTAL :  47.49 3.4877.46
4276682 kB

and I was able to reduce function bodies loaded in WPA to 35% (from previous 
55%). The main problem
with speed was hidden in work list for congruence classes, where hash_set was 
used. I chose 

Re: [PATCH 3/5] IPA ICF pass

2014-10-10 Thread Martin Liška
On 09/26/2014 09:46 PM, Jan Hubicka wrote:
 Hi,
 this is on ipa-icf-gimple.c
 
 @@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void)
   {
 if (verify_edge_corresponds_to_fndecl (e, decl))
   {
 -   error (edge points to wrong declaration:);
 -   debug_tree (e-callee-decl);
 -   fprintf (stderr, Instead of:);
 -   debug_tree (decl);
 -   error_found = true;
 +   /* The edge can be redirected in WPA by IPA 
 ICF.
 +  Following check really ensures that it's
 +  not the case.  */
 +
 +   cgraph_node *current_node = cgraph_node::get 
 (decl);
 +   if (!current_node || 
 !current_node-icf_merged)
 
 I would move this into verify_edge_corresponds_to_fndecl.
 
 diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
 new file mode 100644
 index 000..7031eaa
 --- /dev/null
 +++ b/gcc/ipa-icf-gimple.c
 @@ -0,0 +1,384 @@
 +/* Interprocedural Identical Code Folding pass
 +   Copyright (C) 2014 Free Software Foundation, Inc.
 +
 +   Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska 
 mli...@suse.cz
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify it under
 +the terms of the GNU General Public License as published by the Free
 +Software Foundation; either version 3, or (at your option) any later
 +version.
 +
 +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 +WARRANTY; without even the implied warranty of MERCHANTABILITY or
 +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 +for more details.
 +
 +You should have received a copy of the GNU General Public License
 +along with GCC; see the file COPYING3.  If not see
 +http://www.gnu.org/licenses/.  */
 
 Please add toplevel comment about what the code does and how to use it.
 
 +namespace ipa_icf {
 +
 +/* Basic block equivalence comparison function that returns true if
 +   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.  */
 ... to each other?
 I would add short comment that as comparsion goes you build voclabulary
 of equivalences of variables/ssanames etc.
 So people reading the code do not get lost at very beggining.
 
 +
 +bool
 +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
 +{
 +  unsigned i;
 +  gimple_stmt_iterator gsi1, gsi2;
 +  gimple s1, s2;
 +
 +  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
 +  || bb1-edge_count != bb2-edge_count)
 +return RETURN_FALSE ();
 
 The UPPERCASE looks ugly.  I see that RETURN_FALSE is a warpper for 
 return_false_with_msg
 that outputs line and file information.
 
 I would make it lowercase even if it is macro. You may consider using
 CXX_MEM_STAT_INFO style default argument to avoid function macro completely.
 Probably not big win given that it won't save you from preprocesor mess.
 +
 +  gsi1 = gsi_start_bb (bb1-bb);
 +  gsi2 = gsi_start_bb (bb2-bb);
 +
 +  for (i = 0; i  bb1-nondbg_stmt_count; i++)
 +{
 +  if (is_gimple_debug (gsi_stmt (gsi1)))
 + gsi_next_nondebug (gsi1);
 +
 +  if (is_gimple_debug (gsi_stmt (gsi2)))
 + gsi_next_nondebug (gsi2);
 +
 +  s1 = gsi_stmt (gsi1);
 +  s2 = gsi_stmt (gsi2);
 +
 +  if (gimple_code (s1) != gimple_code (s2))
 + return RETURN_FALSE_WITH_MSG (gimple codes are different);
 
 I think you need to compare EH here.  Consider case where one unit
 is compiled with -fno-exception and thus all EH regions are removed,
 while other function has EH regions in it.  Those are not equivalent.
 
 EH region is obtained by lookup_stmt_eh and then you need to comapre
 them for match as you do with gimple_resx_regoin.
 
 +  t1 = gimple_call_fndecl (s1);
 +  t2 = gimple_call_fndecl (s2);
 +
 +  /* Function pointer variables are not supported yet.  */
 
 They seems to be, compare_operand seems just right.
 
 +
 +/* Verifies for given GIMPLEs S1 and S2 that
 +   label statements are semantically equivalent.  */
 +
 +bool
 +func_checker::compare_gimple_label (gimple g1, gimple g2)
 +{
 +  if (m_ignore_labels)
 +return true;
 +
 +  tree t1 = gimple_label_label (g1);
 +  tree t2 = gimple_label_label (g2);
 +
 +  return compare_tree_ssa_label (t1, t2);
 +}
 
 I would expect the main BB loop to record BB in which label belongs to
 and the BB assciatio neing checked here.
 Otherwise I do not see how switch statements are compared to not have
 different permutations of targets. Also note that one BB may have
 multiple labels in them and they are equivalent.
 
 Also I would punt on occurence of FORCED_LABEL. Those are tricky as they
 may be passed around and compared for address and no one really defines
 what should happen.  Better to avoid those.

Hi.

I will 

Re: [PATCH 3/5] IPA ICF pass

2014-10-10 Thread Martin Liška
On 09/26/2014 11:27 PM, Jan Hubicka wrote:
 diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
 new file mode 100644
 index 000..f3472fe
 --- /dev/null
 +++ b/gcc/ipa-icf.c
 @@ -0,0 +1,2841 @@
 +/* Interprocedural Identical Code Folding pass
 +   Copyright (C) 2014 Free Software Foundation, Inc.
 +
 +   Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska 
 mli...@suse.cz
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify it under
 +the terms of the GNU General Public License as published by the Free
 +Software Foundation; either version 3, or (at your option) any later
 +version.
 +
 +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 +WARRANTY; without even the implied warranty of MERCHANTABILITY or
 +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 +for more details.
 +
 +You should have received a copy of the GNU General Public License
 +along with GCC; see the file COPYING3.  If not see
 +http://www.gnu.org/licenses/.  */
 +
 +/* Interprocedural Identical Code Folding for functions and
 +   read-only variables.
 +
 +   The goal of this transformation is to discover functions and read-only
 +   variables which do have exactly the same semantics.
 (or value)
 +
 +   In case of functions,
 +   we could either create a virtual clone or do a simple function wrapper
 +   that will call equivalent function. If the function is just locally 
 visible,
 +   all function calls can be redirected. For read-only variables, we create
 +   aliases if possible.
 +
 +   Optimization pass arranges as follows:
 
 The optimization pass is arranged as follows: (I guess)
 
 I also wonder if the gimple equality code should be in ipa_icf namespace, it 
 is intended
 to be shared with tail merging pass, so what about just calling it 
 gimple_sem_equality?
 
 +/* Verification function for edges E1 and E2.  */
 +
 +bool
 +func_checker::compare_edge (edge e1, edge e2)
 +{
 +  if (e1-flags != e2-flags)
 +return false;
 
 In future we may want to experiment with checking that edge probabilities with
 profile feedback match and refuse to merge BBs with different outgoing 
 probabilities
 (i.e. +-5%).
 Just add it as TODO there, please.
 +
 +/* Return true if types are compatible from perspective of ICF.  */
 +bool func_checker::types_are_compatible_p (tree t1, tree t2,
 
 Perhaps dropping _are_ would make sense, so we do not have two names
 for essentially same thing.
 +bool compare_polymorphic,
 +bool first_argument)
 +{
 +  if (TREE_CODE (t1) != TREE_CODE (t2))
 +return RETURN_FALSE_WITH_MSG (different tree types);
 +
 +  if (!types_compatible_p (t1, t2))
 +return RETURN_FALSE_WITH_MSG (types are not compatible);
 +
 +  if (get_alias_set (t1) != get_alias_set (t2))
 +return RETURN_FALSE_WITH_MSG (alias sets are different);
 
 You do not need to compare alias sets except for memory operations IMO.

Hello.

Yeah, you are right. But even Richard advised me to put it to a single place. 
Maybe we are a bit
more strict than it would be necessary. But I hope that's fine ;)

 +
 +  /* We call contains_polymorphic_type_p with this pointer type.  */
 +  if (first_argument  TREE_CODE (t1) == POINTER_TYPE)
 +{
 +  t1 = TREE_TYPE (t1);
 +  t2 = TREE_TYPE (t2);
 +}
 +
 +  if (compare_polymorphic
 +   (contains_polymorphic_type_p (t1) || contains_polymorphic_type_p 
 (t2)))
 +{
 +  if (!contains_polymorphic_type_p (t1) || !contains_polymorphic_type_p 
 (t2))
 +return RETURN_FALSE_WITH_MSG (one type is not polymorphic);
 +
 +  if (TYPE_MAIN_VARIANT (t1) != TYPE_MAIN_VARIANT (t2))
 +return RETURN_FALSE_WITH_MSG (type variants are different for 
 +  polymorphic type);
 
 I added types_must_be_same_for_odr (t1,t2) for you here.
 +/* Fast equality function based on knowledge known in WPA.  */
 +
 +bool
 +sem_function::equals_wpa (sem_item *item)
 +{
 +  gcc_assert (item-type == FUNC);
 +
 +  m_compared_func = static_castsem_function * (item);
 +
 +  if (arg_types.length () != m_compared_func-arg_types.length ())
 +return RETURN_FALSE_WITH_MSG (different number of arguments);
 +
 +  /* Checking types of arguments.  */
 +  for (unsigned i = 0; i  arg_types.length (); i++)
 +{
 +  /* This guard is here for function pointer with attributes 
 (pr59927.c).  */
 +  if (!arg_types[i] || !m_compared_func-arg_types[i])
 +return RETURN_FALSE_WITH_MSG (NULL argument type);
 +
 +  if (!func_checker::types_are_compatible_p (arg_types[i],
 +  m_compared_func-arg_types[i],
 +  true, i == 0))
 +return RETURN_FALSE_WITH_MSG (argument type is different);
 +}
 +
 +  /* Result type checking.  */
 +  if (!func_checker::types_are_compatible_p (result_type,
 +  m_compared_func-result_type))
 +return RETURN_FALSE_WITH_MSG (result types are different);
 
 You may want to compare ECF flags, such as nothrow/const/pure.  We do not
 want to merge 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Markus Trippelsdorf
On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
  While a plain Firefox -flto build works fine. LTO/PGO build fails with:
  
  lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
  0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
  ../../gcc/gcc/ipa-utils.c:540
  0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
  ../../gcc/gcc/ipa-icf.c:753
  0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
  ../../gcc/gcc/ipa-icf.c:2706
  0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
  ../../gcc/gcc/ipa-icf.c:2098
  0xf1d3f1 ipa_icf_driver
  ../../gcc/gcc/ipa-icf.c:2784
  0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
  ../../gcc/gcc/ipa-icf.c:2831
  
  
  The pass is also very memory hungry (from 3GB without ICF to 4GB during
  libxul link), while the code size savings are in the 1% range.
 
 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?

(without ICF)
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  58.38 (63%) usr   2.00 (47%) sys  60.37 (40%) wall  
403069 kB (12%) ggc
 phase stream in :  30.24 (33%) usr   0.97 (23%) sys  33.90 (22%) wall 
2944210 kB (88%) ggc
 phase stream out:   4.29 ( 5%) usr   1.32 (31%) sys  57.32 (38%) wall  
 0 kB ( 0%) ggc
 phase finalize  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall  
 0 kB ( 0%) ggc
 garbage collection  :   3.68 ( 4%) usr   0.00 ( 0%) sys   3.68 ( 2%) wall  
 0 kB ( 0%) ggc
 callgraph optimization  :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 0%) wall  
   166 kB ( 0%) ggc
 ipa dead code removal   :   6.91 ( 7%) usr   0.08 ( 2%) sys   7.25 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :   7.08 ( 8%) usr   0.04 ( 1%) sys   6.93 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa devirtualization:   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall  
 10365 kB ( 0%) ggc
 ipa cp  :   1.81 ( 2%) usr   0.06 ( 1%) sys   3.40 ( 2%) wall  
173701 kB ( 5%) ggc
 ipa inlining heuristics :  16.60 (18%) usr   0.27 ( 6%) sys  17.48 (12%) wall  
532704 kB (16%) ggc
 ipa comdats :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple out  :   0.21 ( 0%) usr   0.04 ( 1%) sys   0.97 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto decl in :  18.29 (20%) usr   0.54 (13%) sys  18.96 (12%) wall 
2226088 kB (66%) ggc
 ipa lto decl out:   3.93 ( 4%) usr   0.13 ( 3%) sys   4.06 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors in :   0.24 ( 0%) usr   0.03 ( 1%) sys   0.59 ( 0%) wall  
 14226 kB ( 0%) ggc
 ipa lto constructors out:   0.08 ( 0%) usr   0.04 ( 1%) sys   0.15 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.89 ( 1%) usr   0.12 ( 3%) sys   1.02 ( 1%) wall  
364151 kB (11%) ggc
 ipa lto decl merge  :   2.14 ( 2%) usr   0.01 ( 0%) sys   2.14 ( 1%) wall  
  8196 kB ( 0%) ggc
 ipa lto cgraph merge:   1.59 ( 2%) usr   0.00 ( 0%) sys   1.60 ( 1%) wall  
 12716 kB ( 0%) ggc
 whopr wpa   :   1.54 ( 2%) usr   0.03 ( 1%) sys   1.55 ( 1%) wall  
 1 kB ( 0%) ggc
 whopr wpa I/O   :   0.04 ( 0%) usr   1.11 (26%) sys  52.10 (34%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   5.02 ( 5%) usr   0.01 ( 0%) sys   5.03 ( 3%) wall  
  4938 kB ( 0%) ggc
 ipa reference   :   2.04 ( 2%) usr   0.02 ( 0%) sys   2.08 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   2.43 ( 3%) usr   0.02 ( 0%) sys   2.49 ( 2%) wall  
 0 kB ( 0%) ggc
 tree STMT verifier  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
 0 kB ( 0%) ggc
 callgraph verifier  :  16.31 (18%) usr   1.69 (39%) sys  17.96 (12%) wall  
 0 kB ( 0%) ggc
 dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.01 ( 0%) usr   0.03 ( 1%) sys   0.05 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 0%) wall  
 0 kB ( 0%) ggc
 TOTAL :  92.91 4.29   151.73
3348693 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

(with ICF)
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :  82.70 (70%) usr   3.31 (53%) sys  86.17 (45%) wall 
1468975 kB (33%) ggc
 phase stream in :  30.46 (26%) usr   1.02 (16%) sys  31.48 (16%) wall 
2944210 kB (67%) ggc
 phase stream out:   4.52 ( 4%) usr   1.90 (30%) sys  73.47 (38%) wall  
12 kB ( 0%) ggc
 phase finalize  :   0.00 ( 0%) usr   0.00 ( 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Markus Trippelsdorf
On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote:
 
  It seems that in this case we reject too many of equality candidates?
  It think the original numbers was about 4-5% but later some equivalences was
  disabled because of devirt/aliasing issues. Do you compare it with gold ICF
  enabled? There are quite few obvious improvements to the analysis that can
  be done, but I guess we need to analyze the interesting cases one by one.

Forgot to post the binary size numbers (in bytes):

  | gold's icf off | gold's icf on  |
--+++
gcc's icf off |79793880|74881040|
--+-+
gcc's icf on  |78043608|73612800|
--+++

-- 
Markus


Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 01:27 AM, Jan Hubicka wrote:
 While a plain Firefox -flto build works fine. LTO/PGO build fails with:

 lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
 ../../gcc/gcc/ipa-utils.c:540
 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
 ../../gcc/gcc/ipa-icf.c:753
 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
 ../../gcc/gcc/ipa-icf.c:2706
 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
 ../../gcc/gcc/ipa-icf.c:2098
 0xf1d3f1 ipa_icf_driver
 ../../gcc/gcc/ipa-icf.c:2784
 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
 ../../gcc/gcc/ipa-icf.c:2831


 The pass is also very memory hungry (from 3GB without ICF to 4GB during
 libxul link), while the code size savings are in the 1% range.


The majority of the problem are groups of candidates that are built according 
to hash.
The hash value is based on a number of arguments, number of BB, number of 
gimple statements and types of these statements.
It groups function into classes. In WPA (before a body of any function is 
loaded) I get following histogram:

Dump after WPA based types groups
Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items
Class size histogram [num of members]: number of classe number of classess
[1]: 86453 classes
[2]: 5680 classes
[3]: 1541 classes
[4]: 915 classes
[5]: 446 classes
[6]: 346 classes
[7]: 200 classes
[8]: 181 classes
[9]: 154 classes
[10]: 109 classes
[11]: 87 classes
[12]: 87 classes
[13]: 68 classes
[14]: 58 classes
[15]: 58 classes
[16]: 41 classes
[17]: 25 classes
[18]: 33 classes
[19]: 28 classes
[20]: 25 classes
[21]: 19 classes
[22]: 30 classes
[23]: 24 classes
[24]: 33 classes
[25]: 17 classes
[26]: 15 classes
[27]: 10 classes
[28]: 13 classes
[29]: 18 classes
[30]: 10 classes

It means that each class with more than one member needs to be iterated and 
these functions are compared. And yes, there's the root of the problem.
I have to load function body to process deep function comparison. As you can 
see, we have almost 200k function, where more than half each situated
in a group with more that one member. So that 1GB extra memory usage is caused 
by these bodies:

Init called for 105004 items (54.84%).

Memory footprint can be significantly reduced if one can load the body and 
release it and the memory is freed. I asked Honza about it, but it looks
GGC mechanism cannot be easily forced to release it.

 
 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?
 
 It seems that in this case we reject too many of equality candidates?
 It think the original numbers was about 4-5% but later some equivalences was
 disabled because of devirt/aliasing issues. Do you compare it with gold ICF
 enabled? There are quite few obvious improvements to the analysis that can
 be done, but I guess we need to analyze the interesting cases one by one.

You are right, the number were quite promising, but during the time, I had to
reduce the aggressivity of the pass. As Honza said, it can be improved 
step-by-step.

 
 One thing that Martin can try is to hook into lto-symtab and try to check
 that the COMDAT functions that are known to be same pass the equality check.
 I suppose we will learn interesting things this way.
 
Good point, I will try it.

Martin


 I think the patch adds quite important infrastructure for gimple semantic
 equality checking and function merging. I went through the majority of code 
 and
 I think it is mostly ready to mainline (i.e. cleaner than what we have in
 tree-ssa-tailmerge) so hope we can finish the review process next week.
 We will need to get better cost/benefits ratio to enable it for -O2 that is
 someting I would really like to see for 5.0, but it seems to be easier to
 handle this incrementally

Thank you for the review,
Martin

 
 Honza
 



Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 07:59 AM, Markus Trippelsdorf wrote:
 On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
 While a plain Firefox -flto build works fine. LTO/PGO build fails with:

 lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
 ../../gcc/gcc/ipa-utils.c:540
 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
 ../../gcc/gcc/ipa-icf.c:753
 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
 ../../gcc/gcc/ipa-icf.c:2706
 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
 ../../gcc/gcc/ipa-icf.c:2098
 0xf1d3f1 ipa_icf_driver
 ../../gcc/gcc/ipa-icf.c:2784
 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
 ../../gcc/gcc/ipa-icf.c:2831


 The pass is also very memory hungry (from 3GB without ICF to 4GB during
 libxul link), while the code size savings are in the 1% range.

 Thnks for checking. I was just thinking about doing that myself.  Would
 you mind posting -ftime-report of firefox WPA stage?
 
 (without ICF)
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  58.38 (63%) usr   2.00 (47%) sys  60.37 (40%) 
 wall  403069 kB (12%) ggc
  phase stream in :  30.24 (33%) usr   0.97 (23%) sys  33.90 (22%) 
 wall 2944210 kB (88%) ggc
  phase stream out:   4.29 ( 5%) usr   1.32 (31%) sys  57.32 (38%) 
 wall   0 kB ( 0%) ggc
  phase finalize  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) 
 wall   0 kB ( 0%) ggc
  garbage collection  :   3.68 ( 4%) usr   0.00 ( 0%) sys   3.68 ( 2%) 
 wall   0 kB ( 0%) ggc
  callgraph optimization  :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 0%) 
 wall 166 kB ( 0%) ggc
  ipa dead code removal   :   6.91 ( 7%) usr   0.08 ( 2%) sys   7.25 ( 5%) 
 wall   0 kB ( 0%) ggc
  ipa virtual call target :   7.08 ( 8%) usr   0.04 ( 1%) sys   6.93 ( 5%) 
 wall   0 kB ( 0%) ggc
  ipa devirtualization:   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) 
 wall   10365 kB ( 0%) ggc
  ipa cp  :   1.81 ( 2%) usr   0.06 ( 1%) sys   3.40 ( 2%) 
 wall  173701 kB ( 5%) ggc
  ipa inlining heuristics :  16.60 (18%) usr   0.27 ( 6%) sys  17.48 (12%) 
 wall  532704 kB (16%) ggc
  ipa comdats :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto gimple out  :   0.21 ( 0%) usr   0.04 ( 1%) sys   0.97 ( 1%) 
 wall   0 kB ( 0%) ggc
  ipa lto decl in :  18.29 (20%) usr   0.54 (13%) sys  18.96 (12%) 
 wall 2226088 kB (66%) ggc
  ipa lto decl out:   3.93 ( 4%) usr   0.13 ( 3%) sys   4.06 ( 3%) 
 wall   0 kB ( 0%) ggc
  ipa lto constructors in :   0.24 ( 0%) usr   0.03 ( 1%) sys   0.59 ( 0%) 
 wall   14226 kB ( 0%) ggc
  ipa lto constructors out:   0.08 ( 0%) usr   0.04 ( 1%) sys   0.15 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa lto cgraph I/O  :   0.89 ( 1%) usr   0.12 ( 3%) sys   1.02 ( 1%) 
 wall  364151 kB (11%) ggc
  ipa lto decl merge  :   2.14 ( 2%) usr   0.01 ( 0%) sys   2.14 ( 1%) 
 wall8196 kB ( 0%) ggc
  ipa lto cgraph merge:   1.59 ( 2%) usr   0.00 ( 0%) sys   1.60 ( 1%) 
 wall   12716 kB ( 0%) ggc
  whopr wpa   :   1.54 ( 2%) usr   0.03 ( 1%) sys   1.55 ( 1%) 
 wall   1 kB ( 0%) ggc
  whopr wpa I/O   :   0.04 ( 0%) usr   1.11 (26%) sys  52.10 (34%) 
 wall   0 kB ( 0%) ggc
  whopr partitioning  :   5.02 ( 5%) usr   0.01 ( 0%) sys   5.03 ( 3%) 
 wall4938 kB ( 0%) ggc
  ipa reference   :   2.04 ( 2%) usr   0.02 ( 0%) sys   2.08 ( 1%) 
 wall   0 kB ( 0%) ggc
  ipa profile :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) 
 wall   0 kB ( 0%) ggc
  ipa pure const  :   2.43 ( 3%) usr   0.02 ( 0%) sys   2.49 ( 2%) 
 wall   0 kB ( 0%) ggc
  tree STMT verifier  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
 wall   0 kB ( 0%) ggc
  callgraph verifier  :  16.31 (18%) usr   1.69 (39%) sys  17.96 (12%) 
 wall   0 kB ( 0%) ggc
  dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
 wall   0 kB ( 0%) ggc
  varconst:   0.01 ( 0%) usr   0.03 ( 1%) sys   0.05 ( 0%) 
 wall   0 kB ( 0%) ggc
  unaccounted todo:   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 0%) 
 wall   0 kB ( 0%) ggc
  TOTAL :  92.91 4.29   151.73
 3348693 kB
 Extra diagnostic checks enabled; compiler may run slowly.
 Configure with --enable-checking=release to disable checks.
 
 (with ICF)
 Execution times (seconds)
  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall1412 kB ( 0%) ggc
  phase opt and generate  :  82.70 (70%) usr   3.31 (53%) sys  86.17 (45%) 
 wall 1468975 kB (33%) ggc
  phase stream in :  30.46 (26%) usr   1.02 (16%) sys  31.48 (16%) 
 wall 2944210 kB (67%) ggc
  phase stream out:   4.52 ( 4%) usr 

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Martin Liška
On 09/27/2014 09:47 AM, Markus Trippelsdorf wrote:
 On 2014.09.27 at 07:59 +0200, Markus Trippelsdorf wrote:

 It seems that in this case we reject too many of equality candidates?
 It think the original numbers was about 4-5% but later some equivalences was
 disabled because of devirt/aliasing issues. Do you compare it with gold ICF
 enabled? There are quite few obvious improvements to the analysis that can
 be done, but I guess we need to analyze the interesting cases one by one.
 
 Forgot to post the binary size numbers (in bytes):
 
   | gold's icf off | gold's icf on  |
 --+++
 gcc's icf off |79793880|74881040|
 --+-+
 gcc's icf on  |78043608|73612800|
 --+++
 

Thanks once more!

Gold ICF is quite strong, I will verify what functions are not caught by IPA 
ICF.
These data present that IPA ICF can reduce the binary by 2.19%. I know that 
it's quite a small improvement,
but if you realize that the pass can reduce just the size of .text (and 
slightly related sections). There are
stats about libxul.so (please ignore last 3 columns):

Section name   Start   Size in BSizePortion Disk 
read in B   Disk read   Sec. portion
   0   0  0.00 B  0.00% 
 0  0.00 B  0.00%
.note.gnu.build-i512  36 36.00 B  0.00% 
 0  0.00 B  0.00%
.dynsym  552   8119279.29 KB  0.08% 
 0  0.00 B  0.00%
.dynstr81744   9085988.73 KB  0.09% 
 0  0.00 B  0.00%
.hash 172608   2175221.24 KB  0.02% 
 0  0.00 B  0.00%
.gnu.version  1943606766 6.61 KB  0.01% 
 0  0.00 B  0.00%
.gnu.version_d201128  56 56.00 B  0.00% 
 0  0.00 B  0.00%
.gnu.version_r2011841216 1.19 KB  0.00% 
 0  0.00 B  0.00%
.rela.dyn 202400 8198208 7.82 MB  8.56% 
 0  0.00 B  0.00%
.rela.plt8400608   7027268.62 KB  0.07% 
 0  0.00 B  0.00%
.init8470880  26 26.00 B  0.00% 
 0  0.00 B  0.00%
.plt 8470912   4686445.77 KB  0.05% 
 0  0.00 B  0.00%
.text85177763901433337.21 MB 40.72% 
 0  0.00 B  0.00%
.fini   47532112   9  9.00 B  0.00% 
 0  0.00 B  0.00%
.rodata 475322881525856014.55 MB 15.93% 
 0  0.00 B  0.00%
.eh_frame   62790848 6203564 5.92 MB  6.47% 
 0  0.00 B  0.00%
.eh_frame_hdr   68994412 1088012 1.04 MB  1.14% 
 0  0.00 B  0.00%
.tbss   70082560   4  4.00 B  0.00% 
 0  0.00 B  0.00%
.dynamic700825601104 1.08 KB  0.00% 
 0  0.00 B  0.00%
.got700836641384 1.35 KB  0.00% 
 0  0.00 B  0.00%
.got.plt70085048   2344822.90 KB  0.02% 
 0  0.00 B  0.00%
.data   70108544  811616   792.59 KB  0.85% 
 0  0.00 B  0.00%
.jcr70920160   8  8.00 B  0.00% 
 0  0.00 B  0.00%
.tm_clone_table 70920168   0  0.00 B  0.00% 
 0  0.00 B  0.00%
.fini_array 70920168   8  8.00 B  0.00% 
 0  0.00 B  0.00%
.init_array 70920176  16 16.00 B  0.00% 
 0  0.00 B  0.00%
.data.rel.ro.loca   70920192 3938880 3.76 MB  4.11% 
 0  0.00 B  0.00%
.data.rel.ro74859072  269216   262.91 KB  0.28% 
 0  0.00 B  0.00%
.bss75128320 1844246 1.76 MB  1.92% 
 0  0.00 B  0.00%
.debug_line 75128288 517517.00 B  0.00% 
 0  0.00 B  0.00%
.debug_info 75128805 817817.00 B  0.00% 
 0  0.00 B  0.00%
.debug_abbrev   75129622 438438.00 B  0.00% 
 0   

Re: [PATCH 3/5] IPA ICF pass

2014-09-27 Thread Jan Hubicka
 
 Hi.
 
 Thank you Markus for presenting numbers, it corresponds with I measured. If I 
 see correctly, IPA ICF pass takes about 7 seconds,
 the rest is distributed in verifier (not interesting for release version of 
 the compiler) and 'phase opt and generate'. No idea
 what can make the difference?

phase opt and generate just combine all the optimization times together, so it
is same 7 seconds as in the ICF pass :)
1GB of function bodies just to elimnate 2-3% of code seems quite alot. Do you
have any idea how many of those turns out to be different?
It would be nice to be able to release the duplicate bodies from memory after
the equivalency was stablished

Honza

 
 Martin


Re: [PATCH 3/5] IPA ICF pass

2014-09-26 Thread Martin Liška

On 07/17/2014 05:05 PM, Martin Liška wrote:


On 07/06/2014 12:53 AM, Jan Hubicka wrote:

On Fri, 20 Jun 2014, Trevor Saunders wrote:

+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.

I would perhaps explicitly say that the optimizations reduce code size
and may disturb unwind stacks by replacing a function by equivalent
one with different name.

+Behavior is similar to Gold Linker ICF optimization. Symbols proved

Perhaps tell a bit more here. The optimization works more effectively with link
time optimization enabled and that the Gold and GCC ICF works on different
levels and thus are not equivalent optimizations - there are equivallences that
are found only by GCC and equivalences found only by Gold.


+as semantically equivalent are redirected to corresponding symbol. The pass

+sensitively decides for usage of alias, thunk or local redirection.
+This flag is enabled by default at @option{-O2}.

Probably at -Os too.

I found this a bit hard to read/understand.

Perhaps first describe what it does and then, before This flag is
enabled... note that This is similar to the ICF optimization performed
by the Gold linker.
Symbols proved (plural) vs to corresponding symbol seems to miss
an an a as in a corresponding symbol.  Alas, how is that one
determined?  Is this more ...are merged into one, from the user's
perspective?

What does it mean to sensitively decide for usage of alias, thunk,
or local redirection?

I think this is just a technical detail of the implementation.  I would not put 
that
into user manual.  It means that for some functions you can make alias, for 
others
you need thunk (so addresses stay different)

Gerald


Hello,
there's updated version of patch that newly uses devirtualization machinery 
to identify polymorphic types that can potentially break ICF (There are such 
examples in Firefox).

Apart from that, I did many small updates, incorporated Trevor's comments and I 
tried to improve documentation entry for the pass.
Patch has been tested for Firefox and Inkscape with LTO.

Thanks,
Martin


Hello.

After couple of weeks I spent with fixing new issues connected to the pass:
1) Inliner failed in case I created a thunk and release body of a function. In 
such situation we need to preserve DECL_ARGUMENTS. I added new argument for: 
cgraph_node::release_body.
2) Awkward error was hidden in libstdc++ test for trees, there were two 
functions having one argument that differs in one sub-template. Thank to 
Richard who helped me to fix alias set accuracy.
3) There was missing comparison for FIELD_DECLS (DECL_FIELD_BIT_OFFSET) which 
caused me miscompilation.
4) After discussion with Honza, we introduced new cgraph_node flag called 
icf_merged. The flag helps to fix verifier in cgraph_node::verify.

Current version of the patch can bootstrap on x86_64-linux. With following 
patch applied, there's not testcase regression.
I tried to build Firefox, Inkscape, GIMP and Chromium with LTO and patch 
applied and no regression has been observed.

Moreover, I discussed with Richard and the pass is capable of playing role in 
tree-ssa-tail-merge (according to first experiments). It can replace current 
usage of value numbering.

I hope we can apply the patch to the mainline in a short-term time window?

Thank you,
Martin

From 53d20d0b0c209b50d385ee8d85d5a7ed4594d477 Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Fri, 26 Sep 2014 13:51:47 +0200
Subject: [PATCH 1/3] IPA ICF: patch1

---
 gcc/Makefile.in  |2 +
 gcc/cgraph.c |   20 +-
 gcc/cgraph.h |2 +
 gcc/cgraphunit.c |2 +-
 gcc/common.opt   |   12 +
 gcc/doc/invoke.texi  |   16 +-
 gcc/ipa-icf-gimple.c |  384 +++
 gcc/ipa-icf.c| 2841 ++
 gcc/ipa-icf.h|  803 ++
 gcc/lto-cgraph.c |2 +
 gcc/lto-section-in.c |3 +-
 gcc/lto-streamer.h   |1 +
 gcc/opts.c   |6 +
 gcc/passes.def   |1 +
 gcc/timevar.def  |1 +
 gcc/tree-pass.h  |1 +
 16 files changed, 4089 insertions(+), 8 deletions(-)
 create mode 100644 gcc/ipa-icf-gimple.c
 create mode 100644 gcc/ipa-icf.c
 create mode 100644 gcc/ipa-icf.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3dd9d8f..8d02425 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1265,6 +1265,8 @@ OBJS = \
 	ipa-profile.o \
 	ipa-prop.o \
 	ipa-pure-const.o \
+	ipa-icf.o \
+	ipa-icf-gimple.o \
 	ipa-reference.o \
 	ipa-ref.o \
 	ipa-utils.o \
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index fdcaf79..439db49 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1913,6 +1913,8 @@ cgraph_node::dump (FILE *f)
 fprintf (f,  only_called_at_exit);
   if (tm_clone)
 fprintf (f,  tm_clone);
+  if (icf_merged)
+fprintf (f,  icf_merged);
   if (DECL_STATIC_CONSTRUCTOR (decl))
 fprintf (f, static_constructor (priority:%i), get_init_priority ());
   if (DECL_STATIC_DESTRUCTOR (decl))

Re: [PATCH 3/5] IPA ICF pass

2014-09-26 Thread Markus Trippelsdorf
On 2014.09.26 at 14:20 +0200, Martin Liška wrote:
 After couple of weeks I spent with fixing new issues connected to the
 pass: 1) Inliner failed in case I created a thunk and release body of
 a function. In such situation we need to preserve DECL_ARGUMENTS. I
 added new argument for: cgraph_node::release_body.  2) Awkward error
 was hidden in libstdc++ test for trees, there were two functions
 having one argument that differs in one sub-template. Thank to Richard
 who helped me to fix alias set accuracy.  3) There was missing
 comparison for FIELD_DECLS (DECL_FIELD_BIT_OFFSET) which caused me
 miscompilation.  4) After discussion with Honza, we introduced new
 cgraph_node flag called icf_merged. The flag helps to fix verifier in
 cgraph_node::verify.
 
 Current version of the patch can bootstrap on x86_64-linux. With
 following patch applied, there's not testcase regression.  I tried to
 build Firefox, Inkscape, GIMP and Chromium with LTO and patch applied
 and no regression has been observed.

While a plain Firefox -flto build works fine. LTO/PGO build fails with:

lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
../../gcc/gcc/ipa-utils.c:540
0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
../../gcc/gcc/ipa-icf.c:753
0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
../../gcc/gcc/ipa-icf.c:2706
0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
../../gcc/gcc/ipa-icf.c:2098
0xf1d3f1 ipa_icf_driver
../../gcc/gcc/ipa-icf.c:2784
0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
../../gcc/gcc/ipa-icf.c:2831


The pass is also very memory hungry (from 3GB without ICF to 4GB during
libxul link), while the code size savings are in the 1% range.

-- 
Markus


Re: [PATCH 3/5] IPA ICF pass

2014-09-26 Thread Jan Hubicka
Hi,
this is on ipa-icf-gimple.c

@@ -2827,11 +2829,19 @@ cgraph_node::verify_node (void)
{
  if (verify_edge_corresponds_to_fndecl (e, decl))
{
- error (edge points to wrong declaration:);
- debug_tree (e-callee-decl);
- fprintf (stderr, Instead of:);
- debug_tree (decl);
- error_found = true;
+ /* The edge can be redirected in WPA by IPA 
ICF.
+Following check really ensures that it's
+not the case.  */
+
+ cgraph_node *current_node = cgraph_node::get 
(decl);
+ if (!current_node || 
!current_node-icf_merged)

I would move this into verify_edge_corresponds_to_fndecl.

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
new file mode 100644
index 000..7031eaa
--- /dev/null
+++ b/gcc/ipa-icf-gimple.c
@@ -0,0 +1,384 @@
+/* Interprocedural Identical Code Folding pass
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska 
mli...@suse.cz
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */

Please add toplevel comment about what the code does and how to use it.

+namespace ipa_icf {
+
+/* Basic block equivalence comparison function that returns true if
+   basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond.  */
... to each other?
I would add short comment that as comparsion goes you build voclabulary
of equivalences of variables/ssanames etc.
So people reading the code do not get lost at very beggining.

+
+bool
+func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2)
+{
+  unsigned i;
+  gimple_stmt_iterator gsi1, gsi2;
+  gimple s1, s2;
+
+  if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count
+  || bb1-edge_count != bb2-edge_count)
+return RETURN_FALSE ();

The UPPERCASE looks ugly.  I see that RETURN_FALSE is a warpper for 
return_false_with_msg
that outputs line and file information.

I would make it lowercase even if it is macro. You may consider using
CXX_MEM_STAT_INFO style default argument to avoid function macro completely.
Probably not big win given that it won't save you from preprocesor mess.
+
+  gsi1 = gsi_start_bb (bb1-bb);
+  gsi2 = gsi_start_bb (bb2-bb);
+
+  for (i = 0; i  bb1-nondbg_stmt_count; i++)
+{
+  if (is_gimple_debug (gsi_stmt (gsi1)))
+   gsi_next_nondebug (gsi1);
+
+  if (is_gimple_debug (gsi_stmt (gsi2)))
+   gsi_next_nondebug (gsi2);
+
+  s1 = gsi_stmt (gsi1);
+  s2 = gsi_stmt (gsi2);
+
+  if (gimple_code (s1) != gimple_code (s2))
+   return RETURN_FALSE_WITH_MSG (gimple codes are different);

I think you need to compare EH here.  Consider case where one unit
is compiled with -fno-exception and thus all EH regions are removed,
while other function has EH regions in it.  Those are not equivalent.

EH region is obtained by lookup_stmt_eh and then you need to comapre
them for match as you do with gimple_resx_regoin.

+  t1 = gimple_call_fndecl (s1);
+  t2 = gimple_call_fndecl (s2);
+
+  /* Function pointer variables are not supported yet.  */

They seems to be, compare_operand seems just right.

+
+/* Verifies for given GIMPLEs S1 and S2 that
+   label statements are semantically equivalent.  */
+
+bool
+func_checker::compare_gimple_label (gimple g1, gimple g2)
+{
+  if (m_ignore_labels)
+return true;
+
+  tree t1 = gimple_label_label (g1);
+  tree t2 = gimple_label_label (g2);
+
+  return compare_tree_ssa_label (t1, t2);
+}

I would expect the main BB loop to record BB in which label belongs to
and the BB assciatio neing checked here.
Otherwise I do not see how switch statements are compared to not have
different permutations of targets. Also note that one BB may have
multiple labels in them and they are equivalent.

Also I would punt on occurence of FORCED_LABEL. Those are tricky as they
may be passed around and compared for address and no one really defines
what should happen.  Better to avoid those.

+
+/* Verifies for given GIMPLEs S1 and S2 that
+   switch statements are semantically equivalent.  */
+
+bool
+func_checker::compare_gimple_switch (gimple g1, gimple 

Re: [PATCH 3/5] IPA ICF pass

2014-09-26 Thread Jan Hubicka
 diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
 new file mode 100644
 index 000..f3472fe
 --- /dev/null
 +++ b/gcc/ipa-icf.c
 @@ -0,0 +1,2841 @@
 +/* Interprocedural Identical Code Folding pass
 +   Copyright (C) 2014 Free Software Foundation, Inc.
 +
 +   Contributed by Jan Hubicka hubi...@ucw.cz and Martin Liska 
 mli...@suse.cz
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify it under
 +the terms of the GNU General Public License as published by the Free
 +Software Foundation; either version 3, or (at your option) any later
 +version.
 +
 +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 +WARRANTY; without even the implied warranty of MERCHANTABILITY or
 +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 +for more details.
 +
 +You should have received a copy of the GNU General Public License
 +along with GCC; see the file COPYING3.  If not see
 +http://www.gnu.org/licenses/.  */
 +
 +/* Interprocedural Identical Code Folding for functions and
 +   read-only variables.
 +
 +   The goal of this transformation is to discover functions and read-only
 +   variables which do have exactly the same semantics.
(or value)
 +
 +   In case of functions,
 +   we could either create a virtual clone or do a simple function wrapper
 +   that will call equivalent function. If the function is just locally 
 visible,
 +   all function calls can be redirected. For read-only variables, we create
 +   aliases if possible.
 +
 +   Optimization pass arranges as follows:

The optimization pass is arranged as follows: (I guess)

I also wonder if the gimple equality code should be in ipa_icf namespace, it is 
intended
to be shared with tail merging pass, so what about just calling it 
gimple_sem_equality?

 +/* Verification function for edges E1 and E2.  */
 +
 +bool
 +func_checker::compare_edge (edge e1, edge e2)
 +{
 +  if (e1-flags != e2-flags)
 +return false;

In future we may want to experiment with checking that edge probabilities with
profile feedback match and refuse to merge BBs with different outgoing 
probabilities
(i.e. +-5%).
Just add it as TODO there, please.
 +
 +/* Return true if types are compatible from perspective of ICF.  */
 +bool func_checker::types_are_compatible_p (tree t1, tree t2,

Perhaps dropping _are_ would make sense, so we do not have two names
for essentially same thing.
 +bool compare_polymorphic,
 +bool first_argument)
 +{
 +  if (TREE_CODE (t1) != TREE_CODE (t2))
 +return RETURN_FALSE_WITH_MSG (different tree types);
 +
 +  if (!types_compatible_p (t1, t2))
 +return RETURN_FALSE_WITH_MSG (types are not compatible);
 +
 +  if (get_alias_set (t1) != get_alias_set (t2))
 +return RETURN_FALSE_WITH_MSG (alias sets are different);

You do not need to compare alias sets except for memory operations IMO.
 +
 +  /* We call contains_polymorphic_type_p with this pointer type.  */
 +  if (first_argument  TREE_CODE (t1) == POINTER_TYPE)
 +{
 +  t1 = TREE_TYPE (t1);
 +  t2 = TREE_TYPE (t2);
 +}
 +
 +  if (compare_polymorphic
 +   (contains_polymorphic_type_p (t1) || contains_polymorphic_type_p 
 (t2)))
 +{
 +  if (!contains_polymorphic_type_p (t1) || !contains_polymorphic_type_p 
 (t2))
 + return RETURN_FALSE_WITH_MSG (one type is not polymorphic);
 +
 +  if (TYPE_MAIN_VARIANT (t1) != TYPE_MAIN_VARIANT (t2))
 + return RETURN_FALSE_WITH_MSG (type variants are different for 
 +   polymorphic type);

I added types_must_be_same_for_odr (t1,t2) for you here.
 +/* Fast equality function based on knowledge known in WPA.  */
 +
 +bool
 +sem_function::equals_wpa (sem_item *item)
 +{
 +  gcc_assert (item-type == FUNC);
 +
 +  m_compared_func = static_castsem_function * (item);
 +
 +  if (arg_types.length () != m_compared_func-arg_types.length ())
 +return RETURN_FALSE_WITH_MSG (different number of arguments);
 +
 +  /* Checking types of arguments.  */
 +  for (unsigned i = 0; i  arg_types.length (); i++)
 +{
 +  /* This guard is here for function pointer with attributes 
 (pr59927.c).  */
 +  if (!arg_types[i] || !m_compared_func-arg_types[i])
 + return RETURN_FALSE_WITH_MSG (NULL argument type);
 +
 +  if (!func_checker::types_are_compatible_p (arg_types[i],
 +   m_compared_func-arg_types[i],
 +   true, i == 0))
 + return RETURN_FALSE_WITH_MSG (argument type is different);
 +}
 +
 +  /* Result type checking.  */
 +  if (!func_checker::types_are_compatible_p (result_type,
 +  m_compared_func-result_type))
 +return RETURN_FALSE_WITH_MSG (result types are different);

You may want to compare ECF flags, such as nothrow/const/pure.  We do not
want to merge non-pure function into pure as it may not be pure in the context
it is used.

Do you compare attributes? I think optimize attribute needs to be matched.
 +  /* Checking function arguments.  */
attributes.
 +  tree decl1 = DECL_ATTRIBUTES 

Re: [PATCH 3/5] IPA ICF pass

2014-09-26 Thread Jan Hubicka
 While a plain Firefox -flto build works fine. LTO/PGO build fails with:
 
 lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
 ../../gcc/gcc/ipa-utils.c:540
 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
 ../../gcc/gcc/ipa-icf.c:753
 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
 ../../gcc/gcc/ipa-icf.c:2706
 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
 ../../gcc/gcc/ipa-icf.c:2098
 0xf1d3f1 ipa_icf_driver
 ../../gcc/gcc/ipa-icf.c:2784
 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
 ../../gcc/gcc/ipa-icf.c:2831
 
 
 The pass is also very memory hungry (from 3GB without ICF to 4GB during
 libxul link), while the code size savings are in the 1% range.

Thnks for checking. I was just thinking about doing that myself.  Would
you mind posting -ftime-report of firefox WPA stage?

It seems that in this case we reject too many of equality candidates?
It think the original numbers was about 4-5% but later some equivalences was
disabled because of devirt/aliasing issues. Do you compare it with gold ICF
enabled? There are quite few obvious improvements to the analysis that can
be done, but I guess we need to analyze the interesting cases one by one.

One thing that Martin can try is to hook into lto-symtab and try to check
that the COMDAT functions that are known to be same pass the equality check.
I suppose we will learn interesting things this way.

I think the patch adds quite important infrastructure for gimple semantic
equality checking and function merging. I went through the majority of code and
I think it is mostly ready to mainline (i.e. cleaner than what we have in
tree-ssa-tailmerge) so hope we can finish the review process next week.
We will need to get better cost/benefits ratio to enable it for -O2 that is
someting I would really like to see for 5.0, but it seems to be easier to
handle this incrementally

Honza


Re: [PATCH 3/5] IPA ICF pass

2014-07-05 Thread Gerald Pfeifer
On Fri, 20 Jun 2014, Trevor Saunders wrote:
 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +Behavior is similar to Gold Linker ICF optimization. Symbols proved
 +as semantically equivalent are redirected to corresponding symbol. The pass
 +sensitively decides for usage of alias, thunk or local redirection.
 +This flag is enabled by default at @option{-O2}.

I found this a bit hard to read/understand.

Perhaps first describe what it does and then, before This flag is
enabled... note that This is similar to the ICF optimization performed
by the Gold linker.

Symbols proved (plural) vs to corresponding symbol seems to miss
an an a as in a corresponding symbol.  Alas, how is that one 
determined?  Is this more ...are merged into one, from the user's
perspective?

What does it mean to sensitively decide for usage of alias, thunk,
or local redirection?

Gerald


Re: [PATCH 3/5] IPA ICF pass

2014-07-05 Thread Jan Hubicka
 On Fri, 20 Jun 2014, Trevor Saunders wrote:
  +@item -fipa-icf
  +@opindex fipa-icf
  +Perform Identical Code Folding for functions and read-only variables.

I would perhaps explicitly say that the optimizations reduce code size
and may disturb unwind stacks by replacing a function by equivalent
one with different name.
  +Behavior is similar to Gold Linker ICF optimization. Symbols proved

Perhaps tell a bit more here. The optimization works more effectively with link
time optimization enabled and that the Gold and GCC ICF works on different
levels and thus are not equivalent optimizations - there are equivallences that
are found only by GCC and equivalences found only by Gold.

 +as semantically equivalent are redirected to corresponding symbol. The pass
  +sensitively decides for usage of alias, thunk or local redirection.
  +This flag is enabled by default at @option{-O2}.
Probably at -Os too.
 
 I found this a bit hard to read/understand.
 
 Perhaps first describe what it does and then, before This flag is
 enabled... note that This is similar to the ICF optimization performed
 by the Gold linker.

 
 Symbols proved (plural) vs to corresponding symbol seems to miss
 an an a as in a corresponding symbol.  Alas, how is that one 
 determined?  Is this more ...are merged into one, from the user's
 perspective?
 
 What does it mean to sensitively decide for usage of alias, thunk,
 or local redirection?

I think this is just a technical detail of the implementation.  I would not put 
that
into user manual.  It means that for some functions you can make alias, for 
others
you need thunk (so addresses stay different)
 
 Gerald


Re: [PATCH 3/5] IPA ICF pass

2014-07-01 Thread Trevor Saunders
 diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
 new file mode 100644
 index 000..8a13dca
 --- /dev/null
 +++ b/gcc/ipa-icf.c
 +/* Itializes internal structures according to given number of

initialize

 +  if (is_a_helpercgraph_node *::test (node))

shouldn't you just use is_acgraph_node* (node) ?

 +sem_item_optimizer::filter_removed_items (void)
 +{
 +  vec sem_item * filtered;
 +  filtered.create (m_items.length());

use auto_vec here?

 +  m_items.release ();
 +
 +  for (unsigned int i = 0; i  filtered.length(); i++)
 +m_items.safe_push (filtered[i]);

hrm, maybe adding vec::swap makes sense.

 +   if (c-members.length()  1)
 + {
 +   vec sem_item * new_vector;
 +   new_vector.create (c-members.length ());

same comment about auto_vec and swap.

 +bool
 +sem_item_optimizer::release_split_map (__attribute__((__unused__)) 
 congruence_class *
 +const cls,
 +__attribute__((__unused__)) bitmap const 
 b,

that one is definitly used

 +__attribute__((__unused__)) 
 traverse_split_pair *pair)

Why can't you just leave the arguments unnamed to fix the warning?

 +sem_item_optimizer::do_congruence_step_for_index (congruence_class *cls,
 +unsigned int index)
 +{
 +  hash_map congruence_class *, bitmap *split_map =
 + new hash_map congruence_class *, bitmap ();

why aren't you putting the hash_map on  the stack? in any case it looks
like you fail to delete it when your done with it.

 diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h
 new file mode 100644
 index 000..d328dd6
 --- /dev/null
 +++ b/gcc/ipa-icf.h
 @@ -0,0 +1,743 @@
 +/* Prints string STRING to a FILE with a given number of SPACE_COUNT.  */
 +#define FPUTS_SPACES(file, space_count, string) \
 +  do \
 +  { \
 +fprintf (file, %*s string, space_count,  ); \

seems like you could do this with a static inline function.

 +/* Prints a MESSAGE to dump_file if exists.  */
 +#define SE_DUMP_MESSAGE(message) \
 +  do \
 +  { \
 +if (dump_file  (dump_flags  TDF_DETAILS)) \
 +  fprintf (dump_file,   debug message: %s (%s:%u)\n, message, 
 __func__, __LINE__); \

a inline function that used the builtins like the memory statis stuff
might be slightly less ugly imho

 +/* Logs a MESSAGE to dump_file if exists and returns false.  */
 +#define SE_EXIT_FALSE_WITH_MSG(message) \
 +  do \
 +  { \
 +if (dump_file  (dump_flags  TDF_DETAILS)) \
 +  fprintf (dump_file,   false returned: '%s' (%s:%u)\n, message, 
 __func__, __LINE__); \
 +return false; \
 +  } \
 +  while (false);

ugh macros that effect control flow, instead maybe define a inline
function that does the logging and then returns the value you want your
function to return so you can write

if (whatever)
  return SE_LOG (NULL, something or other);
?

 +/* Forward declaration for sem_func class.  */
 +class sem_item;

comment is wrong, and imho useless.

 +  /* Initializes internal structures according to given number of
 + source and target SSA names. The number of source names is SSA_SOURCE,
 + respectively SSA_TARGET.  */
 +  func_checker (unsigned ssa_source, unsigned sss_target);

ssa_target?

 +  /* Source to target edge map.  */
 +  hash_map edge, edge *m_edge_map;

is there a reason to not embedd these in the object? you seem to create
them in the ctor and delete them in the dtor, so I expect they have teh
same lifetime.

 +/* Basic block struct for sematic equality pass.  */

semantic?

 +typedef struct sem_bb

you don't need the typedef in C++

 +  /* Item type.  */
 +  enum sem_item_type type;

 loose the enum keyword since you don't need it?

 +class sem_function: public sem_item
 +{
 +  /* COMPARED_FUNC is a function that we compare to.  */
 +  sem_function *m_compared_func;

this feels like a weird place for this, would func_checker maybe make
more sense as a place to put it?

 +class sem_variable: public sem_item
 +{
 +  /* Initializes references to other semantic functions/variables.  */
 +  inline virtual void init_refs ()

iirc defining with in a class definition implies inline.

 +typedef struct congruence_class_group
 +{
 +  hashval_t hash;
 +  sem_item_type type;
 +  vec congruence_class * classes;
 +} congruence_class_group_t;

lose the typedef?

 +  /* Returns true if a congruence class CLS is presented in worklist.  */

s/presented/present/ ?

Trev



Re: [PATCH 3/5] IPA ICF pass

2014-06-30 Thread Jeff Law

On 06/26/14 12:46, Jan Hubicka wrote:

So you've added this at -O2, what is the general compile-time

impact? Would it make more sense to instead have it be part of -O3,
particularly since ICF is rarely going to improve performance (sans
icache issues).


I think code size optimization not sacrifying any (or too much of) performance 
are
generally very welcome at -O2.  Compared to LLVM and Microsoft compilers we are
on code bloat side at -O2.
I'm not so much worried about runtime performance here, but compile-time 
performance.  ICF would seem like a general win as we're likely going to 
be more icache efficient.


So, just to be clear, if the compile-time impact is minimal, then I'll 
fully support -O2, but my worry is that it won't be minimal :(




Is returning TRUE really the conservatively correct thing to do in
the absence of aliasing information?  Isn't that case really I
don't know in which case the proper return value is FALSE?


I think with -fno-strict-aliasing the set should be 0 (Richi?) and thus we can
compare for equality.  We probably can be on agressive side and let 0 alias
set prevail the non-0.  But that can be done incrementally.
I'd think it should be zero in the -fno-strict-aliasing case.  My 
concern was that in the -fno-strict-aliasing case we seems to always 
assume the objects are the same.  Is that the safe thing to do?


That hints that the comment for the function needs tweaking.  It really 
doesn't say anything about what the return values really mean.




There are few, like we can ignore weak or visibility attribute because we do
produce alias with proper visibility anyway.  My plan is to start removing those
attributes from declarations once they are turned into suitable representation
in symbol table (or for attributes like const/noreturn/pure where we have 
explicit
decl flags).  This will make our life bit easier later, too.

We probably then can whitelist some attributes, but I would deal with this 
later.
Sure, I don't mind going with the conservative approach and iterating to 
remove some of the limitations.




Yep, there are no resonable orders on it.  If function starts same in source 
code they ought
to end up same here.  Plan was to first match for exact equality and then play 
with
smarter tricks here.
Yea, that's fine too.  It's conservatively correct and we may find that 
there just isn't much to gain by doing hte dfs walk to build indices and 
such for the CFG.  I guess ultimately I just want someone to look at 
that issue and evaluate if what we're doing now is good enough or if 
we're missing most of the benefit because of something like bb index 
stability or something dumb like that.




Yep, the pass has grown up to be rather long. The gimple equality testing is the
main body of work, so perhaps doing this in separate file is good idea.
I'd certainly like to see that happen both because of its size and 
because I think those bits are useful in other contexts.


Overall, most of the stuff looks quite reasonable.

jeff



Re: [PATCH 3/5] IPA ICF pass

2014-06-26 Thread Martin Liška


On 06/24/2014 10:31 PM, Jeff Law wrote:

On 06/13/14 04:44, mliska wrote:

Hello,
this is core of IPA ICF patchset. It adds new pass and registers all needed 
stuff related to a newly introduced interprocedural optimization.

Algorithm description:
   In LGEN, we visit all read-only variables and functions. For each symbol, a 
hash value based on e.g. number of arguments,
   number of BB, GIMPLE CODES is computed (similar hash is computed for 
read-only variables). This kind of information is streamed
   for LTO.

   In WPA, we build congruence classes for all symbols having a same hash 
value. For functions, these classes are subdivided in WPA by argument type 
comparison. Each reference (a call or a variable reference) to another semantic 
item candidate is marked and stored for further congruence class reduction 
(similar algorithm as Value Numbering: 
www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf).

   For every congruence class of functions with more than one semantic 
function, we load function body. Having this information, we can
   process complete semantic function equality and subdivide such congruence 
class. Read-only variable class members are also deeply compared.

   After that, we process Value numbering algorithm to do a final subdivision. 
Finally, all items belonging to a congruence class with more than one
   item are merged.

Martin

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Jan Hubicka  hubi...@ucw.cz

* Makefile.in: New pass object file added.
* common.opt: New -fipa-icf flag introduced.
* doc/invoke.texi: Documentation enhanced for the pass.
* lto-section-in.c: New LTO section for a summary created by IPA-ICF.
* lto-streamer.h: New section name introduced.
* opts.c: Optimization is added to -O2.
* passes.def: New pass added.
* timevar.def: New time var for IPA-ICF.
* tree-pass.h: Pass construction function.
* ipa-icf.h: New pass header file added.
* ipa-icf.c: New pass source file added.

Hi Jeff,
   I must agree that the implementation of the patch is quite big. Suggested 
split makes sense, I'll do it.

You'll note many of my comments are do you need to   You may in fact be 
handling that stuff correctly, they're just things I'd like you to verify are properly 
handled.  If they're properly handled just say so :-)

At a high level, I think this needs to be broken down a bit more. We've got two 
high level concepts in ipa-icf.  One is all the equivalence testing the other 
is using that information for the icf optimization.

Splitting out the equivalence testing seems like a good thing to do as there's 
other contexts where it would be useful.

Overall I think you're on the right path and we just need to iterate a bit on 
this part of the patchset.



@@ -7862,6 +7863,14 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.

+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.
+Behavior is similar to Gold Linker ICF optimization. Symbols proved
+as semantically equivalent are redirected to corresponding symbol. The pass
+sensitively decides for usage of alias, thunk or local redirection.
+This flag is enabled by default at @option{-O2}.

So you've added this at -O2, what is the general compile-time impact? Would it 
make more sense to instead have it be part of -O3, particularly since ICF is 
rarely going to improve performance (sans icache issues).

This was Honza's idea to put the optimization for -O2, I'll measure 
compile-time impact.




+
+/* Interprocedural Identical Code Folding for functions and
+   read-only variables.
+
+   The goal of this transformation is to discover functions and read-only
+   variables which do have exactly the same semantics.
+
+   In case of functions,
+   we could either create a virtual clone or do a simple function wrapper
+   that will call equivalent function. If the function is just locally visible,
+   all function calls can be redirected. For read-only variables, we create
+   aliases if possible.
+
+   Optimization pass arranges as follows:
+   1) All functions and read-only variables are visited and internal
+  data structure, either sem_function or sem_variables is created.
+   2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are
+  saved and matched to corresponding sem_items.

s/previoues/previous/


+   3) These declaration are ignored for equality check and are solved
+  by Value Numbering algorithm published by Alpert, Zadeck in 1992.
+   4) We compute hash value for each symbol.
+   5) Congruence classes are created based on hash value. If hash value are
+  equal, equals function is called and symbols are deeply compared.
+  We must prove that all SSA names, declarations and other items
+  correspond.
+   6) Value Numbering is executed for these classes. 

Re: [PATCH 3/5] IPA ICF pass

2014-06-26 Thread Jan Hubicka
Jeff,
thanks for review! I did some passes over the patch before it got to the ML, I 
am
happy to have independent opinion. 
 +@item -fipa-icf
 +@opindex fipa-icf
 +Perform Identical Code Folding for functions and read-only variables.
 +Behavior is similar to Gold Linker ICF optimization. Symbols proved
 +as semantically equivalent are redirected to corresponding symbol. The pass
 +sensitively decides for usage of alias, thunk or local redirection.
 +This flag is enabled by default at @option{-O2}.
 So you've added this at -O2, what is the general compile-time
 impact? Would it make more sense to instead have it be part of -O3,
 particularly since ICF is rarely going to improve performance (sans
 icache issues).

I think code size optimization not sacrifying any (or too much of) performance 
are
generally very welcome at -O2.  Compared to LLVM and Microsoft compilers we are
on code bloat side at -O2.

http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
has some numbers for -O2 GGC/LLVM.

I believe this is result of tunning for relatively small benchamrks (SPECS) and
I hope we could revisit -O2 for code size considerations for 4.10 somewhat.  If
tuned well, ICF has no reason to be expnesive wrt compile time. So lets shoot
for that.  The considerable donwside of enabling ICF IMO should be only
disturbing effect on debug info.
 +  return true;
 +}
 Isn't this really checking for equivalence? do correspond seems
 awkward here.

The function turns the names equivalent on first invocation for a given name
and later checks that this tentative equivalence holds.

Not sure what is best name for it (originaly it was verify that did not sound
right to me either)
 
 +
 +/* Verification function for edges E1 and E2.  */
 +
 +bool
 +func_checker::compare_edge (edge e1, edge e2)
 +{
 +  if (e1-flags != e2-flags)
 +return false;
 Presumably there's no flags we can safely ignore.  So absolute
 equality seems reasonable here.

Yep
 +/* Compare two types if are same aliases in case of strict aliasing
 +   is enabled.  */
 +bool
 +sem_item::compare_for_aliasing (tree t1, tree t2)
 +{
 +  if (flag_strict_aliasing)
 +{
 +  alias_set_type s1 = get_deref_alias_set (TREE_TYPE (t1));
 +  alias_set_type s2 = get_deref_alias_set (TREE_TYPE (t2));
 +
 +  return s1 == s2;
 +}
 +
 +  return true;
 +}
 Is returning TRUE really the conservatively correct thing to do in
 the absence of aliasing information?  Isn't that case really I
 don't know in which case the proper return value is FALSE?

I think with -fno-strict-aliasing the set should be 0 (Richi?) and thus we can
compare for equality.  We probably can be on agressive side and let 0 alias
set prevail the non-0.  But that can be done incrementally.

We also need to match type inheritance equality for polymorphic types. I will
add function for that into ipa-devirt.

 +/* References independent hash function.  */
 +
 +hashval_t
 +sem_function::get_hash (void)
 +{
 +  if(!hash)
 +{
 +  hash = 177454; /* Random number for function type.  */
 +
 +  hash = iterative_hash_object (arg_count, hash);
 +  hash = iterative_hash_object (bb_count, hash);
 +  hash = iterative_hash_object (edge_count, hash);
 +  hash = iterative_hash_object (cfg_checksum, hash);
 Does CFG_CHECKSUM encompass the bb/edge counts?

It is one used by profiling code to match profiles, so it should.
 +SE_EXIT_FALSE();
 +
 +  if (!equals_wpa (item))
 +return false;
 +
 +  /* Checking function arguments.  */
 +  tree decl1 = DECL_ATTRIBUTES (decl);
 +  tree decl2 = DECL_ATTRIBUTES (compared_func-decl);
 So are there any attributes we can safely ignore?  Probably not.
 However, we ought to handle the case where the attributes appear in
 different orders.

There are few, like we can ignore weak or visibility attribute because we do
produce alias with proper visibility anyway.  My plan is to start removing those
attributes from declarations once they are turned into suitable representation
in symbol table (or for attributes like const/noreturn/pure where we have 
explicit
decl flags).  This will make our life bit easier later, too.

We probably then can whitelist some attributes, but I would deal with this 
later.
 +/* Returns cgraph_node.  */
 +
 +struct cgraph_node *
 +sem_function::get_node (void)
 +{
 +  return cgraph (node);
 +}
 +
 +/* Initialize semantic item by info reachable during LTO WPA phase.  */
 +
 +void
 +sem_function::init_wpa (void)
 +{
 +  parse_tree_args ();
 +}
 inline? Worth or not worth the headache?

We ought to autoinline simple wrappers even at -Os (for size)
(I am not agains explicit inline keywords here tough)
 
 
 +
 +bool
 +sem_function::compare_bb (sem_bb_t *bb1, sem_bb_t *bb2, tree func1, tree 
 func2)
 So this routine walks down the gimple statements and compares them
 for equality.  Would it make sense to have the equality testing in
 gimple?  That way if someone adds a new gimple code the places they
 need to 

Re: [PATCH 3/5] IPA ICF pass

2014-06-24 Thread Jeff Law

On 06/13/14 04:44, mliska wrote:

Hello,
this is core of IPA ICF patchset. It adds new pass and registers all needed 
stuff related to a newly introduced interprocedural optimization.

Algorithm description:
   In LGEN, we visit all read-only variables and functions. For each symbol, a 
hash value based on e.g. number of arguments,
   number of BB, GIMPLE CODES is computed (similar hash is computed for 
read-only variables). This kind of information is streamed
   for LTO.

   In WPA, we build congruence classes for all symbols having a same hash 
value. For functions, these classes are subdivided in WPA by argument type 
comparison. Each reference (a call or a variable reference) to another semantic 
item candidate is marked and stored for further congruence class reduction 
(similar algorithm as Value Numbering:  
www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf).

   For every congruence class of functions with more than one semantic 
function, we load function body. Having this information, we can
   process complete semantic function equality and subdivide such congruence 
class. Read-only variable class members are also deeply compared.

   After that, we process Value numbering algorithm to do a final subdivision. 
Finally, all items belonging to a congruence class with more than one
   item are merged.

Martin

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Jan Hubicka  hubi...@ucw.cz

* Makefile.in: New pass object file added.
* common.opt: New -fipa-icf flag introduced.
* doc/invoke.texi: Documentation enhanced for the pass.
* lto-section-in.c: New LTO section for a summary created by IPA-ICF.
* lto-streamer.h: New section name introduced.
* opts.c: Optimization is added to -O2.
* passes.def: New pass added.
* timevar.def: New time var for IPA-ICF.
* tree-pass.h: Pass construction function.
* ipa-icf.h: New pass header file added.
* ipa-icf.c: New pass source file added.
You'll note many of my comments are do you need to   You may in 
fact be handling that stuff correctly, they're just things I'd like you 
to verify are properly handled.  If they're properly handled just say so :-)


At a high level, I think this needs to be broken down a bit more.  We've 
got two high level concepts in ipa-icf.  One is all the equivalence 
testing the other is using that information for the icf optimization.


Splitting out the equivalence testing seems like a good thing to do as 
there's other contexts where it would be useful.


Overall I think you're on the right path and we just need to iterate a 
bit on this part of the patchset.




@@ -7862,6 +7863,14 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.

+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.
+Behavior is similar to Gold Linker ICF optimization. Symbols proved
+as semantically equivalent are redirected to corresponding symbol. The pass
+sensitively decides for usage of alias, thunk or local redirection.
+This flag is enabled by default at @option{-O2}.
So you've added this at -O2, what is the general compile-time impact? 
Would it make more sense to instead have it be part of -O3, particularly 
since ICF is rarely going to improve performance (sans icache issues).




+
+/* Interprocedural Identical Code Folding for functions and
+   read-only variables.
+
+   The goal of this transformation is to discover functions and read-only
+   variables which do have exactly the same semantics.
+
+   In case of functions,
+   we could either create a virtual clone or do a simple function wrapper
+   that will call equivalent function. If the function is just locally visible,
+   all function calls can be redirected. For read-only variables, we create
+   aliases if possible.
+
+   Optimization pass arranges as follows:
+   1) All functions and read-only variables are visited and internal
+  data structure, either sem_function or sem_variables is created.
+   2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are
+  saved and matched to corresponding sem_items.

s/previoues/previous/


+   3) These declaration are ignored for equality check and are solved
+  by Value Numbering algorithm published by Alpert, Zadeck in 1992.
+   4) We compute hash value for each symbol.
+   5) Congruence classes are created based on hash value. If hash value are
+  equal, equals function is called and symbols are deeply compared.
+  We must prove that all SSA names, declarations and other items
+  correspond.
+   6) Value Numbering is executed for these classes. At the end of the process
+  all symbol members in remaining classes can be mrged.

s/mrged/merged.




+   7) Merge operation creates alias in case of read-only variables. For
+  callgraph