Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-12-04 Thread Jan Hubicka
 
 I did not really like it either (although I must say that in general I
 really dislike the need to call expand_thunk but telling it to not

Yeah, I am thinking about simply teaching inliner to handle them.  Given the 
issues
with extra copies for values passed by reference it may be better solution.

 really expand anything if possible).  So here is another approach,
 which works by delaying expansion of thunks but only after all
 redirections within create_clone are done.  This has the advantage
 that the new method expand_all_artificial_thunks is not exposed to
 every (indirect) user of cgraph_node::create_clone (as opposed to
 requiring them all to call that function at some point after making
 any redirections at all).  However, it also has the disadvantage that
 in the unlikeliest of circumstances (strongly connected components in
 IPA-CP value dependencies connected by thunks and these thunks ceasing
 to be assembly thunks in the course of cloning at the same time),
 IPA-CP might still create extra gimple-expanded thunks.

This is because you expand the thunks during creating and lose track of them?
 
 The alternative not suffering from this problem would be either
 exposing expand_all_artificial_thunks to all users and requiring them
 to call it, with IPA-CP doing it for all created clones at the very
 end of propagation, or teaching the infrastructure and pass manager
 about this and then expanding them at some convenient time
 (before/after inlining?).  However, the first option seems to be too
 ugly for very little benefit (but I am willing to implement that) and
 the second does not seem to be stage3 material.
 
 What do you think?  Bootstrapped and tested on x86_64-linux.

Hmm, I think thunks in SCC regions are rather rare and small.  Lets go with this
and try to look whether we can't get thunks more regular in inliner.
 
 Martin
 
 
 2014-12-02  Martin Jambor  mjam...@suse.cz
 
   * cgraph.h (cgraph_node): New method expand_all_artificial_thunks.
   (cgraph_edge): New method redirect_callee_duplicating_thunks.
   * cgraphclones.c (duplicate_thunk_for_node): Donot expand newly
   created thunks.
   (redirect_edge_duplicating_thunks): Turned into edge method
   redirect_callee_duplicating_thunks.
   (cgraph_node::expand_all_artificial_thunks): New method.
   (create_clone): Call expand_all_artificial_thunks.
   * ipa-cp.c (perhaps_add_new_callers): Call
   redirect_callee_duplicating_thunks instead of redirect_callee.
   Also call expand_all_artificial_thunks.

OK
Honza
 
 Index: src/gcc/cgraph.h
 ===
 --- src.orig/gcc/cgraph.h
 +++ src/gcc/cgraph.h
 @@ -908,6 +908,10 @@ public:
   thunks that are not lowered.  */
bool expand_thunk (bool output_asm_thunks, bool force_gimple_thunk);
  
 +  /*  Call expand_thunk on all callers that are thunks and if analyze those
 +  nodes that were expanded.  */
 +  void expand_all_artificial_thunks ();
 +
/* Assemble thunks and aliases associated to node.  */
void assemble_thunks_and_aliases (void);
  
 @@ -1477,6 +1481,12 @@ struct GTY((chain_next (%h.next_caller
   call expression.  */
void redirect_callee (cgraph_node *n);
  
 +  /* If the edge does not lead to a thunk, simply redirect it to N.  
 Otherwise
 + create one or more equivalent thunks for N and redirect E to the first 
 in
 + the chain.  Note that it is then necessary to call
 + n-expand_all_artificial_thunks once all callers are redirected.  */
 +  void redirect_callee_duplicating_thunks (cgraph_node *n);
 +
/* Make an indirect edge with an unknown callee an ordinary edge leading to
   CALLEE.  DELTA is an integer constant that is to be added to the this
   pointer (first parameter) to compensate for skipping
 Index: src/gcc/cgraphclones.c
 ===
 --- src.orig/gcc/cgraphclones.c
 +++ src/gcc/cgraphclones.c
 @@ -370,28 +370,47 @@ duplicate_thunk_for_node (cgraph_node *t
 CGRAPH_FREQ_BASE);
e-call_stmt_cannot_inline_p = true;
symtab-call_edge_duplication_hooks (thunk-callees, e);
 -  if (new_thunk-expand_thunk (false, false))
 -{
 -  new_thunk-thunk.thunk_p = false;
 -  new_thunk-analyze ();
 -}
 -
symtab-call_cgraph_duplication_hooks (thunk, new_thunk);
return new_thunk;
  }
  
  /* If E does not lead to a thunk, simply redirect it to N.  Otherwise create
 one or more equivalent thunks for N and redirect E to the first in the
 -   chain.  */
 +   chain.  Note that it is then necessary to call
 +   n-expand_all_artificial_thunks once all callers are redirected.  */
  
  void
 -redirect_edge_duplicating_thunks (cgraph_edge *e, cgraph_node *n)
 +cgraph_edge::redirect_callee_duplicating_thunks (cgraph_node *n)
  {
 -  cgraph_node *orig_to = e-callee-ultimate_alias_target ();
 +  cgraph_node 

Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-12-02 Thread Martin Jambor
Hi,

On Mon, Dec 01, 2014 at 10:43:19PM +0100, Jan Hubicka wrote:
  On Fri, Nov 21, 2014 at 08:18:12PM +0100, Martin Jambor wrote:
   Hi,
   
   when debugging PR 63814 I noticed that when cgraph_node::create_clone
   was using redirect_edge_duplicating_thunks to redirect two edges to a
   thunk of a clone, two thunks were created, one for each edge.  The
   reason is that even though duplicate_thunk_for_node attempts to locate
   an already created thunk, it does so by looking for a caller with
   thunk.thunk_p set and the previously created one does not have it set
   because (on i686) expand_thunk has expanded the thunk to gimple and
   cleared the flag.
   
   This patch fixes the issue by marking such expanded thunks with yet
   another flag and then uses the flag to identify such expanded thunks.
   Bootstrapped and tested on x86_64-linux and i686-linux.  Honza, do you
   think this is a good approach?  Is the patch OK for trunk?
 
 Hmm, I was sort of hoping to drop the whole .thunk structure into summary and
 do not keep it for functions with Gimple body, so this change will imply the
 need to keep this datastructure for thunks that are already expanded.
 moreover we do not stream the flag, so not all expanded thunks are handled,
 only those that was introduced during WPA.
 
 It may be cleaner to simply donot expand thunks proactively before all
 the clonning is finished?
 

I did not really like it either (although I must say that in general I
really dislike the need to call expand_thunk but telling it to not
really expand anything if possible).  So here is another approach,
which works by delaying expansion of thunks but only after all
redirections within create_clone are done.  This has the advantage
that the new method expand_all_artificial_thunks is not exposed to
every (indirect) user of cgraph_node::create_clone (as opposed to
requiring them all to call that function at some point after making
any redirections at all).  However, it also has the disadvantage that
in the unlikeliest of circumstances (strongly connected components in
IPA-CP value dependencies connected by thunks and these thunks ceasing
to be assembly thunks in the course of cloning at the same time),
IPA-CP might still create extra gimple-expanded thunks.

The alternative not suffering from this problem would be either
exposing expand_all_artificial_thunks to all users and requiring them
to call it, with IPA-CP doing it for all created clones at the very
end of propagation, or teaching the infrastructure and pass manager
about this and then expanding them at some convenient time
(before/after inlining?).  However, the first option seems to be too
ugly for very little benefit (but I am willing to implement that) and
the second does not seem to be stage3 material.

What do you think?  Bootstrapped and tested on x86_64-linux.

Martin


2014-12-02  Martin Jambor  mjam...@suse.cz

* cgraph.h (cgraph_node): New method expand_all_artificial_thunks.
(cgraph_edge): New method redirect_callee_duplicating_thunks.
* cgraphclones.c (duplicate_thunk_for_node): Donot expand newly
created thunks.
(redirect_edge_duplicating_thunks): Turned into edge method
redirect_callee_duplicating_thunks.
(cgraph_node::expand_all_artificial_thunks): New method.
(create_clone): Call expand_all_artificial_thunks.
* ipa-cp.c (perhaps_add_new_callers): Call
redirect_callee_duplicating_thunks instead of redirect_callee.
Also call expand_all_artificial_thunks.

Index: src/gcc/cgraph.h
===
--- src.orig/gcc/cgraph.h
+++ src/gcc/cgraph.h
@@ -908,6 +908,10 @@ public:
  thunks that are not lowered.  */
   bool expand_thunk (bool output_asm_thunks, bool force_gimple_thunk);
 
+  /*  Call expand_thunk on all callers that are thunks and if analyze those
+  nodes that were expanded.  */
+  void expand_all_artificial_thunks ();
+
   /* Assemble thunks and aliases associated to node.  */
   void assemble_thunks_and_aliases (void);
 
@@ -1477,6 +1481,12 @@ struct GTY((chain_next (%h.next_caller
  call expression.  */
   void redirect_callee (cgraph_node *n);
 
+  /* If the edge does not lead to a thunk, simply redirect it to N.  Otherwise
+ create one or more equivalent thunks for N and redirect E to the first in
+ the chain.  Note that it is then necessary to call
+ n-expand_all_artificial_thunks once all callers are redirected.  */
+  void redirect_callee_duplicating_thunks (cgraph_node *n);
+
   /* Make an indirect edge with an unknown callee an ordinary edge leading to
  CALLEE.  DELTA is an integer constant that is to be added to the this
  pointer (first parameter) to compensate for skipping
Index: src/gcc/cgraphclones.c
===
--- src.orig/gcc/cgraphclones.c
+++ src/gcc/cgraphclones.c
@@ -370,28 +370,47 @@ 

Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-12-01 Thread Martin Jambor
Ping.

Thx,

Martin


On Fri, Nov 21, 2014 at 08:18:12PM +0100, Martin Jambor wrote:
 Hi,
 
 when debugging PR 63814 I noticed that when cgraph_node::create_clone
 was using redirect_edge_duplicating_thunks to redirect two edges to a
 thunk of a clone, two thunks were created, one for each edge.  The
 reason is that even though duplicate_thunk_for_node attempts to locate
 an already created thunk, it does so by looking for a caller with
 thunk.thunk_p set and the previously created one does not have it set
 because (on i686) expand_thunk has expanded the thunk to gimple and
 cleared the flag.
 
 This patch fixes the issue by marking such expanded thunks with yet
 another flag and then uses the flag to identify such expanded thunks.
 Bootstrapped and tested on x86_64-linux and i686-linux.  Honza, do you
 think this is a good approach?  Is the patch OK for trunk?
 
 Thanks,
 
 Martin
 
 
 2014-11-21  Martin Jambor  mjam...@suse.cz
 
   * cgraph.h (cgraph_thunk_info): Converted thunk_p to a bit-field.
   Added new flag expanded_thunk_p.
   * cgraphunit.c (expand_thunk): Set expanded_thunk_p when appropriate.
   * cgraphclones.c (duplicate_thunk_for_node): Also re-use an expanded
   thunk if available.
 
 Index: src/gcc/cgraph.h
 ===
 --- src.orig/gcc/cgraph.h
 +++ src/gcc/cgraph.h
 @@ -552,7 +552,9 @@ struct GTY(()) cgraph_thunk_info {
bool virtual_offset_p;
bool add_pointer_bounds_args;
/* Set to true when alias node is thunk.  */
 -  bool thunk_p;
 +  unsigned thunk_p : 1;
 +  /* Set when this is an already expanded thunk.  */
 +  unsigned expanded_thunk_p : 1;
  };
  
  /* Information about the function collected locally.
 Index: src/gcc/cgraphclones.c
 ===
 --- src.orig/gcc/cgraphclones.c
 +++ src/gcc/cgraphclones.c
 @@ -311,7 +311,7 @@ duplicate_thunk_for_node (cgraph_node *t
  
cgraph_edge *cs;
for (cs = node-callers; cs; cs = cs-next_caller)
 -if (cs-caller-thunk.thunk_p
 +if ((cs-caller-thunk.thunk_p || cs-caller-thunk.expanded_thunk_p)
cs-caller-thunk.this_adjusting == thunk-thunk.this_adjusting
cs-caller-thunk.fixed_offset == thunk-thunk.fixed_offset
cs-caller-thunk.virtual_offset_p == thunk-thunk.virtual_offset_p
 Index: src/gcc/cgraphunit.c
 ===
 --- src.orig/gcc/cgraphunit.c
 +++ src/gcc/cgraphunit.c
 @@ -1504,6 +1504,7 @@ cgraph_node::expand_thunk (bool output_a
set_cfun (NULL);
TREE_ASM_WRITTEN (thunk_fndecl) = 1;
thunk.thunk_p = false;
 +  thunk.expanded_thunk_p = true;
analyzed = false;
  }
else
 @@ -1686,6 +1687,7 @@ cgraph_node::expand_thunk (bool output_a
/* Since we want to emit the thunk, we explicitly mark its name as
referenced.  */
thunk.thunk_p = false;
 +  thunk.expanded_thunk_p = true;
lowered = true;
bitmap_obstack_release (NULL);
  }


Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-12-01 Thread Jan Hubicka
 On Fri, Nov 21, 2014 at 08:18:12PM +0100, Martin Jambor wrote:
  Hi,
  
  when debugging PR 63814 I noticed that when cgraph_node::create_clone
  was using redirect_edge_duplicating_thunks to redirect two edges to a
  thunk of a clone, two thunks were created, one for each edge.  The
  reason is that even though duplicate_thunk_for_node attempts to locate
  an already created thunk, it does so by looking for a caller with
  thunk.thunk_p set and the previously created one does not have it set
  because (on i686) expand_thunk has expanded the thunk to gimple and
  cleared the flag.
  
  This patch fixes the issue by marking such expanded thunks with yet
  another flag and then uses the flag to identify such expanded thunks.
  Bootstrapped and tested on x86_64-linux and i686-linux.  Honza, do you
  think this is a good approach?  Is the patch OK for trunk?

Hmm, I was sort of hoping to drop the whole .thunk structure into summary and
do not keep it for functions with Gimple body, so this change will imply the
need to keep this datastructure for thunks that are already expanded.
moreover we do not stream the flag, so not all expanded thunks are handled,
only those that was introduced during WPA.

It may be cleaner to simply donot expand thunks proactively before all
the clonning is finished?

Honza
  
  Thanks,
  
  Martin
  
  
  2014-11-21  Martin Jambor  mjam...@suse.cz
  
  * cgraph.h (cgraph_thunk_info): Converted thunk_p to a bit-field.
  Added new flag expanded_thunk_p.
  * cgraphunit.c (expand_thunk): Set expanded_thunk_p when appropriate.
  * cgraphclones.c (duplicate_thunk_for_node): Also re-use an expanded
  thunk if available.
  
  Index: src/gcc/cgraph.h
  ===
  --- src.orig/gcc/cgraph.h
  +++ src/gcc/cgraph.h
  @@ -552,7 +552,9 @@ struct GTY(()) cgraph_thunk_info {
 bool virtual_offset_p;
 bool add_pointer_bounds_args;
 /* Set to true when alias node is thunk.  */
  -  bool thunk_p;
  +  unsigned thunk_p : 1;
  +  /* Set when this is an already expanded thunk.  */
  +  unsigned expanded_thunk_p : 1;
   };
   
   /* Information about the function collected locally.
  Index: src/gcc/cgraphclones.c
  ===
  --- src.orig/gcc/cgraphclones.c
  +++ src/gcc/cgraphclones.c
  @@ -311,7 +311,7 @@ duplicate_thunk_for_node (cgraph_node *t
   
 cgraph_edge *cs;
 for (cs = node-callers; cs; cs = cs-next_caller)
  -if (cs-caller-thunk.thunk_p
  +if ((cs-caller-thunk.thunk_p || cs-caller-thunk.expanded_thunk_p)
   cs-caller-thunk.this_adjusting == thunk-thunk.this_adjusting
   cs-caller-thunk.fixed_offset == thunk-thunk.fixed_offset
   cs-caller-thunk.virtual_offset_p == thunk-thunk.virtual_offset_p
  Index: src/gcc/cgraphunit.c
  ===
  --- src.orig/gcc/cgraphunit.c
  +++ src/gcc/cgraphunit.c
  @@ -1504,6 +1504,7 @@ cgraph_node::expand_thunk (bool output_a
 set_cfun (NULL);
 TREE_ASM_WRITTEN (thunk_fndecl) = 1;
 thunk.thunk_p = false;
  +  thunk.expanded_thunk_p = true;
 analyzed = false;
   }
 else
  @@ -1686,6 +1687,7 @@ cgraph_node::expand_thunk (bool output_a
 /* Since we want to emit the thunk, we explicitly mark its name as
   referenced.  */
 thunk.thunk_p = false;
  +  thunk.expanded_thunk_p = true;
 lowered = true;
 bitmap_obstack_release (NULL);
   }


[PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-11-21 Thread Martin Jambor
Hi,

when debugging PR 63814 I noticed that when cgraph_node::create_clone
was using redirect_edge_duplicating_thunks to redirect two edges to a
thunk of a clone, two thunks were created, one for each edge.  The
reason is that even though duplicate_thunk_for_node attempts to locate
an already created thunk, it does so by looking for a caller with
thunk.thunk_p set and the previously created one does not have it set
because (on i686) expand_thunk has expanded the thunk to gimple and
cleared the flag.

This patch fixes the issue by marking such expanded thunks with yet
another flag and then uses the flag to identify such expanded thunks.
Bootstrapped and tested on x86_64-linux and i686-linux.  Honza, do you
think this is a good approach?  Is the patch OK for trunk?

Thanks,

Martin


2014-11-21  Martin Jambor  mjam...@suse.cz

* cgraph.h (cgraph_thunk_info): Converted thunk_p to a bit-field.
Added new flag expanded_thunk_p.
* cgraphunit.c (expand_thunk): Set expanded_thunk_p when appropriate.
* cgraphclones.c (duplicate_thunk_for_node): Also re-use an expanded
thunk if available.

Index: src/gcc/cgraph.h
===
--- src.orig/gcc/cgraph.h
+++ src/gcc/cgraph.h
@@ -552,7 +552,9 @@ struct GTY(()) cgraph_thunk_info {
   bool virtual_offset_p;
   bool add_pointer_bounds_args;
   /* Set to true when alias node is thunk.  */
-  bool thunk_p;
+  unsigned thunk_p : 1;
+  /* Set when this is an already expanded thunk.  */
+  unsigned expanded_thunk_p : 1;
 };
 
 /* Information about the function collected locally.
Index: src/gcc/cgraphclones.c
===
--- src.orig/gcc/cgraphclones.c
+++ src/gcc/cgraphclones.c
@@ -311,7 +311,7 @@ duplicate_thunk_for_node (cgraph_node *t
 
   cgraph_edge *cs;
   for (cs = node-callers; cs; cs = cs-next_caller)
-if (cs-caller-thunk.thunk_p
+if ((cs-caller-thunk.thunk_p || cs-caller-thunk.expanded_thunk_p)
 cs-caller-thunk.this_adjusting == thunk-thunk.this_adjusting
 cs-caller-thunk.fixed_offset == thunk-thunk.fixed_offset
 cs-caller-thunk.virtual_offset_p == thunk-thunk.virtual_offset_p
Index: src/gcc/cgraphunit.c
===
--- src.orig/gcc/cgraphunit.c
+++ src/gcc/cgraphunit.c
@@ -1504,6 +1504,7 @@ cgraph_node::expand_thunk (bool output_a
   set_cfun (NULL);
   TREE_ASM_WRITTEN (thunk_fndecl) = 1;
   thunk.thunk_p = false;
+  thunk.expanded_thunk_p = true;
   analyzed = false;
 }
   else
@@ -1686,6 +1687,7 @@ cgraph_node::expand_thunk (bool output_a
   /* Since we want to emit the thunk, we explicitly mark its name as
 referenced.  */
   thunk.thunk_p = false;
+  thunk.expanded_thunk_p = true;
   lowered = true;
   bitmap_obstack_release (NULL);
 }