Re: [PATCH, devirtualization] Intraprocedural devirtualization pass

2011-11-02 Thread Richard Guenther
On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 the patch below is the second (and last) revived type-based
 devirtualization patch that did not make it to 4.6.  It deals with
 virtual calls from the function in which the there is also the object
 declaration:

 void foo()
 {
  A a;

  a.foo ();
 }

 Normally, the front-end would do the devirtualization on its own,
 however, early-inlining can create these situations too.  Since there
 is nothing interprocedural going on, the current inlining and IPA-CP
 devirtualization bits are of no help.  We do not do type-based
 devirtualization in OBJ_TYPE_REF folding either, because the necessary
 type-changing checks might make it quite expensive.

But in the above case - a is not address-taken - we do not need
type-based devirtualization.  So, can you explain why we do not
forward the proper vtable from the vtable store that would lead to
the known-type info?

Thanks,
Richard.

  Thus, this patch
 introduces a new pass to do that.

 The patch basically piggy-tails on the intraprocedural
 devirtualization mechanism, trying to construct a known-type jump
 function for all objects in OBJ_TYPE_REF calls and then immediately
 using it like we do in IPA-CP.

 The original patch was doing this as a part of
 pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
 a separate pass.  First, I scheduled it after
 pass_rebuild_cgraph_edges and was only traversing indirect edges,
 avoiding a sweep over all of the IL.  Unfortunately, this does not
 work in one scenario.  If the newly-known destination of a virtual
 call is known not to throw, we may have to purge some EH CFG edges and
 potentially basic blocks.  If these basic blocks contain calls
 (typically calls to object destructors), we may end up having stale
 call edges in the call graph... and our current approach to that
 problem is to call pass_rebuild_cgraph_edges.  I think that I was not
 running into this problem when the mechanism was a part of that pass
 just because of pure luck.  Anyway, this is why I eventually opted for
 a sweep over the statements.

 My best guess is that it is probably worth it, but only because the
 overhead should be still fairly low.  The pass triggers quite a number
 of times when building libstdc++ and it can speed up an artificial
 testcase which I will attach from over 20 seconds to 7s on my older
 desktop - it is very similar to the one I posted with the previous
 patch but this time the object constructors must not get early inlined
 but the function multiply_matrices has to.  Currently I have problems
 compiling Firefox even without LTO so I don't have any numbers from it
 either.  IIRC, Honza did not see this patch trigger there when he
 tried the ancient version almost a year go.  On the other hand, Maxim
 claimed that the impact can be noticeable on some code base he is
 concerned about.

 I have successfully bootstrapped and tested the patch on x86_64-linux.
 What do you think, should we include this in trunk?

 Thanks,

 Martin


 2011-10-31  Martin Jambor  mjam...@suse.cz

        * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
        * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and
        updated all callers.
        (intraprocedural_devirtualization): New function.
        (gate_intra_devirtualization): Likewise.
        (pass_intra_devirt): New pass.
        * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
        * passes.c (init_optimization_passes): Schedule pass_intra_devirt.
        * tree-pass.h (pass_intra_devirt): Declare.

        * testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
        * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.


 Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
 ===
 --- /dev/null
 +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
 @@ -0,0 +1,62 @@
 +/* Verify that virtual calls are folded even when a typecast to an
 +   ancestor is involved along the way.  */
 +/* { dg-do run } */
 +/* { dg-options -O2 -fdump-tree-devirt  } */
 +
 +extern C void abort (void);
 +
 +class A
 +{
 +public:
 +  int data;
 +  virtual int foo (int i);
 +};
 +
 +
 +class B : public A
 +{
 +public:
 +  __attribute__ ((noinline)) B();
 +  virtual int foo (int i);
 +};
 +
 +int __attribute__ ((noinline)) A::foo (int i)
 +{
 +  return i + 1;
 +}
 +
 +int __attribute__ ((noinline)) B::foo (int i)
 +{
 +  return i + 2;
 +}
 +
 +int __attribute__ ((noinline,noclone)) get_input(void)
 +{
 +  return 1;
 +}
 +
 +__attribute__ ((noinline)) B::B()
 +{
 +}
 +
 +static inline int middleman_1 (class A *obj, int i)
 +{
 +  return obj-foo (i);
 +}
 +
 +static inline int middleman_2 (class B *obj, int i)
 +{
 +  return middleman_1 (obj, i);
 +}
 +
 +int main (int argc, char *argv[])
 +{
 +  class B b;
 +
 +  if (middleman_2 (b, get_input ()) != 3)
 +    abort ();
 +  return 0;
 +}
 +
 +/* { dg-final { scan-tree-dump Immediately 

Re: [PATCH, devirtualization] Intraprocedural devirtualization pass

2011-11-02 Thread Martin Jambor
Hi,

On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote:
 On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  the patch below is the second (and last) revived type-based
  devirtualization patch that did not make it to 4.6.  It deals with
  virtual calls from the function in which the there is also the object
  declaration:
 
  void foo()
  {
   A a;
 
   a.foo ();
  }
 
  Normally, the front-end would do the devirtualization on its own,
  however, early-inlining can create these situations too.  Since there
  is nothing interprocedural going on, the current inlining and IPA-CP
  devirtualization bits are of no help.  We do not do type-based
  devirtualization in OBJ_TYPE_REF folding either, because the necessary
  type-changing checks might make it quite expensive.
 
 But in the above case - a is not address-taken - we do not need
 type-based devirtualization.  So, can you explain why we do not
 forward the proper vtable from the vtable store that would lead to
 the known-type info?
 

We can only do that when the object constructor is early-inlined.  If
it is only slightly bigger, or in a different compilation unit, the
store ends up in a different function (or unit) and we cannot do it.
LTO is not necessary with this approach and we still can be quite
helpful if foo is in the same TU and we can inline it.

Martin


 Thanks,
 Richard.
 
   Thus, this patch
  introduces a new pass to do that.
 
  The patch basically piggy-tails on the intraprocedural
  devirtualization mechanism, trying to construct a known-type jump
  function for all objects in OBJ_TYPE_REF calls and then immediately
  using it like we do in IPA-CP.
 
  The original patch was doing this as a part of
  pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
  a separate pass.  First, I scheduled it after
  pass_rebuild_cgraph_edges and was only traversing indirect edges,
  avoiding a sweep over all of the IL.  Unfortunately, this does not
  work in one scenario.  If the newly-known destination of a virtual
  call is known not to throw, we may have to purge some EH CFG edges and
  potentially basic blocks.  If these basic blocks contain calls
  (typically calls to object destructors), we may end up having stale
  call edges in the call graph... and our current approach to that
  problem is to call pass_rebuild_cgraph_edges.  I think that I was not
  running into this problem when the mechanism was a part of that pass
  just because of pure luck.  Anyway, this is why I eventually opted for
  a sweep over the statements.
 
  My best guess is that it is probably worth it, but only because the
  overhead should be still fairly low.  The pass triggers quite a number
  of times when building libstdc++ and it can speed up an artificial
  testcase which I will attach from over 20 seconds to 7s on my older
  desktop - it is very similar to the one I posted with the previous
  patch but this time the object constructors must not get early inlined
  but the function multiply_matrices has to.  Currently I have problems
  compiling Firefox even without LTO so I don't have any numbers from it
  either.  IIRC, Honza did not see this patch trigger there when he
  tried the ancient version almost a year go.  On the other hand, Maxim
  claimed that the impact can be noticeable on some code base he is
  concerned about.
 
  I have successfully bootstrapped and tested the patch on x86_64-linux.
  What do you think, should we include this in trunk?
 
  Thanks,
 
  Martin
 
 
  2011-10-31  Martin Jambor  mjam...@suse.cz
 
         * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
         * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and
         updated all callers.
         (intraprocedural_devirtualization): New function.
         (gate_intra_devirtualization): Likewise.
         (pass_intra_devirt): New pass.
         * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
         * passes.c (init_optimization_passes): Schedule pass_intra_devirt.
         * tree-pass.h (pass_intra_devirt): Declare.
 
         * testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
         * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.
 
 
  Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
  ===
  --- /dev/null
  +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
  @@ -0,0 +1,62 @@
  +/* Verify that virtual calls are folded even when a typecast to an
  +   ancestor is involved along the way.  */
  +/* { dg-do run } */
  +/* { dg-options -O2 -fdump-tree-devirt  } */
  +
  +extern C void abort (void);
  +
  +class A
  +{
  +public:
  +  int data;
  +  virtual int foo (int i);
  +};
  +
  +
  +class B : public A
  +{
  +public:
  +  __attribute__ ((noinline)) B();
  +  virtual int foo (int i);
  +};
  +
  +int __attribute__ ((noinline)) A::foo (int i)
  +{
  +  return i + 1;
  +}
  +
  +int __attribute__ ((noinline)) B::foo (int i)
 

Re: [PATCH, devirtualization] Intraprocedural devirtualization pass

2011-11-02 Thread Richard Guenther
On Wed, Nov 2, 2011 at 12:12 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote:
 On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  the patch below is the second (and last) revived type-based
  devirtualization patch that did not make it to 4.6.  It deals with
  virtual calls from the function in which the there is also the object
  declaration:
 
  void foo()
  {
   A a;
 
   a.foo ();
  }
 
  Normally, the front-end would do the devirtualization on its own,
  however, early-inlining can create these situations too.  Since there
  is nothing interprocedural going on, the current inlining and IPA-CP
  devirtualization bits are of no help.  We do not do type-based
  devirtualization in OBJ_TYPE_REF folding either, because the necessary
  type-changing checks might make it quite expensive.

 But in the above case - a is not address-taken - we do not need
 type-based devirtualization.  So, can you explain why we do not
 forward the proper vtable from the vtable store that would lead to
 the known-type info?


 We can only do that when the object constructor is early-inlined.  If
 it is only slightly bigger, or in a different compilation unit, the
 store ends up in a different function (or unit) and we cannot do it.
 LTO is not necessary with this approach and we still can be quite
 helpful if foo is in the same TU and we can inline it.

So we are using that fishy calls-cannot-change-object-types logic here?
Even if the actual call is the constructor?

Well, the cases seem to be pretty obscure - do we really want to add
another walk over the entire IL for that?  Can't you piggy-back on
some other transform for that?  Like for example FREs eliminate() walk,
which already does some of the devirt work?

Thanks,
Richard.

 Martin


 Thanks,
 Richard.

   Thus, this patch
  introduces a new pass to do that.
 
  The patch basically piggy-tails on the intraprocedural
  devirtualization mechanism, trying to construct a known-type jump
  function for all objects in OBJ_TYPE_REF calls and then immediately
  using it like we do in IPA-CP.
 
  The original patch was doing this as a part of
  pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
  a separate pass.  First, I scheduled it after
  pass_rebuild_cgraph_edges and was only traversing indirect edges,
  avoiding a sweep over all of the IL.  Unfortunately, this does not
  work in one scenario.  If the newly-known destination of a virtual
  call is known not to throw, we may have to purge some EH CFG edges and
  potentially basic blocks.  If these basic blocks contain calls
  (typically calls to object destructors), we may end up having stale
  call edges in the call graph... and our current approach to that
  problem is to call pass_rebuild_cgraph_edges.  I think that I was not
  running into this problem when the mechanism was a part of that pass
  just because of pure luck.  Anyway, this is why I eventually opted for
  a sweep over the statements.
 
  My best guess is that it is probably worth it, but only because the
  overhead should be still fairly low.  The pass triggers quite a number
  of times when building libstdc++ and it can speed up an artificial
  testcase which I will attach from over 20 seconds to 7s on my older
  desktop - it is very similar to the one I posted with the previous
  patch but this time the object constructors must not get early inlined
  but the function multiply_matrices has to.  Currently I have problems
  compiling Firefox even without LTO so I don't have any numbers from it
  either.  IIRC, Honza did not see this patch trigger there when he
  tried the ancient version almost a year go.  On the other hand, Maxim
  claimed that the impact can be noticeable on some code base he is
  concerned about.
 
  I have successfully bootstrapped and tested the patch on x86_64-linux.
  What do you think, should we include this in trunk?
 
  Thanks,
 
  Martin
 
 
  2011-10-31  Martin Jambor  mjam...@suse.cz
 
         * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
         * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported 
  and
         updated all callers.
         (intraprocedural_devirtualization): New function.
         (gate_intra_devirtualization): Likewise.
         (pass_intra_devirt): New pass.
         * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
         * passes.c (init_optimization_passes): Schedule pass_intra_devirt.
         * tree-pass.h (pass_intra_devirt): Declare.
 
         * testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
         * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.
 
 
  Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
  ===
  --- /dev/null
  +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
  @@ -0,0 +1,62 @@
  +/* Verify that virtual calls are folded even when a typecast to an
  +   ancestor is 

[PATCH, devirtualization] Intraprocedural devirtualization pass

2011-11-01 Thread Martin Jambor
Hi,

the patch below is the second (and last) revived type-based
devirtualization patch that did not make it to 4.6.  It deals with
virtual calls from the function in which the there is also the object
declaration:

void foo()
{
  A a;

  a.foo ();
}

Normally, the front-end would do the devirtualization on its own,
however, early-inlining can create these situations too.  Since there
is nothing interprocedural going on, the current inlining and IPA-CP
devirtualization bits are of no help.  We do not do type-based
devirtualization in OBJ_TYPE_REF folding either, because the necessary
type-changing checks might make it quite expensive.  Thus, this patch
introduces a new pass to do that.

The patch basically piggy-tails on the intraprocedural
devirtualization mechanism, trying to construct a known-type jump
function for all objects in OBJ_TYPE_REF calls and then immediately
using it like we do in IPA-CP.

The original patch was doing this as a part of
pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
a separate pass.  First, I scheduled it after
pass_rebuild_cgraph_edges and was only traversing indirect edges,
avoiding a sweep over all of the IL.  Unfortunately, this does not
work in one scenario.  If the newly-known destination of a virtual
call is known not to throw, we may have to purge some EH CFG edges and
potentially basic blocks.  If these basic blocks contain calls
(typically calls to object destructors), we may end up having stale
call edges in the call graph... and our current approach to that
problem is to call pass_rebuild_cgraph_edges.  I think that I was not
running into this problem when the mechanism was a part of that pass
just because of pure luck.  Anyway, this is why I eventually opted for
a sweep over the statements.

My best guess is that it is probably worth it, but only because the
overhead should be still fairly low.  The pass triggers quite a number
of times when building libstdc++ and it can speed up an artificial
testcase which I will attach from over 20 seconds to 7s on my older
desktop - it is very similar to the one I posted with the previous
patch but this time the object constructors must not get early inlined
but the function multiply_matrices has to.  Currently I have problems
compiling Firefox even without LTO so I don't have any numbers from it
either.  IIRC, Honza did not see this patch trigger there when he
tried the ancient version almost a year go.  On the other hand, Maxim
claimed that the impact can be noticeable on some code base he is
concerned about.

I have successfully bootstrapped and tested the patch on x86_64-linux.
What do you think, should we include this in trunk?

Thanks,

Martin


2011-10-31  Martin Jambor  mjam...@suse.cz

* ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
* ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and
updated all callers.
(intraprocedural_devirtualization): New function.
(gate_intra_devirtualization): Likewise.
(pass_intra_devirt): New pass.
* ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
* passes.c (init_optimization_passes): Schedule pass_intra_devirt.
* tree-pass.h (pass_intra_devirt): Declare.

* testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
* testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.


Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
@@ -0,0 +1,62 @@
+/* Verify that virtual calls are folded even when a typecast to an
+   ancestor is involved along the way.  */
+/* { dg-do run } */
+/* { dg-options -O2 -fdump-tree-devirt  } */
+
+extern C void abort (void);
+
+class A
+{
+public:
+  int data;
+  virtual int foo (int i);
+};
+
+
+class B : public A
+{
+public:
+  __attribute__ ((noinline)) B();
+  virtual int foo (int i);
+};
+
+int __attribute__ ((noinline)) A::foo (int i)
+{
+  return i + 1;
+}
+
+int __attribute__ ((noinline)) B::foo (int i)
+{
+  return i + 2;
+}
+
+int __attribute__ ((noinline,noclone)) get_input(void)
+{
+  return 1;
+}
+
+__attribute__ ((noinline)) B::B()
+{
+}
+
+static inline int middleman_1 (class A *obj, int i)
+{
+  return obj-foo (i);
+}
+
+static inline int middleman_2 (class B *obj, int i)
+{
+  return middleman_1 (obj, i);
+}
+
+int main (int argc, char *argv[])
+{
+  class B b;
+
+  if (middleman_2 (b, get_input ()) != 3)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump Immediately devirtualizing call.*into.*B::foo 
 devirt  } } */
+/* { dg-final { cleanup-tree-dump devirt } } */
Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
@@ -0,0 +1,95 @@
+/* Verify that virtual calls are folded even when a typecast to an
+   ancestor is involved along the way.  */
+/* { dg-do run }