Re: [PATCH, devirtualization] Intraprocedural devirtualization pass
On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote: Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. But in the above case - a is not address-taken - we do not need type-based devirtualization. So, can you explain why we do not forward the proper vtable from the vtable store that would lead to the known-type info? Thanks, Richard. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-devirt } */ + +extern C void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public A +{ +public: + __attribute__ ((noinline)) B(); + virtual int foo (int i); +}; + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +__attribute__ ((noinline)) B::B() +{ +} + +static inline int middleman_1 (class A *obj, int i) +{ + return obj-foo (i); +} + +static inline int middleman_2 (class B *obj, int i) +{ + return middleman_1 (obj, i); +} + +int main (int argc, char *argv[]) +{ + class B b; + + if (middleman_2 (b, get_input ()) != 3) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump Immediately
Re: [PATCH, devirtualization] Intraprocedural devirtualization pass
Hi, On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote: On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote: Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. But in the above case - a is not address-taken - we do not need type-based devirtualization. So, can you explain why we do not forward the proper vtable from the vtable store that would lead to the known-type info? We can only do that when the object constructor is early-inlined. If it is only slightly bigger, or in a different compilation unit, the store ends up in a different function (or unit) and we cannot do it. LTO is not necessary with this approach and we still can be quite helpful if foo is in the same TU and we can inline it. Martin Thanks, Richard. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-devirt } */ + +extern C void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public A +{ +public: + __attribute__ ((noinline)) B(); + virtual int foo (int i); +}; + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i)
Re: [PATCH, devirtualization] Intraprocedural devirtualization pass
On Wed, Nov 2, 2011 at 12:12 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote: On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor mjam...@suse.cz wrote: Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. But in the above case - a is not address-taken - we do not need type-based devirtualization. So, can you explain why we do not forward the proper vtable from the vtable store that would lead to the known-type info? We can only do that when the object constructor is early-inlined. If it is only slightly bigger, or in a different compilation unit, the store ends up in a different function (or unit) and we cannot do it. LTO is not necessary with this approach and we still can be quite helpful if foo is in the same TU and we can inline it. So we are using that fishy calls-cannot-change-object-types logic here? Even if the actual call is the constructor? Well, the cases seem to be pretty obscure - do we really want to add another walk over the entire IL for that? Can't you piggy-back on some other transform for that? Like for example FREs eliminate() walk, which already does some of the devirt work? Thanks, Richard. Martin Thanks, Richard. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is
[PATCH, devirtualization] Intraprocedural devirtualization pass
Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-devirt } */ + +extern C void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public A +{ +public: + __attribute__ ((noinline)) B(); + virtual int foo (int i); +}; + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +__attribute__ ((noinline)) B::B() +{ +} + +static inline int middleman_1 (class A *obj, int i) +{ + return obj-foo (i); +} + +static inline int middleman_2 (class B *obj, int i) +{ + return middleman_1 (obj, i); +} + +int main (int argc, char *argv[]) +{ + class B b; + + if (middleman_2 (b, get_input ()) != 3) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump Immediately devirtualizing call.*into.*B::foo devirt } } */ +/* { dg-final { cleanup-tree-dump devirt } } */ Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C @@ -0,0 +1,95 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run }