Re: [PATCH][v6] GIMPLE store merging pass
On Mon, 24 Oct 2016, Kyrill Tkachov wrote: > > On 21/10/16 19:37, Richard Biener wrote: > > On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov > >wrote: > > > Hi Richard, > > > > > > On 21/10/16 13:37, Richard Biener wrote: > > > > On Tue, 18 Oct 2016, Kyrill Tkachov wrote: > > > > > > > > > Hi Richard, > > > > > > > > > > This patch is a merge of [1] and [2] and implements the manual > > > merging of > > > > > bitfields > > > > > as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN > > > too. > > > > > It caused me a lot of headeache because the bit offset is counted > > >from the > > > > > most significant bit > > > > > in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN > > > looks > > > > > irrelevant for store merging > > > > > anyway as it's just used to described RTL extract operations). > > > > > I've included ASCII diagrams of the steps in the merging algorithm. > > > > Heh, thanks. > > > > > > > > > Bootstrapped and tested on arm-none-linux-gnueabihf, > > > aarch64-none-linux-gnu, > > > > > x86_64-unknown-linux-gnu. > > > > > Also tested on aarch64_be-none-elf. > > > > > > > > > > How does this version look now? > > > > Mostly good. For > > > > > > > > +bool > > > > +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree > > > base, > > > > + gimple *stmt) > > > > +{ > > > > ... > > > > + /* Check if the assignment destination (BASE) is part of a store > > > chain. > > > > + This is to catch non-constant stores to destinations that may > > > be > > > > part > > > > + of a chain. */ > > > > + if (base) > > > > +{ > > > > + chain_info = m_stores.get (base); > > > > + if (chain_info) > > > > + { > > > > + struct store_immediate_info *info; > > > > + unsigned int i; > > > > + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) > > > > + { > > > > + if (refs_may_alias_p (info->dest, dest)) > > > > + { > > > > > > > > I suppose the chain is not yet sorted in any way? > > > > > > > > At least for 'dest' which do not have a known constant offset we > > > > could do > > > > > > > > if (base) > > > >terminate_and_release_chain (base); > > > Do you mean when get_inner_reference returns non-NULL for POFFSET? > > Yes. > > > > > Or do you think we should try to look into dest in this function? > > > > > > > to speed things up? IIRC we do not terminate chains early in > > > > this phase when we have enough stores to form a group, so > > > > writing a testcase that triggers quadraticness would be as simple > > > > as having > > > > > > > > char a[100]; > > > > > > > > void foo () > > > > { > > > >a[0] = 1; > > > >a[1] = 2; > > > > > > > >a[999] = 3; > > > > } > > > > > > > > ? > > > > > > > > so I think you probably want to limit the number of stores you > > > > ever put onto a chain and if you reach that limit, terminate > > > > and release it? Like just choose 16 or 64? (and experiment > > > > with the above kind of testcases) > > > I was initially thinking of imposing such a limit as well but > > > later I thought we'd want to extend the output code to be able to emit > > > a memcpy (or memset) call for large regions, so detecting the largest > > > possible > > > regions would be needed. > > But then we need a better data structure here to avoid the quadraticness. > > For the testcase: > > char a[100]; > > void foo () > { > a[0] = 1; > a[1] = 2; > > a[999] = 3; > } > > and similar we don't have quadratic behaviour because constant stores to a > constant offset > from base (i.e. stores valid for merging) get automatically inserted into the > chain without > doing the alias checking (we keep track of their program order so that > overlapping stores > are merged properly). Since pushing into a vec is not a linear operation we're > fine. Good. > Once I teach it to quickly terminate chains when encountering stores to base > with a variable > offset (like you suggest above) there is only one case of quadraticness: > We walk the stores in the chain (causing quadratic behaviour) when among > the valid mergeable stores we have interspersed stores to a constant offset > from base but that > do not store a constant value e.g. a[5] = x; > So the input needs to be something like: > > char a[16000]; > > void > foo (char x) > { > a[0] = 1; a[8000 + 0] = x; > a[1] = 1; a[8000 + 1] = x; > a[2] = 1; a[8000 + 2] = x; > a[3] = 1; a[8000 + 3] = x; > a[4] = 1; a[8000 + 4] = x; > ... > a[7999] = 1; a[8000 + 7999] = x; > } > > I've reproduced a slowdown (with store merging taking a disproportional > time in -ftime-report) with this case and I've verified that limiting the > maximum number of statements in the chain fixes it. OK - as said, the algorithm trivially looks quadratic in some cases and we should
Re: [PATCH][v6] GIMPLE store merging pass
On 21/10/16 19:37, Richard Biener wrote: On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachovwrote: Hi Richard, On 21/10/16 13:37, Richard Biener wrote: On Tue, 18 Oct 2016, Kyrill Tkachov wrote: Hi Richard, This patch is a merge of [1] and [2] and implements the manual merging of bitfields as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too. It caused me a lot of headeache because the bit offset is counted >from the most significant bit in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks irrelevant for store merging anyway as it's just used to described RTL extract operations). I've included ASCII diagrams of the steps in the merging algorithm. Heh, thanks. Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-unknown-linux-gnu. Also tested on aarch64_be-none-elf. How does this version look now? Mostly good. For +bool +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base, + gimple *stmt) +{ ... + /* Check if the assignment destination (BASE) is part of a store chain. + This is to catch non-constant stores to destinations that may be part + of a chain. */ + if (base) +{ + chain_info = m_stores.get (base); + if (chain_info) + { + struct store_immediate_info *info; + unsigned int i; + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) + { + if (refs_may_alias_p (info->dest, dest)) + { I suppose the chain is not yet sorted in any way? At least for 'dest' which do not have a known constant offset we could do if (base) terminate_and_release_chain (base); Do you mean when get_inner_reference returns non-NULL for POFFSET? Yes. Or do you think we should try to look into dest in this function? to speed things up? IIRC we do not terminate chains early in this phase when we have enough stores to form a group, so writing a testcase that triggers quadraticness would be as simple as having char a[100]; void foo () { a[0] = 1; a[1] = 2; a[999] = 3; } ? so I think you probably want to limit the number of stores you ever put onto a chain and if you reach that limit, terminate and release it? Like just choose 16 or 64? (and experiment with the above kind of testcases) I was initially thinking of imposing such a limit as well but later I thought we'd want to extend the output code to be able to emit a memcpy (or memset) call for large regions, so detecting the largest possible regions would be needed. But then we need a better data structure here to avoid the quadraticness. For the testcase: char a[100]; void foo () { a[0] = 1; a[1] = 2; a[999] = 3; } and similar we don't have quadratic behaviour because constant stores to a constant offset from base (i.e. stores valid for merging) get automatically inserted into the chain without doing the alias checking (we keep track of their program order so that overlapping stores are merged properly). Since pushing into a vec is not a linear operation we're fine. Once I teach it to quickly terminate chains when encountering stores to base with a variable offset (like you suggest above) there is only one case of quadraticness: We walk the stores in the chain (causing quadratic behaviour) when among the valid mergeable stores we have interspersed stores to a constant offset from base but that do not store a constant value e.g. a[5] = x; So the input needs to be something like: char a[16000]; void foo (char x) { a[0] = 1; a[8000 + 0] = x; a[1] = 1; a[8000 + 1] = x; a[2] = 1; a[8000 + 2] = x; a[3] = 1; a[8000 + 3] = x; a[4] = 1; a[8000 + 4] = x; ... a[7999] = 1; a[8000 + 7999] = x; } I've reproduced a slowdown (with store merging taking a disproportional time in -ftime-report) with this case and I've verified that limiting the maximum number of statements in the chain fixes it. But that is not implemented yet (though I have experimented with it) so I can add a limit here. Should I just hardcode a limit or should I make it into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)? A param is preferred. Thanks. I'll introduce the limit but, as mentioned above, the simple case of inserting multiple stores is not quadratic. Kyrill + bit_off = byte_off << LOG2_BITS_PER_UNIT; + if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off)) + { + bitpos += bit_off.to_shwi (); + I think you want bit_off += bitpos before the fits_shwi check otherwise this add may still overflow. + base_addr = copy_node (base_addr); + TREE_OPERAND (base_addr, 1) + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( + base_addr,
Re: [PATCH][v6] GIMPLE store merging pass
On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachovwrote: >Hi Richard, > >On 21/10/16 13:37, Richard Biener wrote: >> On Tue, 18 Oct 2016, Kyrill Tkachov wrote: >> >>> Hi Richard, >>> >>> This patch is a merge of [1] and [2] and implements the manual >merging of >>> bitfields >>> as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN >too. >>> It caused me a lot of headeache because the bit offset is counted >from the >>> most significant bit >>> in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN >looks >>> irrelevant for store merging >>> anyway as it's just used to described RTL extract operations). >>> I've included ASCII diagrams of the steps in the merging algorithm. >> Heh, thanks. >> >>> Bootstrapped and tested on arm-none-linux-gnueabihf, >aarch64-none-linux-gnu, >>> x86_64-unknown-linux-gnu. >>> Also tested on aarch64_be-none-elf. >>> >>> How does this version look now? >> Mostly good. For >> >> +bool >> +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree >base, >> + gimple *stmt) >> +{ >> ... >> + /* Check if the assignment destination (BASE) is part of a store >chain. >> + This is to catch non-constant stores to destinations that may >be >> part >> + of a chain. */ >> + if (base) >> +{ >> + chain_info = m_stores.get (base); >> + if (chain_info) >> + { >> + struct store_immediate_info *info; >> + unsigned int i; >> + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) >> + { >> + if (refs_may_alias_p (info->dest, dest)) >> + { >> >> I suppose the chain is not yet sorted in any way? >> >> At least for 'dest' which do not have a known constant offset we >> could do >> >> if (base) >> terminate_and_release_chain (base); > >Do you mean when get_inner_reference returns non-NULL for POFFSET? Yes. >Or do you think we should try to look into dest in this function? > >> to speed things up? IIRC we do not terminate chains early in >> this phase when we have enough stores to form a group, so >> writing a testcase that triggers quadraticness would be as simple >> as having >> >> char a[100]; >> >> void foo () >> { >> a[0] = 1; >> a[1] = 2; >> >> a[999] = 3; >> } >> >> ? >> >> so I think you probably want to limit the number of stores you >> ever put onto a chain and if you reach that limit, terminate >> and release it? Like just choose 16 or 64? (and experiment >> with the above kind of testcases) > >I was initially thinking of imposing such a limit as well but >later I thought we'd want to extend the output code to be able to emit >a memcpy (or memset) call for large regions, so detecting the largest >possible >regions would be needed. But then we need a better data structure here to avoid the quadraticness. But that is not implemented yet (though I have >experimented >with it) so I can add a limit here. Should I just hardcode a limit or >should I make it >into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)? A param is preferred. >> >> + bit_off = byte_off << LOG2_BITS_PER_UNIT; >> + if (!wi::neg_p (bit_off) && wi::fits_shwi_p >(bit_off)) >> + { >> + bitpos += bit_off.to_shwi (); >> + >> >> I think you want bit_off += bitpos before the fits_shwi check >> otherwise this add may still overflow. >> >> + base_addr = copy_node (base_addr); >> + TREE_OPERAND (base_addr, 1) >> + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( >> + base_addr, >1))); >> >> I'd prefer >> >>base_addr = build2 (MEM_REF, ...); >> >> here. > >Thanks for the feedback, >Kyrill > >> >> Thanks, >> Richard. >> >>> Thanks, >>> Kyrill >>> >>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html >>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html >>> >>> 2016-10-18 Kyrylo Tkachov >>> >>> PR middle-end/22141 >>> * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. >>> * common.opt (fstore-merging): New Optimization option. >>> * opts.c (default_options_table): Add entry for >>> OPT_ftree_store_merging. >>> * fold-const.h (can_native_encode_type_p): Declare prototype. >>> * fold-const.c (can_native_encode_type_p): Define. >>> * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. >>> * passes.def: Insert pass_tree_store_merging. >>> * tree-pass.h (make_pass_store_merging): Declare extern >>> prototype. >>> * gimple-ssa-store-merging.c: New file. >>> * doc/invoke.texi (Optimization Options): Document >>> -fstore-merging. >>> >>> 2016-10-18 Kyrylo Tkachov >>> Jakub Jelinek >>> Andrew Pinski
Re: [PATCH][v6] GIMPLE store merging pass
Hi Richard, On 21/10/16 13:37, Richard Biener wrote: On Tue, 18 Oct 2016, Kyrill Tkachov wrote: Hi Richard, This patch is a merge of [1] and [2] and implements the manual merging of bitfields as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too. It caused me a lot of headeache because the bit offset is counted from the most significant bit in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks irrelevant for store merging anyway as it's just used to described RTL extract operations). I've included ASCII diagrams of the steps in the merging algorithm. Heh, thanks. Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-unknown-linux-gnu. Also tested on aarch64_be-none-elf. How does this version look now? Mostly good. For +bool +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base, + gimple *stmt) +{ ... + /* Check if the assignment destination (BASE) is part of a store chain. + This is to catch non-constant stores to destinations that may be part + of a chain. */ + if (base) +{ + chain_info = m_stores.get (base); + if (chain_info) + { + struct store_immediate_info *info; + unsigned int i; + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) + { + if (refs_may_alias_p (info->dest, dest)) + { I suppose the chain is not yet sorted in any way? At least for 'dest' which do not have a known constant offset we could do if (base) terminate_and_release_chain (base); Do you mean when get_inner_reference returns non-NULL for POFFSET? Or do you think we should try to look into dest in this function? to speed things up? IIRC we do not terminate chains early in this phase when we have enough stores to form a group, so writing a testcase that triggers quadraticness would be as simple as having char a[100]; void foo () { a[0] = 1; a[1] = 2; a[999] = 3; } ? so I think you probably want to limit the number of stores you ever put onto a chain and if you reach that limit, terminate and release it? Like just choose 16 or 64? (and experiment with the above kind of testcases) I was initially thinking of imposing such a limit as well but later I thought we'd want to extend the output code to be able to emit a memcpy (or memset) call for large regions, so detecting the largest possible regions would be needed. But that is not implemented yet (though I have experimented with it) so I can add a limit here. Should I just hardcode a limit or should I make it into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)? + bit_off = byte_off << LOG2_BITS_PER_UNIT; + if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off)) + { + bitpos += bit_off.to_shwi (); + I think you want bit_off += bitpos before the fits_shwi check otherwise this add may still overflow. + base_addr = copy_node (base_addr); + TREE_OPERAND (base_addr, 1) + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( + base_addr, 1))); I'd prefer base_addr = build2 (MEM_REF, ...); here. Thanks for the feedback, Kyrill Thanks, Richard. Thanks, Kyrill [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html 2016-10-18 Kyrylo TkachovPR middle-end/22141 * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. * common.opt (fstore-merging): New Optimization option. * opts.c (default_options_table): Add entry for OPT_ftree_store_merging. * fold-const.h (can_native_encode_type_p): Declare prototype. * fold-const.c (can_native_encode_type_p): Define. * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. * passes.def: Insert pass_tree_store_merging. * tree-pass.h (make_pass_store_merging): Declare extern prototype. * gimple-ssa-store-merging.c: New file. * doc/invoke.texi (Optimization Options): Document -fstore-merging. 2016-10-18 Kyrylo Tkachov Jakub Jelinek Andrew Pinski PR middle-end/22141 PR rtl-optimization/23684 * gcc.c-torture/execute/pr22141-1.c: New test. * gcc.c-torture/execute/pr22141-2.c: Likewise. * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. * gcc.target/aarch64/ldp_stp_4.c: Likewise. * gcc.dg/store_merging_1.c: New test. * gcc.dg/store_merging_2.c: Likewise. * gcc.dg/store_merging_3.c: Likewise. * gcc.dg/store_merging_4.c: Likewise. * gcc.dg/store_merging_5.c: Likewise. * gcc.dg/store_merging_6.c: Likewise. * gcc.dg/store_merging_7.c: Likewise. *
Re: [PATCH][v6] GIMPLE store merging pass
On Tue, 18 Oct 2016, Kyrill Tkachov wrote: > Hi Richard, > > This patch is a merge of [1] and [2] and implements the manual merging of > bitfields > as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too. > It caused me a lot of headeache because the bit offset is counted from the > most significant bit > in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks > irrelevant for store merging > anyway as it's just used to described RTL extract operations). > I've included ASCII diagrams of the steps in the merging algorithm. Heh, thanks. > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, > x86_64-unknown-linux-gnu. > Also tested on aarch64_be-none-elf. > > How does this version look now? Mostly good. For +bool +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base, + gimple *stmt) +{ ... + /* Check if the assignment destination (BASE) is part of a store chain. + This is to catch non-constant stores to destinations that may be part + of a chain. */ + if (base) +{ + chain_info = m_stores.get (base); + if (chain_info) + { + struct store_immediate_info *info; + unsigned int i; + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) + { + if (refs_may_alias_p (info->dest, dest)) + { I suppose the chain is not yet sorted in any way? At least for 'dest' which do not have a known constant offset we could do if (base) terminate_and_release_chain (base); to speed things up? IIRC we do not terminate chains early in this phase when we have enough stores to form a group, so writing a testcase that triggers quadraticness would be as simple as having char a[100]; void foo () { a[0] = 1; a[1] = 2; a[999] = 3; } ? so I think you probably want to limit the number of stores you ever put onto a chain and if you reach that limit, terminate and release it? Like just choose 16 or 64? (and experiment with the above kind of testcases) + bit_off = byte_off << LOG2_BITS_PER_UNIT; + if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off)) + { + bitpos += bit_off.to_shwi (); + I think you want bit_off += bitpos before the fits_shwi check otherwise this add may still overflow. + base_addr = copy_node (base_addr); + TREE_OPERAND (base_addr, 1) + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( + base_addr, 1))); I'd prefer base_addr = build2 (MEM_REF, ...); here. Thanks, Richard. > Thanks, > Kyrill > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html > [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html > > 2016-10-18 Kyrylo Tkachov> > PR middle-end/22141 > * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. > * common.opt (fstore-merging): New Optimization option. > * opts.c (default_options_table): Add entry for > OPT_ftree_store_merging. > * fold-const.h (can_native_encode_type_p): Declare prototype. > * fold-const.c (can_native_encode_type_p): Define. > * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. > * passes.def: Insert pass_tree_store_merging. > * tree-pass.h (make_pass_store_merging): Declare extern > prototype. > * gimple-ssa-store-merging.c: New file. > * doc/invoke.texi (Optimization Options): Document > -fstore-merging. > > 2016-10-18 Kyrylo Tkachov > Jakub Jelinek > Andrew Pinski > > PR middle-end/22141 > PR rtl-optimization/23684 > * gcc.c-torture/execute/pr22141-1.c: New test. > * gcc.c-torture/execute/pr22141-2.c: Likewise. > * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. > * gcc.target/aarch64/ldp_stp_4.c: Likewise. > * gcc.dg/store_merging_1.c: New test. > * gcc.dg/store_merging_2.c: Likewise. > * gcc.dg/store_merging_3.c: Likewise. > * gcc.dg/store_merging_4.c: Likewise. > * gcc.dg/store_merging_5.c: Likewise. > * gcc.dg/store_merging_6.c: Likewise. > * gcc.dg/store_merging_7.c: Likewise. > * gcc.target/i386/pr22141.c: Likewise. > * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options. > * g++.dg/init/new17.C: Likewise. > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][v6] GIMPLE store merging pass
Hi Richard, This patch is a merge of [1] and [2] and implements the manual merging of bitfields as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too. It caused me a lot of headeache because the bit offset is counted from the most significant bit in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks irrelevant for store merging anyway as it's just used to described RTL extract operations). I've included ASCII diagrams of the steps in the merging algorithm. Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, x86_64-unknown-linux-gnu. Also tested on aarch64_be-none-elf. How does this version look now? Thanks, Kyrill [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html 2016-10-18 Kyrylo TkachovPR middle-end/22141 * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. * common.opt (fstore-merging): New Optimization option. * opts.c (default_options_table): Add entry for OPT_ftree_store_merging. * fold-const.h (can_native_encode_type_p): Declare prototype. * fold-const.c (can_native_encode_type_p): Define. * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. * passes.def: Insert pass_tree_store_merging. * tree-pass.h (make_pass_store_merging): Declare extern prototype. * gimple-ssa-store-merging.c: New file. * doc/invoke.texi (Optimization Options): Document -fstore-merging. 2016-10-18 Kyrylo Tkachov Jakub Jelinek Andrew Pinski PR middle-end/22141 PR rtl-optimization/23684 * gcc.c-torture/execute/pr22141-1.c: New test. * gcc.c-torture/execute/pr22141-2.c: Likewise. * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. * gcc.target/aarch64/ldp_stp_4.c: Likewise. * gcc.dg/store_merging_1.c: New test. * gcc.dg/store_merging_2.c: Likewise. * gcc.dg/store_merging_3.c: Likewise. * gcc.dg/store_merging_4.c: Likewise. * gcc.dg/store_merging_5.c: Likewise. * gcc.dg/store_merging_6.c: Likewise. * gcc.dg/store_merging_7.c: Likewise. * gcc.target/i386/pr22141.c: Likewise. * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options. * g++.dg/init/new17.C: Likewise. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d6e44e4b77990b9cd48ce31f657e48c44191ade1..02632ea70483224623ac7cb2b97d18686d3a4a3c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1296,6 +1296,7 @@ OBJS = \ gimple-ssa-isolate-paths.o \ gimple-ssa-nonnull-compare.o \ gimple-ssa-split-paths.o \ + gimple-ssa-store-merging.o \ gimple-ssa-strength-reduction.o \ gimple-ssa-sprintf.o \ gimple-streamer-in.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 6f24f568d313bbf8f35e8809639ba4d976b57765..f5dc86b6376ded8f2f77313d75f41572eb0e77ad 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1463,6 +1463,10 @@ fstrict-volatile-bitfields Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization Force bitfield accesses to match their type width. +fstore-merging +Common Report Var(flag_store_merging) Optimization +Merge adjacent stores. + fguess-branch-probability Common Report Var(flag_guess_branch_prob) Optimization Enable guessing of branch probabilities. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 795ad1b63a914de50113cd83a1eb7094306f0634..85cedf87cf135e5eb7d53a6ebfd308e19dec7830 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -404,7 +404,7 @@ Objective-C and Objective-C++ Dialects}. -fsingle-precision-constant -fsplit-ivs-in-unroller @gol -fsplit-paths @gol -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol --fstdarg-opt -fstrict-aliasing @gol +-fstdarg-opt -fstore-merging -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol @@ -415,8 +415,8 @@ Objective-C and Objective-C++ Dialects}. -ftree-loop-vectorize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol --ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol --ftree-vectorize -ftree-vrp -funconstrained-commons @gol +-ftree-switch-conversion -ftree-tail-merge @gol +-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol @@ -6668,6 +6668,7 @@ compilation time. -fsplit-wide-types @gol -fssa-backprop @gol -fssa-phiopt @gol +-fstore-merging @gol -ftree-bit-ccp @gol -ftree-ccp @gol -ftree-ch @gol @@ -8010,6 +8011,13 @@ Perform scalar replacement of aggregates. This pass replaces structure references with