Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
2011-12-15 Enkovich Ilya ilya.enkov...@intel.com PR target/50038 * implicit-zee.c: Delete. * ree.c: New file. * Makefile.in: Replace implicit-zee.c with ree.c. * config/i386/i386.c (ix86_option_override_internal): Set flag_ree for 32 bit platform. * common.opt (fzee): Ignored. (free): New. * passes.c (init_optimization_passes): Replace pass_implicit_zee with pass_ree. * tree-pass.h (pass_implicit_zee): Delete. (pass_ree): New. * timevar.def (TV_ZEE): Delete. (TV_REE): New. * doc/invoke.texi: Add -free description. The patch is OK (modulo the following nits) but I cannot approve the pure x86 part. So you can install it without this part (i.e. just rename flag_zee to flag_ree in ix86_option_override_internal with a corresponding ChangeLog) if you also install at least one testcase (e.g. for the x86-64 architecture) that is fixed by the patch in the testsuite. Then repost the pure x86 part with at least one testcase for the x86 architecture and ask for approval by a x86 maintainer. Now on to the couple of nits: @@ -6708,6 +6708,14 @@ Perform a number of minor optimizations that are relatively expensive. Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}. +@item -free +@opindex free +Attempt to remove redundant extension instructions. This is especially The double space after period is also meant for the .texi files. +helpful for x86-64 architecture which implicitly zero-extend in 64-bit ...for the x86-64 architecture... zero-extends ... +registers after writing to their lower 32-bit half. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Here is a patch wich introduces new pass 'ree' based on pass 'implicit_zee' as was discussed above. Thanks. 2011-11-22 Enkovich Ilya ilya.enkov...@intel.com PR target/50038 * implicit-zee.c: Delete. * ree.c: New file. * Makefile.in: Replace implicit-zee.c with ree.c. * i386.c (ix86_option_override_internal): Set flag_ree for 32 bit platform. * config/i386/i386.c (ix86_option_override_internal): ... * common.opt (fzee): Ignored. (free): New. * passes.c (init_optimization_passes): Replace pass_implicit_zee with pass_ree. * tree-pass.h (pass_implicit_zee): Delete. (pass_ree): New. * timevar.def (TV_ZEE): Delete. (TV_REE): New. It would be nice to add something to doc/invoke.texi about -free. The patch is mostly OK, but a few changes are required: +/* Problem Description : + + This pass is intended to remove redundant extension instructions. + Such instructions appeare for different reasons. We expect some of appear without terminal 'e'. + them due to implicit zero-extend in 64-bit registers after writing to zero-extension + their lower 32-bit half (as in x86_64 arch). (e.g. for the x86-64 architecture). + Another possible reason is a type cast which follows load (for instance + register restore) which can be combined into single instruction in the + most cases. Another possible reason is a type cast which follows a load (for instance a register restore) and which can be combined into a single instruction, and for which earlier local passes, e.g. the combiner, weren't able to optimize. + extension instruction that could possibly be redundant. Such extension double space after the period + For example, in x86_64, implicit zero-extensions are captured with For example, for the x86-64 architecture, implicit... + Architectures like x86_64 support conditional moves whose semantics for x86-64 + Basic ZEE pass reported reduction of the dynamic instruction count of a Let's use the wording The original redundant zero-extension elimination pass + The most performance gain from REE pass in addition to ZEE pass is expected The additional performance gain with the enhanced pass is mostly expected... +/* This structure is used to hold data about candidate for + elimination. */ + +typedef struct ext_cand +{ + rtx insn; + const_rtx ext_expr; No need to repeat the ext_ prefix, const_rtx expr; is fine. + enum machine_mode src_mode; +} *ext_cand_t; + +static alloc_pool ext_cand_pool; +/* Carry information about extensions while walking the RTL. */ + +DEF_VEC_P(ext_cand_t); +DEF_VEC_ALLOC_P(ext_cand_t, heap); The combination of a pool with a heap-allocated vector of pointers looks a little convoluted. Can't you use a vector of objects (DEF_VEC_O) directly? +/* Returns the merge code status for INSN. */ + +static enum insn_merge_code +get_insn_status (rtx insn) I know this was in the original file, but in the head comment of functions, this should be /* Return the merge code status for INSN. */ +/* Sets the merge code status of INSN to CODE. */ /* Set the merge code status of INSN to CODE. */ and so on. +/* Given a insn (CURR_INSN) and a pointer to the SET rtx (ORIG_SET) + that needs to be modified, this code modifies the SET rtx to a + new SET rtx that extends the right hand expression into a + register (NEWREG) on the left hand side. Note that multiple + assumptions are made about the nature of the set that needs + to be true for this to work and is called from merge_def_and_ext. + + Original : + (set (reg a) (expression)) + + Transform : + (set (reg a) (extend (expression))) + + Special Cases : + If the expression is a constant or another extend directly + assign it to the register. */ + +static bool +combine_set_extend (ext_cand_t cand, rtx curr_insn, rtx *orig_set) The head comment is outdated: NEWREG is gone and CAND has appeared. + /* Merge constants by directly moving the constant into the + register under some conditions. */ + + if (GET_CODE (orig_src) == CONST_INT + HOST_BITS_PER_WIDE_INT = GET_MODE_BITSIZE (dst_mode)) +{ + if (INTVAL (orig_src) = 0) + new_set = gen_rtx_SET (VOIDmode, newreg, orig_src); + else if (GET_CODE (orig_src) == ZERO_EXTEND) + { + /* Zero-extending a negative SImode integer into DImode +makes it a positive integer. Convert the given negative +integer into the appropriate integer when zero-extended. */ + + delta_width = HOST_BITS_PER_WIDE_INT - GET_MODE_BITSIZE (SImode); + mask = (~(unsigned HOST_WIDE_INT) 0) delta_width; + val = INTVAL (orig_src); + val = val mask; + new_const_int = gen_rtx_CONST_INT (VOIDmode, val); + new_set = gen_rtx_SET (VOIDmode, newreg, new_const_int); + } + else if (GET_CODE (orig_src) == SIGN_EXTEND) +
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Ping 2011/11/22 Ilya Enkovich enkovich@gmail.com: 2011/11/11 Eric Botcazou ebotca...@adacore.com: I have already signed copyright agreement with the FSF. Will I need the separate one for this particular commit? No, if your contributions are already covered by a copyright agreement with the FSF, nothing more needs to be done. -- Eric Botcazou Hello, Here is a patch wich introduces new pass 'ree' based on pass 'implicit_zee' as was discussed above. Could you please review it? Bootstrapped and checked on linux-x86_64. Thanks, Ilya --- 2011-11-22 Enkovich Ilya ilya.enkov...@intel.com PR target/50038 * implicit-zee.c: Delete. * ree.c: New file. * Makefile.in: Replace implicit-zee.c with ree.c. * i386.c (ix86_option_override_internal): Set flag_ree for 32 bit platform. * common.opt (fzee): Ignored. (free): New. * passes.c (init_optimization_passes): Replace pass_implicit_zee with pass_ree. * tree-pass.h (pass_implicit_zee): Delete. (pass_ree): New. * timevar.def (TV_ZEE): Delete. (TV_REE): New.
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Eric, I will follow up while Ilya is on vacation. I can see only one patch along the dicussion so I will use it, making changes to follow phase renaming and documentation? I am covered by FSF agreement too, on the same Intel's list as Ilya. regards, Sergos On Fri, Nov 11, 2011 at 2:12 PM, Eric Botcazou ebotca...@adacore.com wrote: I have already signed copyright agreement with the FSF. Will I need the separate one for this particular commit? No, if your contributions are already covered by a copyright agreement with the FSF, nothing more needs to be done. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Great! I'll be back with patch covering all non functional changes. Will it be OK to have everything in one patch (including current functional changes) or I should split it? Let's also rename the file while we are at it. I'd suggest Redundant Extension Elimination for the name of the pass, so ree.c for the filename (suggestions for a more descriptive name welcome). So we rename implicit-zee.c into ree.c and add a new header along these lines: /* Redundant Extension Elimination pass for the GNU compiler. Copyright (C) 2010-2011 Free Software Foundation, Inc. Contributed by you Based on the Redundant Zero-extension elimination pass contributed by Sriraman Tallam (tmsri...@google.com) and Silvius Rus (r...@google.com). This file is part of GCC. [...] The general comment must be adjusted: Problem Description extended, How does this pass work adjusted and so on. Hardcoded references to zero-extensions and specific pair of modes, both in the comment and the function names, must be eliminated. Since implicit-zee.c has essentially no revision history (the original commit + a patch of mine to fix rough edges), let's pretend we start from scratch, so the ChangeLog will be * implicit-zee.c: Delete. * ree.c: New file. [...] and you post the complete file. I'll do the review. Last but not least, since this is a significant contribution, you need to have a copyright assignment on file with the FSF in order for us to accept it. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Hello Eric, 2011/11/11 Eric Botcazou ebotca...@adacore.com: Great! I'll be back with patch covering all non functional changes. Will it be OK to have everything in one patch (including current functional changes) or I should split it? Let's also rename the file while we are at it. I'd suggest Redundant Extension Elimination for the name of the pass, so ree.c for the filename (suggestions for a more descriptive name welcome). So we rename implicit-zee.c into ree.c and add a new header along these lines: /* Redundant Extension Elimination pass for the GNU compiler. Copyright (C) 2010-2011 Free Software Foundation, Inc. Contributed by you Based on the Redundant Zero-extension elimination pass contributed by Sriraman Tallam (tmsri...@google.com) and Silvius Rus (r...@google.com). This file is part of GCC. [...] The general comment must be adjusted: Problem Description extended, How does this pass work adjusted and so on. Hardcoded references to zero-extensions and specific pair of modes, both in the comment and the function names, must be eliminated. Since implicit-zee.c has essentially no revision history (the original commit + a patch of mine to fix rough edges), let's pretend we start from scratch, so the ChangeLog will be * implicit-zee.c: Delete. * ree.c: New file. [...] and you post the complete file. I'll do the review. Last but not least, since this is a significant contribution, you need to have a copyright assignment on file with the FSF in order for us to accept it. -- Eric Botcazou Thanks for tips! I'll do these changes after my two weeks vacation or my colleges will cover me. I have already signed copyright agreement with the FSF. Will I need the separate one for this particular commit? Thanks Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
I have already signed copyright agreement with the FSF. Will I need the separate one for this particular commit? No, if your contributions are already covered by a copyright agreement with the FSF, nothing more needs to be done. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Initial aim of the pass was to remove zero extentions redundant due to implicit zero extention in x64. But implementation actually uses generic approach and seems like a mini-combiner. Pass may combine two zero extends or combine zero extend with a constant as a special case but in other cases we just try to merge two instructions and then check we have corresponding template. It can be easily adopted to remove all redundant extensions. So, byte add in the example will be merged with zxero extend only if we have explicit template for it in machine model. OK. In this particular test case combiner may also help because we have byte memory load and extend on combiner pass. But due to some reason it does not merge them. In combiner dump I see (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} (nil)) (insn 43 42 44 4 (parallel [ (set (reg:SI 116 [ xr ]) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil The pseudo-register (reg/v/f:DI 111) is changed between insn 39 and insn 43 so can_combine_p returns 0. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
So, what about the patch? I think since we already have zee patch it would be great to use it as more general optimization. I tested it on EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32 bit which is really good for such simple optimization. For OOO archs patch is not so critical but still makes code cleaner The patch cannot be accepted as-is since it doesn't update a single bit of the documentation present in implicit-zee.c. The authors have made the effort of thoroughly documenting their code so it shouldn't be wasted. Therefore, at a minimum, the documentation must be overhauled the same way the code will be. I agree that the numbers are encouraging. Moreover, the narrow specialization of the pass was critized when it was added so a generalization will probably be welcome. So, unless other developers object, let's do it, but correctly, that is to say, let's rename the pass, eliminate all the hardcoded references to implicit zero-extensions in the code and turn it into a generic elimination of redundant extensions pass. -- Eric Botcazou
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Hello Eric, Thanks for review! 2011/11/10 Eric Botcazou ebotca...@adacore.com: So, what about the patch? I think since we already have zee patch it would be great to use it as more general optimization. I tested it on EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32 bit which is really good for such simple optimization. For OOO archs patch is not so critical but still makes code cleaner The patch cannot be accepted as-is since it doesn't update a single bit of the documentation present in implicit-zee.c. The authors have made the effort of thoroughly documenting their code so it shouldn't be wasted. Therefore, at a minimum, the documentation must be overhauled the same way the code will be. I agree that the numbers are encouraging. Moreover, the narrow specialization of the pass was critized when it was added so a generalization will probably be welcome. So, unless other developers object, let's do it, but correctly, that is to say, let's rename the pass, eliminate all the hardcoded references to implicit zero-extensions in the code and turn it into a generic elimination of redundant extensions pass. Great! I'll be back with patch covering all non functional changes. Will it be OK to have everything in one patch (including current functional changes) or I should split it? -- Eric Botcazou Thanks, Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Hello guys, So, what about the patch? I think since we already have zee patch it would be great to use it as more general optimization. I tested it on EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32 bit which is really good for such simple optimization. For OOO archs patch is not so critical but still makes code cleaner Thanks, Ilya 2011/11/6 Ilya Enkovich enkovich@gmail.com: 2011/11/6 Richard Guenther richard.guent...@gmail.com: On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com wrote: Hello, 2011/11/5 Eric Botcazou ebotca...@adacore.com: Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Could you explain the undelying idea? The current strategy of implicit-zee.c is exposed at length at the beginning of the file, but here's a summary: 1. On some architectures (typically x86-64), implicity zero-extensions are applied when instructions operate in selected sub-word modes (SImode here): addl edi,eax has an implicit zero-extension for %rax. 2. Because of 1, the second instruction in sequences like: (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) (set (reg:DI x) (zero_extend:DI (reg:SI x))) is redundant. 3. The pass recognizes this and transforms the above sequence into: (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2 and the machine description knows how to translate this into an 'addl'. You're proposing extending this to other modes and other architectures, for example QImode on x86. But does addb %dl, %al modify the entire %eax register on x86? In other words, are you really after implicit (zero-)extensions or after something else, like global elimination of redundant extensions? Initial aim of the pass was to remove zero extentions redundant due to implicit zero extention in x64. But implementation actually uses generic approach and seems like a mini-combiner. Pass may combine two zero extends or combine zero extend with a constant as a special case but in other cases we just try to merge two instructions and then check we have corresponding template. It can be easily adopted to remove all redundant extensions. So, byte add in the example will be merged with zxero extend only if we have explicit template for it in machine model. What's the effect of the patch on the testcase in the PR in terms of insns at the RTL level? Why doesn't the combiner already optimize it? The patch helps to remove two zero extends from RTL in the test from PR. I believe zee pass was introduced after postreload pass because we should have additional memory instructions by that time and therefore more opportunities for optimization after combiner work. In this particular test case combiner may also help because we have byte memory load and extend on combiner pass. But due to some reason it does not merge them. In combiner dump I see (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} (nil)) (insn 43 42 44 4 (parallel [ (set (reg:SI 116 [ xr ]) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil and Trying 39 - 43: With no additional information. Well, I bet it's because of the CC clobber which is there because of the use of TARGET_ZERO_EXTEND_WITH_AND. Where does that insn get generated? By combine itself? This insn is generated by expand: (insn 43 42 44 6 (parallel [ (set (reg:SI 116) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 -1 (nil)) Richard. Enhancing implicit-zee.c to address missed optimizations like the one reported in target/50038 might well be the best approach, but the strategy shift must be clearly exposed and discussed. The reported numbers are certainly impressive. -- Eric Botcazou Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Hello, 2011/11/5 Eric Botcazou ebotca...@adacore.com: Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Could you explain the undelying idea? The current strategy of implicit-zee.c is exposed at length at the beginning of the file, but here's a summary: 1. On some architectures (typically x86-64), implicity zero-extensions are applied when instructions operate in selected sub-word modes (SImode here): addl edi,eax has an implicit zero-extension for %rax. 2. Because of 1, the second instruction in sequences like: (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) (set (reg:DI x) (zero_extend:DI (reg:SI x))) is redundant. 3. The pass recognizes this and transforms the above sequence into: (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2 and the machine description knows how to translate this into an 'addl'. You're proposing extending this to other modes and other architectures, for example QImode on x86. But does addb %dl, %al modify the entire %eax register on x86? In other words, are you really after implicit (zero-)extensions or after something else, like global elimination of redundant extensions? Initial aim of the pass was to remove zero extentions redundant due to implicit zero extention in x64. But implementation actually uses generic approach and seems like a mini-combiner. Pass may combine two zero extends or combine zero extend with a constant as a special case but in other cases we just try to merge two instructions and then check we have corresponding template. It can be easily adopted to remove all redundant extensions. So, byte add in the example will be merged with zxero extend only if we have explicit template for it in machine model. What's the effect of the patch on the testcase in the PR in terms of insns at the RTL level? Why doesn't the combiner already optimize it? The patch helps to remove two zero extends from RTL in the test from PR. I believe zee pass was introduced after postreload pass because we should have additional memory instructions by that time and therefore more opportunities for optimization after combiner work. In this particular test case combiner may also help because we have byte memory load and extend on combiner pass. But due to some reason it does not merge them. In combiner dump I see (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} (nil)) (insn 43 42 44 4 (parallel [ (set (reg:SI 116 [ xr ]) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil and Trying 39 - 43: With no additional information. Enhancing implicit-zee.c to address missed optimizations like the one reported in target/50038 might well be the best approach, but the strategy shift must be clearly exposed and discussed. The reported numbers are certainly impressive. -- Eric Botcazou Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com wrote: Hello, 2011/11/5 Eric Botcazou ebotca...@adacore.com: Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Could you explain the undelying idea? The current strategy of implicit-zee.c is exposed at length at the beginning of the file, but here's a summary: 1. On some architectures (typically x86-64), implicity zero-extensions are applied when instructions operate in selected sub-word modes (SImode here): addl edi,eax has an implicit zero-extension for %rax. 2. Because of 1, the second instruction in sequences like: (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) (set (reg:DI x) (zero_extend:DI (reg:SI x))) is redundant. 3. The pass recognizes this and transforms the above sequence into: (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2 and the machine description knows how to translate this into an 'addl'. You're proposing extending this to other modes and other architectures, for example QImode on x86. But does addb %dl, %al modify the entire %eax register on x86? In other words, are you really after implicit (zero-)extensions or after something else, like global elimination of redundant extensions? Initial aim of the pass was to remove zero extentions redundant due to implicit zero extention in x64. But implementation actually uses generic approach and seems like a mini-combiner. Pass may combine two zero extends or combine zero extend with a constant as a special case but in other cases we just try to merge two instructions and then check we have corresponding template. It can be easily adopted to remove all redundant extensions. So, byte add in the example will be merged with zxero extend only if we have explicit template for it in machine model. What's the effect of the patch on the testcase in the PR in terms of insns at the RTL level? Why doesn't the combiner already optimize it? The patch helps to remove two zero extends from RTL in the test from PR. I believe zee pass was introduced after postreload pass because we should have additional memory instructions by that time and therefore more opportunities for optimization after combiner work. In this particular test case combiner may also help because we have byte memory load and extend on combiner pass. But due to some reason it does not merge them. In combiner dump I see (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} (nil)) (insn 43 42 44 4 (parallel [ (set (reg:SI 116 [ xr ]) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil and Trying 39 - 43: With no additional information. Well, I bet it's because of the CC clobber which is there because of the use of TARGET_ZERO_EXTEND_WITH_AND. Where does that insn get generated? By combine itself? Richard. Enhancing implicit-zee.c to address missed optimizations like the one reported in target/50038 might well be the best approach, but the strategy shift must be clearly exposed and discussed. The reported numbers are certainly impressive. -- Eric Botcazou Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
2011/11/6 Richard Guenther richard.guent...@gmail.com: On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com wrote: Hello, 2011/11/5 Eric Botcazou ebotca...@adacore.com: Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Could you explain the undelying idea? The current strategy of implicit-zee.c is exposed at length at the beginning of the file, but here's a summary: 1. On some architectures (typically x86-64), implicity zero-extensions are applied when instructions operate in selected sub-word modes (SImode here): addl edi,eax has an implicit zero-extension for %rax. 2. Because of 1, the second instruction in sequences like: (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) (set (reg:DI x) (zero_extend:DI (reg:SI x))) is redundant. 3. The pass recognizes this and transforms the above sequence into: (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2 and the machine description knows how to translate this into an 'addl'. You're proposing extending this to other modes and other architectures, for example QImode on x86. But does addb %dl, %al modify the entire %eax register on x86? In other words, are you really after implicit (zero-)extensions or after something else, like global elimination of redundant extensions? Initial aim of the pass was to remove zero extentions redundant due to implicit zero extention in x64. But implementation actually uses generic approach and seems like a mini-combiner. Pass may combine two zero extends or combine zero extend with a constant as a special case but in other cases we just try to merge two instructions and then check we have corresponding template. It can be easily adopted to remove all redundant extensions. So, byte add in the example will be merged with zxero extend only if we have explicit template for it in machine model. What's the effect of the patch on the testcase in the PR in terms of insns at the RTL level? Why doesn't the combiner already optimize it? The patch helps to remove two zero extends from RTL in the test from PR. I believe zee pass was introduced after postreload pass because we should have additional memory instructions by that time and therefore more opportunities for optimization after combiner work. In this particular test case combiner may also help because we have byte memory load and extend on combiner pass. But due to some reason it does not merge them. In combiner dump I see (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} (nil)) (insn 43 42 44 4 (parallel [ (set (reg:SI 116 [ xr ]) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil and Trying 39 - 43: With no additional information. Well, I bet it's because of the CC clobber which is there because of the use of TARGET_ZERO_EXTEND_WITH_AND. Where does that insn get generated? By combine itself? This insn is generated by expand: (insn 43 42 44 6 (parallel [ (set (reg:SI 116) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 -1 (nil)) Richard. Enhancing implicit-zee.c to address missed optimizations like the one reported in target/50038 might well be the best approach, but the strategy shift must be clearly exposed and discussed. The reported numbers are certainly impressive. -- Eric Botcazou Ilya
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Could you explain the undelying idea? The current strategy of implicit-zee.c is exposed at length at the beginning of the file, but here's a summary: 1. On some architectures (typically x86-64), implicity zero-extensions are applied when instructions operate in selected sub-word modes (SImode here): addl edi,eax has an implicit zero-extension for %rax. 2. Because of 1, the second instruction in sequences like: (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) (set (reg:DI x) (zero_extend:DI (reg:SI x))) is redundant. 3. The pass recognizes this and transforms the above sequence into: (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2 and the machine description knows how to translate this into an 'addl'. You're proposing extending this to other modes and other architectures, for example QImode on x86. But does addb %dl, %al modify the entire %eax register on x86? In other words, are you really after implicit (zero-)extensions or after something else, like global elimination of redundant extensions? What's the effect of the patch on the testcase in the PR in terms of insns at the RTL level? Why doesn't the combiner already optimize it? Enhancing implicit-zee.c to address missed optimizations like the one reported in target/50038 might well be the best approach, but the strategy shift must be clearly exposed and discussed. The reported numbers are certainly impressive. -- Eric Botcazou
[PATCH] PR target/50038 fix: redundant zero extensions removal
Hi, Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Bootstrapped and checked on linux-x86_64. Thanks, Ilya --- 2011-11-01 Enkovich Ilya ilya.enkov...@intel.com PR target/50038 * implicit-zee.c (ext_cand): New. (ext_cand_pool): Likewise. (add_ext_candidate): New. (zee_init): New. (zee_cleanup): New. (get_reg_di): Removed. (combine_set_zero_extend): Get extend candidate as new parameter. Now handle sign extend cases and other modes. (transform_ifelse): Likewise. (merge_def_and_ze): Likewise. (combine_reaching_defs): Change parameter type. (zero_extend_info): Changed insn_list type. (add_removable_zero_extend): Relaxed mode and code filter. (find_removable_zero_extends): Changed return type. (find_and_remove_ze): Var type changes. (rest_of_handle_zee): Init and cleanup added. * i386.c (ix86_option_override_internal): set flag_zee for 32 bit platform. PR50038.diff Description: Binary data