Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-12-16 Thread Eric Botcazou
 2011-12-15  Enkovich Ilya  ilya.enkov...@intel.com

   PR target/50038
   * implicit-zee.c: Delete.
   * ree.c: New file.
   * Makefile.in: Replace implicit-zee.c with ree.c.
   * config/i386/i386.c (ix86_option_override_internal): Set
   flag_ree for 32 bit platform.
   * common.opt (fzee): Ignored.
   (free): New.
   * passes.c (init_optimization_passes): Replace pass_implicit_zee
   with pass_ree.
   * tree-pass.h (pass_implicit_zee): Delete.
   (pass_ree): New.
   * timevar.def (TV_ZEE): Delete.
   (TV_REE): New.
   * doc/invoke.texi: Add -free description.

The patch is OK (modulo the following nits) but I cannot approve the pure x86 
part.  So you can install it without this part (i.e. just rename flag_zee to 
flag_ree in ix86_option_override_internal with a corresponding ChangeLog) if 
you also install at least one testcase (e.g. for the x86-64 architecture) that 
is fixed by the patch in the testsuite.  Then repost the pure x86 part with 
at least one testcase for the x86 architecture and ask for approval by a x86 
maintainer.

Now on to the couple of nits:

@@ -6708,6 +6708,14 @@ Perform a number of minor optimizations that are 
relatively expensive.
 
 Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -free
+@opindex free
+Attempt to remove redundant extension instructions. This is especially

The double space after period is also meant for the .texi files.

+helpful for x86-64 architecture which implicitly zero-extend in 64-bit

...for the x86-64 architecture... zero-extends ...

+registers after writing to their lower 32-bit half.

-- 
Eric Botcazou


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-12-12 Thread Eric Botcazou
 Here is a patch wich introduces new pass 'ree' based on pass
 'implicit_zee' as was discussed above.

Thanks.

 2011-11-22  Enkovich Ilya  ilya.enkov...@intel.com

   PR target/50038
   * implicit-zee.c: Delete.
   * ree.c: New file.
   * Makefile.in: Replace implicit-zee.c with ree.c.
   * i386.c (ix86_option_override_internal): Set flag_ree for
   32 bit platform.

* config/i386/i386.c (ix86_option_override_internal): ...

   * common.opt (fzee): Ignored.
   (free): New.
   * passes.c (init_optimization_passes): Replace pass_implicit_zee
   with pass_ree.
   * tree-pass.h (pass_implicit_zee): Delete.
   (pass_ree): New.
   * timevar.def (TV_ZEE): Delete.
   (TV_REE): New.

It would be nice to add something to doc/invoke.texi about -free.


The patch is mostly OK, but a few changes are required:

+/* Problem Description :
+   
+   This pass is intended to remove redundant extension instructions.
+   Such instructions appeare for different reasons.  We expect some of

appear without terminal 'e'.

+   them due to implicit zero-extend in 64-bit registers after writing to

zero-extension

+   their lower 32-bit half (as in x86_64 arch).

(e.g. for the x86-64 architecture).

+ Another possible reason is a type cast which follows load (for instance
+ register restore) which can be combined into single instruction in the
+ most cases.

Another possible reason is a type cast which follows a load (for instance
a register restore) and which can be combined into a single instruction, and
for which earlier local passes, e.g. the combiner, weren't able to optimize.

+   extension instruction that could possibly be redundant. Such extension

double space after the period

+   For example, in x86_64, implicit zero-extensions are captured with

For example, for the x86-64 architecture, implicit...

+   Architectures like x86_64 support conditional moves whose semantics for

x86-64

+   Basic ZEE pass reported reduction of the dynamic instruction count of a

Let's use the wording The original redundant zero-extension elimination pass

+   The most performance gain from REE pass in addition to ZEE pass is expected

The additional performance gain with the enhanced pass is mostly expected...


+/* This structure is used to hold data about candidate for
+   elimination.  */
+
+typedef struct ext_cand
+{
+  rtx insn;
+  const_rtx ext_expr;

No need to repeat the ext_ prefix, const_rtx expr; is fine.

+  enum machine_mode src_mode;
+} *ext_cand_t;
+
+static alloc_pool ext_cand_pool;

+/* Carry information about extensions while walking the RTL.  */
+
+DEF_VEC_P(ext_cand_t);
+DEF_VEC_ALLOC_P(ext_cand_t, heap);

The combination of a pool with a heap-allocated vector of pointers looks a 
little convoluted.  Can't you use a vector of objects (DEF_VEC_O) directly?


+/* Returns the merge code status for INSN.  */
+
+static enum insn_merge_code
+get_insn_status (rtx insn)

I know this was in the original file, but in the head comment of functions, 
this should be

/* Return the merge code status for INSN.  */

+/* Sets the merge code status of INSN to CODE.  */

/* Set the merge code status of INSN to CODE.  */

and so on.


+/* Given a insn (CURR_INSN) and a pointer to the SET rtx (ORIG_SET)
+   that needs to be modified, this code modifies the SET rtx to a
+   new SET rtx that extends the right hand expression into a
+   register (NEWREG) on the left hand side.  Note that multiple
+   assumptions are made about the nature of the set that needs
+   to be true for this to work and is called from merge_def_and_ext.
+
+   Original :
+   (set (reg a) (expression))
+
+   Transform :
+   (set (reg a) (extend (expression)))
+
+   Special Cases :
+   If the expression is a constant or another extend directly
+   assign it to the register.  */
+
+static bool
+combine_set_extend (ext_cand_t cand, rtx curr_insn, rtx *orig_set)

The head comment is outdated: NEWREG is gone and CAND has appeared.


+  /* Merge constants by directly moving the constant into the
+ register under some conditions.  */
+
+  if (GET_CODE (orig_src) == CONST_INT
+   HOST_BITS_PER_WIDE_INT = GET_MODE_BITSIZE (dst_mode))
+{
+  if (INTVAL (orig_src) = 0)
+   new_set = gen_rtx_SET (VOIDmode, newreg, orig_src);
+  else if (GET_CODE (orig_src) == ZERO_EXTEND)
+   {
+ /* Zero-extending a negative SImode integer into DImode
+makes it a positive integer.  Convert the given negative
+integer into the appropriate integer when zero-extended.  */
+
+ delta_width = HOST_BITS_PER_WIDE_INT - GET_MODE_BITSIZE (SImode);
+ mask = (~(unsigned HOST_WIDE_INT) 0)  delta_width;
+ val = INTVAL (orig_src);
+ val = val  mask;
+ new_const_int = gen_rtx_CONST_INT (VOIDmode, val);
+ new_set = gen_rtx_SET (VOIDmode, newreg, new_const_int);
+   }
+  else if (GET_CODE (orig_src) == SIGN_EXTEND)
+   

Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-30 Thread Ilya Enkovich
Ping

2011/11/22 Ilya Enkovich enkovich@gmail.com:
 2011/11/11 Eric Botcazou ebotca...@adacore.com:
 I have already signed copyright agreement with the FSF. Will I need the 
 separate one for this particular commit? No, if your contributions are 
 already covered by a copyright agreement with the FSF, nothing more needs 
 to be done. -- Eric Botcazou
 Hello,

 Here is a patch wich introduces new pass 'ree' based on pass
 'implicit_zee' as was discussed above. Could you please review it?

 Bootstrapped and checked on linux-x86_64.

 Thanks,
 Ilya
 ---
 2011-11-22  Enkovich Ilya  ilya.enkov...@intel.com

        PR target/50038
        * implicit-zee.c: Delete.
        * ree.c: New file.
        * Makefile.in: Replace implicit-zee.c with ree.c.
        * i386.c (ix86_option_override_internal): Set flag_ree for
        32 bit platform.
        * common.opt (fzee): Ignored.
        (free): New.
        * passes.c (init_optimization_passes): Replace pass_implicit_zee
        with pass_ree.
        * tree-pass.h (pass_implicit_zee): Delete.
        (pass_ree): New.
        * timevar.def (TV_ZEE): Delete.
        (TV_REE): New.



Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-16 Thread Sergey Ostanevich
Eric,

I will follow up while Ilya is on vacation.
I can see only one patch along the dicussion so I will use it, making
changes to follow phase renaming and documentation?

I am covered by FSF agreement too, on the same Intel's list as Ilya.

regards,
Sergos


On Fri, Nov 11, 2011 at 2:12 PM, Eric Botcazou ebotca...@adacore.com wrote:
 I have already signed copyright agreement with the FSF. Will I need
 the separate one for this particular commit?

 No, if your contributions are already covered by a copyright agreement with 
 the
 FSF, nothing more needs to be done.

 --
 Eric Botcazou



Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-11 Thread Eric Botcazou
 Great! I'll be back with patch covering all non functional changes.
 Will it be OK to have everything in one patch (including current
 functional changes) or I should split it?

Let's also rename the file while we are at it.  I'd suggest Redundant Extension 
Elimination for the name of the pass, so ree.c for the filename (suggestions 
for a more descriptive name welcome).  So we rename implicit-zee.c into ree.c 
and add a new header along these lines:

/* Redundant Extension Elimination pass for the GNU compiler.
   Copyright (C) 2010-2011 Free Software Foundation, Inc.
   Contributed by you

   Based on the Redundant Zero-extension elimination pass contributed by
   Sriraman Tallam (tmsri...@google.com) and Silvius Rus (r...@google.com).

   This file is part of GCC.
[...]

The general comment must be adjusted: Problem Description extended, How does 
this pass work adjusted and so on.  Hardcoded references to zero-extensions 
and specific pair of modes, both in the comment and the function names, must 
be eliminated.

Since implicit-zee.c has essentially no revision history (the original commit + 
a patch of mine to fix rough edges), let's pretend we start from scratch, so 
the ChangeLog will be


* implicit-zee.c: Delete.
* ree.c: New file.
[...]

and you post the complete file.  I'll do the review.

Last but not least, since this is a significant contribution, you need to have 
a copyright assignment on file with the FSF in order for us to accept it.

-- 
Eric Botcazou


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-11 Thread Ilya Enkovich
Hello Eric,

2011/11/11 Eric Botcazou ebotca...@adacore.com:
 Great! I'll be back with patch covering all non functional changes.
 Will it be OK to have everything in one patch (including current
 functional changes) or I should split it?

 Let's also rename the file while we are at it.  I'd suggest Redundant 
 Extension
 Elimination for the name of the pass, so ree.c for the filename (suggestions
 for a more descriptive name welcome).  So we rename implicit-zee.c into ree.c
 and add a new header along these lines:

 /* Redundant Extension Elimination pass for the GNU compiler.
   Copyright (C) 2010-2011 Free Software Foundation, Inc.
   Contributed by you

   Based on the Redundant Zero-extension elimination pass contributed by
   Sriraman Tallam (tmsri...@google.com) and Silvius Rus (r...@google.com).

   This file is part of GCC.
 [...]

 The general comment must be adjusted: Problem Description extended, How 
 does
 this pass work adjusted and so on.  Hardcoded references to zero-extensions
 and specific pair of modes, both in the comment and the function names, must
 be eliminated.

 Since implicit-zee.c has essentially no revision history (the original commit 
 +
 a patch of mine to fix rough edges), let's pretend we start from scratch, so
 the ChangeLog will be


        * implicit-zee.c: Delete.
        * ree.c: New file.
 [...]

 and you post the complete file.  I'll do the review.

 Last but not least, since this is a significant contribution, you need to have
 a copyright assignment on file with the FSF in order for us to accept it.

 --
 Eric Botcazou


Thanks for tips! I'll do these changes after my two weeks vacation or
my colleges will cover me.

I have already signed copyright agreement with the FSF. Will I need
the separate one for this particular commit?

Thanks
Ilya


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-11 Thread Eric Botcazou
 I have already signed copyright agreement with the FSF. Will I need
 the separate one for this particular commit?

No, if your contributions are already covered by a copyright agreement with the 
FSF, nothing more needs to be done.

-- 
Eric Botcazou


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-10 Thread Eric Botcazou
 Initial aim of the pass was to remove zero extentions redundant due to
 implicit zero extention in x64. But implementation actually uses
 generic approach and seems like a mini-combiner. Pass may combine two
 zero extends or combine zero extend with a constant as a special case
 but in other cases we just try to merge two instructions and then
 check we have corresponding template. It can be easily adopted to
 remove all redundant extensions. So, byte add in the example will be
 merged with zxero extend only if we have explicit template for it in
 machine model.

OK.

 In this particular test case combiner may also help because we have
 byte memory load and extend on combiner pass. But due to some reason
 it does not merge them. In combiner dump I see

 (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
 (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
 ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
  (nil))

 (insn 43 42 44 4 (parallel [
 (set (reg:SI 116 [ xr ])
 (zero_extend:SI (reg/v:QI 81 [ xr ])))
 (clobber (reg:CC 17 flags))
 ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
  (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
 (expr_list:REG_UNUSED (reg:CC 17 flags)
 (nil

The pseudo-register (reg/v/f:DI 111) is changed between insn 39 and insn 43 so 
can_combine_p returns 0.

-- 
Eric Botcazou


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-10 Thread Eric Botcazou
 So, what about the patch? I think since we already have zee patch it
 would be great to use it as more general optimization. I tested it on
 EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32
 bit which is really good for such simple optimization. For OOO archs
 patch is not so critical but still makes code cleaner

The patch cannot be accepted as-is since it doesn't update a single bit of the 
documentation present in implicit-zee.c.  The authors have made the effort of 
thoroughly documenting their code so it shouldn't be wasted.  Therefore, at a 
minimum, the documentation must be overhauled the same way the code will be.

I agree that the numbers are encouraging.  Moreover, the narrow specialization 
of the pass was critized when it was added so a generalization will probably 
be welcome.  So, unless other developers object, let's do it, but correctly, 
that is to say, let's rename the pass, eliminate all the hardcoded references 
to implicit zero-extensions in the code and turn it into a generic elimination 
of redundant extensions pass.

-- 
Eric Botcazou


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-10 Thread Ilya Enkovich
Hello Eric,

Thanks for review!

2011/11/10 Eric Botcazou ebotca...@adacore.com:
 So, what about the patch? I think since we already have zee patch it
 would be great to use it as more general optimization. I tested it on
 EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32
 bit which is really good for such simple optimization. For OOO archs
 patch is not so critical but still makes code cleaner

 The patch cannot be accepted as-is since it doesn't update a single bit of the
 documentation present in implicit-zee.c.  The authors have made the effort of
 thoroughly documenting their code so it shouldn't be wasted.  Therefore, at a
 minimum, the documentation must be overhauled the same way the code will be.

 I agree that the numbers are encouraging.  Moreover, the narrow specialization
 of the pass was critized when it was added so a generalization will probably
 be welcome.  So, unless other developers object, let's do it, but correctly,
 that is to say, let's rename the pass, eliminate all the hardcoded references
 to implicit zero-extensions in the code and turn it into a generic elimination
 of redundant extensions pass.

Great! I'll be back with patch covering all non functional changes.
Will it be OK to have everything in one patch (including current
functional changes) or I should split it?


 --
 Eric Botcazou


Thanks,
Ilya


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-09 Thread Ilya Enkovich
Hello guys,

So, what about the patch? I think since we already have zee patch it
would be great to use it as more general optimization. I tested it on
EEMBC 2.0 on Atom and it showed 1% performance gain in geomean on 32
bit which is really good for such simple optimization. For OOO archs
patch is not so critical but still makes code cleaner

Thanks,
Ilya

2011/11/6 Ilya Enkovich enkovich@gmail.com:
 2011/11/6 Richard Guenther richard.guent...@gmail.com:
 On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 Hello,

 2011/11/5 Eric Botcazou ebotca...@adacore.com:
 Here is a patch which fixes redundant zero extensions problem. Issue
 is resolved by expanding implicit_zee pass functionality to cover zero
 and sign extends of different modes. Could please someone review it?

 Could you explain the undelying idea?  The current strategy of 
 implicit-zee.c
 is exposed at length at the beginning of the file, but here's a summary:

  1. On some architectures (typically x86-64), implicity zero-extensions are
 applied when instructions operate in selected sub-word modes (SImode here):

  addl edi,eax

 has an implicit zero-extension for %rax.

  2. Because of 1, the second instruction in sequences like:

  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
  (set (reg:DI x) (zero_extend:DI (reg:SI x)))

 is redundant.

  3. The pass recognizes this and transforms the above sequence into:

  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2

 and the machine description knows how to translate this into an 'addl'.


 You're proposing extending this to other modes and other architectures, for
 example QImode on x86.  But does

  addb %dl, %al

 modify the entire %eax register on x86?  In other words, are you really 
 after
 implicit (zero-)extensions or after something else, like global 
 elimination of
 redundant extensions?
 Initial aim of the pass was to remove zero extentions redundant due to
 implicit zero extention in x64. But implementation actually uses
 generic approach and seems like a mini-combiner. Pass may combine two
 zero extends or combine zero extend with a constant as a special case
 but in other cases we just try to merge two instructions and then
 check we have corresponding template. It can be easily adopted to
 remove all redundant extensions. So, byte add in the example will be
 merged with zxero extend only if we have explicit template for it in
 machine model.


 What's the effect of the patch on the testcase in the PR in terms of insns 
 at
 the RTL level?  Why doesn't the combiner already optimize it?
 The patch helps to remove two zero extends from RTL in the test from
 PR. I believe zee pass was introduced after postreload pass because we
 should have additional memory instructions by that time and therefore
 more opportunities for optimization after combiner work.

 In this particular test case combiner may also help because we have
 byte memory load and extend on combiner pass. But due to some reason
 it does not merge them. In combiner dump I see

 (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
        (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
 ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
     (nil))

 (insn 43 42 44 4 (parallel [
            (set (reg:SI 116 [ xr ])
                (zero_extend:SI (reg/v:QI 81 [ xr ])))
            (clobber (reg:CC 17 flags))
        ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
     (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil

 and

 Trying 39 - 43:

 With no additional information.

 Well, I bet it's because of the CC clobber which is there
 because of the use of TARGET_ZERO_EXTEND_WITH_AND.
 Where does that insn get generated?  By combine itself?

 This insn is generated by expand:

 (insn 43 42 44 6 (parallel [
            (set (reg:SI 116)
                (zero_extend:SI (reg/v:QI 81 [ xr ])))
            (clobber (reg:CC 17 flags))
        ]) 1.c:11 -1
     (nil))


 Richard.

 Enhancing implicit-zee.c to address missed optimizations like the one 
 reported
 in target/50038 might well be the best approach, but the strategy shift 
 must be
 clearly exposed and discussed.  The reported numbers are certainly 
 impressive.

 --
 Eric Botcazou


 Ilya





Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-06 Thread Ilya Enkovich
Hello,

2011/11/5 Eric Botcazou ebotca...@adacore.com:
 Here is a patch which fixes redundant zero extensions problem. Issue
 is resolved by expanding implicit_zee pass functionality to cover zero
 and sign extends of different modes. Could please someone review it?

 Could you explain the undelying idea?  The current strategy of implicit-zee.c
 is exposed at length at the beginning of the file, but here's a summary:

  1. On some architectures (typically x86-64), implicity zero-extensions are
 applied when instructions operate in selected sub-word modes (SImode here):

  addl edi,eax

 has an implicit zero-extension for %rax.

  2. Because of 1, the second instruction in sequences like:

  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
  (set (reg:DI x) (zero_extend:DI (reg:SI x)))

 is redundant.

  3. The pass recognizes this and transforms the above sequence into:

  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2

 and the machine description knows how to translate this into an 'addl'.


 You're proposing extending this to other modes and other architectures, for
 example QImode on x86.  But does

  addb %dl, %al

 modify the entire %eax register on x86?  In other words, are you really after
 implicit (zero-)extensions or after something else, like global elimination of
 redundant extensions?
Initial aim of the pass was to remove zero extentions redundant due to
implicit zero extention in x64. But implementation actually uses
generic approach and seems like a mini-combiner. Pass may combine two
zero extends or combine zero extend with a constant as a special case
but in other cases we just try to merge two instructions and then
check we have corresponding template. It can be easily adopted to
remove all redundant extensions. So, byte add in the example will be
merged with zxero extend only if we have explicit template for it in
machine model.


 What's the effect of the patch on the testcase in the PR in terms of insns at
 the RTL level?  Why doesn't the combiner already optimize it?
The patch helps to remove two zero extends from RTL in the test from
PR. I believe zee pass was introduced after postreload pass because we
should have additional memory instructions by that time and therefore
more opportunities for optimization after combiner work.

In this particular test case combiner may also help because we have
byte memory load and extend on combiner pass. But due to some reason
it does not merge them. In combiner dump I see

(insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
(mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
 (nil))

(insn 43 42 44 4 (parallel [
(set (reg:SI 116 [ xr ])
(zero_extend:SI (reg/v:QI 81 [ xr ])))
(clobber (reg:CC 17 flags))
]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
 (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

and

Trying 39 - 43:

With no additional information.

 Enhancing implicit-zee.c to address missed optimizations like the one reported
 in target/50038 might well be the best approach, but the strategy shift must 
 be
 clearly exposed and discussed.  The reported numbers are certainly impressive.

 --
 Eric Botcazou


Ilya


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-06 Thread Richard Guenther
On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello,

 2011/11/5 Eric Botcazou ebotca...@adacore.com:
 Here is a patch which fixes redundant zero extensions problem. Issue
 is resolved by expanding implicit_zee pass functionality to cover zero
 and sign extends of different modes. Could please someone review it?

 Could you explain the undelying idea?  The current strategy of implicit-zee.c
 is exposed at length at the beginning of the file, but here's a summary:

  1. On some architectures (typically x86-64), implicity zero-extensions are
 applied when instructions operate in selected sub-word modes (SImode here):

  addl edi,eax

 has an implicit zero-extension for %rax.

  2. Because of 1, the second instruction in sequences like:

  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
  (set (reg:DI x) (zero_extend:DI (reg:SI x)))

 is redundant.

  3. The pass recognizes this and transforms the above sequence into:

  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2

 and the machine description knows how to translate this into an 'addl'.


 You're proposing extending this to other modes and other architectures, for
 example QImode on x86.  But does

  addb %dl, %al

 modify the entire %eax register on x86?  In other words, are you really after
 implicit (zero-)extensions or after something else, like global elimination 
 of
 redundant extensions?
 Initial aim of the pass was to remove zero extentions redundant due to
 implicit zero extention in x64. But implementation actually uses
 generic approach and seems like a mini-combiner. Pass may combine two
 zero extends or combine zero extend with a constant as a special case
 but in other cases we just try to merge two instructions and then
 check we have corresponding template. It can be easily adopted to
 remove all redundant extensions. So, byte add in the example will be
 merged with zxero extend only if we have explicit template for it in
 machine model.


 What's the effect of the patch on the testcase in the PR in terms of insns at
 the RTL level?  Why doesn't the combiner already optimize it?
 The patch helps to remove two zero extends from RTL in the test from
 PR. I believe zee pass was introduced after postreload pass because we
 should have additional memory instructions by that time and therefore
 more opportunities for optimization after combiner work.

 In this particular test case combiner may also help because we have
 byte memory load and extend on combiner pass. But due to some reason
 it does not merge them. In combiner dump I see

 (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
        (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
 ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
     (nil))

 (insn 43 42 44 4 (parallel [
            (set (reg:SI 116 [ xr ])
                (zero_extend:SI (reg/v:QI 81 [ xr ])))
            (clobber (reg:CC 17 flags))
        ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
     (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil

 and

 Trying 39 - 43:

 With no additional information.

Well, I bet it's because of the CC clobber which is there
because of the use of TARGET_ZERO_EXTEND_WITH_AND.
Where does that insn get generated?  By combine itself?

Richard.

 Enhancing implicit-zee.c to address missed optimizations like the one 
 reported
 in target/50038 might well be the best approach, but the strategy shift must 
 be
 clearly exposed and discussed.  The reported numbers are certainly 
 impressive.

 --
 Eric Botcazou


 Ilya



Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-06 Thread Ilya Enkovich
2011/11/6 Richard Guenther richard.guent...@gmail.com:
 On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello,

 2011/11/5 Eric Botcazou ebotca...@adacore.com:
 Here is a patch which fixes redundant zero extensions problem. Issue
 is resolved by expanding implicit_zee pass functionality to cover zero
 and sign extends of different modes. Could please someone review it?

 Could you explain the undelying idea?  The current strategy of 
 implicit-zee.c
 is exposed at length at the beginning of the file, but here's a summary:

  1. On some architectures (typically x86-64), implicity zero-extensions are
 applied when instructions operate in selected sub-word modes (SImode here):

  addl edi,eax

 has an implicit zero-extension for %rax.

  2. Because of 1, the second instruction in sequences like:

  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
  (set (reg:DI x) (zero_extend:DI (reg:SI x)))

 is redundant.

  3. The pass recognizes this and transforms the above sequence into:

  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2

 and the machine description knows how to translate this into an 'addl'.


 You're proposing extending this to other modes and other architectures, for
 example QImode on x86.  But does

  addb %dl, %al

 modify the entire %eax register on x86?  In other words, are you really 
 after
 implicit (zero-)extensions or after something else, like global elimination 
 of
 redundant extensions?
 Initial aim of the pass was to remove zero extentions redundant due to
 implicit zero extention in x64. But implementation actually uses
 generic approach and seems like a mini-combiner. Pass may combine two
 zero extends or combine zero extend with a constant as a special case
 but in other cases we just try to merge two instructions and then
 check we have corresponding template. It can be easily adopted to
 remove all redundant extensions. So, byte add in the example will be
 merged with zxero extend only if we have explicit template for it in
 machine model.


 What's the effect of the patch on the testcase in the PR in terms of insns 
 at
 the RTL level?  Why doesn't the combiner already optimize it?
 The patch helps to remove two zero extends from RTL in the test from
 PR. I believe zee pass was introduced after postreload pass because we
 should have additional memory instructions by that time and therefore
 more opportunities for optimization after combiner work.

 In this particular test case combiner may also help because we have
 byte memory load and extend on combiner pass. But due to some reason
 it does not merge them. In combiner dump I see

 (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
        (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
 ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
     (nil))

 (insn 43 42 44 4 (parallel [
            (set (reg:SI 116 [ xr ])
                (zero_extend:SI (reg/v:QI 81 [ xr ])))
            (clobber (reg:CC 17 flags))
        ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
     (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil

 and

 Trying 39 - 43:

 With no additional information.

 Well, I bet it's because of the CC clobber which is there
 because of the use of TARGET_ZERO_EXTEND_WITH_AND.
 Where does that insn get generated?  By combine itself?

This insn is generated by expand:

(insn 43 42 44 6 (parallel [
(set (reg:SI 116)
(zero_extend:SI (reg/v:QI 81 [ xr ])))
(clobber (reg:CC 17 flags))
]) 1.c:11 -1
 (nil))


 Richard.

 Enhancing implicit-zee.c to address missed optimizations like the one 
 reported
 in target/50038 might well be the best approach, but the strategy shift 
 must be
 clearly exposed and discussed.  The reported numbers are certainly 
 impressive.

 --
 Eric Botcazou


 Ilya




Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-05 Thread Eric Botcazou
 Here is a patch which fixes redundant zero extensions problem. Issue
 is resolved by expanding implicit_zee pass functionality to cover zero
 and sign extends of different modes. Could please someone review it?

Could you explain the undelying idea?  The current strategy of implicit-zee.c 
is exposed at length at the beginning of the file, but here's a summary:

 1. On some architectures (typically x86-64), implicity zero-extensions are 
applied when instructions operate in selected sub-word modes (SImode here):

  addl edi,eax

has an implicit zero-extension for %rax.

 2. Because of 1, the second instruction in sequences like:

  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
  (set (reg:DI x) (zero_extend:DI (reg:SI x)))

is redundant.

 3. The pass recognizes this and transforms the above sequence into:

  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2

and the machine description knows how to translate this into an 'addl'.


You're proposing extending this to other modes and other architectures, for 
example QImode on x86.  But does

  addb %dl, %al

modify the entire %eax register on x86?  In other words, are you really after 
implicit (zero-)extensions or after something else, like global elimination of 
redundant extensions?

What's the effect of the patch on the testcase in the PR in terms of insns at 
the RTL level?  Why doesn't the combiner already optimize it?

Enhancing implicit-zee.c to address missed optimizations like the one reported 
in target/50038 might well be the best approach, but the strategy shift must be 
clearly exposed and discussed.  The reported numbers are certainly impressive.

-- 
Eric Botcazou


[PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-01 Thread Ilya Enkovich
Hi,

Here is a patch which fixes redundant zero extensions problem. Issue
is resolved by expanding implicit_zee pass functionality to cover zero
and sign extends of different modes. Could please someone review it?

Bootstrapped and checked on linux-x86_64.

Thanks,
Ilya
---
2011-11-01  Enkovich Ilya  ilya.enkov...@intel.com

PR target/50038
* implicit-zee.c (ext_cand): New.
(ext_cand_pool): Likewise.
(add_ext_candidate): New.
(zee_init): New.
(zee_cleanup): New.
(get_reg_di): Removed.
(combine_set_zero_extend): Get extend candidate as new parameter.
Now handle sign extend cases and other modes.
(transform_ifelse): Likewise.
(merge_def_and_ze): Likewise.
(combine_reaching_defs): Change parameter type.
(zero_extend_info): Changed insn_list type.
(add_removable_zero_extend): Relaxed mode and code filter.
(find_removable_zero_extends): Changed return type.
(find_and_remove_ze): Var type changes.
(rest_of_handle_zee): Init and cleanup added.

* i386.c (ix86_option_override_internal): set flag_zee for
32 bit platform.


PR50038.diff
Description: Binary data