Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-03-06 Thread Ramana Radhakrishnan
Hi Prathamesh,

Could you split out the ARM specific portions into a separate patch
please in a patch series?

>@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool 
>@var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx 
>*@var{quot}, rtx *@var{rem})
>Expand divmod libfunc
>@end deftypefn
>

This could do with some more detail here with respect to the
conditions when a port needs
to define this particular macro.

Based on the discussion it would be good to have an assert for
TARGET_IDIV in the hook
for the ARM port.

i.e gcc_assert (!TARGET_IDIV);


>
># Return 1 if the target supports divmod
>
>proc check_effective_target_divmod { } {
>if { [istarget arm*-*-*] } {
>   return 1
>}
>return 0
>}
>

This will fail test cases in a configuration where a div instruction
exists. If you are looking to
check the transformation - I think you need to check for __ARM_ARCH_EXT_IDIV__
here similar to the way in which we test for FMA or __ARM_FEATURE_UNALIGNED.

Could you please repost with the changes so that I can take another look ?

Otherwise I think this is something we should queue up for GCC 7.

thanks,
Ramana


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Richard Henderson

On 01/29/2016 12:37 AM, Richard Biener wrote:

To workaround this, I defined a new hook expand_divmod_libfunc, which
targets must override for expanding call to target-specific dimovd.
The "default" hook default_expand_divmod_libfunc() expands call to
libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
available.
Is this a reasonable approach ?


Hum.  How do they get to expand/generate it today?  That said, I'm
no expert in this area.

A simpler solution may be to not do the transform if there is no
instruction for divmod.


Are we certain that the libcall is a win for any target?

I would have expected a default of

q = x / y
r = x - (q * y)

to be most efficient on modern machines.  Even more so on targets like ARM that 
have multiply-and-subtract instructions.


I suppose there's the case of the really tiny embedded chips that don't have 
multiply patterns either.  So that this default expansion still results in 
multiple libcalls.


I do like the transformation, because for machines that don't have a divmod 
instruction, being able to strength-reduce a mod operation to a multiply 
operation is a nice win.



r~


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Sun, Jan 31, 2016 at 2:15 PM, Richard Henderson  wrote:
> On 01/29/2016 12:37 AM, Richard Biener wrote:
>>>
>>> To workaround this, I defined a new hook expand_divmod_libfunc, which
>>> targets must override for expanding call to target-specific dimovd.
>>> The "default" hook default_expand_divmod_libfunc() expands call to
>>> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
>>> available.
>>> Is this a reasonable approach ?
>>
>>
>> Hum.  How do they get to expand/generate it today?  That said, I'm
>> no expert in this area.
>>
>> A simpler solution may be to not do the transform if there is no
>> instruction for divmod.
>
>
> Are we certain that the libcall is a win for any target?
>
> I would have expected a default of
>
> q = x / y
> r = x - (q * y)
>
> to be most efficient on modern machines.  Even more so on targets like ARM
> that have multiply-and-subtract instructions.
>
> I suppose there's the case of the really tiny embedded chips that don't have
> multiply patterns either.  So that this default expansion still results in
> multiple libcalls.
>
> I do like the transformation, because for machines that don't have a divmod
> instruction, being able to strength-reduce a mod operation to a multiply
> operation is a nice win.
>
>
> r~


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Sun, Jan 31, 2016 at 8:43 PM, Jim Wilson  wrote:
>> Are we certain that the libcall is a win for any target?
>> I would have expected a default of
>> q = x / y
>> r = x - (q * y)
>> to be most efficient on modern machines.  Even more so on targets like ARM
>> that have multiply-and-subtract instructions.

If there is a div insn, then yes, gcc will emit a div and a multiply.
However, a div insn is a relatively recent addition to the 32-bit ARM
architecture.  Without the div insn, we get a div libcall and a mod
libcall.  That means two libcalls, both of which are likely
implemented by calling the divmod libcall and returning the desired
part of the result.  One call to a divmod libcall is clearly more
efficient than two calls to a divmod libcall.  So that makes the
transformation useful.

Prathamesh's patch has a number of conditions required to trigger the
optimization, such as a divmod insn, or a lack of a div insn and the
presence of a divmod libcall.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-31 Thread Jim Wilson
On Fri, Jan 29, 2016 at 12:09 AM, Richard Biener  wrote:
> I wonder if rather than introducing a target hook ports could use
> a define_expand expanding to a libcall for this case?

Of the two divmod libcall APIs, one requires a stack temporary, which
would be awkward to allocate in a define_expand.  Though we could have
expand_twoval_binop implement the libgcc udivmoddi4 API which requires
a stack temp, and then add an ARM divmod expander that implements the
ARM API which has a double-wide result.  That sounds like it could
work.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-29 Thread Richard Biener
On Thu, 28 Jan 2016, Jim Wilson wrote:

> On Thu, Jan 28, 2016 at 5:37 AM, Richard Biener  wrote:
> >> To workaround this, I defined a new hook expand_divmod_libfunc, which
> >> targets must override for expanding call to target-specific dimovd.
> >> The "default" hook default_expand_divmod_libfunc() expands call to
> >> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
> >> available.
> >> Is this a reasonable approach ?
> >
> > Hum.  How do they get to expand/generate it today?  That said, I'm
> > no expert in this area.
> 
> Currently, the only place where a divmod libfunc can be called is in
> expand_divmod in expmed.c, which can return either the div or mod
> result, but not both.  If this is called for the mod result, and there
> is no div insn, and no mod insn, and no mod libfunc, then it will call
> the divmod libfunc to generate the mod result.  This is exactly the
> case where the ARM port needs it, as this code was written for the
> arm.
> 
> There are 3 targets that define a divmod libfunc: arm, c6x, and spu.
> The arm port is OK, because expand_divmod does the right thing for
> arm, using the arm divmod calling convention.  The c6x port is OK
> because it defines mod insns and libfuncs, and hence the divmod
> libfunc will never be called and is redundant.  The spu port is also
> OK, because it defines mod libcalls, and hence the divmod libfunc will
> never be called, and is likewise redundant.  Both the c6x and spu
> ports have their own divmod library functions in
> libgcc/config/$target.  The divmod library functions are called by the
> div and mod library functions, so they are necessary, they are just
> never directly called.  Both the c6x and spu port uses the current
> libgcc __udivmoddi4 calling convention with a pointer to the mod
> result, which is different and incompatible to the ARM convention of
> returning a double size result that contains div and mod.
> 
> WIth Prathamesh's patch to add support to the tree optimizers to
> create divmod operations, the c6x and spu ports break.  The divmod
> libfuncs are no longer redundant, and will be called, except with the
> wrong ABI, so we need to extend the divmod support to handle multiple
> ABIs.  This is why Prathamesh added the target hook for the divmod
> libcall, so the target can specify the ABI used by its divmod
> libcalls.  Prathamesh has correct support for ARM (current code), and
> apparently correct code for c6x and spu (libgcc udivmodsi4).

Thanks for the explanation.

I wonder if rather than introducing a target hook ports could use
a define_expand expanding to a libcall for this case?

That said, I've reviewed the GIMPLE parts of the patch and am
happy with that.  I'd prefer if somebody with more RTL-expansion
fu would review the rest of the patch.

> > A simpler solution may be to not do the transform if there is no
> > instruction for divmod.
> 
> This prevents the optimization from happening on ARM, which has divmod
> libfuncs but no divmod insn.  We want the optimization to happen
> there, as if we need both div and mod results, then calling the divmod
> libfunc is faster than calling both the div and mod libfuncs.

Ok, understood.  It also affects all > word mode ops on all targets
I think (not sure if we have TImode divmod in libgcc...).

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-28 Thread Jim Wilson
On Thu, Jan 28, 2016 at 5:37 AM, Richard Biener  wrote:
>> To workaround this, I defined a new hook expand_divmod_libfunc, which
>> targets must override for expanding call to target-specific dimovd.
>> The "default" hook default_expand_divmod_libfunc() expands call to
>> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
>> available.
>> Is this a reasonable approach ?
>
> Hum.  How do they get to expand/generate it today?  That said, I'm
> no expert in this area.

Currently, the only place where a divmod libfunc can be called is in
expand_divmod in expmed.c, which can return either the div or mod
result, but not both.  If this is called for the mod result, and there
is no div insn, and no mod insn, and no mod libfunc, then it will call
the divmod libfunc to generate the mod result.  This is exactly the
case where the ARM port needs it, as this code was written for the
arm.

There are 3 targets that define a divmod libfunc: arm, c6x, and spu.
The arm port is OK, because expand_divmod does the right thing for
arm, using the arm divmod calling convention.  The c6x port is OK
because it defines mod insns and libfuncs, and hence the divmod
libfunc will never be called and is redundant.  The spu port is also
OK, because it defines mod libcalls, and hence the divmod libfunc will
never be called, and is likewise redundant.  Both the c6x and spu
ports have their own divmod library functions in
libgcc/config/$target.  The divmod library functions are called by the
div and mod library functions, so they are necessary, they are just
never directly called.  Both the c6x and spu port uses the current
libgcc __udivmoddi4 calling convention with a pointer to the mod
result, which is different and incompatible to the ARM convention of
returning a double size result that contains div and mod.

WIth Prathamesh's patch to add support to the tree optimizers to
create divmod operations, the c6x and spu ports break.  The divmod
libfuncs are no longer redundant, and will be called, except with the
wrong ABI, so we need to extend the divmod support to handle multiple
ABIs.  This is why Prathamesh added the target hook for the divmod
libcall, so the target can specify the ABI used by its divmod
libcalls.  Prathamesh has correct support for ARM (current code), and
apparently correct code for c6x and spu (libgcc udivmodsi4).

> A simpler solution may be to not do the transform if there is no
> instruction for divmod.

This prevents the optimization from happening on ARM, which has divmod
libfuncs but no divmod insn.  We want the optimization to happen
there, as if we need both div and mod results, then calling the divmod
libfunc is faster than calling both the div and mod libfuncs.

Jim


Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-28 Thread Richard Biener
On Wed, 27 Jan 2016, Prathamesh Kulkarni wrote:

> On 11 November 2015 at 19:04, Richard Biener  wrote:
> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 11 November 2015 at 16:03, Richard Biener  wrote:
> >> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 10 November 2015 at 20:11, Richard Biener  wrote:
> >> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 4 November 2015 at 20:35, Richard Biener  
> >> >> >> wrote:
> >> >> >> >
> >> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
> >> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
> >> >> >> > solely as the HW always computes both div and mod (IIRC).
> >> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
> >> >> >> take place on x86.
> >> >> >
> >> >> > Ok.
> >> >> >
> >> >> >> > +
> >> >> >> > +gassign *assign_stmt = gimple_build_assign 
> >> >> >> > (gimple_assign_lhs
> >> >> >> > (use_stmt), rhs);
> >> >> >> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >> >> >> >
> >> >> >> > Ick.  Please use
> >> >> >> >
> >> >> >> > gimple_set_rhs_from_tree (use_stmt, res);
> >> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
> >> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for 
> >> >> >> use_stmt.
> >> >> >> Is that OK ?
> >> >> >
> >> >> > Yes.
> >> >> >
> >> >> >> > update_stmt (use_stmt);
> >> >> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> >> >   cfg_changed = true;
> >> >> >> >
> >> >> >> > +  free_dominance_info (CDI_DOMINATORS);
> >> >> >> >
> >> >> >> > do not free dominators.
> >> >> >>
> >> >> >> I have done the suggested changes in the attached patch.
> >> >> >> I have a few questions:
> >> >> >>
> >> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
> >> >> >> stmt with matching operands
> >> >> >> look correct ?
> >> >> >
> >> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
> >> >> > + top_bb dominates all other basic blocks containing div/mod stms
> >> >> > + so, the topmost stmt would be the first div/mod stmt with 
> >> >> > matching
> >> >> > operands
> >> >> > + in top_bb.  */
> >> >> > +
> >> >> > +  gcc_assert (top_bb != 0);
> >> >> > +  gimple_stmt_iterator gsi;
> >> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> >> >> > ())
> >> >> > +{
> >> >> > +  gimple *g = gsi_stmt (gsi);
> >> >> > +  if (is_gimple_assign (g)
> >> >> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> >> >> > +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> >> >> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> >> >> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> >> >> > +   break;
> >> >> >
> >> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
> >> >> > alongside top_bb (looks like you'll no longer need top_bb if you
> >> >> > retail top_stmt).  And then do
> >> >> >
> >> >> >gsi = gsi_for_stmt (top_stmt);
> >> >> >
> >> >> > and insert before that.
> >> >> Thanks, done in this patch. Does it look OK ?
> >> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
> >> >> stmt1 occurs before stmt2
> >> >> only if stmt1 and stmt2 are in the same basic block ?
> >> >> >
> >> >> >> b) Handling constants - I dropped handling constants in the attached
> >> >> >> patch. IIUC we don't want
> >> >> >> to enable this transform if there's a specialized expansion for some
> >> >> >> constants for div or mod ?
> >> >> >
> >> >> > See expand_divmod which has lots of special cases for constant 
> >> >> > operands
> >> >> > not requiring target support for div or mod.
> >> >> Thanks, would it be OK if I do this in follow up patch ?
> >> >
> >> > Well, just not handle them like in your patch is fine.
> >> >
> >> >> >
> >> >> >> I suppose this would also be target dependent and require a target 
> >> >> >> hook ?
> >> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd 
> >> >> >> operand
> >> >> >> is constant and <= 0 or power of 2,
> >> >> >> while for other cases goes the expand_divmod() route to generate call
> >> >> >> to __aeabi_idivmod libcall.
> >> >> >
> >> >> > Ok, so it lacks a signed mod instruction.
> >> >> >
> >> >> >> c) Gating the divmod transform -
> >> >> >> I tried gating it on checks for optab_handlers on div and mod, 
> >> >> >> however
> >> >> >> this doesn't enable transform for arm cortex-a9
> >> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer 
> >> >> >> div and mod).
> >> >> >> IIUC for cortex-a9,
> >> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing 
> >> >> >> because
> >> >> >> HAVE_divsi3 is 0.
> >> >> >> However optab_handler (smod_optab, SImode) matches since 
> >> >> >> optab_handler

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2016-01-27 Thread Prathamesh Kulkarni
On 11 November 2015 at 19:04, Richard Biener  wrote:
> On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 11 November 2015 at 16:03, Richard Biener  wrote:
>> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 10 November 2015 at 20:11, Richard Biener  wrote:
>> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 4 November 2015 at 20:35, Richard Biener  wrote:
>> >> >> >
>> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
>> >> >> > solely as the HW always computes both div and mod (IIRC).
>> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
>> >> >> take place on x86.
>> >> >
>> >> > Ok.
>> >> >
>> >> >> > +
>> >> >> > +gassign *assign_stmt = gimple_build_assign 
>> >> >> > (gimple_assign_lhs
>> >> >> > (use_stmt), rhs);
>> >> >> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >> >> >
>> >> >> > Ick.  Please use
>> >> >> >
>> >> >> > gimple_set_rhs_from_tree (use_stmt, res);
>> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> >> >> Is that OK ?
>> >> >
>> >> > Yes.
>> >> >
>> >> >> > update_stmt (use_stmt);
>> >> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >   cfg_changed = true;
>> >> >> >
>> >> >> > +  free_dominance_info (CDI_DOMINATORS);
>> >> >> >
>> >> >> > do not free dominators.
>> >> >>
>> >> >> I have done the suggested changes in the attached patch.
>> >> >> I have a few questions:
>> >> >>
>> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
>> >> >> stmt with matching operands
>> >> >> look correct ?
>> >> >
>> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
>> >> > + top_bb dominates all other basic blocks containing div/mod stms
>> >> > + so, the topmost stmt would be the first div/mod stmt with matching
>> >> > operands
>> >> > + in top_bb.  */
>> >> > +
>> >> > +  gcc_assert (top_bb != 0);
>> >> > +  gimple_stmt_iterator gsi;
>> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
>> >> > ())
>> >> > +{
>> >> > +  gimple *g = gsi_stmt (gsi);
>> >> > +  if (is_gimple_assign (g)
>> >> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
>> >> > +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
>> >> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
>> >> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
>> >> > +   break;
>> >> >
>> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
>> >> > alongside top_bb (looks like you'll no longer need top_bb if you
>> >> > retail top_stmt).  And then do
>> >> >
>> >> >gsi = gsi_for_stmt (top_stmt);
>> >> >
>> >> > and insert before that.
>> >> Thanks, done in this patch. Does it look OK ?
>> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
>> >> stmt1 occurs before stmt2
>> >> only if stmt1 and stmt2 are in the same basic block ?
>> >> >
>> >> >> b) Handling constants - I dropped handling constants in the attached
>> >> >> patch. IIUC we don't want
>> >> >> to enable this transform if there's a specialized expansion for some
>> >> >> constants for div or mod ?
>> >> >
>> >> > See expand_divmod which has lots of special cases for constant operands
>> >> > not requiring target support for div or mod.
>> >> Thanks, would it be OK if I do this in follow up patch ?
>> >
>> > Well, just not handle them like in your patch is fine.
>> >
>> >> >
>> >> >> I suppose this would also be target dependent and require a target 
>> >> >> hook ?
>> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> >> >> is constant and <= 0 or power of 2,
>> >> >> while for other cases goes the expand_divmod() route to generate call
>> >> >> to __aeabi_idivmod libcall.
>> >> >
>> >> > Ok, so it lacks a signed mod instruction.
>> >> >
>> >> >> c) Gating the divmod transform -
>> >> >> I tried gating it on checks for optab_handlers on div and mod, however
>> >> >> this doesn't enable transform for arm cortex-a9
>> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div 
>> >> >> and mod).
>> >> >> IIUC for cortex-a9,
>> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> >> >> HAVE_divsi3 is 0.
>> >> >> However optab_handler (smod_optab, SImode) matches since optab_handler
>> >> >> only checks for existence of pattern
>> >> >> (and not whether the pattern gets matched).
>> >> >> I suppose we should enable the transform only if the divmod, div, and
>> >> >> mod pattern do not match rather than checking
>> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> >> >> would fail to match but 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-11 Thread Prathamesh Kulkarni
On 10 November 2015 at 20:11, Richard Biener  wrote:
> On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 4 November 2015 at 20:35, Richard Biener  wrote:
>> >
>> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> > target expands all divisions/modulo ops via divmod, relying on CSE
>> > solely as the HW always computes both div and mod (IIRC).
>> x86_64 has optab_handler for divmod defined, so the transform won't
>> take place on x86.
>
> Ok.
>
>> > +
>> > +gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
>> > (use_stmt), rhs);
>> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >
>> > Ick.  Please use
>> >
>> > gimple_set_rhs_from_tree (use_stmt, res);
>> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> Is that OK ?
>
> Yes.
>
>> > update_stmt (use_stmt);
>> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >   cfg_changed = true;
>> >
>> > +  free_dominance_info (CDI_DOMINATORS);
>> >
>> > do not free dominators.
>>
>> I have done the suggested changes in the attached patch.
>> I have a few questions:
>>
>> a) Does the change to insert DIVMOD call before topmost div or mod
>> stmt with matching operands
>> look correct ?
>
> +  /* Insert call-stmt just before the topmost div/mod stmt.
> + top_bb dominates all other basic blocks containing div/mod stms
> + so, the topmost stmt would be the first div/mod stmt with matching
> operands
> + in top_bb.  */
> +
> +  gcc_assert (top_bb != 0);
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> ())
> +{
> +  gimple *g = gsi_stmt (gsi);
> +  if (is_gimple_assign (g)
> + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> +   break;
>
> Looks overly complicated to me.  Just remember "topmost" use_stmt
> alongside top_bb (looks like you'll no longer need top_bb if you
> retail top_stmt).  And then do
>
>gsi = gsi_for_stmt (top_stmt);
>
> and insert before that.
Thanks, done in this patch. Does it look OK ?
IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
stmt1 occurs before stmt2
only if stmt1 and stmt2 are in the same basic block ?
>
>> b) Handling constants - I dropped handling constants in the attached
>> patch. IIUC we don't want
>> to enable this transform if there's a specialized expansion for some
>> constants for div or mod ?
>
> See expand_divmod which has lots of special cases for constant operands
> not requiring target support for div or mod.
Thanks, would it be OK if I do this in follow up patch ?
>
>> I suppose this would also be target dependent and require a target hook ?
>> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> is constant and <= 0 or power of 2,
>> while for other cases goes the expand_divmod() route to generate call
>> to __aeabi_idivmod libcall.
>
> Ok, so it lacks a signed mod instruction.
>
>> c) Gating the divmod transform -
>> I tried gating it on checks for optab_handlers on div and mod, however
>> this doesn't enable transform for arm cortex-a9
>> anymore (cortex-a9 doesn't have hardware instructions for integer div and 
>> mod).
>> IIUC for cortex-a9,
>> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> HAVE_divsi3 is 0.
>> However optab_handler (smod_optab, SImode) matches since optab_handler
>> only checks for existence of pattern
>> (and not whether the pattern gets matched).
>> I suppose we should enable the transform only if the divmod, div, and
>> mod pattern do not match rather than checking
>> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> would fail to match but optab_handler(smod_optab, SImode ) still
>> says it's matched.
>
> Ah, of course.  Querying for an optab handler is just a cheap
> guesstimate...  Not sure how to circumvent this best (sub-target
> enablement of patterns).  RTL expansion just goes ahead (of course)
> and sees if expansion eventually fails.  Richard?
>
>> Should we define a new target hook combine_divmod, which returns true
>> if transforming to divmod is desirable for that
>> target ?
>> The default definition could be:
>> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
>> {
>>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> }
>>
>> And for arm, it could be over-ridden to return false if op2 is
>> constant and <= 0 or power of 2.
>> I am not really sure if this is a good idea since I am replicating
>> information from modsi3 pattern.
>> Any change to the pattern may require corresponding change to the hook :/
>
> Yeah, I don't think that is desirable.  Ideally 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-11 Thread Richard Biener
On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:

> On 10 November 2015 at 20:11, Richard Biener  wrote:
> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 4 November 2015 at 20:35, Richard Biener  wrote:
> >> >
> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
> >> > target expands all divisions/modulo ops via divmod, relying on CSE
> >> > solely as the HW always computes both div and mod (IIRC).
> >> x86_64 has optab_handler for divmod defined, so the transform won't
> >> take place on x86.
> >
> > Ok.
> >
> >> > +
> >> > +gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> >> > (use_stmt), rhs);
> >> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >> >
> >> > Ick.  Please use
> >> >
> >> > gimple_set_rhs_from_tree (use_stmt, res);
> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
> >> Is that OK ?
> >
> > Yes.
> >
> >> > update_stmt (use_stmt);
> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >   cfg_changed = true;
> >> >
> >> > +  free_dominance_info (CDI_DOMINATORS);
> >> >
> >> > do not free dominators.
> >>
> >> I have done the suggested changes in the attached patch.
> >> I have a few questions:
> >>
> >> a) Does the change to insert DIVMOD call before topmost div or mod
> >> stmt with matching operands
> >> look correct ?
> >
> > +  /* Insert call-stmt just before the topmost div/mod stmt.
> > + top_bb dominates all other basic blocks containing div/mod stms
> > + so, the topmost stmt would be the first div/mod stmt with matching
> > operands
> > + in top_bb.  */
> > +
> > +  gcc_assert (top_bb != 0);
> > +  gimple_stmt_iterator gsi;
> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> > ())
> > +{
> > +  gimple *g = gsi_stmt (gsi);
> > +  if (is_gimple_assign (g)
> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> > +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> > +   break;
> >
> > Looks overly complicated to me.  Just remember "topmost" use_stmt
> > alongside top_bb (looks like you'll no longer need top_bb if you
> > retail top_stmt).  And then do
> >
> >gsi = gsi_for_stmt (top_stmt);
> >
> > and insert before that.
> Thanks, done in this patch. Does it look OK ?
> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
> stmt1 occurs before stmt2
> only if stmt1 and stmt2 are in the same basic block ?
> >
> >> b) Handling constants - I dropped handling constants in the attached
> >> patch. IIUC we don't want
> >> to enable this transform if there's a specialized expansion for some
> >> constants for div or mod ?
> >
> > See expand_divmod which has lots of special cases for constant operands
> > not requiring target support for div or mod.
> Thanks, would it be OK if I do this in follow up patch ?

Well, just not handle them like in your patch is fine.

> >
> >> I suppose this would also be target dependent and require a target hook ?
> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
> >> is constant and <= 0 or power of 2,
> >> while for other cases goes the expand_divmod() route to generate call
> >> to __aeabi_idivmod libcall.
> >
> > Ok, so it lacks a signed mod instruction.
> >
> >> c) Gating the divmod transform -
> >> I tried gating it on checks for optab_handlers on div and mod, however
> >> this doesn't enable transform for arm cortex-a9
> >> anymore (cortex-a9 doesn't have hardware instructions for integer div and 
> >> mod).
> >> IIUC for cortex-a9,
> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
> >> HAVE_divsi3 is 0.
> >> However optab_handler (smod_optab, SImode) matches since optab_handler
> >> only checks for existence of pattern
> >> (and not whether the pattern gets matched).
> >> I suppose we should enable the transform only if the divmod, div, and
> >> mod pattern do not match rather than checking
> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
> >> would fail to match but optab_handler(smod_optab, SImode ) still
> >> says it's matched.
> >
> > Ah, of course.  Querying for an optab handler is just a cheap
> > guesstimate...  Not sure how to circumvent this best (sub-target
> > enablement of patterns).  RTL expansion just goes ahead (of course)
> > and sees if expansion eventually fails.  Richard?
> >
> >> Should we define a new target hook combine_divmod, which returns true
> >> if transforming to divmod is desirable for that
> >> target ?
> >> The default definition could be:
> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
> >> {
> >>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
> >> 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-11 Thread Prathamesh Kulkarni
On 11 November 2015 at 16:03, Richard Biener  wrote:
> On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 10 November 2015 at 20:11, Richard Biener  wrote:
>> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 4 November 2015 at 20:35, Richard Biener  wrote:
>> >> >
>> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> >> > target expands all divisions/modulo ops via divmod, relying on CSE
>> >> > solely as the HW always computes both div and mod (IIRC).
>> >> x86_64 has optab_handler for divmod defined, so the transform won't
>> >> take place on x86.
>> >
>> > Ok.
>> >
>> >> > +
>> >> > +gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
>> >> > (use_stmt), rhs);
>> >> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >> >
>> >> > Ick.  Please use
>> >> >
>> >> > gimple_set_rhs_from_tree (use_stmt, res);
>> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> >> Is that OK ?
>> >
>> > Yes.
>> >
>> >> > update_stmt (use_stmt);
>> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >   cfg_changed = true;
>> >> >
>> >> > +  free_dominance_info (CDI_DOMINATORS);
>> >> >
>> >> > do not free dominators.
>> >>
>> >> I have done the suggested changes in the attached patch.
>> >> I have a few questions:
>> >>
>> >> a) Does the change to insert DIVMOD call before topmost div or mod
>> >> stmt with matching operands
>> >> look correct ?
>> >
>> > +  /* Insert call-stmt just before the topmost div/mod stmt.
>> > + top_bb dominates all other basic blocks containing div/mod stms
>> > + so, the topmost stmt would be the first div/mod stmt with matching
>> > operands
>> > + in top_bb.  */
>> > +
>> > +  gcc_assert (top_bb != 0);
>> > +  gimple_stmt_iterator gsi;
>> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
>> > ())
>> > +{
>> > +  gimple *g = gsi_stmt (gsi);
>> > +  if (is_gimple_assign (g)
>> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
>> > +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
>> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
>> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
>> > +   break;
>> >
>> > Looks overly complicated to me.  Just remember "topmost" use_stmt
>> > alongside top_bb (looks like you'll no longer need top_bb if you
>> > retail top_stmt).  And then do
>> >
>> >gsi = gsi_for_stmt (top_stmt);
>> >
>> > and insert before that.
>> Thanks, done in this patch. Does it look OK ?
>> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
>> stmt1 occurs before stmt2
>> only if stmt1 and stmt2 are in the same basic block ?
>> >
>> >> b) Handling constants - I dropped handling constants in the attached
>> >> patch. IIUC we don't want
>> >> to enable this transform if there's a specialized expansion for some
>> >> constants for div or mod ?
>> >
>> > See expand_divmod which has lots of special cases for constant operands
>> > not requiring target support for div or mod.
>> Thanks, would it be OK if I do this in follow up patch ?
>
> Well, just not handle them like in your patch is fine.
>
>> >
>> >> I suppose this would also be target dependent and require a target hook ?
>> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> >> is constant and <= 0 or power of 2,
>> >> while for other cases goes the expand_divmod() route to generate call
>> >> to __aeabi_idivmod libcall.
>> >
>> > Ok, so it lacks a signed mod instruction.
>> >
>> >> c) Gating the divmod transform -
>> >> I tried gating it on checks for optab_handlers on div and mod, however
>> >> this doesn't enable transform for arm cortex-a9
>> >> anymore (cortex-a9 doesn't have hardware instructions for integer div and 
>> >> mod).
>> >> IIUC for cortex-a9,
>> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> >> HAVE_divsi3 is 0.
>> >> However optab_handler (smod_optab, SImode) matches since optab_handler
>> >> only checks for existence of pattern
>> >> (and not whether the pattern gets matched).
>> >> I suppose we should enable the transform only if the divmod, div, and
>> >> mod pattern do not match rather than checking
>> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> >> would fail to match but optab_handler(smod_optab, SImode ) still
>> >> says it's matched.
>> >
>> > Ah, of course.  Querying for an optab handler is just a cheap
>> > guesstimate...  Not sure how to circumvent this best (sub-target
>> > enablement of patterns).  RTL expansion just goes ahead (of course)
>> > and sees if expansion eventually fails.  Richard?
>> >
>> >> Should we define a new target hook combine_divmod, which returns true
>> >> if transforming to divmod is desirable for that
>> >> target ?
>> >> The 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-11 Thread Richard Biener
On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:

> On 11 November 2015 at 16:03, Richard Biener  wrote:
> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 10 November 2015 at 20:11, Richard Biener  wrote:
> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 4 November 2015 at 20:35, Richard Biener  wrote:
> >> >> >
> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
> >> >> > solely as the HW always computes both div and mod (IIRC).
> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
> >> >> take place on x86.
> >> >
> >> > Ok.
> >> >
> >> >> > +
> >> >> > +gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> >> >> > (use_stmt), rhs);
> >> >> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >> >> >
> >> >> > Ick.  Please use
> >> >> >
> >> >> > gimple_set_rhs_from_tree (use_stmt, res);
> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
> >> >> Is that OK ?
> >> >
> >> > Yes.
> >> >
> >> >> > update_stmt (use_stmt);
> >> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> >   cfg_changed = true;
> >> >> >
> >> >> > +  free_dominance_info (CDI_DOMINATORS);
> >> >> >
> >> >> > do not free dominators.
> >> >>
> >> >> I have done the suggested changes in the attached patch.
> >> >> I have a few questions:
> >> >>
> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
> >> >> stmt with matching operands
> >> >> look correct ?
> >> >
> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
> >> > + top_bb dominates all other basic blocks containing div/mod stms
> >> > + so, the topmost stmt would be the first div/mod stmt with matching
> >> > operands
> >> > + in top_bb.  */
> >> > +
> >> > +  gcc_assert (top_bb != 0);
> >> > +  gimple_stmt_iterator gsi;
> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> >> > ())
> >> > +{
> >> > +  gimple *g = gsi_stmt (gsi);
> >> > +  if (is_gimple_assign (g)
> >> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> >> > +|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> >> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> >> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> >> > +   break;
> >> >
> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
> >> > alongside top_bb (looks like you'll no longer need top_bb if you
> >> > retail top_stmt).  And then do
> >> >
> >> >gsi = gsi_for_stmt (top_stmt);
> >> >
> >> > and insert before that.
> >> Thanks, done in this patch. Does it look OK ?
> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
> >> stmt1 occurs before stmt2
> >> only if stmt1 and stmt2 are in the same basic block ?
> >> >
> >> >> b) Handling constants - I dropped handling constants in the attached
> >> >> patch. IIUC we don't want
> >> >> to enable this transform if there's a specialized expansion for some
> >> >> constants for div or mod ?
> >> >
> >> > See expand_divmod which has lots of special cases for constant operands
> >> > not requiring target support for div or mod.
> >> Thanks, would it be OK if I do this in follow up patch ?
> >
> > Well, just not handle them like in your patch is fine.
> >
> >> >
> >> >> I suppose this would also be target dependent and require a target hook 
> >> >> ?
> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
> >> >> is constant and <= 0 or power of 2,
> >> >> while for other cases goes the expand_divmod() route to generate call
> >> >> to __aeabi_idivmod libcall.
> >> >
> >> > Ok, so it lacks a signed mod instruction.
> >> >
> >> >> c) Gating the divmod transform -
> >> >> I tried gating it on checks for optab_handlers on div and mod, however
> >> >> this doesn't enable transform for arm cortex-a9
> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div 
> >> >> and mod).
> >> >> IIUC for cortex-a9,
> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
> >> >> HAVE_divsi3 is 0.
> >> >> However optab_handler (smod_optab, SImode) matches since optab_handler
> >> >> only checks for existence of pattern
> >> >> (and not whether the pattern gets matched).
> >> >> I suppose we should enable the transform only if the divmod, div, and
> >> >> mod pattern do not match rather than checking
> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
> >> >> would fail to match but optab_handler(smod_optab, SImode ) still
> >> >> says it's matched.
> >> >
> >> > Ah, of course.  Querying for an optab handler is just a cheap
> >> > guesstimate...  Not sure how to circumvent this best (sub-target

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-10 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>> c) Gating the divmod transform -
>> I tried gating it on checks for optab_handlers on div and mod, however
>> this doesn't enable transform for arm cortex-a9
>> anymore (cortex-a9 doesn't have hardware instructions for integer div
>> and mod).
>> IIUC for cortex-a9,
>> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> HAVE_divsi3 is 0.
>> However optab_handler (smod_optab, SImode) matches since optab_handler
>> only checks for existence of pattern
>> (and not whether the pattern gets matched).
>> I suppose we should enable the transform only if the divmod, div, and
>> mod pattern do not match rather than checking
>> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> would fail to match but optab_handler(smod_optab, SImode ) still
>> says it's matched.
>
> Ah, of course.  Querying for an optab handler is just a cheap
> guesstimate...  Not sure how to circumvent this best (sub-target
> enablement of patterns).  RTL expansion just goes ahead (of course)
> and sees if expansion eventually fails.  Richard?

Yeah, FAILs in expanders are kind of awkward these days.  The ARM
modsi3 could be a bit more helpful by having a predicate that only
accepts power-of-2 integers instead of any const_int_operand,
which would at least avoid having to generate insns.

I did wonder once whether the generators should provide a way of
automatically fusing separate .md constructs into a single optab.
That might avoid more complicated FAILs and could be used to
automatically generate information for the tree level.

>> Should we define a new target hook combine_divmod, which returns true
>> if transforming to divmod is desirable for that
>> target ?
>> The default definition could be:
>> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
>> {
>>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> }
>> 
>> And for arm, it could be over-ridden to return false if op2 is
>> constant and <= 0 or power of 2.
>> I am not really sure if this is a good idea since I am replicating
>> information from modsi3 pattern.
>> Any change to the pattern may require corresponding change to the hook :/
>
> Yeah, I don't think that is desirable.  Ideally we'd have a way
> to query HAVE_* for CODE_FOR_* which would mean target-insns.def
> support for all div/mod/divmod patterns(?) and queries...
>
> Not sure if what would be enough though.

This kind of stuff should really go through optabs rather than
target-insns.def.  target-insns.def is more for cases where there's
no distinction between operand modes.

> Note that the divmod check is equally flawed.
>
> I think with the above I'd enable the transform when
>
> +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> +  || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>&& optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
> +return false;
>
> so we either will have a divmod instruction (hopefully not sub-target
> disabled for us) or a libfunc for divmod and for sure no HW divide
> instruction (HW mod can be emulated by HW divide but not the other
> way around).

Sounds good to me FWIW.

Thanks,
Richard



Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-10 Thread Richard Biener
On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:

> On 4 November 2015 at 20:35, Richard Biener  wrote:
> >
> > Btw, did you investigate code gen differences on x86_64/i586?  That
> > target expands all divisions/modulo ops via divmod, relying on CSE
> > solely as the HW always computes both div and mod (IIRC).
> x86_64 has optab_handler for divmod defined, so the transform won't
> take place on x86.

Ok.

> > +
> > +gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> > (use_stmt), rhs);
> > +gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >
> > Ick.  Please use
> >
> > gimple_set_rhs_from_tree (use_stmt, res);
> Um there doesn't seem to be gimple_set_rhs_from_tree.
> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
> Is that OK ?

Yes.

> > update_stmt (use_stmt);
> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >   cfg_changed = true;
> >
> > +  free_dominance_info (CDI_DOMINATORS);
> >
> > do not free dominators.
> 
> I have done the suggested changes in the attached patch.
> I have a few questions:
> 
> a) Does the change to insert DIVMOD call before topmost div or mod
> stmt with matching operands
> look correct ?

+  /* Insert call-stmt just before the topmost div/mod stmt.
+ top_bb dominates all other basic blocks containing div/mod stms
+ so, the topmost stmt would be the first div/mod stmt with matching 
operands
+ in top_bb.  */
+
+  gcc_assert (top_bb != 0);
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next 
())
+{
+  gimple *g = gsi_stmt (gsi);
+  if (is_gimple_assign (g)
+ && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
+|| gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
+ && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
+ && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
+   break;

Looks overly complicated to me.  Just remember "topmost" use_stmt
alongside top_bb (looks like you'll no longer need top_bb if you
retail top_stmt).  And then do

   gsi = gsi_for_stmt (top_stmt);

and insert before that.

> b) Handling constants - I dropped handling constants in the attached
> patch. IIUC we don't want
> to enable this transform if there's a specialized expansion for some
> constants for div or mod ?

See expand_divmod which has lots of special cases for constant operands
not requiring target support for div or mod.

> I suppose this would also be target dependent and require a target hook ?
> For instance arm defines modsi3 pattern to expand mod when 2nd operand
> is constant and <= 0 or power of 2,
> while for other cases goes the expand_divmod() route to generate call
> to __aeabi_idivmod libcall.

Ok, so it lacks a signed mod instruction.

> c) Gating the divmod transform -
> I tried gating it on checks for optab_handlers on div and mod, however
> this doesn't enable transform for arm cortex-a9
> anymore (cortex-a9 doesn't have hardware instructions for integer div and 
> mod).
> IIUC for cortex-a9,
> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
> HAVE_divsi3 is 0.
> However optab_handler (smod_optab, SImode) matches since optab_handler
> only checks for existence of pattern
> (and not whether the pattern gets matched).
> I suppose we should enable the transform only if the divmod, div, and
> mod pattern do not match rather than checking
> if the patterns exist via optab_handler ? For a general x % y, modsi3
> would fail to match but optab_handler(smod_optab, SImode ) still
> says it's matched.

Ah, of course.  Querying for an optab handler is just a cheap
guesstimate...  Not sure how to circumvent this best (sub-target
enablement of patterns).  RTL expansion just goes ahead (of course)
and sees if expansion eventually fails.  Richard?

> Should we define a new target hook combine_divmod, which returns true
> if transforming to divmod is desirable for that
> target ?
> The default definition could be:
> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
> {
>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
> }
> 
> And for arm, it could be over-ridden to return false if op2 is
> constant and <= 0 or power of 2.
> I am not really sure if this is a good idea since I am replicating
> information from modsi3 pattern.
> Any change to the pattern may require corresponding change to the hook :/

Yeah, I don't think that is desirable.  Ideally we'd have a way
to query HAVE_* for CODE_FOR_* which would mean target-insns.def
support for all div/mod/divmod patterns(?) and queries...

Not sure if what would be enough though.

Note that the divmod check is equally flawed.

I think with the above I'd enable the transform when

+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+  || (optab_libfunc (divmod_optab, mode) != NULL_RTX
   && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
+   

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-09 Thread Prathamesh Kulkarni
On 4 November 2015 at 20:35, Richard Biener  wrote:
> On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 2 November 2015 at 18:31, Richard Biener  wrote:
>> > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 2 November 2015 at 13:20, Prathamesh Kulkarni
>> >>  wrote:
>> >> > On 30 October 2015 at 15:57, Richard Biener 
>> >> >  wrote:
>> >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>> >> >>  wrote:
>> >> >>> Hi,
>> >> >>> I have attached revamped version of Kugan's patch
>> >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>> >> >>> Test-case: http://pastebin.com/QMfpXLD9
>> >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
>> >> >>> Assembly: http://pastebin.com/kk2HZpvA
>> >> >>>
>> >> >>> The approach I took is similar to sincos pass, which involves two 
>> >> >>> parts:
>> >> >>> a) divmod pass that transforms:
>> >> >>> t1 = a / b;
>> >> >>> t2 = a % b;
>> >> >>> to the following sequence:
>> >> >>> complex_tmp = DIVMOD (a, b);
>> >> >>> t1 = REALPART_EXPR(a);
>> >> >>> t2 = IMAGPART_EXPR(b);
>> >> >>>
>> >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>> >> >>> expand_DIVMOD().
>> >> >>> I am not sure if I have done this correctly. I was somehow looking to
>> >> >>> reuse expand_divmod() but am not able to figure out how to do that
>> >> >>> (AFAIU expand_divmod() strictly returns either the quotient or
>> >> >>> remainder but never both which is what I want), so ended up with
>> >> >>> manually expanding to rtx.
>> >> >>>
>> >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>> >> >>> understand the following:
>> >> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>> >> >>>   cfg_changed = true;
>> >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>> >> >>
>> >> >> The call replacement might no longer throw.
>> >> >>
>> >> >>> A silly question, what options to pass to gcc to print statistics ? I
>> >> >>> added statistics_counter_event to keep track of number of calls
>> >> >>> inserted/used but not sure how to print them :/
>> >> >>
>> >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
>> >> > Thanks, I can see it now -;)
>> >> >>
>> >> >>> I would be grateful for suggestions for improving the patch.
>> >> >>
>> >> >> First of all new functions need comments (expand_DIVMOD).
>> >> >>
>> >> >> Second - what's the reasoning for the pass placement seen?  I think
>> >> >> the transform is a lowering one, improving RTL expansion.  The
>> >> >> DIVMOD representation is harmful for GIMPLE optimizers.
>> >> >>
>> >> >> Third - I'd have integrated this with an existing pass - we have 
>> >> >> another
>> >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> >> >> Please piggy-back on it.
>> >> >>
>> >> >> You seem to do the transform unconditionally even if the target has
>> >> >> instructions for division and modulus but no instruction for divmod
>> >> >> in which case you'll end up emitting a library call in the end instead
>> >> >> of a div and mod instruction.  I think the transform should be gated on
>> >> >> missing div/mod instructions or the availability of a divmod one.
>> >> >>
>> >> >> + if (is_gimple_assign (stmt)
>> >> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
>> >> >> tcc_binary)
>> >> >> +   {
>> >> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> >> >> +   cfg_changed |= execute_divmod_1 (stmt);
>> >> >>
>> >> >> you can directly check gimple_assign_rhs_code.  I'd check for 
>> >> >> TRUNC_MOD_EXPR
>> >> >> which seems to be less common and thus should cut down compile-time.
>> >> >>
>> >> >> +  /* Case 2: When both are in top_bb */
>> >> >> +  else if (op1_def_in_bb && op2_def_in_bb)
>> >> >> +{
>> >> >> +  /* FIXME: better way to compare iterators?.  */
>> >> >> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> >> >> def_stmt_op1, def_stmt_op2);
>> >> >>
>> >> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> >> >> this test efficiently
>> >> >> you need stmt UIDs computed.
>> >> >>
>> >> >> I miss a testcase (or two).
>> >> > I have tried to address your comments in the attached patch.
>> >> oops, unsigned uid_no = 0; should be outside the loop in
>> >> pass_optimize_widening_mul::execute.
>> >> Done in this version of the patch.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> > Could you please review it for me ?
>> >
>> > For the testcases you need a new target check for divmod.
>> Sorry I didn't understand.
>> Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?
>
> Yes.
>
>> >
>> >> > I have a few questions:
>> >> >
>> >> > a) Is the check for availability of divmod correct ?
>> >
>> > You simply want optab_handler (tab, mode) != 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-04 Thread Richard Biener
On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote:

> On 2 November 2015 at 18:31, Richard Biener  wrote:
> > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 2 November 2015 at 13:20, Prathamesh Kulkarni
> >>  wrote:
> >> > On 30 October 2015 at 15:57, Richard Biener  
> >> > wrote:
> >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
> >> >>  wrote:
> >> >>> Hi,
> >> >>> I have attached revamped version of Kugan's patch
> >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
> >> >>> Test-case: http://pastebin.com/QMfpXLD9
> >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
> >> >>> Assembly: http://pastebin.com/kk2HZpvA
> >> >>>
> >> >>> The approach I took is similar to sincos pass, which involves two 
> >> >>> parts:
> >> >>> a) divmod pass that transforms:
> >> >>> t1 = a / b;
> >> >>> t2 = a % b;
> >> >>> to the following sequence:
> >> >>> complex_tmp = DIVMOD (a, b);
> >> >>> t1 = REALPART_EXPR(a);
> >> >>> t2 = IMAGPART_EXPR(b);
> >> >>>
> >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
> >> >>> expand_DIVMOD().
> >> >>> I am not sure if I have done this correctly. I was somehow looking to
> >> >>> reuse expand_divmod() but am not able to figure out how to do that
> >> >>> (AFAIU expand_divmod() strictly returns either the quotient or
> >> >>> remainder but never both which is what I want), so ended up with
> >> >>> manually expanding to rtx.
> >> >>>
> >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
> >> >>> understand the following:
> >> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> >> >>>   cfg_changed = true;
> >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
> >> >>
> >> >> The call replacement might no longer throw.
> >> >>
> >> >>> A silly question, what options to pass to gcc to print statistics ? I
> >> >>> added statistics_counter_event to keep track of number of calls
> >> >>> inserted/used but not sure how to print them :/
> >> >>
> >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
> >> > Thanks, I can see it now -;)
> >> >>
> >> >>> I would be grateful for suggestions for improving the patch.
> >> >>
> >> >> First of all new functions need comments (expand_DIVMOD).
> >> >>
> >> >> Second - what's the reasoning for the pass placement seen?  I think
> >> >> the transform is a lowering one, improving RTL expansion.  The
> >> >> DIVMOD representation is harmful for GIMPLE optimizers.
> >> >>
> >> >> Third - I'd have integrated this with an existing pass - we have another
> >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
> >> >> Please piggy-back on it.
> >> >>
> >> >> You seem to do the transform unconditionally even if the target has
> >> >> instructions for division and modulus but no instruction for divmod
> >> >> in which case you'll end up emitting a library call in the end instead
> >> >> of a div and mod instruction.  I think the transform should be gated on
> >> >> missing div/mod instructions or the availability of a divmod one.
> >> >>
> >> >> + if (is_gimple_assign (stmt)
> >> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
> >> >> tcc_binary)
> >> >> +   {
> >> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
> >> >> +   cfg_changed |= execute_divmod_1 (stmt);
> >> >>
> >> >> you can directly check gimple_assign_rhs_code.  I'd check for 
> >> >> TRUNC_MOD_EXPR
> >> >> which seems to be less common and thus should cut down compile-time.
> >> >>
> >> >> +  /* Case 2: When both are in top_bb */
> >> >> +  else if (op1_def_in_bb && op2_def_in_bb)
> >> >> +{
> >> >> +  /* FIXME: better way to compare iterators?.  */
> >> >> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
> >> >> def_stmt_op1, def_stmt_op2);
> >> >>
> >> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
> >> >> this test efficiently
> >> >> you need stmt UIDs computed.
> >> >>
> >> >> I miss a testcase (or two).
> >> > I have tried to address your comments in the attached patch.
> >> oops, unsigned uid_no = 0; should be outside the loop in
> >> pass_optimize_widening_mul::execute.
> >> Done in this version of the patch.
> >>
> >> Thanks,
> >> Prathamesh
> >> > Could you please review it for me ?
> >
> > For the testcases you need a new target check for divmod.
> Sorry I didn't understand.
> Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?

Yes.

> >
> >> > I have a few questions:
> >> >
> >> > a) Is the check for availability of divmod correct ?
> >
> > You simply want optab_handler (tab, mode) != CODE_FOR_nothing
> >
> > The libfunc is always available.
> Um I am probably missing something, I thought libfuncs are initialized
> by init_optabs()
> but the target can override/delete that with the target 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-03 Thread Prathamesh Kulkarni
On 2 November 2015 at 18:31, Richard Biener  wrote:
> On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 2 November 2015 at 13:20, Prathamesh Kulkarni
>>  wrote:
>> > On 30 October 2015 at 15:57, Richard Biener  
>> > wrote:
>> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>> >>  wrote:
>> >>> Hi,
>> >>> I have attached revamped version of Kugan's patch
>> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>> >>> Test-case: http://pastebin.com/QMfpXLD9
>> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
>> >>> Assembly: http://pastebin.com/kk2HZpvA
>> >>>
>> >>> The approach I took is similar to sincos pass, which involves two parts:
>> >>> a) divmod pass that transforms:
>> >>> t1 = a / b;
>> >>> t2 = a % b;
>> >>> to the following sequence:
>> >>> complex_tmp = DIVMOD (a, b);
>> >>> t1 = REALPART_EXPR(a);
>> >>> t2 = IMAGPART_EXPR(b);
>> >>>
>> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>> >>> expand_DIVMOD().
>> >>> I am not sure if I have done this correctly. I was somehow looking to
>> >>> reuse expand_divmod() but am not able to figure out how to do that
>> >>> (AFAIU expand_divmod() strictly returns either the quotient or
>> >>> remainder but never both which is what I want), so ended up with
>> >>> manually expanding to rtx.
>> >>>
>> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>> >>> understand the following:
>> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>> >>>   cfg_changed = true;
>> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>> >>
>> >> The call replacement might no longer throw.
>> >>
>> >>> A silly question, what options to pass to gcc to print statistics ? I
>> >>> added statistics_counter_event to keep track of number of calls
>> >>> inserted/used but not sure how to print them :/
>> >>
>> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
>> > Thanks, I can see it now -;)
>> >>
>> >>> I would be grateful for suggestions for improving the patch.
>> >>
>> >> First of all new functions need comments (expand_DIVMOD).
>> >>
>> >> Second - what's the reasoning for the pass placement seen?  I think
>> >> the transform is a lowering one, improving RTL expansion.  The
>> >> DIVMOD representation is harmful for GIMPLE optimizers.
>> >>
>> >> Third - I'd have integrated this with an existing pass - we have another
>> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> >> Please piggy-back on it.
>> >>
>> >> You seem to do the transform unconditionally even if the target has
>> >> instructions for division and modulus but no instruction for divmod
>> >> in which case you'll end up emitting a library call in the end instead
>> >> of a div and mod instruction.  I think the transform should be gated on
>> >> missing div/mod instructions or the availability of a divmod one.
>> >>
>> >> + if (is_gimple_assign (stmt)
>> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
>> >> tcc_binary)
>> >> +   {
>> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> >> +   cfg_changed |= execute_divmod_1 (stmt);
>> >>
>> >> you can directly check gimple_assign_rhs_code.  I'd check for 
>> >> TRUNC_MOD_EXPR
>> >> which seems to be less common and thus should cut down compile-time.
>> >>
>> >> +  /* Case 2: When both are in top_bb */
>> >> +  else if (op1_def_in_bb && op2_def_in_bb)
>> >> +{
>> >> +  /* FIXME: better way to compare iterators?.  */
>> >> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> >> def_stmt_op1, def_stmt_op2);
>> >>
>> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> >> this test efficiently
>> >> you need stmt UIDs computed.
>> >>
>> >> I miss a testcase (or two).
>> > I have tried to address your comments in the attached patch.
>> oops, unsigned uid_no = 0; should be outside the loop in
>> pass_optimize_widening_mul::execute.
>> Done in this version of the patch.
>>
>> Thanks,
>> Prathamesh
>> > Could you please review it for me ?
>
> For the testcases you need a new target check for divmod.
Sorry I didn't understand.
Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?
>
>> > I have a few questions:
>> >
>> > a) Is the check for availability of divmod correct ?
>
> You simply want optab_handler (tab, mode) != CODE_FOR_nothing
>
> The libfunc is always available.
Um I am probably missing something, I thought libfuncs are initialized
by init_optabs()
but the target can override/delete that with the target hook
targetm.init_libfuncs () ?
On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX.
>
> When looking at TRUNC_DIV_EXPR you should also exclude
> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> expand using the [su]divv optabs (no trapping overflow
> divmod optab 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-02 Thread Richard Biener
On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:

> On 2 November 2015 at 13:20, Prathamesh Kulkarni
>  wrote:
> > On 30 October 2015 at 15:57, Richard Biener  
> > wrote:
> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
> >>  wrote:
> >>> Hi,
> >>> I have attached revamped version of Kugan's patch
> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
> >>> Test-case: http://pastebin.com/QMfpXLD9
> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
> >>> Assembly: http://pastebin.com/kk2HZpvA
> >>>
> >>> The approach I took is similar to sincos pass, which involves two parts:
> >>> a) divmod pass that transforms:
> >>> t1 = a / b;
> >>> t2 = a % b;
> >>> to the following sequence:
> >>> complex_tmp = DIVMOD (a, b);
> >>> t1 = REALPART_EXPR(a);
> >>> t2 = IMAGPART_EXPR(b);
> >>>
> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
> >>> expand_DIVMOD().
> >>> I am not sure if I have done this correctly. I was somehow looking to
> >>> reuse expand_divmod() but am not able to figure out how to do that
> >>> (AFAIU expand_divmod() strictly returns either the quotient or
> >>> remainder but never both which is what I want), so ended up with
> >>> manually expanding to rtx.
> >>>
> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
> >>> understand the following:
> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> >>>   cfg_changed = true;
> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
> >>
> >> The call replacement might no longer throw.
> >>
> >>> A silly question, what options to pass to gcc to print statistics ? I
> >>> added statistics_counter_event to keep track of number of calls
> >>> inserted/used but not sure how to print them :/
> >>
> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
> > Thanks, I can see it now -;)
> >>
> >>> I would be grateful for suggestions for improving the patch.
> >>
> >> First of all new functions need comments (expand_DIVMOD).
> >>
> >> Second - what's the reasoning for the pass placement seen?  I think
> >> the transform is a lowering one, improving RTL expansion.  The
> >> DIVMOD representation is harmful for GIMPLE optimizers.
> >>
> >> Third - I'd have integrated this with an existing pass - we have another
> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
> >> Please piggy-back on it.
> >>
> >> You seem to do the transform unconditionally even if the target has
> >> instructions for division and modulus but no instruction for divmod
> >> in which case you'll end up emitting a library call in the end instead
> >> of a div and mod instruction.  I think the transform should be gated on
> >> missing div/mod instructions or the availability of a divmod one.
> >>
> >> + if (is_gimple_assign (stmt)
> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
> >> tcc_binary)
> >> +   {
> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
> >> +   cfg_changed |= execute_divmod_1 (stmt);
> >>
> >> you can directly check gimple_assign_rhs_code.  I'd check for 
> >> TRUNC_MOD_EXPR
> >> which seems to be less common and thus should cut down compile-time.
> >>
> >> +  /* Case 2: When both are in top_bb */
> >> +  else if (op1_def_in_bb && op2_def_in_bb)
> >> +{
> >> +  /* FIXME: better way to compare iterators?.  */
> >> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
> >> def_stmt_op1, def_stmt_op2);
> >>
> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
> >> this test efficiently
> >> you need stmt UIDs computed.
> >>
> >> I miss a testcase (or two).
> > I have tried to address your comments in the attached patch.
> oops, unsigned uid_no = 0; should be outside the loop in
> pass_optimize_widening_mul::execute.
> Done in this version of the patch.
> 
> Thanks,
> Prathamesh
> > Could you please review it for me ?

For the testcases you need a new target check for divmod.

> > I have a few questions:
> >
> > a) Is the check for availability of divmod correct ?

You simply want optab_handler (tab, mode) != CODE_FOR_nothing

The libfunc is always available.

When looking at TRUNC_DIV_EXPR you should also exclude
the case where TYPE_OVERFLOW_TRAPS (type) as that should
expand using the [su]divv optabs (no trapping overflow
divmod optab exists).

> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == 
> > bb1).
> > However I wonder if we should also check that if bb0 dominates bb1
> > then bb1 should
> > post-dominate bb0 ?

Depends on how expensive divmod is compared to div or mod.  An alternative
transform would be to look whether [us]mod optabs are available and if not
implement modulo via div and subtract, CSEing the division itself.  Not
sure if any target that 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-02 Thread Prathamesh Kulkarni
On 2 November 2015 at 13:20, Prathamesh Kulkarni
 wrote:
> On 30 October 2015 at 15:57, Richard Biener  
> wrote:
>> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>>  wrote:
>>> Hi,
>>> I have attached revamped version of Kugan's patch
>>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>>> Test-case: http://pastebin.com/QMfpXLD9
>>> divmod pass dump: http://pastebin.com/yMY1ikCp
>>> Assembly: http://pastebin.com/kk2HZpvA
>>>
>>> The approach I took is similar to sincos pass, which involves two parts:
>>> a) divmod pass that transforms:
>>> t1 = a / b;
>>> t2 = a % b;
>>> to the following sequence:
>>> complex_tmp = DIVMOD (a, b);
>>> t1 = REALPART_EXPR(a);
>>> t2 = IMAGPART_EXPR(b);
>>>
>>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>>> expand_DIVMOD().
>>> I am not sure if I have done this correctly. I was somehow looking to
>>> reuse expand_divmod() but am not able to figure out how to do that
>>> (AFAIU expand_divmod() strictly returns either the quotient or
>>> remainder but never both which is what I want), so ended up with
>>> manually expanding to rtx.
>>>
>>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>>> understand the following:
>>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>>>   cfg_changed = true;
>>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>>
>> The call replacement might no longer throw.
>>
>>> A silly question, what options to pass to gcc to print statistics ? I
>>> added statistics_counter_event to keep track of number of calls
>>> inserted/used but not sure how to print them :/
>>
>> -fdump-tree-XXX-stats or -fdump-statistics-stats
> Thanks, I can see it now -;)
>>
>>> I would be grateful for suggestions for improving the patch.
>>
>> First of all new functions need comments (expand_DIVMOD).
>>
>> Second - what's the reasoning for the pass placement seen?  I think
>> the transform is a lowering one, improving RTL expansion.  The
>> DIVMOD representation is harmful for GIMPLE optimizers.
>>
>> Third - I'd have integrated this with an existing pass - we have another
>> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> Please piggy-back on it.
>>
>> You seem to do the transform unconditionally even if the target has
>> instructions for division and modulus but no instruction for divmod
>> in which case you'll end up emitting a library call in the end instead
>> of a div and mod instruction.  I think the transform should be gated on
>> missing div/mod instructions or the availability of a divmod one.
>>
>> + if (is_gimple_assign (stmt)
>> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
>> tcc_binary)
>> +   {
>> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> +   cfg_changed |= execute_divmod_1 (stmt);
>>
>> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
>> which seems to be less common and thus should cut down compile-time.
>>
>> +  /* Case 2: When both are in top_bb */
>> +  else if (op1_def_in_bb && op2_def_in_bb)
>> +{
>> +  /* FIXME: better way to compare iterators?.  */
>> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> def_stmt_op1, def_stmt_op2);
>>
>> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> this test efficiently
>> you need stmt UIDs computed.
>>
>> I miss a testcase (or two).
> I have tried to address your comments in the attached patch.
oops, unsigned uid_no = 0; should be outside the loop in
pass_optimize_widening_mul::execute.
Done in this version of the patch.

Thanks,
Prathamesh
> Could you please review it for me ?
>
> I have a few questions:
>
> a) Is the check for availability of divmod correct ?
>
> b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
> I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == 
> bb1).
> However I wonder if we should also check that if bb0 dominates bb1
> then bb1 should
> post-dominate bb0 ?
> For instance the transform gets applied to test-case in pr43721-2.c.
> bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
> I wonder if we want to perform the transform for this case since control flow
> may not always reach from quot = x / y to rem = x % y ?
>
> c) A silly queston, I wonder if vec stmts  in convert_to_divmod() 
> will
> have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
> the vector) ?
> I assume that cse will take place before widening_mul pass executes
> (since pass_fre/pass_fre executes earlier), and there will be no
> duplicate gimple stmts ?
> So vec stmts would contain at most 2 entries - gimple stmt
> with subcode  TRUNC_MOD_EXPR
> and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
> There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
> same 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-11-01 Thread Prathamesh Kulkarni
On 30 October 2015 at 15:57, Richard Biener  wrote:
> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> I have attached revamped version of Kugan's patch
>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>> Test-case: http://pastebin.com/QMfpXLD9
>> divmod pass dump: http://pastebin.com/yMY1ikCp
>> Assembly: http://pastebin.com/kk2HZpvA
>>
>> The approach I took is similar to sincos pass, which involves two parts:
>> a) divmod pass that transforms:
>> t1 = a / b;
>> t2 = a % b;
>> to the following sequence:
>> complex_tmp = DIVMOD (a, b);
>> t1 = REALPART_EXPR(a);
>> t2 = IMAGPART_EXPR(b);
>>
>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>> expand_DIVMOD().
>> I am not sure if I have done this correctly. I was somehow looking to
>> reuse expand_divmod() but am not able to figure out how to do that
>> (AFAIU expand_divmod() strictly returns either the quotient or
>> remainder but never both which is what I want), so ended up with
>> manually expanding to rtx.
>>
>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>> understand the following:
>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>>   cfg_changed = true;
>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>
> The call replacement might no longer throw.
>
>> A silly question, what options to pass to gcc to print statistics ? I
>> added statistics_counter_event to keep track of number of calls
>> inserted/used but not sure how to print them :/
>
> -fdump-tree-XXX-stats or -fdump-statistics-stats
Thanks, I can see it now -;)
>
>> I would be grateful for suggestions for improving the patch.
>
> First of all new functions need comments (expand_DIVMOD).
>
> Second - what's the reasoning for the pass placement seen?  I think
> the transform is a lowering one, improving RTL expansion.  The
> DIVMOD representation is harmful for GIMPLE optimizers.
>
> Third - I'd have integrated this with an existing pass - we have another
> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
> Please piggy-back on it.
>
> You seem to do the transform unconditionally even if the target has
> instructions for division and modulus but no instruction for divmod
> in which case you'll end up emitting a library call in the end instead
> of a div and mod instruction.  I think the transform should be gated on
> missing div/mod instructions or the availability of a divmod one.
>
> + if (is_gimple_assign (stmt)
> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
> tcc_binary)
> +   {
> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
> +   cfg_changed |= execute_divmod_1 (stmt);
>
> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
> which seems to be less common and thus should cut down compile-time.
>
> +  /* Case 2: When both are in top_bb */
> +  else if (op1_def_in_bb && op2_def_in_bb)
> +{
> +  /* FIXME: better way to compare iterators?.  */
> +  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
> def_stmt_op1, def_stmt_op2);
>
> You can't use get_later_stmt, it's a vectorizer speciality.  To do
> this test efficiently
> you need stmt UIDs computed.
>
> I miss a testcase (or two).
I have tried to address your comments in the attached patch.
Could you please review it for me ?

I have a few questions:

a) Is the check for availability of divmod correct ?

b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == bb1).
However I wonder if we should also check that if bb0 dominates bb1
then bb1 should
post-dominate bb0 ?
For instance the transform gets applied to test-case in pr43721-2.c.
bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
I wonder if we want to perform the transform for this case since control flow
may not always reach from quot = x / y to rem = x % y ?

c) A silly queston, I wonder if vec stmts  in convert_to_divmod() will
have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
the vector) ?
I assume that cse will take place before widening_mul pass executes
(since pass_fre/pass_fre executes earlier), and there will be no
duplicate gimple stmts ?
So vec stmts would contain at most 2 entries - gimple stmt
with subcode  TRUNC_MOD_EXPR
and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
same operands ?

d) I am not sure if I have correctly done expansion of divmod to rtx
in expand_DIVMOD() (I fear I may be missing
many checks that are in expmed.c:expand_divmod).

Thanks,
Prathamesh
>
> Richard.
>
>
>> Thank you,
>> Prathamesh
2015-11-02  Prathamesh Kulkarni  
Kugan Vivekanandarajah   

 

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call

2015-10-30 Thread Richard Biener
On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
 wrote:
> Hi,
> I have attached revamped version of Kugan's patch
> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
> Test-case: http://pastebin.com/QMfpXLD9
> divmod pass dump: http://pastebin.com/yMY1ikCp
> Assembly: http://pastebin.com/kk2HZpvA
>
> The approach I took is similar to sincos pass, which involves two parts:
> a) divmod pass that transforms:
> t1 = a / b;
> t2 = a % b;
> to the following sequence:
> complex_tmp = DIVMOD (a, b);
> t1 = REALPART_EXPR(a);
> t2 = IMAGPART_EXPR(b);
>
> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
> expand_DIVMOD().
> I am not sure if I have done this correctly. I was somehow looking to
> reuse expand_divmod() but am not able to figure out how to do that
> (AFAIU expand_divmod() strictly returns either the quotient or
> remainder but never both which is what I want), so ended up with
> manually expanding to rtx.
>
> While going through the sincos pass in execute_cse_sincos_1, I didn't
> understand the following:
>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>   cfg_changed = true;
> Um why is the call to gimple_purge_dead_eh_edges necessary here?

The call replacement might no longer throw.

> A silly question, what options to pass to gcc to print statistics ? I
> added statistics_counter_event to keep track of number of calls
> inserted/used but not sure how to print them :/

-fdump-tree-XXX-stats or -fdump-statistics-stats

> I would be grateful for suggestions for improving the patch.

First of all new functions need comments (expand_DIVMOD).

Second - what's the reasoning for the pass placement seen?  I think
the transform is a lowering one, improving RTL expansion.  The
DIVMOD representation is harmful for GIMPLE optimizers.

Third - I'd have integrated this with an existing pass - we have another
lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
Please piggy-back on it.

You seem to do the transform unconditionally even if the target has
instructions for division and modulus but no instruction for divmod
in which case you'll end up emitting a library call in the end instead
of a div and mod instruction.  I think the transform should be gated on
missing div/mod instructions or the availability of a divmod one.

+ if (is_gimple_assign (stmt)
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_binary)
+   {
+ if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
+   cfg_changed |= execute_divmod_1 (stmt);

you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
which seems to be less common and thus should cut down compile-time.

+  /* Case 2: When both are in top_bb */
+  else if (op1_def_in_bb && op2_def_in_bb)
+{
+  /* FIXME: better way to compare iterators?.  */
+  gimple_stmt_iterator gsi = get_later_stmt (top_bb,
def_stmt_op1, def_stmt_op2);

You can't use get_later_stmt, it's a vectorizer speciality.  To do
this test efficiently
you need stmt UIDs computed.

I miss a testcase (or two).

Richard.


> Thank you,
> Prathamesh