Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Dinar Temirbulatov
Hello,
Here is another version of fix. This time, I added
complete_type_or_else call just before aggregate_value_p. Bootstraped
and tested on x86_64-pc-linux-gnu with no new regressions.  Ok to
apply
to trunk?
 Thanks, Dinar.


On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote:
 On 04/07/2014 03:46 PM, Jason Merrill wrote:

 I guess we need to call complete_type before aggregate_value_p.


 complete_type_or_else, actually.

 Jason


2014-04-18  Dinar Temirbulatov  dtemirbula...@gmail.com

PR c++/57958
* semantics.c (apply_deduced_return_type): Complete non-void type
before estimating whether the type is aggregate.


fix.patch
Description: Binary data


Re: [Ping] [Patch C++] RFC Fix PR57958

2014-04-18 Thread Dinar Temirbulatov
I found typo in the testcase header, fixed. Ok to apply to trunk?
thanks, Dinar.

On Fri, Apr 18, 2014 at 1:13 PM, Dinar Temirbulatov
dtemirbula...@gmail.com wrote:
 Hello,
 Here is another version of fix. This time, I added
 complete_type_or_else call just before aggregate_value_p. Bootstraped
 and tested on x86_64-pc-linux-gnu with no new regressions.  Ok to
 apply
 to trunk?
  Thanks, Dinar.


 On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote:
 On 04/07/2014 03:46 PM, Jason Merrill wrote:

 I guess we need to call complete_type before aggregate_value_p.


 complete_type_or_else, actually.

 Jason


2014-04-18  Dinar Temirbulatov  dtemirbula...@gmail.com

PR c++/57958
* semantics.c (apply_deduced_return_type): Complete non-void type
before estimating whether the type is aggregate.


fix.patch
Description: Binary data


[Ping] [Patch C++] RFC Fix PR57958

2014-04-07 Thread Dinar Temirbulatov
Hi,
I revised the patch from
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00083.html.
This change fixes PR57958 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57958)
For this code:
auto fn = [] (const FooData x) {
return (x);
  };

  {
FooData a;
fn(a);
  }
Current version of trunk generates following gimple:

main(int, char**)::lambda(const FooData) (const struct __lambda0
* const __closure, const struct Foo  x)
{
  struct Foo D.2089;

  FooData::Foo (D.2089, x);
  try
{
  return retval;
}
  finally
{
  FooData::~Foo (D.2089);
  D.2089 = {CLOBBER};
}
}

And after fix applied gimple product looks like this:

main(int, char**)::lambda(const FooData) (const struct __lambda0
* const __closure, const struct Foo  x)
{
  FooData::Foo (retval, x);
  return retval;
}

looking at the code that I think caused the issue:
cp/typeck.c:8600
  /* We can't initialize a register from a AGGR_INIT_EXPR.  */
  else if (! cfun-returns_struct
TREE_CODE (retval) == TARGET_EXPR
TREE_CODE (TREE_OPERAND (retval, 1)) == AGGR_INIT_EXPR)
retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
 TREE_OPERAND (retval, 0));
It is not clear to me the propose of constructing COMPOUND_EXPR here,
comment says: We can't initialize a register from a AGGR_INIT_EXPR.,
but how on the front-end side we could determine what is going to end
up in a register and what is not? The code looks like a hack for some
middle-end issue awhile back?  I hoped to trigger any code generation
problem by removing the code and testing regression testsuite on
x86_64-pc-linux-gnu, but no regression have occured. Attached patch
was tested on x86_64-pc-linux-gnu with no new regressions. Ok to apply
to trunk?
 Thanks in advance, Dinar.


PR57958.patch
Description: Binary data


[Patch C++] PR57958 RFC

2014-04-02 Thread Dinar Temirbulatov
Hi,
Following change fixes gimple production for lambda function, in the
patch I assumed that constructing COMPOUND_EXPR for the return value
of auto type function resoluted to CLASS_TYPE_P is wrong. Tested
x86_64-pc-linux-gnu by applying to trunk with no new regressions.
Thanks, Dinar.


fix1.patch
Description: Binary data


[Ping] [Google] Fix profiledbootstrap failure

2013-08-01 Thread Dinar Temirbulatov
Ping?
Hi,
Here is the patch, Tested by profiledbootstrap. Ok for google gcc-4.8?
thanks, Dinar.


profiledbootstrap-fix1.patch
Description: Binary data


Re: [Google] Fix profiledbootstrap failure

2013-07-31 Thread Dinar Temirbulatov
Hi,
Here is the patch, Tested by profiledbootstrap. Ok for google gcc-4.8?
thanks, Dinar.

On Wed, Jul 31, 2013 at 12:01 AM, Rong Xu x...@google.com wrote:
 Will do.

 The patch was in gcc-4_7 by Dehao.

 r194713 | dehao | 2012-12-24 16:49:06 -0800 (Mon, 24 Dec 2012) | 5 lines

 Fix the profiled bootstrap:

 1. Set the default value of gcov-debug to be 0.
 2. Merge profile summaries from different instrumented binaries.

 On Tue, Jul 30, 2013 at 12:58 PM, Xinliang David Li davi...@google.com 
 wrote:
 Ok. Rong, can you help commit the parameter default setting patch?

 thanks,

 David

 On Tue, Jul 30, 2013 at 12:48 PM, Rong Xu x...@google.com wrote:
 We have seen the issue before. It does fail the profile boostrap as it
 reads a wrong gcda file.
 I thought it had been fixed. (The fix was as David mentioned, setting
 the default value of the parameter to 0).

 -Rong

 On Tue, Jul 30, 2013 at 12:02 PM, Xinliang David Li davi...@google.com 
 wrote:
 I need to understand why this affects profile bootstrap -- is this due
 to file name conflict?

 The fix is wrong -- please do not remove the parameter. If it is a
 problem, a better fix is to change the default parameter value to 0.

 David


 On Tue, Jul 30, 2013 at 11:56 AM, Teresa Johnson tejohn...@google.com 
 wrote:
 cc'ing Rong and David since this came from LIPO support.

 The patch as-is removes the one use of PARAM_GCOV_DEBUG (which is on
 by default) without removing the parameter itself. What is the failure
 mode you see from this code?

 Thanks,
 Teresa

 On Tue, Jul 30, 2013 at 11:50 AM, Dinar Temirbulatov
 di...@kugelworks.com wrote:
 Hello

 This change allows to complete profiledbootstrap on the google gcc-4.8
 branch, tested with make bootstrap with no new regressions.  OK for
 google 4.8?
thanks, Dinar.



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


profiledbootstrap-fix1.patch
Description: Binary data


[Google] Fix profiledbootstrap failure

2013-07-30 Thread Dinar Temirbulatov
Hello

This change allows to complete profiledbootstrap on the google gcc-4.8
branch, tested with make bootstrap with no new regressions.  OK for
google 4.8?
   thanks, Dinar.


profiledbootstrap-fix.patch
Description: Binary data


Re: [Google] Fix profiledbootstrap failure

2013-07-30 Thread Dinar Temirbulatov
I need to understand why this affects profile bootstrap -- is this due
to file name conflict?
Yes, It is simple. During the profiledbootstrap on x86_64 platform for
libiberty the compiler picks an incorrect profile from the current
directory(non-pic version), while compiling pic version, and
profiledbootstrap fails.
Here is log:
if [ x-fpic != x ]; then \
  /home/dinar/bugz/x86_64/build-gcc-4_8-svn-orig/./prev-gcc/xgcc
-B/home/dinar/bugz/x86_64/bisect/build-gcc-4_8-svn-orig/./prev-gcc/
-B/tmp/x86_64-unknown-linux-gnu/bin/
-B/tmp/x86_64-unknown-linux-gnu/bin/
-B/tmp/x86_64-unknown-linux-gnu/lib/ -isystem
/tmp/x86_64-unknown-linux-gnu/include -isystem
/tmp/x86_64-unknown-linux-gnu/sys-include-c -DHAVE_CONFIG_H -g -O2
-fprofile-use  -I. -I../../gcc-4_8-svn-orig/libiberty/../include  -W
-Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic
-fpic ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c -o
pic/dwarfnames.o; \
else true; fi
yes
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control
flow of function ‘get_DW_CFA_name’ does not match its profile data
(counter ‘arcs’) [-Werror=coverage-mismatch]

 ^
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use
-Wno-error=coverage-mismatch to tolerate the mismatch but performance
may drop if the function is hot
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c: In function ‘get_DW_ATE_name’:
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control
flow of function ‘get_DW_ATE_name’ does not match its profile data
(counter ‘arcs’) [-Werror=coverage-mismatch]
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use
-Wno-error=coverage-mismatch to tolerate the mismatch but performance
may drop if the function is hot
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c: In function ‘get_DW_OP_name’:
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control
flow of function ‘get_DW_OP_name’ does not match its profile data
(counter ‘arcs’) [-Werror=coverage-mismatch]
../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use
-Wno-error=coverage-mismatch to tolerate the mismatch but performance
may drop if the function is hot
thanks, Dinar.

On Tue, Jul 30, 2013 at 11:02 PM, Xinliang David Li davi...@google.com wrote:
 I need to understand why this affects profile bootstrap -- is this due
 to file name conflict?

 The fix is wrong -- please do not remove the parameter. If it is a
 problem, a better fix is to change the default parameter value to 0.

 David


 On Tue, Jul 30, 2013 at 11:56 AM, Teresa Johnson tejohn...@google.com wrote:
 cc'ing Rong and David since this came from LIPO support.

 The patch as-is removes the one use of PARAM_GCOV_DEBUG (which is on
 by default) without removing the parameter itself. What is the failure
 mode you see from this code?

 Thanks,
 Teresa

 On Tue, Jul 30, 2013 at 11:50 AM, Dinar Temirbulatov
 di...@kugelworks.com wrote:
 Hello

 This change allows to complete profiledbootstrap on the google gcc-4.8
 branch, tested with make bootstrap with no new regressions.  OK for
 google 4.8?
thanks, Dinar.



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR57268

2013-06-03 Thread Dinar Temirbulatov
 I'd guess you don't want to flush on DEBUG_INSN_Ps, because then you'd flush
 differently between -g and -g0.  So perhaps something like:
yes, If I skipped to flush all DEBUG_INSN_Ps, then dependence lists
are the same under -g0 and -g. I bootstrapped  Jakub's change on
x86_64-linux with no errors.
thanks, Dinar.

On Sat, Jun 1, 2013 at 4:55 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Sat, Jun 01, 2013 at 10:11:24AM +0200, Jakub Jelinek wrote:
 On Sat, Jun 01, 2013 at 08:39:58AM +0400, Dinar Temirbulatov wrote:
  I am investigating the problem.

 I'd guess you don't want to flush on DEBUG_INSN_Ps, because then you'd flush
 differently between -g and -g0.  So perhaps something like:

 Now bootstrapped/regtested on x86_64-linux and i686-linux.  I see you've
 already reverted in the mean time, so ok for trunk this way?

 2013-06-01  Jakub Jelinek  ja...@redhat.com

 PR rtl-optimization/57268
 * sched-deps.c (sched_analyze_2): Don't flush_pedning_lists
 if DEBUG_INSN_P (insn).

 Reapply
 2013-05-31  Dinar Temirbulatov  di...@kugelworks.com

 PR rtl-optimization/57268
 * sched-deps.c (sched_analyze_2): Flush dependence lists if
 the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH.

 --- gcc/sched-deps.c(revision 199576)
 +++ gcc/sched-deps.c(revision 199575)
 @@ -2690,8 +2690,15 @@

 /* Always add these dependencies to pending_reads, since
this insn may be followed by a write.  */
 -if (!deps-readonly)
 -  add_insn_mem_dependence (deps, true, insn, x);
 +   if (!deps-readonly)
 + {
 +   if ((deps-pending_read_list_length
 ++ deps-pending_write_list_length)
 +MAX_PENDING_LIST_LENGTH
 +!DEBUG_INSN_P (insn))
 + flush_pending_lists (deps, insn, true, true);
 +   add_insn_mem_dependence (deps, true, insn, x);
 + }

 sched_analyze_2 (deps, XEXP (x, 0), insn);


 Jakub


Updated MAINTAINERS

2013-05-31 Thread Dinar Temirbulatov
Hi,
I updated MAINTAINERS list with following change(patch attached).
  thanks, Dinar.


MAINTAINERS.patch
Description: Binary data


Re: Fix PR57268

2013-05-31 Thread Dinar Temirbulatov
Hi,
oh, This is my mistake I should have bootstrap the compiler. I am
investigating the problem.
  thanks, Dinar.

On Sat, Jun 1, 2013 at 7:50 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, May 29, 2013 at 9:36 AM, Dinar Temirbulatov
 di...@kugelworks.com wrote:
 Here is the corrected version of change. Also, I think, I need
 write-after-approval access to commit the change.
 thanks, Dinar,

 PR rtl-optimization/57268
 * sched-deps.c (sched_analyze_2): Flush dependence lists if
 the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH.

 On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote:
 On 05/29/2013 06:52 AM, Steven Bosscher wrote:

 Hello,

 On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote:

 Hi,
 I noticed that the scheduler created long dependence list about ~9000
 elements long and slowed compilation time for about an hour. Attached
 patch flushes the dependence list is case of it is longer than
 MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu
 with c and c++ enabled. Ok for trunk?
 thanks, Dinar.

 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com

  PR rtl-optimization/57268
  * sched-deps.c (sched_analyze_2): Flush dependence list
 then it is longer than MAX_PENDING_LIST_LENGTH.



  * sched-deps.c (sched_analyze_2): Flush dependence lists if
  the sum of the read and write lists exceeds
 MAX_PENDING_LIST_LENGTH.



   if (!deps-readonly)
 -  add_insn_mem_dependence (deps, true, insn, x);
 +  {
 +   if ((deps-pending_read_list_length +
 deps-pending_write_list_length)
 +MAX_PENDING_LIST_LENGTH)
 + flush_pending_lists (deps, insn, true, true);
 +add_insn_mem_dependence (deps, true, insn, x);
 +}


 The flush_pending_lists, add_insn_mem_dependence and } lines are
 not indented correctly. The if (...+...) line is too long (max. 80
 characters per line). The GCC style would be

  if (!deps-readonly)
{
  if ((deps-pending_read_list_length
   + deps-pending_write_list_length)
   MAX_PENDING_LIST_LENGTH)
flush_pending_lists (deps, insn, true, true);
  add_insn_mem_dependence (deps, true, insn, x);
}

 (The aesthetics of GCC code style is a matter for debate, but not here
 and now ;-)

 And just to be clear, with Steven's suggested changes, this patch is OK.

 jeff

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57494

 --
 H.J.


Fix PR57268

2013-05-29 Thread Dinar Temirbulatov
Hi,
I noticed that the scheduler created long dependence list about ~9000
elements long and slowed compilation time for about an hour. Attached
patch flushes the dependence list is case of it is longer than
MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu
with c and c++ enabled. Ok for trunk?
   thanks, Dinar.

2013-05-28 Dinar Temirbulatov di...@kugelworks.com

PR rtl-optimization/57268
* sched-deps.c (sched_analyze_2): Flush dependence list
then it is longer than MAX_PENDING_LIST_LENGTH.


fix.patch
Description: Binary data


Re: Fix PR57268

2013-05-29 Thread Dinar Temirbulatov
* sched-deps.c (sched_analyze_2): Flush dependence lists if
the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH.

On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote:
 On 05/29/2013 06:52 AM, Steven Bosscher wrote:

 Hello,

 On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote:

 Hi,
 I noticed that the scheduler created long dependence list about ~9000
 elements long and slowed compilation time for about an hour. Attached
 patch flushes the dependence list is case of it is longer than
 MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu
 with c and c++ enabled. Ok for trunk?
 thanks, Dinar.

 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com

  PR rtl-optimization/57268
  * sched-deps.c (sched_analyze_2): Flush dependence list
 then it is longer than MAX_PENDING_LIST_LENGTH.



  * sched-deps.c (sched_analyze_2): Flush dependence lists if
  the sum of the read and write lists exceeds
 MAX_PENDING_LIST_LENGTH.



   if (!deps-readonly)
 -  add_insn_mem_dependence (deps, true, insn, x);
 +  {
 +   if ((deps-pending_read_list_length +
 deps-pending_write_list_length)
 +MAX_PENDING_LIST_LENGTH)
 + flush_pending_lists (deps, insn, true, true);
 +add_insn_mem_dependence (deps, true, insn, x);
 +}


 The flush_pending_lists, add_insn_mem_dependence and } lines are
 not indented correctly. The if (...+...) line is too long (max. 80
 characters per line). The GCC style would be

  if (!deps-readonly)
{
  if ((deps-pending_read_list_length
   + deps-pending_write_list_length)
   MAX_PENDING_LIST_LENGTH)
flush_pending_lists (deps, insn, true, true);
  add_insn_mem_dependence (deps, true, insn, x);
}

 (The aesthetics of GCC code style is a matter for debate, but not here
 and now ;-)

 And just to be clear, with Steven's suggested changes, this patch is OK.

 jeff


fix1.patch
Description: Binary data


Re: Fix PR57268

2013-05-29 Thread Dinar Temirbulatov
Here is the corrected version of change. Also, I think, I need
write-after-approval access to commit the change.
thanks, Dinar,

PR rtl-optimization/57268
* sched-deps.c (sched_analyze_2): Flush dependence lists if
the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH.

On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote:
 On 05/29/2013 06:52 AM, Steven Bosscher wrote:

 Hello,

 On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote:

 Hi,
 I noticed that the scheduler created long dependence list about ~9000
 elements long and slowed compilation time for about an hour. Attached
 patch flushes the dependence list is case of it is longer than
 MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu
 with c and c++ enabled. Ok for trunk?
 thanks, Dinar.

 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com

  PR rtl-optimization/57268
  * sched-deps.c (sched_analyze_2): Flush dependence list
 then it is longer than MAX_PENDING_LIST_LENGTH.



  * sched-deps.c (sched_analyze_2): Flush dependence lists if
  the sum of the read and write lists exceeds
 MAX_PENDING_LIST_LENGTH.



   if (!deps-readonly)
 -  add_insn_mem_dependence (deps, true, insn, x);
 +  {
 +   if ((deps-pending_read_list_length +
 deps-pending_write_list_length)
 +MAX_PENDING_LIST_LENGTH)
 + flush_pending_lists (deps, insn, true, true);
 +add_insn_mem_dependence (deps, true, insn, x);
 +}


 The flush_pending_lists, add_insn_mem_dependence and } lines are
 not indented correctly. The if (...+...) line is too long (max. 80
 characters per line). The GCC style would be

  if (!deps-readonly)
{
  if ((deps-pending_read_list_length
   + deps-pending_write_list_length)
   MAX_PENDING_LIST_LENGTH)
flush_pending_lists (deps, insn, true, true);
  add_insn_mem_dependence (deps, true, insn, x);
}

 (The aesthetics of GCC code style is a matter for debate, but not here
 and now ;-)

 And just to be clear, with Steven's suggested changes, this patch is OK.

 jeff


RFC [Patch, arm] arm_reload_out_hi with strh instruction

2012-12-04 Thread Dinar Temirbulatov
Hi,
During investigation of some defect, I found that for HImode reload
pass generated three instruction with QImode but for modern arm
machines that could be done with just one instruction. Attached patch
adds that capability.
  thanks, Dinar.


reload_out_hi.patch
Description: Binary data


[ARM] fix for PR49423

2012-09-25 Thread Dinar Temirbulatov
Hi Ramana,
Here is obvious fix for PR49423, I just added pool range for
arm_zero_extendqisi2, arm_zero_extendqisi2_v6, arm_zero_extendhisi2,
arm_zero_extendhisi2_v6 patterns.
thanks, Dinar.


PR49423.patch
Description: Binary data


Re: divide 64-bit by constant for 32-bit target machines

2012-06-15 Thread Dinar Temirbulatov
Hi, Richard,
How about if I add and utilize umul_highpart_di to the libgcc
instead of expanding
multiplication for the high part directly, or add my own function with
with pre-shift, post-shift, and
64-bit constant and 64-bit operand as function parameters for division
for less than -O3?
 thanks, Dinar.

On Fri, Jun 15, 2012 at 12:12 PM, Richard Earnshaw rearn...@arm.com wrote:
 On 14/06/12 19:46, Dinar Temirbulatov wrote:
 Hi,
 OK for trunk?
                 thanks, Dinar.


 I'm still not comfortable about the code bloat that this is likely to
 incurr at -O2.

 R.

 On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini bonz...@gnu.org wrote:
 Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
 is safe?  That is, that the underflows cannot produce a wrong result?

 [snip]

 Thanks very much!

 Paolo=

 ChangeLog.txt


 2012-06-14  Dinar Temirbulatov  dtemirbula...@gmail.com
          Alexey Kravets  mr.kayr...@gmail.com
          Paolo Bonzini  bonz...@gnu.org

      * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
      double-word division operation.
      * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the 
 integer
      double-word division operation for 32-bit targets.
      * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the 
 higher multipilcation
      product for unsigned double-word integers using 32-bit wide registers.

 30.patch


 N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý




Re: divide 64-bit by constant for 32-bit target machines

2012-06-14 Thread Dinar Temirbulatov
Hi,
OK for trunk?
thanks, Dinar.

On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini bonz...@gnu.org wrote:
 Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
 is safe?  That is, that the underflows cannot produce a wrong result?

 [snip]

 Thanks very much!

 Paolo
2012-06-14  Dinar Temirbulatov  dtemirbula...@gmail.com
Alexey Kravets  mr.kayr...@gmail.com
Paolo Bonzini  bonz...@gnu.org
  
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the 
integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the 
higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.


30.patch
Description: Binary data


Re: divide 64-bit by constant for 32-bit target machines

2012-06-08 Thread Dinar Temirbulatov
Hi, Paolo.
Here is the new version of patch. I have tested this version with gcc
testsuite only on i686 without new regressions, for now. Mips and arm
tests are in progress.

One strange thing I noticed:

 No need for this gen_reg_rtx, either, by passing a NULL_RTX target below.

 +      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, 
 carry_result, 1);
 +
 +      /* Adding 0x1 as carry here if required.  */

 Oops, a remnant of 32-bit specific code.


that I have to add convert_to_mode () to DImode after
emit_store_flag_force (), since emit_store_flag_force () returns
carry in SImode and without convert_to_mode () call compiler fails
with this error:

Breakpoint 2, simplify_subreg (outermode=SImode, op=0x756cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
5423  gcc_assert (GET_MODE (op) == innermode
(gdb) bt
#0  simplify_subreg (outermode=SImode, op=0x756cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
#1  0x00aea223 in simplify_gen_subreg (outermode=SImode,
op=0x756cdf20, innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5763
#2  0x00733c99 in operand_subword (op=0x756cdf20,
offset=0, validate_address=1, mode=DImode) at
../../gcc-20120418-1/gcc/emit-rtl.c:1427
#3  0x00733cc6 in operand_subword_force (op=0x756cdf20,
offset=0, mode=DImode) at ../../gcc-20120418-1/gcc/emit-rtl.c:1440
#4  0x00a016b3 in expand_binop (mode=DImode,
binoptab=0x195f580, op0=0x756cdf20, op1=0x7583d670,
target=0x756cdfa0, unsignedp=1, methods=OPTAB_DIRECT)
at ../../gcc-20120418-1/gcc/optabs.c:1779
#5  0x007525af in expand_shift_1 (code=LSHIFT_EXPR,
mode=DImode, shifted=0x756cdf20, amount=0x7583d670,
target=0x0, unsignedp=1)
at ../../gcc-20120418-1/gcc/expmed.c:2273
#6  0x007526b6 in expand_shift (code=LSHIFT_EXPR, mode=DImode,
shifted=0x756cdf20, amount=32, target=0x0, unsignedp=1) at
../../gcc-20120418-1/gcc/expmed.c:2318
#7  0x007563e6 in expand_mult_highpart_optab (mode=DImode,
op0=0x756cdcc0, op1=0x756b1e00, target=0x0, unsignedp=1,
max_cost=188)
at ../../gcc-20120418-1/gcc/expmed.c:3581
#8  0x00756747 in expand_mult_highpart (mode=DImode,
op0=0x756cdcc0, op1=0x756b1e00, target=0x0, unsignedp=1,
max_cost=188)
at ../../gcc-20120418-1/gcc/expmed.c:3654

  thanks, Dinar.


30.patch
Description: Binary data


Re: divide 64-bit by constant for 32-bit target machines

2012-06-07 Thread Dinar Temirbulatov
Hi,
Here is new version of patch based up on Paolo review, again tested on
arm-7l, mips-32r2 (74k), i686 without new regressions.
  thanks, Dinar.

On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini bonz...@gnu.org wrote:
 Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
        /* We have to return

             z2 + ((u0 + u1)  GET_MODE_BITSIZE (word_mode)).

           u0 + u1 are the upper two words of the three-word
           intermediate result and they could have up to
           2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
           We compute the extra bit by checking for carry, and add
           1  GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */

 Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.

  +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, 
  OPTAB_LIB_WIDEN);
  +      if (!tmp)
  +             return 0;
          /* We have to return z2 + (tmp  32).  We need
  +      /* Checking for overflow.  */
 This is not overflow, it's carry (see above).

  +      c = gen_reg_rtx (mode);
  +      c1 = gen_reg_rtx (mode);
  +      cres = gen_reg_rtx (mode);
  +
  +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
  +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
  +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, 
  OPTAB_LIB_WIDEN);
  +      if (!result)
  +           return 0;
  +
  +      ccst = gen_reg_rtx (mode);
  +      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
 This 32 should be GET_MODE_BITSIZE (word_mode).

 Here, too.

 Paolo

2012-06-07  Dinar Temirbulatov  dtemirbula...@gmail.com
Alexey Kravets  mr.kayr...@gmail.com
  
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the 
integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the 
higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.


28.patch
Description: Binary data


Re: divide 64-bit by constant for 32-bit target machines

2012-05-25 Thread Dinar Temirbulatov
Hi,
I have replaced expand_mult to expand_widening_mult and removed
all direct references to DImode, SImode modes in the
expand_mult_highpart_optab funtion. The attached patch was tested on
arm-7l, mips-32r2 (74k), i686 without new regressions. Richard, do you
think it is ready now?
thanks, Dinar.

On Tue, May 22, 2012 at 7:45 PM, Richard Henderson r...@redhat.com wrote:
 On 05/22/12 07:05, Dinar Temirbulatov wrote:
 +  if ((size - 1  BITS_PER_WORD
 +        BITS_PER_WORD == 32  mode == DImode)

 Do note that this patch will not go in with hard-coded
 SImode and DImode references.

 Which, really, you do not even need.

   GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD

 is what you wanted for testing for double-word-ness,
 and word_mode is a good substitute for SImode here.

 +      /* Splitting the 64-bit constant for the higher and the lower parts.  
 */
 +      y0 = gen_rtx_CONST_INT (DImode, dUINT32_MAX);
 +      y1 = gen_rtx_CONST_INT (DImode, d32);

 Use gen_int_mode.

 +      x1 = convert_to_mode (DImode, x1, 1);
 +      x0 = convert_to_mode (DImode, x0, 1);
 +
 +      /* Splitting the 64-bit constant for the higher and the lower parts.  
 */
 +      y0 = gen_rtx_CONST_INT (DImode, dUINT32_MAX);
 +      y1 = gen_rtx_CONST_INT (DImode, d32);
 +
 +      z2 = gen_reg_rtx (DImode);
 +      u0 = gen_reg_rtx (DImode);
 +
 +      /* Unsigned multiplication of the higher multiplier part
 +      and the higher constant part.  */
 +      z2 = expand_mult(DImode, x1, y1, z2, 1);
 +      /* Unsigned multiplication of the lower multiplier part
 +         and the higher constant part.  */
 +      u0 = expand_mult(DImode, x0, y1, u0, 1);

 I'm fairly sure you really want to be using expand_widening_mult
 here, rather than using convert_to_mode first.  While combine may
 be able to re-generate a widening multiply out of this sequence,
 there's no sense making it work too hard.



 r~
2012-05-25  Dinar Temirbulatov  dtemirbula...@gmail.com
Alexey Kravets  mr.kayr...@gmail.com
  
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the 
integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the 
higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.


22.patch
Description: Binary data


Re: divide 64-bit by constant for 32-bit target machines

2012-05-22 Thread Dinar Temirbulatov
Hi,
Here is the new version of the patch
I have fixed two errors in the previous version,
   on mips32 the compiler could not expand division and terminated
with ICE, this change fixed the issue:
  /* Extrating the higher part of the sum */
  tmp = gen_highpart (SImode, tmp);
  tmp = force_reg (SImode, tmp);
+  tmp = force_reg (SImode, tmp);
+  tmp = convert_to_mode (DImode, tmp, 1);

   and another error on i686 and mips32r2: I found that overflow
check in multiplication was not working. For example
0xfe34b4190a392b23/257 produced wrong result. Following change
resolved the issue:
   -emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
   +emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1);
Tested this new version of the patch on -none-linux-gnu with arm-7l,
mips-32r2 (74k), i686 without new regressions.

On Thu, May 3, 2012 at 5:40 PM, Richard Earnshaw rearn...@arm.com wrote:
 On 03/05/12 11:27, Dinar Temirbulatov wrote:


 This clearly needs more work.

 Comments:  Need to end with a period and two spaces.
 Binary Operators: Need to be surrounded with white space.
sorry for this, hope I resolved such issues with the new version.

 As Andrew Haley has already said, some documentation of the algorithm is
 needed.
General documentation for the issue could be found here
gmplib.org/~tege/divcnst-pldi94.pdf.
Multiplication to get higher 128-bit was developed by me and Alexey
Kravets, I attached C version of the algorithm.

 Why is this restricted to BITS_PER_WORD == 32?
I am checking here that we are generating code for 32-bit target with
32-bit wide general propose registers, and with 64-bit wide registers
usually there is an instruction to get the higher 64-bit of 64x64-bit
multiplication cheaply.


 Costs: This code clearly needs to understand the relative cost of doing
 division this way; there is a limit to the amount of inline expansion
 that we should tolerate, particularly at -O2 and it's not clear that
 this will be much better than a library call if we don't have a widening
 multiply operation (as is true for older ARM chips, and presumably some
 other CPUs).  In essence, you need to work out the cost of a divide
 instruction (just as rtx costs for this) and the approximate cost of the
 expanded algorithm.
Added cost calculation.

 Another issue that worries me is the number of live DImode values; on
 machines with few registers this could cause excessive spilling to start
 happening.
The cost calculation suppose to take this into account.

 I also wonder whether it would be beneficial to generate custom
 functions for division by specific constants (using this algorithm) and
 then call those functions rather than the standard lib-call.  On ELF
 systems the functions can marked as hidden and put into common sections
 so that we only end up with one instance of each function in a program.
yes, I think it is a good approach, I could redo my patch with such
intrinsic function implementation with pre-shift, post-shift, and
64-bit constant as function parameters.

 Finally, do you have a copyright assignment with the FSF?  We can't use
 this code without one.
yes, I do have a copyright assignment with the FSF.
Also I am in process of implementing signed integer 64-bit division by constant.

thanks, Dinar.


18.patch
Description: Binary data
unsigned long long mul(unsigned long long a, unsigned long long b)
{
unsigned long long x1=a32;
unsigned long long x0=a0x;
unsigned long long y1=b32;
unsigned long long y0=b0x;
unsigned long long z2, z0, tmp, u0, u1;
unsigned char c=0;
z2=x1*y1;
z0=x0*y0;
u0=x0*y1+(z032);
u1=x1*y0;
tmp = (u0+u1);
c = (tmp  u0) || (tmp  u1);
return z2+(tmp32)+(((unsigned long long)c)32);
}



Re: divide 64-bit by constant for 32-bit target machines

2012-05-03 Thread Dinar Temirbulatov
Hi,
Here is updated version of patch. I added comments describing the algorithm.

 Hi Dinar.  I'm afraid it gives the wrong results for some dividends
  * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
  * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
  * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
  * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
Oh, I have used signed multiplication instead of unsigned and that was
the reason of errors above, fixed that typo.
Tested on arm-7l with no new regressions.
Ok for trunk?

thanks, Dinar.


14.patch
Description: Binary data


divide 64-bit by constant for 32-bit target machines

2012-04-20 Thread Dinar Temirbulatov
Hi,
Here is the patch that adds support for divide 64-bit by constant for
32-bit target machines, this patch was tested on arm-7a with no new
regressions, also I am not sure on how to avoid for example i686
targets since div operation there is fast compared to over targets and
it showed better performance with  libc/sysdeps/wordsize-32/divdi3.c
__udivdi3 vs my implementation on the compiler side, it is not clear
for me by looking at the udiv_cost[speed][SImode] value.
  thanks, Dinar.


11.patch
Description: Binary data


Re: MIPS Fix PR18141

2011-09-23 Thread Dinar Temirbulatov
hi, Richard,
This version of patch showed no regressions on mipsel-unknown-linux-gnu.
 Thanks, Dinar.


On Wed, Sep 21, 2011 at 4:59 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Dinar Temirbulatov dtemirbula...@gmail.com writes:
 @@ -14696,7 +14696,11 @@ mips_avoid_hazard (rtx after, rtx insn,
       clobber hi and lo.  */
    if (*hilo_delay  2  reg_set_p (lo_reg, pattern))
      nops = 2 - *hilo_delay;
 -  else if (*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
 +  else if ((*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
 +          !((GET_MODE (*delayed_reg) == DFmode  set_after != 0
 +             (set = single_set (insn)) != NULL_RTX  GET_MODE 
 (SET_DEST(set)) == DFmode
 +             XINT((XEXP (pattern, 1)), 1) == UNSPEC_LOAD_HIGH
 +             XINT((XEXP (*set_after, 1)), 1) == UNSPEC_LOAD_LOW)))

 This isn't safe because the patterns might not be UNSPECs (so XINT (...)
 would be meaningless).  It's better to check the insn code instead.
 Something like:

  else if (*delayed_reg != 0
            reg_referenced_p (*delayed_reg, pattern)
            !(recog_memoized (insn) == CODE_FOR_load_highdf
                 recog_memoized (*set_after) == CODE_FOR_load_lowdf))

 (untested).  Note that *set_after should always be nonnull if
 *delayed_reg is.

 Looks good otherwise.

 Richard

diff -ruNp gcc-20110912-orig/gcc/config/mips/mips.c gcc-20110912-fixed/gcc/config/mips/mips.c
--- gcc-20110912-orig/gcc/config/mips/mips.c	2011-09-12 17:22:27.576457121 +0400
+++ gcc-20110912-fixed/gcc/config/mips/mips.c	2011-09-23 14:24:45.379778834 +0400
@@ -14659,20 +14659,20 @@ mips_orphaned_high_part_p (htab_t htab,
INSN and a previous instruction, avoid it by inserting nops after
instruction AFTER.
 
-   *DELAYED_REG and *HILO_DELAY describe the hazards that apply at
-   this point.  If *DELAYED_REG is non-null, INSN must wait a cycle
-   before using the value of that register.  *HILO_DELAY counts the
-   number of instructions since the last hilo hazard (that is,
-   the number of instructions since the last MFLO or MFHI).
+   *DELAYED_REG, *SET_AFTER and *HILO_DELAY describe the hazards that 
+   apply at this point.  If *DELAYED_REG and *SET_AFTER is non-null, 
+   INSN must wait a cycle before using the value of that register.  
+   *HILO_DELAY counts the number of instructions since the last hilo hazard 
+   (that is, the number of instructions since the last MFLO or MFHI).
 
-   After inserting nops for INSN, update *DELAYED_REG and *HILO_DELAY
-   for the next instruction.
+   After inserting nops for INSN, update *DELAYED_REG, *SET_AFTER
+   and *HILO_DELAY for the next instruction.
 
LO_REG is an rtx for the LO register, used in dependence checking.  */
 
 static void
 mips_avoid_hazard (rtx after, rtx insn, int *hilo_delay,
-		   rtx *delayed_reg, rtx lo_reg)
+		   rtx *delayed_reg, rtx lo_reg, rtx *set_after)
 {
   rtx pattern, set;
   int nops, ninsns;
@@ -14696,7 +14696,9 @@ mips_avoid_hazard (rtx after, rtx insn,
  clobber hi and lo.  */
   if (*hilo_delay  2  reg_set_p (lo_reg, pattern))
 nops = 2 - *hilo_delay;
-  else if (*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
+  else if ((*delayed_reg != 0  set_after != 0  reg_referenced_p (*delayed_reg, pattern))
+	 !(recog_memoized (insn) == CODE_FOR_load_highdf
+	 recog_memoized (*set_after) == CODE_FOR_load_lowdf))
 nops = 1;
   else
 nops = 0;
@@ -14710,6 +14712,7 @@ mips_avoid_hazard (rtx after, rtx insn,
   /* Set up the state for the next instruction.  */
   *hilo_delay += ninsns;
   *delayed_reg = 0;
+  *set_after = 0;
   if (INSN_CODE (insn) = 0)
 switch (get_attr_hazard (insn))
   {
@@ -14724,6 +14727,7 @@ mips_avoid_hazard (rtx after, rtx insn,
 	set = single_set (insn);
 	gcc_assert (set);
 	*delayed_reg = SET_DEST (set);
+	*set_after = insn;
 	break;
   }
 }
@@ -14736,7 +14740,7 @@ mips_avoid_hazard (rtx after, rtx insn,
 static void
 mips_reorg_process_insns (void)
 {
-  rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg;
+  rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg, set_after;
   int hilo_delay;
   htab_t htab;
 
@@ -14811,7 +14815,7 @@ mips_reorg_process_insns (void)
 			INSN_CODE (subinsn) = CODE_FOR_nop;
 		  }
 		mips_avoid_hazard (last_insn, subinsn, hilo_delay,
-   delayed_reg, lo_reg);
+   delayed_reg, lo_reg, set_after);
 		  }
 	  last_insn = insn;
 	}
@@ -14832,7 +14836,7 @@ mips_reorg_process_insns (void)
 	  else
 		{
 		  mips_avoid_hazard (last_insn, insn, hilo_delay,
- delayed_reg, lo_reg);
+ delayed_reg, lo_reg, set_after);
 		  last_insn = insn;
 		}
 	}


ChangeLog
Description: Binary data


Re: MIPS Fix PR18141

2011-09-23 Thread Dinar Temirbulatov
Hi,
I found typo in the patch instead of checking *set_after != 0 it was
set_after != 0,  here is corrected version of patch. I retested the
patch without typo on mipsel-unknown-linux-gnu with no new
regressions.
  thanks, Dinar.


On Fri, Sep 23, 2011 at 3:11 PM, Dinar Temirbulatov
dtemirbula...@gmail.com wrote:
 hi, Richard,
 This version of patch showed no regressions on mipsel-unknown-linux-gnu.
                         Thanks, Dinar.


 On Wed, Sep 21, 2011 at 4:59 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Dinar Temirbulatov dtemirbula...@gmail.com writes:
 @@ -14696,7 +14696,11 @@ mips_avoid_hazard (rtx after, rtx insn,
       clobber hi and lo.  */
    if (*hilo_delay  2  reg_set_p (lo_reg, pattern))
      nops = 2 - *hilo_delay;
 -  else if (*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
 +  else if ((*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
 +          !((GET_MODE (*delayed_reg) == DFmode  set_after != 0
 +             (set = single_set (insn)) != NULL_RTX  GET_MODE 
 (SET_DEST(set)) == DFmode
 +             XINT((XEXP (pattern, 1)), 1) == UNSPEC_LOAD_HIGH
 +             XINT((XEXP (*set_after, 1)), 1) == UNSPEC_LOAD_LOW)))

 This isn't safe because the patterns might not be UNSPECs (so XINT (...)
 would be meaningless).  It's better to check the insn code instead.
 Something like:

  else if (*delayed_reg != 0
            reg_referenced_p (*delayed_reg, pattern)
            !(recog_memoized (insn) == CODE_FOR_load_highdf
                 recog_memoized (*set_after) == CODE_FOR_load_lowdf))

 (untested).  Note that *set_after should always be nonnull if
 *delayed_reg is.

 Looks good otherwise.

 Richard


diff -ruNp gcc-20110912-orig/gcc/config/mips/mips.c gcc-20110912-fixed/gcc/config/mips/mips.c
--- gcc-20110912-orig/gcc/config/mips/mips.c	2011-09-12 17:22:27.576457121 +0400
+++ gcc-20110912-fixed/gcc/config/mips/mips.c	2011-09-23 18:59:33.329771992 +0400
@@ -14659,20 +14659,20 @@ mips_orphaned_high_part_p (htab_t htab,
INSN and a previous instruction, avoid it by inserting nops after
instruction AFTER.
 
-   *DELAYED_REG and *HILO_DELAY describe the hazards that apply at
-   this point.  If *DELAYED_REG is non-null, INSN must wait a cycle
-   before using the value of that register.  *HILO_DELAY counts the
-   number of instructions since the last hilo hazard (that is,
-   the number of instructions since the last MFLO or MFHI).
+   *DELAYED_REG, *SET_AFTER and *HILO_DELAY describe the hazards that 
+   apply at this point.  If *DELAYED_REG and *SET_AFTER is non-null, 
+   INSN must wait a cycle before using the value of that register.  
+   *HILO_DELAY counts the number of instructions since the last hilo hazard 
+   (that is, the number of instructions since the last MFLO or MFHI).
 
-   After inserting nops for INSN, update *DELAYED_REG and *HILO_DELAY
-   for the next instruction.
+   After inserting nops for INSN, update *DELAYED_REG, *SET_AFTER
+   and *HILO_DELAY for the next instruction.
 
LO_REG is an rtx for the LO register, used in dependence checking.  */
 
 static void
 mips_avoid_hazard (rtx after, rtx insn, int *hilo_delay,
-		   rtx *delayed_reg, rtx lo_reg)
+		   rtx *delayed_reg, rtx lo_reg, rtx *set_after)
 {
   rtx pattern, set;
   int nops, ninsns;
@@ -14696,7 +14696,9 @@ mips_avoid_hazard (rtx after, rtx insn,
  clobber hi and lo.  */
   if (*hilo_delay  2  reg_set_p (lo_reg, pattern))
 nops = 2 - *hilo_delay;
-  else if (*delayed_reg != 0  reg_referenced_p (*delayed_reg, pattern))
+  else if ((*delayed_reg != 0  *set_after != 0  reg_referenced_p (*delayed_reg, pattern))
+	 !(recog_memoized (insn) == CODE_FOR_load_highdf
+	 recog_memoized (*set_after) == CODE_FOR_load_lowdf))
 nops = 1;
   else
 nops = 0;
@@ -14710,6 +14712,7 @@ mips_avoid_hazard (rtx after, rtx insn,
   /* Set up the state for the next instruction.  */
   *hilo_delay += ninsns;
   *delayed_reg = 0;
+  *set_after = 0;
   if (INSN_CODE (insn) = 0)
 switch (get_attr_hazard (insn))
   {
@@ -14724,6 +14727,7 @@ mips_avoid_hazard (rtx after, rtx insn,
 	set = single_set (insn);
 	gcc_assert (set);
 	*delayed_reg = SET_DEST (set);
+	*set_after = insn;
 	break;
   }
 }
@@ -14736,7 +14740,7 @@ mips_avoid_hazard (rtx after, rtx insn,
 static void
 mips_reorg_process_insns (void)
 {
-  rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg;
+  rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg, set_after;
   int hilo_delay;
   htab_t htab;
 
@@ -14811,7 +14815,7 @@ mips_reorg_process_insns (void)
 			INSN_CODE (subinsn) = CODE_FOR_nop;
 		  }
 		mips_avoid_hazard (last_insn, subinsn, hilo_delay,
-   delayed_reg, lo_reg);
+   delayed_reg, lo_reg, set_after);
 		  }
 	  last_insn = insn;
 	}
@@ -14832,7 +14836,7 @@ mips_reorg_process_insns (void)
 	  else
 		{
 		  mips_avoid_hazard (last_insn, insn, hilo_delay,
- delayed_reg