[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2019-10-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

Wilco  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from Wilco  ---
Fixed by r276887 and other recent multiply improvements. The only CPU that can
still trigger stack pushes is cortex-a8 with -O2/O3 both for Arm and Thumb:

mov ip, r0
mul r3, ip, r3
mla r1, r2, r1, r3
push{lr}
umull   r0, lr, r0, r2
add r1, r1, lr
ldr pc, [sp], #4

All other CPUs and optimization options generate the optimal:

mulsr3, r0, r3
mla r1, r2, r1, r3
umull   r0, r2, r0, r2
add r1, r1, r2
bx  lr

So I consider this fixed.

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2019-09-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

--- Comment #20 from Wilco  ---
(In reply to Wilco from comment #19)
> (In reply to Christophe Lyon from comment #18)
> > This is still wrong with current trunk.
> 
> I don't see it happening since expansion of DImode instructions improved.
> The only case that uses an extra register is -mcpu=cortex-a9/-mcpu=cortex-a5
> with -O2 -mthumb:
> 
>   mul r3, r0, r3
>   push{r4}
>   mov r4, r1
>   umull   r0, r1, r0, r2
>   mla r2, r2, r4, r3
>   ldr r4, [sp], #4
>   add r1, r1, r2
>   bx  lr
> 
> I don't think we should expect perfect register allocation in severely
> constrained cases like this - scheduling can increase register pressure.

Interestingly this will be fixed by
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00576.html:

mul r3, r0, r3
mov ip, r1
umull   r0, r1, r0, r2
mla ip, r2, ip, r3
add r1, r1, ip
bx  lr

With r12 as an extra temporary r4 no longer needs to be saved/restored.

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2019-07-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #19 from Wilco  ---
(In reply to Christophe Lyon from comment #18)
> This is still wrong with current trunk.

I don't see it happening since expansion of DImode instructions improved. The
only case that uses an extra register is -mcpu=cortex-a9/-mcpu=cortex-a5 with
-O2 -mthumb:

mul r3, r0, r3
push{r4}
mov r4, r1
umull   r0, r1, r0, r2
mla r2, r2, r4, r3
ldr r4, [sp], #4
add r1, r1, r2
bx  lr

I don't think we should expect perfect register allocation in severely
constrained cases like this - scheduling can increase register pressure.

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2019-07-09 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #18 from Christophe Lyon  ---
This is still wrong with current trunk.

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2018-01-17 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

--- Comment #17 from ktkachov at gcc dot gnu.org ---
As mentioned in PR, sched1 exposes this problem.

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2018-01-17 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||janis at gcc dot gnu.org

--- Comment #16 from ktkachov at gcc dot gnu.org ---
*** Bug 49678 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2015-03-26 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Version|4.2.1   |5.0

--- Comment #15 from ktkachov at gcc dot gnu.org ---
Updating version as this still affects trunk


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2015-02-12 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #14 from ktkachov at gcc dot gnu.org ---
Vlad, do you have any insight on this? The difference in scheduling is only the
order between a mult and an add but the register allocation looks like the
underlying cause.


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2014-11-17 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #13 from ktkachov at gcc dot gnu.org ---
So I see this regression still, but only for some -mcpu options.
For example for -mcpu=cortex-a15 we get:
mul r3, r0, r3
strdr4, [sp, #-8]!
umull   r4, r5, r0, r2
mla r1, r2, r1, r3
mov r0, r4
add r5, r1, r5
mov r1, r5
ldrdr4, [sp]
add sp, sp, #8

whereas for cortex-a7 we get:
mul r3, r0, r3
mla r3, r2, r1, r3
umull   r0, r1, r0, r2
add r1, r3, r1


I think the problem here is reload.
If I look at the the dump of postreload, for the 'bad' RTL I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r4(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r2(SI) * r1(SI) + r3(SI)
r5(SI) := r1(SI) + r5(SI)
r0(DI) := r4(DI)

whereas for the good one I see:
r0(SI) := r0(SI)
r3(SI) := r0(SI) * r3(SI)
r3(SI) := r2(SI) * r1(SI) + r3(SI)
r0(DI) := r0(SI) * r2(SI) //with sign extension
r1(SI) := r3(SI) + r1(SI)
r0(DI) := r0(DI)

In the good one the final insn is eliminated due to being dead, whereas the in
the bad one the final DImode move is split into two moves.

Sched1 changed the order of the mult and mult-accumulate but it's the register
allocator that causes the bad codegen


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2014-02-13 Thread bernd.edlinger at hotmail dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

Bernd Edlinger bernd.edlinger at hotmail dot de changed:

   What|Removed |Added

 CC||bernd.edlinger at hotmail dot 
de

--- Comment #11 from Bernd Edlinger bernd.edlinger at hotmail dot de ---
The test case fails on current trunk:

longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mulr3, r0, r3
push{r4, r5}
umullr4, r5, r0, r2
mlar1, r2, r1, r3
movr0, r4
addr5, r5, r1
movr1, r5
pop{r4, r5}
bxlr
.sizelongfunc, .-longfunc
.identGCC: (GNU) 4.9.0 20140209 (experimental)


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2014-02-13 Thread bernd.edlinger at hotmail dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

--- Comment #12 from Bernd Edlinger bernd.edlinger at hotmail dot de ---
$ gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/ed/gnu/arm-linux-gnueabihf/libexec/gcc/armv7l-unknown-linux-gnueabihf/4.9.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-4.9-20140209/configure
--prefix=/home/ed/gnu/arm-linux-gnueabihf
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go --with-arch=armv7-a
--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard
Thread model: posix
gcc version 4.9.0 20140209 (experimental) (GCC)


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2013-05-29 Thread ktkachov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 CC||ktkachov at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #10 from ktkachov at gcc dot gnu.org ---
(In reply to jules from comment #9)
 This appears to have regressed on mainline. I now get the following assembly
 output for the test case added by Maxim:
 
 longfunc:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 stmfd   sp!, {r4, r5}
 umull   r4, r5, r0, r2
 mul r3, r0, r3
 mla r1, r2, r1, r3
 mov r0, r4
 add r1, r1, r5
 ldmfd   sp!, {r4, r5}
 bx  lr

Current trunk (r199375) gives, I think this can be closed.

longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mul r3, r0, r3
mla r3, r2, r1, r3
umull   r0, r1, r0, r2
add r1, r3, r1
bx  lr


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2011-09-20 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575

jules at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||jules at gcc dot gnu.org
 Resolution|FIXED   |

--- Comment #9 from jules at gcc dot gnu.org 2011-09-20 19:03:43 UTC ---
This appears to have regressed on mainline. I now get the following assembly
output for the test case added by Maxim:

longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
stmfd   sp!, {r4, r5}
umull   r4, r5, r0, r2
mul r3, r0, r3
mla r1, r2, r1, r3
mov r0, r4
add r1, r1, r5
ldmfd   sp!, {r4, r5}
bx  lr


[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-08-18 Thread mkuvyrkov at gcc dot gnu dot org


--- Comment #7 from mkuvyrkov at gcc dot gnu dot org  2010-08-18 10:34 
---
Subject: Bug 42575

Author: mkuvyrkov
Date: Wed Aug 18 10:34:02 2010
New Revision: 163334

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=163334
Log:
gcc/
PR rtl-optimization/42575
* optabs.c (expand_doubleword_mult): Generate new pseudos to shorten
live ranges.

gcc/testsuite/
PR rtl-optimization/42575
* gcc.target/pr42575.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/arm/pr42575.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-08-18 Thread mkuvyrkov at gcc dot gnu dot org


--- Comment #8 from mkuvyrkov at gcc dot gnu dot org  2010-08-18 10:43 
---
Bernd did all the heavy lifting for this patch.  The above patch fixes the last
piece of the problem -- extra move when compiling for ARMv7-A.


-- 

mkuvyrkov at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-07-29 Thread bernds at gcc dot gnu dot org


--- Comment #6 from bernds at gcc dot gnu dot org  2010-07-29 12:40 ---
Subject: Bug 42575

Author: bernds
Date: Thu Jul 29 12:39:57 2010
New Revision: 162678

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=162678
Log:
PR rtl-optimization/42575
* dce.c (word_dce_process_block): Renamed from byte_dce_process_block.
Argument AU removed.  All callers changed.  Ignore artificial refs.
Use return value of df_word_lr_simulate_defs to decide whether an insn
is necessary.
(fast_dce): Rename arg to WORD_LEVEL.
(run_word_dce): Renamed from rest_of_handle_fast_byte_dce.  No longer
static.
(pass_fast_rtl_byte_dce): Delete.
* dce.h (run_word_dce): Declare.
* df-core.c (df_print_word_regset): Renamed from df_print_byteregset.
All callers changed.  Simplify code to only deal with two-word regs.
* df.h (DF_WORD_LR): Renamed from DF_BYTE_LR.
(DF_WORD_LR_BB_INFO): Renamed from DF_BYTE_LR_BB_INFO.
(DF_WORD_LR_IN): Renamed from DF_BYTE_LR_IN.
(DF_WORD_LR_OUT): Renamed from DF_BYTE_LR_OUT.
(struct df_word_lr_bb_info): Renamed from df_byte_lr_bb_info.
(df_word_lr_mark_ref): Declare.
(df_word_lr_add_problem, df_word_lr_mark_ref, df_word_lr_simulate_defs,
df_word_lr_simulate_uses): Declare or rename from byte variants.
(df_byte_lr_simulate_artificial_refs_at_top,
df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
df_byte_lr_get_regno_len, df_compute_accessed_bytes): Delete
declarations.
(df_word_lr_get_bb_info): Rename from df_byte_lr_get_bb_info.
(enum df_mm): Delete.
* df-byte-scan.c: Delete file.
* df-problems.c (df_word_lr_problem_data): Renamed from
df_byte_lr_problem_data, all members deleted except for
WORD_LR_BITMAPS, which is renamed from BYTE_LR_BITMAPS.  Uses changed.
(df_word_lr_expand_bitmap, df_byte_lr_simulate_artificial_refs_at_top,
df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
df_byte_lr_get_regno_len, df_byte_lr_check_regs,
df_byte_lr_confluence_0): Delete functions.
(df_word_lr_free_bb_info): Renamed from df_byte_lr_free_bb_info; all
callers changed.
(df_word_lr_alloc): Renamed from df_byte_lr_alloc; all callers changed.
Don't initialize members that were deleted, don't try to discover data
about registers.  Ignore hard regs.
(df_word_lr_reset): Renamed from df_byte_lr_reset; all callers changed.
(df_word_lr_mark_ref): New function.
(df_word_lr_bb_local_compute): Renamed from
df_byte_bb_lr_local_compute; all callers changed.  Use
df_word_lr_mark_ref.  Assert that artificial refs don't include
pseudos.  Ignore hard registers.
(df_word_lr_local_compute): Renamed from df_byte_lr_local_compute.
Assert that exit block uses don't contain pseudos.
(df_word_lr_init): Renamed from df_byte_lr_init; all callers changed.
(df_word_lr_confluence_n): Renamed from df_byte_lr_confluence_n; all
callers changed.  Ignore hard regs.
(df_word_lr_transfer_function): Renamed from
df_byte_lr_transfer_function; all callers changed.
(df_word_lr_free): Renamed from df_byte_lr_free; all callers changed.
(df_word_lr_top_dump): Renamed from df_byte_lr_top_dump; all callers
changed.
(df_word_lr_bottom_dump): Renamed from df_byte_lr_bottom_dump; all
callers changed.
(problem_WORD_LR): Renamed from problem_BYTE_LR; uses changed;
confluence operator 0 set to NULL.
(df_word_lr_add_problem): Renamed from df_byte_lr_add_problem; all
callers changed.
(df_word_lr_simulate_defs): Renamed from df_byte_lr_simulate_defs.
Return bool, true if bitmap changed or insn otherwise necessary.
All callers changed.  Simplify using df_word_lr_mark_ref.
(df_word_lr_simulate_uses): Renamed from df_byte_lr_simulate_uses;
all callers changed.  Simplify using df_word_lr_mark_ref.
* lower-subreg.c: Include dce.h
(decompose_multiword_subregs): Call run_word_dce if df available.
* Makefile.in (lower-subreg.o): Adjust dependencies.
(df-byte-scan.o): Delete.
* timevar.def (TV_DF_WORD_LR): Renamed from TV_DF_BYTE_LR.

Removed:
trunk/gcc/df-byte-scan.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/dce.c
trunk/gcc/dce.h
trunk/gcc/df-core.c
trunk/gcc/df-problems.c
trunk/gcc/df.h
trunk/gcc/lower-subreg.c
trunk/gcc/timevar.def


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-02-22 Thread drow at gcc dot gnu dot org


--- Comment #5 from drow at gcc dot gnu dot org  2010-02-22 21:06 ---
(In reply to comment #3)
 * What is the purpose of insn 12 here?  It looks to me like this is dead code,
 since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
 wrong).

I couldn't figure this out either.  Where did it come from - was it so late
that we never DCE'd it, or does something bizarre claim to be dependent on the
value?

 Note how the sched1 pass has switched the two insns around. The register
 allocator now decides to use two new registers here, because r0 and r3 are 
 both
 live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
 available in insn 10 -- but this is too late.
 
 Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
 and 10, when sched2 wants to swap them back again?

I'm guessing, but presumably we want to separate the mul from the mla because
they're dependent; the umull isn't.  But I don't know what would swap them back
again and that's probably the crux.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-02-08 Thread steven at gcc dot gnu dot org


--- Comment #3 from steven at gcc dot gnu dot org  2010-02-08 10:47 ---
Trunk today produces this (with -dAP hacked to print slim RTL):

.file   t.c
.text
.align  2
.global longfunc
.type   longfunc, %function
longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
@ basic block 2
@8 ip:SI=r2:SI*r1:SI
@  REG_DEAD: r1:SI
mul ip, r2, r1  @ 8 *arm_mulsi3/2   [length = 4]
@   35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;}
@  REG_DEAD: r5:SI
@  REG_DEAD: r4:SI
@  REG_FRAME_RELATED_EXPR: sequence
stmfd   sp!, {r4, r5}   @ 35*push_multi [length = 4]
@9 r1:SI=r0:SI*r3:SI+ip:SI
@  REG_DEAD: ip:SI
@  REG_DEAD: r3:SI
@  REG_DEAD: r0:SI
mla r1, r0, r3, ip  @ 9 *mulsi3addsi/2  [length = 4]
@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@  REG_DEAD: r2:SI
umull   r4, r5, r2, r0  @ 10*umulsidi3_nov6 [length = 4]
@   11 r1:SI=r1:SI+r5:SI
@  REG_DEAD: r5:SI
add r1, r1, r5  @ 11*arm_addsi3/1   [length = 4]
@   12 r5:SI=r1:SI
mov r5, r1  @ 12*arm_movsi_insn/1   [length = 4]
@   31 r0:SI=r4:SI
mov r0, r4  @ 31*arm_movsi_insn/1   [length = 4]
@   38 unspec/v{return;}
ldmfd   sp!, {r4, r5}
bx  lr
.size   longfunc, .-longfunc
.ident  GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision
156595]

Questions for those who know ARM:

* What is the purpose of insn 12 here?  It looks to me like this is dead code,
since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
wrong).


* After combine we have these two insns:

9 r138:SI=r142:SI*r3:SI+r139:SI
  REG_DEAD: r3:SI
  REG_DEAD: r139:SI
   10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI)
  REG_DEAD: r144:SI
  REG_DEAD: r142:SI

which translate to the mla insn and to the umull insn that uses r4 and r5:

@   10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@  REG_DEAD: r2:SI
umull   r4, r5, r2, r0  @ 10*umulsidi3_nov6 [length = 4]
@9 r1:SI=r0:SI*r3:SI+ip:SI
@  REG_DEAD: ip:SI
@  REG_DEAD: r3:SI
@  REG_DEAD: r0:SI
mla r1, r0, r3, ip  @ 9 *mulsi3addsi/2  [length = 4]

Note how the sched1 pass has switched the two insns around. The register
allocator now decides to use two new registers here, because r0 and r3 are both
live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
available in insn 10 -- but this is too late.

Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
and 10, when sched2 wants to swap them back again?

(Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in
insn 10, because the sched2 pass fails to update the notes when it moves insn 9
before insn 10. But that's a separate issue...)


* If I compile with -fno-schedule-insns, I still don't get the optimal code:

mul ip, r2, r1
str r4, [sp, #-4]!
mla r1, r0, r3, ip
umull   r3, r4, r2, r0
add r1, r1, r4
mov r4, r1
mov r0, r3
ldmfd   sp!, {r4}
bx  lr

This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that
is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where
r3 comes before r2. That's another thing for a target maintainer to look into.
If IRA would select r2:DI, you would also lose the save/restore of r4 and get
the perfect code of comment #2.


So two issues:
1. Why does the sched1 pass schedule insn 10 before insn 9?
2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-02-08 Thread steven at gcc dot gnu dot org


--- Comment #4 from steven at gcc dot gnu dot org  2010-02-08 10:51 ---
Add an ARM guy to the CC:


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575



[Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness

2010-01-04 Thread ramana at gcc dot gnu dot org


--- Comment #2 from ramana at gcc dot gnu dot org  2010-01-04 10:54 ---
Confirmed with trunk I get 

longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mul r1, r2, r1
mla r1, r0, r3, r1
stmfd   sp!, {r4, r5}
umull   r4, r5, r2, r0
add r1, r1, r5
mov r0, r4
mov r5, r1
ldmfd   sp!, {r4, r5}
bx  lr

r4 and r5 need not be used here  - you could do with just r2 and r3 instead of
r4 and r5 here 

i.e.
mul r1, r2, r1
mla r1, r0, r3, r1
umull   r2, r3, r2, r0
add r1, r1, r3
mov r0, r2
bx  lr


-- 

ramana at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Component|target  |rtl-optimization
 Ever Confirmed|0   |1
   Keywords||missed-optimization, ra
   Last reconfirmed|-00-00 00:00:00 |2010-01-04 10:54:28
   date||
Summary|arm-eabi-gcc 4.2.1 64-bit   |arm-eabi-gcc 64-bit multiply
   |multiply weirdness  |weirdness


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575