[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #20 from Segher Boessenkool  ---
(In reply to Wilco from comment #19)
> Well insn_cost() uses COSTS_N_INSNS (1) for instructions with unknown (zero)
> costs. That's a reasonable default and gives more accurate cost comparisons,
> eg. 0 + 4 vs 0 + 8 now becomes 4 + 4 vs 4 + 8.

You're talking about the default insn_cost, which uses pattern_cost to do the
work; pattern_cost returns COSTS_N_INSNS (1) if there is a single SET (or a
single SET and a single SET of COMPARE), and set_src_cost of that SET says it
is cost 0.  It does not say cost 0 for a PARALLEL of two SETs, like you have
here.

A target-specific insn_cost can return a saner cost for all insns, including
for those that are a "real" PARALLEL.

(COSTS_N_INSNS (1) isn't minimal of course, it is 4 like you already point
out; it is the default insn cost).

It never is clear that 0+4 is better than 0+8, because the 0's usually are not
the cost of the same insn.

> With those changes I think there will be far fewer cases with unknown costs.

I'm not sure what changes you propose.

> There will be cases where target insn_cost needs to be improved to more
> accurately model complex instructions, but that will result in even better
> code.

The arm port does not _have_ the insn_cost hook implemented; doing that will
probably help, sure.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #19 from Wilco  ---
(In reply to Segher Boessenkool from comment #16)
> (In reply to Wilco from comment #14)
> > Note there is also an issue with costs, if the cost is zero then it seems to
> > behave like infinite cost.
> 
> 0 means unknown cost.  Any known cost is treated as at least as good as
> unknown cost.
> 
> > It would be better to properly cost an
> > instruction sequence given there is a new interface for this now.
> 
> Yup.
> 
> > If the
> > cost is still zero, it could get the default cost rather than being treated
> > like infinite cost, resulting in non-optimal replacements.
> 
> Treating unknown cost as minimal cost is still non-optimal.  It might work
> better for your case of course.  But it means for example that combine will
> modify parallels less often.

Well insn_cost() uses COSTS_N_INSNS (1) for instructions with unknown (zero)
costs. That's a reasonable default and gives more accurate cost comparisons,
eg. 0 + 4 vs 0 + 8 now becomes 4 + 4 vs 4 + 8.

With insn_cost() the ldm's get cost 8+8, so the comparison is 4 + 16 vs 8 + 8,
and thus the substitution will happen (though it's not clear it's useful for
the selected CPU).

With those changes I think there will be far fewer cases with unknown costs.
There will be cases where target insn_cost needs to be improved to more
accurately model complex instructions, but that will result in even better
code.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #18 from Segher Boessenkool  ---
I have a patch.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #17 from Segher Boessenkool  ---
Please do the combine dumps with details enabled; these are pretty useless.
(-fdump-rtl-combine-all)

A C testcase would be very helpful, too (or some magic configure command
to run on some cfarm machine).

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #16 from Segher Boessenkool  ---
(In reply to Wilco from comment #14)
> Note there is also an issue with costs, if the cost is zero then it seems to
> behave like infinite cost.

0 means unknown cost.  Any known cost is treated as at least as good as
unknown cost.

> It would be better to properly cost an
> instruction sequence given there is a new interface for this now.

Yup.

> If the
> cost is still zero, it could get the default cost rather than being treated
> like infinite cost, resulting in non-optimal replacements.

Treating unknown cost as minimal cost is still non-optimal.  It might work
better for your case of course.  But it means for example that combine will
modify parallels less often.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #15 from Segher Boessenkool  ---
(In reply to Wilco from comment #13)
> It seems the safest way
> to split an instruction is to place the new instructions next to each other.

combine can only place new insns at i2 and i3, in either order.  There is
code to figure out which way (if any) works; apparently it doesn't work here.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #14 from Wilco  ---
Note there is also an issue with costs, if the cost is zero then it seems to
behave like infinite cost. It would be better to properly cost an instruction
sequence given there is a new interface for this now. If the cost is still
zero, it could get the default cost rather than being treated like infinite
cost, resulting in non-optimal replacements.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

Wilco  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-08-20
 CC||wilco at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #13 from Wilco  ---
(In reply to nsz from comment #12)
> the wrong string seems to be caused by a missing ldm

Yes what happens is that it decides to combine instruction 150 and 190 and
split an ldm2:

insn_cost 4 for   150: r197:SI=sfp:SI-0x24
...
insn_cost 0 for   190: {r0:SI=[r197:SI];r1:SI=[r197:SI+0x4];}


allowing combination of insns 150 and 190
original costs 4 + 0 = 0
replacement costs 8 + 8 = 16
modifying insn i2   150: r0:SI=[sfp:SI-0x24]
deferring rescan insn with uid = 150.
modifying insn i3   190: r1:SI=[sfp:SI-0x20]
deferring rescan insn with uid = 190.

There are lots of instructions between 150 and 190, including another ldm2
which writes r0:

150: r0:SI=[sfp:SI-0x24]
...
163: {r0:SI=[r206:SI];r1:SI=[r206:SI+0x4];}
...
190: r1:SI=[sfp:SI-0x20]

The combination ends up corrupting r0 because one of the loads is lifted across
many instructions without checking liveness. It seems the safest way to split
an instruction is to place the new instructions next to each other.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #12 from nsz at gcc dot gnu.org ---
the wrong string seems to be caused by a missing ldm

good main:
...
mov r4, #0
str r4, [sp, #32]
mov r2, #2
str r2, [sp, #36]
add r2, sp, #40
str r2, [sp, #4]
str r4, [sp, #8]
ldm ip, {r0, r1}/// load r0 (content is 'hell')
str r0, [sp, #40]   /// store the right r0
strhr1, [sp, #44]   @ movhi
ldr r2, [sp, #66]   @ unaligned
str r2, [sp, #46]   @ unaligned
ldrhr2, [sp, #70]   @ unaligned
strhr2, [sp, #50]   @ unaligned
add r2, sp, #72
ldm r2, {r0, r1}
str r0, [sp, #52]
strhr1, [sp, #56]   @ movhi
mov r1, r3
add r0, sp, #4
bl  option_stopwatch_a.5061

bad main:
...
mov r4, #0
str r4, [sp, #32]
mov r2, #2
str r2, [sp, #36]
add r2, sp, #40
str r2, [sp, #4]
str r4, [sp, #8]
ldr r1, [sp, #64]
str r0, [sp, #40]   /// store a bad r0 (content is 'godd')
strhr1, [sp, #44]   @ movhi
ldr r2, [sp, #66]   @ unaligned
str r2, [sp, #46]   @ unaligned
ldrhr2, [sp, #70]   @ unaligned
strhr2, [sp, #50]   @ unaligned
ldr r1, [sp, #76]
str r0, [sp, #52]
strhr1, [sp, #56]   @ movhi
mov r1, r3
add r0, sp, #4
bl  option_stopwatch_a.5061

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #11 from nsz at gcc dot gnu.org ---
Created attachment 44562
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44562=edit
bad rtl dump of combine pass

attached good and bad dump of combine, i also tried to look at the array
contents, it seems to go from (good):

 hello hola! goddag

to (bad):

 goddo hola! goddag

in the last option_stopwatch_a call in main (even though the array 'a' seems to
be 'hello' before the call).

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-20 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #10 from nsz at gcc dot gnu.org ---
Created attachment 44561
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44561=edit
good rtl dump of combine pass

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-13 Thread andrey.y.guskov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #9 from Andrey Guskov  ---
OK.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-13 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #8 from Segher Boessenkool  ---
So is it worse code, better code, is the testcase broken / suboptimal?

The haswell problem seems to be completely unrelated, so open a separate PR
please.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-13 Thread andrey.y.guskov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

Andrey Guskov  changed:

   What|Removed |Added

 CC||andrey.y.guskov at intel dot 
com

--- Comment #7 from Andrey Guskov  ---
On Haswell, r263067 is also responsible for that:

spawn -ignore SIGHUP /work/gcc/xgcc -B/work/gcc/
/source/gcc/testsuite/gcc.target/i386/avx-cvt-2.c -fno-diagnostics-show-caret
-fdiagnostics-color=never -O3 -mavx -mno-avx2 -mtune=generic
-fdump-tree-vect-details -ffat-lto-objects -S -o avx-cvt-2.s
PASS: gcc.target/i386/avx-cvt-2.c (test for excess errors)
PASS: gcc.target/i386/avx-cvt-2.c scan-tree-dump-times vect "vectorized 1 loops
in function" 6
PASS: gcc.target/i386/avx-cvt-2.c scan-assembler
vcvttpd2dq(y[^\n\r]*%xmm|[^\n\r]*xmm[^\n\r]*YMMWORD PTR)
PASS: gcc.target/i386/avx-cvt-2.c scan-assembler vcvtdq2ps[^\n\r]*ymm
FAIL: gcc.target/i386/avx-cvt-2.c scan-assembler
vcvtps2pd[^\n\r]*(%xmm[^\n\r]*%ymm|ymm[^\n\r]*xmm)

Option set:
-with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-shared
--enable-host-shared --enable-clocale=gnu --enable-cloog-backend=isl
--enable-languages=c,c++,fortran,jit,lto -with-arch=haswell --with-cpu=haswell

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #6 from Segher Boessenkool  ---
So, what is happening at all?  What is different during/after combine, etc.?

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #5 from Christophe Lyon  ---
I think in the "ok" version we have:
add ip, sp, #60
...
ldm ip, {r0, r1}
...
add r2, sp, #72
ldm r2, {r0, r1}

in the "ko" version we have:
ldr r1, [sp, #64]
...
ldr r1, [sp, #76]

So in the "ko" version we do not load r0

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #4 from Christophe Lyon  ---
Created attachment 44488
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44488=edit
Good code

This is with r263197 and r263067 (your patch) reverted

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #3 from Christophe Lyon  ---
Created attachment 44487
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44487=edit
Wrong code generated

This is with trunk @r263197

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #2 from Christophe Lyon  ---
gfortran.log contains:
STOP 4
STOP 4
STOP 4
before the execution fails

I'll regenerate the 2 asm files.

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

--- Comment #1 from Segher Boessenkool  ---
Could you trace this down to some bad code generated, at least?

[Bug target/86771] [9 Regression] gfortran.dg/actual_array_constructor_1.f90 fails on arm after combine 2 insns to 2 insns patch

2018-08-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86771

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |9.0
Summary|gfortran.dg/actual_array_co |[9 Regression]
   |nstructor_1.f90 fails on|gfortran.dg/actual_array_co
   |arm after combine 2 insns   |nstructor_1.f90 fails on
   |to 2 insns patch|arm after combine 2 insns
   ||to 2 insns patch