[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread wilco.dijkstra at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #49 from Wilco  ---
AArch64 does this:

(define_expand "vec_store_lanesoi"
  [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:OI [(match_operand:OI 1 "register_operand" "w")
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   UNSPEC_ST2))]
  "TARGET_SIMD"
{
  if (BYTES_BIG_ENDIAN)
{
  rtx tmp = gen_reg_rtx (OImode);
  rtx mask = aarch64_reverse_mask (mode, );
  emit_insn (gen_aarch64_rev_reglistoi (tmp, operands[1], mask));
  emit_insn (gen_aarch64_simd_st2 (operands[0], tmp));
}
  else
emit_insn (gen_aarch64_simd_st2 (operands[0], operands[1]));
  DONE;
})

ARM seems to be missing the swap:

(define_expand "vec_store_lanesoi"
  [(set (match_operand:OI 0 "neon_struct_operand")
(unspec:OI [(match_operand:OI 1 "s_register_operand")
(unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   UNSPEC_VST2))]
  "TARGET_NEON")

So clearly looks like a backend issue.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Jakub Jelinek  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #48 from Jakub Jelinek  ---
Can someone familiar with ARM please take this over?  Either this is a backend
bug, or a bug in the vectorizer part specific to only ARM/AArch64 (no other
target has STORE_LANES stuff).  This is a P1 (for now), so it would be nice to
get it resolved soon.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread wilco.dijkstra at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #47 from Wilco  ---

(In reply to Jakub Jelinek from comment #46)
> Wonder if that:
>   vect_array.11[0] = vect_vec_iv_.7_45;
>   vect_array.11[1] = vect__4.8_48;
> on armeb shouldn't have been [1] and [0] instead, otherwise we end up with:
> (insn 35 37 38 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 0)
> (reg:V4SI 110 [ vect_vec_iv_.7 ])) "pr82518.c":8 939 {*neon_movv4si}
>  (nil))
> (insn 38 35 41 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 16)
> (plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ])
> (reg:V4SI 171))) "pr82518.c":8 998 {*addv4si3_neon}
>  (nil))
> (insn 41 38 39 5 (set (reg:V4SI 110 [ vect_vec_iv_.7 ])
> (plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ])
> (reg:V4SI 169))) 998 {*addv4si3_neon}
>  (nil))
> (insn 39 41 43 5 (set (mem:OI (post_inc:SI (reg:SI 152 [ ivtmp.31 ])) [2
> MEM[(int *)vectp_p.9_49]+0 S32 A32])
> (unspec:OI [
> (reg:OI 155 [ vect_array.11 ])
> (unspec:V4SI [
> (const_int 0 [0])
> ] UNSPEC_VSTRUCTDUMMY)
> ] UNSPEC_VST2)) "pr82518.c":8 2396 {neon_vst2v4si}
>  (expr_list:REG_INC (reg:SI 152 [ ivtmp.31 ])
> (nil)))
> where pseudo 110 is the vect_vec_iv_.7_45 ({i, i + 1, i + 2, i + 3}) and
> insn 38 adds {1, 1, 1, 1} to that.  It really depends on what exactly the
> neon_vst2v4si instruction does on armeb.
> vmov.i32q10, #4  @ v4si
> vmov.i32q9, #1  @ v4si
> ...
> vldrd16, .L19
> vldrd17, .L19+8
> .L4:
> vadd.i32q11, q8, q9
> vst1.64 {d16-d17}, [sp:64]
> vadd.i32q8, q8, q10
> vstrd22, [sp, #16]
> vstrd23, [sp, #24]
> vld1.64 {d22-d25}, [sp:64]
> vst2.32 {d22-d25}, [r3]!
> If it works like on armel, except the elements of the vectors are
> byte-swapped, then it should be [1] and [0].

The vst2 works on little endian, but in big-endian the lane numbering is
complex since all data is still treated as 64-bit quantities. 

The stores and vld1.64 have no effect on data layout, so everything is still
64-bit data in 64-bit registers. The vst2.32 can only be used in big-endian if
the data is lane-swapped first. AArch64 in big-endian does this:

.L26:
mov v2.16b, v0.16b
add v3.4s, v0.4s, v6.4s
add v0.4s, v0.4s, v7.4s
tbl v4.16b, {v2.16b}, v1.16b
tbl v5.16b, {v3.16b}, v1.16b
st2 {v4.4s - v5.4s}, [x2], 32
cmp x2, x3
bne .L26

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #46 from Jakub Jelinek  ---
Wonder if that:
  vect_array.11[0] = vect_vec_iv_.7_45;
  vect_array.11[1] = vect__4.8_48;
on armeb shouldn't have been [1] and [0] instead, otherwise we end up with:
(insn 35 37 38 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 0)
(reg:V4SI 110 [ vect_vec_iv_.7 ])) "pr82518.c":8 939 {*neon_movv4si}
 (nil))
(insn 38 35 41 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 16)
(plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ])
(reg:V4SI 171))) "pr82518.c":8 998 {*addv4si3_neon}
 (nil))
(insn 41 38 39 5 (set (reg:V4SI 110 [ vect_vec_iv_.7 ])
(plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ])
(reg:V4SI 169))) 998 {*addv4si3_neon}
 (nil))
(insn 39 41 43 5 (set (mem:OI (post_inc:SI (reg:SI 152 [ ivtmp.31 ])) [2
MEM[(int *)vectp_p.9_49]+0 S32 A32])
(unspec:OI [
(reg:OI 155 [ vect_array.11 ])
(unspec:V4SI [
(const_int 0 [0])
] UNSPEC_VSTRUCTDUMMY)
] UNSPEC_VST2)) "pr82518.c":8 2396 {neon_vst2v4si}
 (expr_list:REG_INC (reg:SI 152 [ ivtmp.31 ])
(nil)))
where pseudo 110 is the vect_vec_iv_.7_45 ({i, i + 1, i + 2, i + 3}) and
insn 38 adds {1, 1, 1, 1} to that.  It really depends on what exactly the
neon_vst2v4si instruction does on armeb.
vmov.i32q10, #4  @ v4si
vmov.i32q9, #1  @ v4si
...
vldrd16, .L19
vldrd17, .L19+8
.L4:
vadd.i32q11, q8, q9
vst1.64 {d16-d17}, [sp:64]
vadd.i32q8, q8, q10
vstrd22, [sp, #16]
vstrd23, [sp, #24]
vld1.64 {d22-d25}, [sp:64]
vst2.32 {d22-d25}, [r3]!
If it works like on armel, except the elements of the vectors are byte-swapped,
then it should be [1] and [0].

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #45 from Jakub Jelinek  ---
Note the vectorized loop is pretty much the same on arm little-endian,
  # vect_vec_iv_.6_33 = PHI <{ 0, 1, 2, 3 }(4), vect_vec_iv_.6_34(5)>
  # ivtmp.12_14 = PHI 
  vectp_p.8_37 = (int[8] *) ivtmp.12_14;
  vect_vec_iv_.6_34 = vect_vec_iv_.6_33 + { 4, 4, 4, 4 };
  vect__4.7_36 = vect_vec_iv_.6_33 + { 1, 1, 1, 1 };
  vect_array.10[0] = vect_vec_iv_.6_33;
  vect_array.10[1] = vect__4.7_36;
  MEM[(int *)vectp_p.8_37] = STORE_LANES (vect_array.10);
  ivtmp.12_23 = ivtmp.12_14 + 32;
  if (ivtmp.12_23 != _54)
goto ; [83.33%]
  else
goto ; [16.67%]
for which we emit:
vmov.i32q12, #4  @ v4si
vmov.i32q9, #1  @ v4si
...
vldrd16, .L13
vldrd17, .L13+8
.L4:
vmovq10, q8  @ v4si
vadd.i32q11, q8, q9
vadd.i32q8, q8, q12
vst2.32 {d20-d23}, [r3]!
cmp r3, r2
bne .L4

vst2.32 seems to be documented to do 32-bit interleaving, so if qN registers
overlap d{2*N} and d{2*N+1} registers, I guess this does the right thing.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #44 from Jakub Jelinek  ---
Maybe -O3 -mcpu=cortex-a9 -mfpu=neon-fp16 -mfloat-abi=hard is needed.
With that I certainly see the #c42 loop vectorized.

On x86_64 we get in *.optimized:
   [local count: 567644349]:
  # vect_vec_iv_.4_33 = PHI <{ 0, 1, 2, 3, 4, 5, 6, 7 }(4),
vect_vec_iv_.4_34(5)>
  # ivtmp.10_14 = PHI 
  vect_vec_iv_.4_34 = vect_vec_iv_.4_33 + { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect__4.5_36 = vect_vec_iv_.4_33 + { 1, 1, 1, 1, 1, 1, 1, 1 };
  vect_inter_high_39 = VEC_PERM_EXPR ;
  vect_inter_low_40 = VEC_PERM_EXPR ;
  _86 = (void *) ivtmp.10_14;
  MEM[base: _86, offset: 0B] = vect_inter_high_39;
  MEM[base: _86, offset: 32B] = vect_inter_low_40;
  ivtmp.10_23 = ivtmp.10_14 + 64;
  if (ivtmp.10_23 != _90)
goto ; [83.33%]
  else
goto ; [16.67%]
which doesn't look optimal either, in this case I'd say better would be to have
two IVs bumped by { 8, ... 8 } in each iteration, one starting with
{ 0, 1, 1, 2, 2, 3, 3, 4 } and another with
{ 4, 5, 5, 6, 6, 7, 7, 8 } or just one and add { 4, ... 4 }; to it for the
second store and avoid both VEC_PERM_EXPRs in that case.

On armeb with the above options I see:
   [local count: 504572758]:
  # vect_vec_iv_.7_45 = PHI <{ 0, 1, 2, 3 }(4), vect_vec_iv_.7_46(5)>
  # ivtmp.31_128 = PHI 
  vectp_p.9_49 = (int[8] *) ivtmp.31_128;
  vect_vec_iv_.7_46 = vect_vec_iv_.7_45 + { 4, 4, 4, 4 };
  vect__4.8_48 = vect_vec_iv_.7_45 + { 1, 1, 1, 1 };
  vect_array.11[0] = vect_vec_iv_.7_45;
  vect_array.11[1] = vect__4.8_48;
  MEM[(int *)vectp_p.9_49] = STORE_LANES (vect_array.11);
  ivtmp.31_129 = ivtmp.31_128 + 32;
  if (ivtmp.31_129 != _133)
goto ; [83.33%]
  else
goto ; [16.67%]
which looks wrong to me (because vect_vec_iv_.7_45 and vect__4.8_48 really
should be interleaved when stored into MEM[(int *)vectp_p.9_49]), but I really
don't know what exactly the STORE_LANES does.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #43 from Richard Biener  ---
(In reply to Wilco from comment #42)
> Cut down example:
> 
> typedef struct { int x, y; } X;
> 
> void f (X *p, int n)
> {
>   for (int i = 0; i < n; i++)
>   { p[i].x = i;
> p[i].y = i + 1;
>   }
> }

Can't reproduce your assembler with -O3 -mcpu=cortex-a9 -mfpu=neon-fp16
[-fno-vect-cost-model] [-mthumb]

Without -fno-vect-cost-model we don't vectorize anything.  With we only
SLP vectorize and that using V1SI vector types (huh).

For the loop case:

t.c:6:3: note: Build SLP for _3->x = i_15;
t.c:6:3: note: Build SLP for _3->y = _4;
t.c:6:3: note: vect_is_simple_use: operand i_15
t.c:6:3: note: def_stmt: i_15 = PHI <0(5), _4(6)>
t.c:6:3: note: type of def: induction
t.c:6:3: note: vect_is_simple_use: operand _4
t.c:6:3: note: def_stmt: _4 = i_15 + 1;
t.c:6:3: note: type of def: internal
t.c:6:3: note: Build SLP failed: different types

ok, known missed handling of SLP induction.

t.c:6:3: note: ==> examining statement: _3->x = i_15;
t.c:6:3: note: vect_is_simple_use: operand i_15
t.c:6:3: note: def_stmt: i_15 = PHI <0(5), _4(6)>
t.c:6:3: note: type of def: induction
t.c:6:3: note: no array mode for DI[2]
permutaion op not supported by target.

so we don't support intereaving either.  Not sure why it talks about DI[2]
instead of SI[2].

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-15 Thread wilco.dijkstra at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #42 from Wilco  ---
Cut down example:

typedef struct { int x, y; } X;

void f (X *p, int n)
{
  for (int i = 0; i < n; i++)
  { p[i].x = i;
p[i].y = i + 1;
  }
}

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-15 Thread wilco.dijkstra at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Wilco  changed:

   What|Removed |Added

 CC||wilco.dijkstra at arm dot com

--- Comment #41 from Wilco  ---
I'm guessing it's this (64-bit loads reading 32-bit data):

vldrd16, .L39
vldrd17, .L39+8
.L5:
vsub.i32q10, q11, q8
add r3, r3, #1
vsub.i32q9, q8, q11


.L39:
.word   1
.word   2
.word   3
.word   4

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-09 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-02-09
 Ever confirmed|0   |1

--- Comment #40 from Aldy Hernandez  ---
*sigh*  Confirmed with:

$ sudo yum install qemu
$ git clone https://git.linaro.org/toolchain/abe.git
$ (cd abe; git checkout stable)
$ mkdir build-cortex-a9
$ cd build-cortex-a9
$ ../abe/configure
$../abe/abe.sh --target armeb-linux-gnueabihf --set languages=fortran --build
gcc,glibc,binutils gcc=gcc.git~master --set cpu=cortex-a9

I avoided --build all because gdb fails to build with --set cpu=cortex-a9.

Then I ran tests like this:

../abe/abe.sh --target armeb-linux-gnueabihf --set
runtestflags=execute.exp=in-pack.f90 --build gcc --check gcc 
gcc=gcc.git~master --disable update

which actually passed, because there doesn't seem to be a way to pass to pass
-mfpu= to either the abe.sh configury, or the "runtestflags=" line.

So... then you do:

$ find . -name gfortran.log
./builds/x86_64-unknown-linux-gnu/armeb-linux-gnueabihf/gcc.git~master-stage2/gcc/testsuite/gfortran/gfortran.log

and fish out the gcc command line to run manually on the command line with
-mcpu=cortex-a9 -mfpu=neon-fp16 -O3 -o i-wanna-kill-myself.exe

And finally:

$ qemu-armeb -cpu any -R 0 -L
/home/cygnus/aldyh/bld/arm-hell/build-cortex-a9/sysroots/armeb-linux-gnueabihf
i-wanna-kill-myself.exe
Program aborted. Backtrace:
qemu: uncaught target signal 6 (Aborted) - core dumped

I've verified with -O2 or without -mfpu=neon-fp16 succeeds, so the test isn't
just dying for all flags :).

Sigh (have I said that before?).  I will now see if I can reduce the test and
see where we are dying.

p.s. gcc.git~master directory seems to have a pretty up to date upstream GCC
(as of today). I'll double check dropping in an actual FSF gcc doesn't change
the results.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #39 from Christophe Lyon  ---
Maybe we can demote this from P1?
I'm sure armeb is getting a lot of attention, given other bug reports.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #38 from Christophe Lyon  ---
(In reply to Aldy Hernandez from comment #37)
> (In reply to Christophe Lyon from comment #31)
> > Created attachment 43352 [details]
> > Reduced testcase
> > 
> > I commented out most calls, since abort() is called from csub4.
> 
> Can you also remove the csub8, isub4, and isub8 unused functions as well?
> 
> I see you've commented out this in csub4:
> 
> !!  if (any(bb /= b)) call abort
> 
> I assume this is irrelevant to the failure?
> 
> Can you also verify that after these changes you have a revision of GCC for
> which this reduced testcase succeeds (regardless of the vect cost model
> rabbit hole), and a revision of GCC for which this fails?
> 
> I'm trying to make sure all this removing of stuff didn't cause an
> inconditional abort.

I don't speak fortran, but I thought the program did:
main-> call csub4-> call abort if some condition
In my testing, removing all calls but csub4 from main is sufficient to make the
program fail, and then it seems the first call to abort in csub4 is taken too.

What would it change to remove csub8/isub4/isub8? (except not generating dead
code, which is irrelevant to the current problem)


> 
> Also, is this only reproducible with -g?
I don't know, it's added by the torture harness. I wouldn't expect this to
change code generation, though.

> 
> BTW, no need to include the assembly.  I should be able to generate it with
> a cross ./cc1.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #37 from Aldy Hernandez  ---
(In reply to Christophe Lyon from comment #31)
> Created attachment 43352 [details]
> Reduced testcase
> 
> I commented out most calls, since abort() is called from csub4.

Can you also remove the csub8, isub4, and isub8 unused functions as well?

I see you've commented out this in csub4:

!!  if (any(bb /= b)) call abort

I assume this is irrelevant to the failure?

Can you also verify that after these changes you have a revision of GCC for
which this reduced testcase succeeds (regardless of the vect cost model rabbit
hole), and a revision of GCC for which this fails?

I'm trying to make sure all this removing of stuff didn't cause an
inconditional abort.

Also, is this only reproducible with -g?

BTW, no need to include the assembly.  I should be able to generate it with a
cross ./cc1.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #36 from Christophe Lyon  ---
The attachments were generated with trunk r257076

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #35 from Christophe Lyon  ---
Created attachment 43356
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43356=edit
execution traces for armeb

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #34 from Christophe Lyon  ---
Created attachment 43355
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43355=edit
execution traces for arm

I have removed the logs/traces for -O1/-O2/-Os/etc... and kept only -O3 -g

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #33 from Christophe Lyon  ---
Created attachment 43354
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43354=edit
assembly for armeb (big-endian)

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #32 from Christophe Lyon  ---
Created attachment 43353
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43353=edit
assembly for arm (little-endian)

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #31 from Christophe Lyon  ---
Created attachment 43352
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43352=edit
Reduced testcase

I commented out most calls, since abort() is called from csub4.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-06 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #30 from Aldy Hernandez  ---
(In reply to Christophe Lyon from comment #29)
> I still haven't found a commit where the test passes with
> -fno-vect-cost-model (before -O3).
> 
> I went back to r193053 (Nov 1, 2012), where I was able to build GCC but the
> test fails.
> With a revision 1 month earlier, the GCC fails to build.
> I tried with earlier revision, but soon reached a point where
> "-mfloat-abi=hard and VFP" is not implemented.

Ok, bisecting isn't getting us anywhere.  I apologize for making you going
through all this in vain.

Could you further reduce the testcase as I suggested in comment 12, and perhaps
we can start looking at the assembly to see where things are going wrong?

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-06 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #29 from Christophe Lyon  ---
I still haven't found a commit where the test passes with -fno-vect-cost-model
(before -O3).

I went back to r193053 (Nov 1, 2012), where I was able to build GCC but the
test fails.
With a revision 1 month earlier, the GCC fails to build.
I tried with earlier revision, but soon reached a point where "-mfloat-abi=hard
and VFP" is not implemented.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-06 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #28 from Christophe Lyon  ---
It's possible that my bisect script got confused by the fact the GCC started
ICEing at -O2 on this test at r197671.

Investigating

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-06 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #27 from Aldy Hernandez  ---
(In reply to Christophe Lyon from comment #26)
> I've manually built or tried to build several revisions:
> * 197671: build OK, test fails to run at -fno-vect-cost-model -O3 -g
> * 197669: same (!)
> * 197815: GCC fails to build
> * 197816: same
> * 197900: same
> 
> So although I still see the test failing, I don't understand why bisect
> said the guilty commit would be between 197671 and 197815, given that the
> test fails at 197669

What's the first revision before 197669 that shows the test succeeding?

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-06 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #26 from Christophe Lyon  ---
I've manually built or tried to build several revisions:
* 197671: build OK, test fails to run at -fno-vect-cost-model -O3 -g
* 197669: same (!)
* 197815: GCC fails to build
* 197816: same
* 197900: same

So although I still see the test failing, I don't understand why bisect
said the guilty commit would be between 197671 and 197815, given that the test
fails at 197669

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #25 from Aldy Hernandez  ---

> Aldy - these easiest thing for now would be to unilaterally relax the
> alignment
> test in Handle_Store_Double and see if that allows you to get further with
> your
> tests.


We're debugging past each other :).  I was already doing that, but it only
mildly helps:

pc: 8358,  ldm  r3, {r0, r1, r2, r3}
pc: 835c,  stm  ip, {r0, r1, r2, r3}
pc: 8360,  sub  r3, fp, #40 ; 0x28
pc: 8364,  mov  r0, r3
pc: 8368,  blx  0x0698
 pc changed to 8a00
pc: 8a00, Thumb instr: f890|f000   emulate as: e5d0f000  ldrb   pc, [r0]   
; 
 pc changed to a

I'm done mucking around with the simulator.  I'll file a GDB/sim PR for the
alignment issue though.  Thanks.

> 
> (But yes, I agree, a reduced testcase would be a much better help than all
> this
> mucking about in the simulator).

Yes please.  Cristophe?

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #24 from Aldy Hernandez  ---
(In reply to Richard Earnshaw from comment #22)
> (In reply to Nick Clifton from comment #21)
> > Hi Aldy,
> > 
> > >>> instruction. :-(  Looking at the code in Handle_Store_Double() in 
> > >>> sim/arm/armemu.c, I think that the reason is probably because the 
> > >>> address
> > >>> for the store is not double word aligned.  Which leads me to wonder,
> > >>> what value is stored in r5 when the STRD instruction is being executed ?
> > 
> > 
> > >> => 0x8c24 : strdr2, [r5, #12]
> > >> (gdb) info reg r5
> > >> r5 0x1b6e8  112360
> > 
> > >> ...which is 64 bit aligned.
> > 
> > But, as you have just discovered, (r5 + 12) is not 64-bit aligned...
> 
> But from ARMv7 onwards it only has to be 4-byte aligned, which it is.  And
> this code was build for cortex-a9, which is ARMv7-a.

(In reply to Richard Earnshaw from comment #22)
> (In reply to Nick Clifton from comment #21)
> > Hi Aldy,
> > 
> > >>> instruction. :-(  Looking at the code in Handle_Store_Double() in 
> > >>> sim/arm/armemu.c, I think that the reason is probably because the 
> > >>> address
> > >>> for the store is not double word aligned.  Which leads me to wonder,
> > >>> what value is stored in r5 when the STRD instruction is being executed ?
> > 
> > 
> > >> => 0x8c24 : strdr2, [r5, #12]
> > >> (gdb) info reg r5
> > >> r5 0x1b6e8  112360
> > 
> > >> ...which is 64 bit aligned.
> > 
> > But, as you have just discovered, (r5 + 12) is not 64-bit aligned...
> 
> But from ARMv7 onwards it only has to be 4-byte aligned, which it is.  And
> this code was build for cortex-a9, which is ARMv7-a.

In that case, unless I'm missing something, the simulator looks wrong.

The unalignment occurs in initialise_monitor_files() here:

openfiles[0].handle = monitor_stdin;

(gdb) p [0].handle
$14 = (int *) 0x1b6f4 
(gdb) p/x (unsigned int)$14 % 4
$15 = 0x0
(gdb) p/x (unsigned int)$14 % 8
$16 = 0x4

So openfiles[0].handle is aligned to 4 bytes, but not to 8.  Forthat matter,
 is 4 byte aligned only.  And Richard says that is ok.

So, why is Handle_Store_Double() unilaterally barfing on non 64-bit alignment?

  /* The address must be aligned on a 8 byte boundary.  */
  if (addr & 0x7)
{
#ifdef ABORTS
  ARMul_DATAABORT (addr);
#else
  ARMul_UndefInstr (state, instr);
#endif
  return;
}

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread nickc at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #23 from Nick Clifton  ---
Hi Guys,

>> But, as you have just discovered, (r5 + 12) is not 64-bit aligned...
> 
> But from ARMv7 onwards it only has to be 4-byte aligned, which it is.  And 
> this
> code was build for cortex-a9, which is ARMv7-a.

Ok, so this is a simulator bug.

Aldy - these easiest thing for now would be to unilaterally relax the alignment
test in Handle_Store_Double and see if that allows you to get further with your
tests.

(But yes, I agree, a reduced testcase would be a much better help than all this
mucking about in the simulator).

Cheers
  Nick

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #22 from Richard Earnshaw  ---
(In reply to Nick Clifton from comment #21)
> Hi Aldy,
> 
> >>> instruction. :-(  Looking at the code in Handle_Store_Double() in 
> >>> sim/arm/armemu.c, I think that the reason is probably because the address
> >>> for the store is not double word aligned.  Which leads me to wonder,
> >>> what value is stored in r5 when the STRD instruction is being executed ?
> 
> 
> >> => 0x8c24 : strdr2, [r5, #12]
> >> (gdb) info reg r5
> >> r5 0x1b6e8  112360
> 
> >> ...which is 64 bit aligned.
> 
> But, as you have just discovered, (r5 + 12) is not 64-bit aligned...

But from ARMv7 onwards it only has to be 4-byte aligned, which it is.  And this
code was build for cortex-a9, which is ARMv7-a.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread nickc at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #21 from Nick Clifton  ---
Hi Aldy,

>>> instruction. :-(  Looking at the code in Handle_Store_Double() in 
>>> sim/arm/armemu.c, I think that the reason is probably because the address
>>> for the store is not double word aligned.  Which leads me to wonder,
>>> what value is stored in r5 when the STRD instruction is being executed ?


>> => 0x8c24 : strdr2, [r5, #12]
>> (gdb) info reg r5
>> r5 0x1b6e8  112360

>> ...which is 64 bit aligned.

But, as you have just discovered, (r5 + 12) is not 64-bit aligned...

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread nickc at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #20 from Nick Clifton  ---
Hi Aldy,

>>> for the store is not double word aligned.  Which leads me to wonder,
>>> what value is stored in r5 when the STRD instruction is being executed ?
>>
>> 1: x/i $pc
>> => 0x8c24 : strdr2, [r5, #12]
>> (gdb) info reg r5
>> r5 0x1b6e8  112360

>> ...which is 64 bit aligned.

Hmm, curious.  OK - my next recommendation would be to add some printf()s
to the simulator to find out a) if Handle_Store_Double() really is being
called, or if the abort is happening somewhere else.  Plus, if it is being
called, then b) where inside that function the abort is happening.  Maybe
the store operations are triggering a memory access failure.

Cheers
  Nick

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #19 from Aldy Hernandez  ---
(In reply to Richard Earnshaw from comment #16)
> (In reply to Nick Clifton from comment #13)
> > Hi Aldy,
> > 
> > 
> > > pc: 8ca4, instr: e1c520fc
> > > pc: 4, instr: ea00089b
> > > 
> > > I took a peek at the executable being run with "/my-arm-build/objdudump -D
> > > the-executable.exe", and I see we are failing in 
> > > initialise_monitor_handles(). 
> > > This suggests we're failing during the start-up code:
> > 
> > > 8ca4:   e1c520fcstrdr2, [r5, #12]
> > > 
> > > It seems that last store is corrupting things and making us jump to a PC 
> > > of
> > > 4???
> > 
> > Address 4 is the "undefined instruction" vector.  If the simulator thinks
> > that the instruction is illegal/unknown then it will branch to address 4
> > and start executing from there.  (Or else it loads the value stored at 
> > address 4 and starts executing from that address.  I forget which).
> > 
> > So, this basically means that the simulator does not like that STRD 
> > instruction. :-(  Looking at the code in Handle_Store_Double() in 
> > sim/arm/armemu.c, I think that the reason is probably because the address
> > for the store is not double word aligned.  Which leads me to wonder,
> > what value is stored in r5 when the STRD instruction is being executed ?
> 
> You wouldn't take the undef vector for an alignment issue: that would take a
> data abort.
> 
> Sounds like your simulator is built for an older architecture, that doesn't
> have strd (ie it's pre-armv5te).

Actually, Nick is correct.  The simulator stops around here:

  /* The address must be aligned on a 8 byte boundary.  */
  if (addr & 0x7)
{
#ifdef ABORTS
  ARMul_DATAABORT (addr);
#else
  ARMul_UndefInstr (state, instr);
#endif

(gdb) p/x addr
$26 = 0x1b6f4
(gdb) p/x addr & 7
$27 = 0x4

and ARMul_UndefInstr() is:

void
ARMul_UndefInstr (ARMul_State * state, ARMword instr ATTRIBUTE_UNUSED)
{
  ARMul_Abort (state, ARMul_UndefinedInstrV);
}

#define ARMUndefinedInstrV 4L
...
#define ARMul_UndefinedInstrV ARMUndefinedInstrV

The 4 is not exactly intuitive...

I still think someone with access to a working environment should reduce or
debug this further :(.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #18 from Aldy Hernandez  ---
(In reply to Christophe Lyon from comment #17)
> (In reply to Aldy Hernandez from comment #12)
> 
> > along with the isub8 subroutine, and continue chopping things similarly
> > upward until you get to the abort that fails.  Then see if you can chop
> > non-dependent things from the top down until you get to the smallest block
> > that has no problem before 197671 and a problem after 197815 (with -O3 -g
> > -fno-vect-cost-model as suggested before).
> > 
> Hmmm does -O3 overrides -fno-vect-cost-model?
> 
> When I running the testsuite with qemu/-fno-vect-cost-model (as target
> board), my logs show:
> [...]
> /home/christophe.lyon/src/GCC/sources/gcc-fsf/r197671/gcc/testsuite/gfortran.
> fortran-torture/execute/in-pack.f90 -fno-vect-cost-model
> -fno-diagnostics-show-caret -w -O3 -g  []
> 
> so I may not have been testing what I thought :(

-fno-vect-cost-model should take effect regardless of the -O3 specified.  That
is, the order shouldn't matter.

BTW, I only mentioned -O3 -fno-vect-cost-model -g because that is what was
suggested by Jakub in comment #3.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #17 from Christophe Lyon  ---
(In reply to Aldy Hernandez from comment #12)

> along with the isub8 subroutine, and continue chopping things similarly
> upward until you get to the abort that fails.  Then see if you can chop
> non-dependent things from the top down until you get to the smallest block
> that has no problem before 197671 and a problem after 197815 (with -O3 -g
> -fno-vect-cost-model as suggested before).
> 
Hmmm does -O3 overrides -fno-vect-cost-model?

When I running the testsuite with qemu/-fno-vect-cost-model (as target board),
my logs show:
[...]
/home/christophe.lyon/src/GCC/sources/gcc-fsf/r197671/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90
-fno-vect-cost-model -fno-diagnostics-show-caret -w -O3 -g  []

so I may not have been testing what I thought :(

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #16 from Richard Earnshaw  ---
(In reply to Nick Clifton from comment #13)
> Hi Aldy,
> 
> 
> > pc: 8ca4, instr: e1c520fc
> > pc: 4, instr: ea00089b
> > 
> > I took a peek at the executable being run with "/my-arm-build/objdudump -D
> > the-executable.exe", and I see we are failing in 
> > initialise_monitor_handles(). 
> > This suggests we're failing during the start-up code:
> 
> > 8ca4:   e1c520fcstrdr2, [r5, #12]
> > 
> > It seems that last store is corrupting things and making us jump to a PC of
> > 4???
> 
> Address 4 is the "undefined instruction" vector.  If the simulator thinks
> that the instruction is illegal/unknown then it will branch to address 4
> and start executing from there.  (Or else it loads the value stored at 
> address 4 and starts executing from that address.  I forget which).
> 
> So, this basically means that the simulator does not like that STRD 
> instruction. :-(  Looking at the code in Handle_Store_Double() in 
> sim/arm/armemu.c, I think that the reason is probably because the address
> for the store is not double word aligned.  Which leads me to wonder,
> what value is stored in r5 when the STRD instruction is being executed ?

You wouldn't take the undef vector for an alignment issue: that would take a
data abort.

Sounds like your simulator is built for an older architecture, that doesn't
have strd (ie it's pre-armv5te).

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #15 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #14)
> (In reply to Nick Clifton from comment #13)
> > Hi Aldy,
> > 
> > 
> > > pc: 8ca4, instr: e1c520fc
> > > pc: 4, instr: ea00089b
> > > 
> > > I took a peek at the executable being run with "/my-arm-build/objdudump -D
> > > the-executable.exe", and I see we are failing in 
> > > initialise_monitor_handles(). 
> > > This suggests we're failing during the start-up code:
> > 
> > > 8ca4:   e1c520fcstrdr2, [r5, #12]
> > > 
> > > It seems that last store is corrupting things and making us jump to a PC 
> > > of
> > > 4???
> > 
> > Address 4 is the "undefined instruction" vector.  If the simulator thinks
> > that the instruction is illegal/unknown then it will branch to address 4
> > and start executing from there.  (Or else it loads the value stored at 
> > address 4 and starts executing from that address.  I forget which).
> > 
> > So, this basically means that the simulator does not like that STRD 
> > instruction. :-(  Looking at the code in Handle_Store_Double() in 
> > sim/arm/armemu.c, I think that the reason is probably because the address
> > for the store is not double word aligned.  Which leads me to wonder,
> > what value is stored in r5 when the STRD instruction is being executed ?
> 
> 1: x/i $pc
> => 0x8c24 : strdr2, [r5, #12]
> (gdb) info reg r5
> r5 0x1b6e8  112360
> (gdb) x/4x 0x1b6e8
> 0x1b6e8 :0x  0x0001  0x0001
> 0x
> 
> ...which is 64 bit aligned.

And if you're curious what the 12 offset points to:

(gdb) x/4x $r5 + 12
0x1b6f4 :0x  0x  0x0001 
0x00
00
(gdb) x/4x $r5 + 0x12
0x1b6fa :  0x  0x0001  0x 
0x00
00

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #14 from Aldy Hernandez  ---
(In reply to Nick Clifton from comment #13)
> Hi Aldy,
> 
> 
> > pc: 8ca4, instr: e1c520fc
> > pc: 4, instr: ea00089b
> > 
> > I took a peek at the executable being run with "/my-arm-build/objdudump -D
> > the-executable.exe", and I see we are failing in 
> > initialise_monitor_handles(). 
> > This suggests we're failing during the start-up code:
> 
> > 8ca4:   e1c520fcstrdr2, [r5, #12]
> > 
> > It seems that last store is corrupting things and making us jump to a PC of
> > 4???
> 
> Address 4 is the "undefined instruction" vector.  If the simulator thinks
> that the instruction is illegal/unknown then it will branch to address 4
> and start executing from there.  (Or else it loads the value stored at 
> address 4 and starts executing from that address.  I forget which).
> 
> So, this basically means that the simulator does not like that STRD 
> instruction. :-(  Looking at the code in Handle_Store_Double() in 
> sim/arm/armemu.c, I think that the reason is probably because the address
> for the store is not double word aligned.  Which leads me to wonder,
> what value is stored in r5 when the STRD instruction is being executed ?

1: x/i $pc
=> 0x8c24 : strdr2, [r5, #12]
(gdb) info reg r5
r5 0x1b6e8  112360
(gdb) x/4x 0x1b6e8
0x1b6e8 :0x  0x0001  0x0001
0x

...which is 64 bit aligned.

The above maps to the source: newlib/libc/sys/arm/syscalls.c

  for (i = 0; i < MAX_OPEN_FILES; i ++)
openfiles[i].handle = -1;

  openfiles[0].handle = monitor_stdin;


> > Should I run the dejagnu tests with -mcpu= or whatever, or is the
> > --with-cpu=cortex-a9 configury flag enough?
> 
> Be paranoid - add the option. :-)

No difference :(.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread nickc at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #13 from Nick Clifton  ---
Hi Aldy,


> pc: 8ca4, instr: e1c520fc
> pc: 4, instr: ea00089b
> 
> I took a peek at the executable being run with "/my-arm-build/objdudump -D
> the-executable.exe", and I see we are failing in 
> initialise_monitor_handles(). 
> This suggests we're failing during the start-up code:

> 8ca4:   e1c520fcstrdr2, [r5, #12]
> 
> It seems that last store is corrupting things and making us jump to a PC of
> 4???

Address 4 is the "undefined instruction" vector.  If the simulator thinks
that the instruction is illegal/unknown then it will branch to address 4
and start executing from there.  (Or else it loads the value stored at 
address 4 and starts executing from that address.  I forget which).

So, this basically means that the simulator does not like that STRD 
instruction. :-(  Looking at the code in Handle_Store_Double() in 
sim/arm/armemu.c, I think that the reason is probably because the address
for the store is not double word aligned.  Which leads me to wonder,
what value is stored in r5 when the STRD instruction is being executed ?




> Am I running the simulator correctly?

Yes.

>  Does it require a special flag for
> cortex-a9?  

No.

> Is the cortex-a9 CPU even handled by the simulator?

Yes.

> Should I run the dejagnu tests with -mcpu= or whatever, or is the
> --with-cpu=cortex-a9 configury flag enough?

Be paranoid - add the option. :-)


> Does the arm newlib/libgloss/whatever code have instructions that aren't
> handled by the GDB simulator?

No.  Well not in the assembler parts of it.  The possible exception to this
are the memory manipulation functions (memcpy, strlen ,etc) in newlib/libc/
sys/arm/ which tend to be very tightly coded, and will often be updated to 
take advantage of new instructions as they are added to the ISA.

Of course the C parts of these libraries might use unsupported instructions
if gcc generates them.  But if you have configured gcc correctly (and I think
that you have) then this should not be an issue.

Cheers
  Nick

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #12 from Aldy Hernandez  ---
(In reply to Christophe Lyon from comment #11)
> My setup uses armeb-none-linux-gnueabihf (as opposed to armeb-eabi as you
> report). I have never tried armeb-eabi.
> 
> I am also using qemu as simulator (in user mode, not in system mode).
> 
> The failure in initialise_monitor_handles indicates a problem in the startup
> code, while initializing the semi-hosting interface.
> 
> Can you try qemu?

Ughhh, off-list I've found that you build glibc/kernel and all that jazz.  This
is going to take forever if we (ahem I) rebuild the world in order to
reproduce.

In comment #6 you mention that the regression started between r197671 and
r197815. 
 Could you reduce the fortran testcase to the absolute minimum, and then we can
take a look at the assembly before r197671 and after r197815?

Perhaps start chopping off:

  i8 = (/(i,i=1,5)/)
  call isub8(i8(5:1:-1),5)
  ii8 = (/(5-i+1,i=1,5)/)
  if (any(ii8 /= i8)) call abort

along with the isub8 subroutine, and continue chopping things similarly upward
until you get to the abort that fails.  Then see if you can chop non-dependent
things from the top down until you get to the smallest block that has no
problem before 197671 and a problem after 197815 (with -O3 -g
-fno-vect-cost-model as suggested before).

Once you have a reduced testcase, perhaps we could discern something from the
assembly or gimple dumps.

Again, please try to get the testcase as small as possible.  That will
exponentially improve the chances of things getting fixed :).

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-05 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #11 from Christophe Lyon  ---
My setup uses armeb-none-linux-gnueabihf (as opposed to armeb-eabi as you
report). I have never tried armeb-eabi.

I am also using qemu as simulator (in user mode, not in system mode).

The failure in initialise_monitor_handles indicates a problem in the startup
code, while initializing the semi-hosting interface.

Can you try qemu?

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-02-04 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #10 from Aldy Hernandez  ---
I'm having some trouble reproducing this bug.  I'm a little rusty on cross
builds, so perhaps someone can lend a hand.

I have a set of combined sources which I'm using to build a toolchain like
this:

~/src/combined/configure --host=x86_64-linux --build=x86_64-linux
--target=armeb-eabi --with-mode=arm --with-cpu=cortex-a9 --with-fpu=neon-fp16
--enable-languages=fortran  --disable-libgomp --disable-libsanitizer
--disable-werror

I then do:

cd gcc && make check-fortran RUNTESTFLAGS="execute.exp=in-pack.f90
--target_board=arm-sim"

=== gfortran tests ===

Schedule of variations:
arm-sim

Running target arm-sim
Using /usr/share/dejagnu/baseboards/arm-sim.exp as board description file for
target.
Using /usr/share/dejagnu/config/sim.exp as generic interface file for target.
Using /usr/share/dejagnu/baseboards/basic-sim.exp as board description file for
target.
Using /home/cygnus/aldyh/src/combined/gcc/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running
/home/cygnus/aldyh/src/combined/gcc/testsuite/gfortran.fortran-torture/execute/execute.exp
...
FAIL: gfortran.fortran-torture/execute/in-pack.f90 execution,  -O0 
FAIL: gfortran.fortran-torture/execute/in-pack.f90 execution,  -O1 
...

I get failures for everything, but it seems every execution test fails, even
simple C tests.  Tests fail with no messages, just a simple execution failure,
so I had to dig into the simulator.

I patched the gdb simulator to trace the instructions to see where we are
dying:

diff --git a/sim/arm/wrapper.c b/sim/arm/wrapper.c
index bc1a043..4492b19 100644
--- a/sim/arm/wrapper.c
+++ b/sim/arm/wrapper.c
@@ -53,7 +53,7 @@ int stop_simulator;
 #include "dis-asm.h"

This presumably gives us some tracing in gcc/testsuite/*/*.log:

 /* TODO: Tracing should be converted to common tracing module.  */
-int trace = 0;
+int trace = 1;
 int disas = 0;
 int trace_funcs = 0;

The GCC testsuite log file now shows:

pc: 8c8c, instr: e3530014
pc: 8c90, instr: 1afb
pc: 8c94, instr: e5952000
pc: 8c98, instr: e3a03000
pc: 8c9c, instr: e5856014
pc: 8ca0, instr: e5853018
pc: 8ca4, instr: e1c520fc
pc: 4, instr: ea00089b

I took a peek at the executable being run with "/my-arm-build/objdudump -D
the-executable.exe", and I see we are failing in initialise_monitor_handles(). 
This suggests we're failing during the start-up code:

8c8c:   e3530014cmp r3, #20
8c90:   1afbbne 8c84 
8c94:   e5952000ldr r2, [r5]
8c98:   e3a03000mov r3, #0
8c9c:   e5856014str r6, [r5, #20]
8ca0:   e5853018str r3, [r5, #24]
8ca4:   e1c520fcstrdr2, [r5, #12]

It seems that last store is corrupting things and making us jump to a PC of
4???

Before I bark up the wrong trees, I have some questions.

Am I running the simulator correctly?  Does it require a special flag for
cortex-a9?  

Is the cortex-a9 CPU even handled by the simulator?

Should I run the dejagnu tests with -mcpu= or whatever, or is the
--with-cpu=cortex-a9 configury flag enough?

Does the arm newlib/libgloss/whatever code have instructions that aren't
handled by the GDB simulator?

I don't want to dig too deep into this, only to find out that our simulator,
newlib, or whatever cannot handle cortex-a9 + neon-fp16.

For that matter, is this bug reproducible on a more generic Arm variant that
*IS* supported by gdb?

Sorry for the barrage of questions, but this is a P1, and there doesn't seem to
be an easy way to reproduce this.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-31 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #9 from Christophe Lyon  ---
(In reply to Aldy Hernandez from comment #7)
> Hi Nick!  Hi all!
> 
> Do we have a way of testing armeb, either through a simulator or through
> some aarch64 with magic flags?
> 

Please note that the bug appears on armeb (ie. AArch32 big-endian), and not on
aarch64.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-31 Thread nickc at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #8 from Nick Clifton  ---
Hi Aldy,

> Do we have a way of testing armeb, either through a simulator or through
> some aarch64 with magic flags?

GDB has an ARM simulator which is OK unless you need to test some of the newer
features like scalable vector instructions.  Just compile your code as normal
and then build an armeb targeted version of gdb.

Cheers
  Nick

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-31 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||nickc at gcc dot gnu.org

--- Comment #7 from Aldy Hernandez  ---
Hi Nick!  Hi all!

Do we have a way of testing armeb, either through a simulator or through some
aarch64 with magic flags?

If anyone has a hint on how to reproduce this, I'll gladly take a stab at
reproducing, bisecting, debugging, etc.  Whatever it takes to inch this forward
:).

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-25 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #6 from Christophe Lyon  ---
My bisect script cannot find the commit that introduced the problem with
-fno-vect-cost-model, because the build was broken for quite some time.
The regression seems to have been introduced between r197671 and r197815.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-17 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #5 from Christophe Lyon  ---
So far, my bisect has been unsuccessful, because my bisect script thinks the
guilty commit is one of the exit-code-125 ones, and there are too many.

I should probably re-try manually.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-11 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

--- Comment #4 from Christophe Lyon  ---
Indeed it looks like the testcase has been failing with -fno-vect-cost-model
for a very long time.

Trying the find the 'good' starting point for a bisect.

(I'm using qemu, I have no such board either)

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Ping, can you bisect with -O3 -g fno-vect-cost-model?  I don't have access to
any armeb box, so no idea what to even look at.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2018-01-08 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||law at redhat dot com

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2017-10-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Target Milestone|--- |8.0

--- Comment #2 from Richard Biener  ---
So it was broken before the rev. with -fno-vect-cost-model added I guess.

[Bug tree-optimization/82518] [8 regression] gfortran.fortran-torture/execute/in-pack.f90 fails on armeb since r252917

2017-10-11 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518

Christophe Lyon  changed:

   What|Removed |Added

 Target||armeb

--- Comment #1 from Christophe Lyon  ---
r252917 was a fix for PR tree-optimization/82220