[Bug target/61578] [4.9 regression] Code size increase for ARM thumb compared to 4.8.x when compiling with -Os

2016-06-13 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

--- Comment #44 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Mon Jun 13 10:03:30 2016
New Revision: 237371

URL: https://gcc.gnu.org/viewcvs?rev=237371=gcc=rev
Log:
Backport from Mainline
2015-09-25  Vladimir Makarov  <vmaka...@redhat.com>

PR target/61578
* lra-constarints.c (match_reload): Check presence of the input pseudo
  in the output pseudo.

Modified:
branches/ARM/embedded-5-branch/gcc/ChangeLog.arm
branches/ARM/embedded-5-branch/gcc/lra-constraints.c

[Bug target/61578] [4.9 regression] Code size increase for ARM thumb compared to 4.8.x when compiling with -Os

2016-06-13 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

--- Comment #43 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Mon Jun 13 09:58:34 2016
New Revision: 237369

URL: https://gcc.gnu.org/viewcvs?rev=237369=gcc=rev
Log:
Backport from Mainline
2015-09-01  Vladimir Makarov  <vmaka...@redhat.com>

PR target/61578
* lra-lives.c (process_bb_lives): Process move pseudos with the
  same value for copies and preferences
* lra-constraints.c (match_reload): Create match reload pseudo
  with the same value from single dying input pseudo.

Modified:
branches/ARM/embedded-5-branch/gcc/ChangeLog.arm
branches/ARM/embedded-5-branch/gcc/lra-constraints.c
branches/ARM/embedded-5-branch/gcc/lra-lives.c

[Bug target/79237] [5/6/7 Regression] ARMv7-M ICE in extract_constrain_insn, insn does not satisfy its constraints

2017-01-26 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79237

--- Comment #3 from avieira at gcc dot gnu.org ---
Hi,

My outstanding patch for PR71607 fixes this ICE too. I am currently retesting
it after some comments upstream and should be posting a new version soon.

Cheers,
Andre

[Bug rtl-optimization/77499] New: Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

Bug ID: 77499
   Summary: Regression after code-hoisting, due to combine pass
failing to evaluate known value range
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hello,

We are seeing a regression for arm-none-eabi on a Cortex-M7. This regression
was observed after Biener's and Bosscher's GIMPLE code-hoisting patch
(https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00360.html). The example below
will illustrate the regression:

$ cat t.c
unsigned short foo (unsigned short x, int c, int d, int e)
{
  unsigned short y;

  while (c > d)
{
  if (c % 3)
{
 x = (x >> 1) ^ 0xB121U;
}
  else
 x = x >> 1;
  c-= e;
}
  return x;
}

Comparing:
$ arm-none-eabi-gcc -mcpu=cortex-m7 -mthumb -O2 -S t.c 
vs
$ arm-none-eabi-gcc -mcpu=cortex-m7 -mthumb -O2 -S t.c -fno-code-hoisting

Will illustrate that the code-hoisting version has an extra zero_extension of
HImode to SImode. After some digging I found out that during the combine phase,
the no-code-hoisting version is able to recognize that the
'last_set_nonzero_bits' are 0x7fff whereas for the code-hoisted version it
seems to have lost this knowledge.

Looking at the graph dump for the no-code-hoisting t.c.246r.ud_dce.dot I see
the following insns:
23: r125:SI=r113:SI 0>>0x1
24: r111:SI=zero_extend(r125:SI#0)
27: r128:SI=r111:SI^r131:SI
28: r113:SI=zero_extend(r128:SI#0)

These are all in the same basic block. 

For the code-hoisting version we have:

BB A:
...
12: r116:SI=r112:SI 0>>0x1
13: r112:SI=zero_extend(r116:SI#0)
...
BB B:
27: r127:SI=r112:SI^r129:SI
28: r112:SI=zero_extend(r127:SI#0)


Now from what I have observed by debugging the combine pass is that combine
will first combine instructions 23 and 24. 
Here combine is able to optimize away the zero_extend, because in
'reg_nonzero_bits_for_combine' the reg_stat[113] has its 'last_set_value' to
'r0' (the unsigned short argument) and its corresponding
'last_set_nonzero_bits' to 0x. Which means the zero_extend is
pointless. The code-hoisting version also combines 12 and 13, optimizing away
the zero_extend.

However, the next set of instructions is where things get tricky. In the
no-code-hoisting case it will end up combining all 4 instructions one by one
from the top down and it will end up figuring out that the last zero_extend is
also not required. For the code-hoisting case, when it tries to combine 28 with
27 (remember they are not in the basic block as 13 and 14, so it will never try
to combine all 4), it will eventually try to evaluate the nonzero bits based on
r112 and see that the last_set_value for r112 is:
(lshiftrt:SI (clobber:SI (const_int 0 [0]))
(const_int 1 [0x1]))

The last_set_nonzero_bits will be 0x7fff, instead of the expected
0x7fff. This looks like the result of the code in 'record_value_for_reg' in
combine.c that sits bellow the comment:

  /* The value being assigned might refer to X (like in "x++;").  In that
 case, we must replace it with (clobber (const_int 0)) to prevent
 infinite loops.  */

Given that 12 and 13 were combined into:
r112:SI=r112:SI 0>>0x1

This seems to be invalidating the last_set_value and thus leaving us with a
weaker 'last_set_nonzero_bits' which isn't enough to optimize away the
zero_extend.

Any clue on how to "fix" this?

Cheers,
Andre

PS: I am not sure I completely understand the way the last_set_value stuff
works for pseudo's in combine, but it looks to me like each instruction is
visited in a top down order per basic block again in a top-down order. And each
pseudo will have its 'last_set_value' according to the last time that register
was seen being set, without any regards to loop or proper dataflow analysis.
Can anyone explain to me how this doesnt go horribly wrong?

[Bug rtl-optimization/77499] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #1 from avieira at gcc dot gnu.org ---
Adding Richard, since this was exposed after Richard's code-hoisting patch and
Segher because I believe the root of the problem might be related to the way
combine works.

[Bug tree-optimization/77498] [7 regression] Performance drop after r239414 on spec2000/172mgrid

2016-09-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Target||arm-none-eabi
 CC||avieira at gcc dot gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
I am observing some regressions for arm-none-eabi on a Cortex-M0+ for a popular
embedded benchmark following this patch.

I believe register pressure might also be the root cause of this given the
significant increase of loads and registers from and to the stack. Though I
need to have a better look.

Passing the option -fno-code-hoisting brings the performance numbers back up.

[Bug rtl-optimization/77499] [7 Regression] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-07 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

--- Comment #7 from avieira at gcc dot gnu.org ---
if-convert is a no go here, for the reason Andrew pointed out, sorry missed
that comment!

So I dont know... Only thing I can think of is better "value-range"-like
analysis for combine, but that might be too costly?

The fact is that for the code-hoisting to work here, the pseudo for r112 has to
be shared among both code-paths, so unless you add an extra move:

BB0:
r112:SI = r0:SI

BB 1:
...
r116:SI=r112:SI 0>>0x1
rNEW:SI=zero_extend(r116:SI#0)
...
if CC goto BB2 else BB Extra
BB 2:
r127:SI=rNEW:SI^r129:SI
r112:SI=zero_extend(r127:SI#0)
if LOOP: goto BB1 else BB exit
BB EXTRA:
r112:SI=rNEW:SI
if LOOP: goto BB1 else BB exit

And you end up with an extra move rather than a zero_extend. But maybe the move
can be optimized away in later stages? Or maybe put in the same conditional
execution block as the XOR...

[Bug rtl-optimization/77499] [7 Regression] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-07 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

--- Comment #9 from avieira at gcc dot gnu.org ---

> > So I dont know... Only thing I can think of is better "value-range"-like
> > analysis for combine, but that might be too costly?
> 
> So we are not really looking for combine to combine the shift stmt
> with the xor stmt?  Because combine doesn't consider that because of
> the multi-use.

AFAIK, combine will not combine the shift and xor because they are in different
basic blocks. The multi-use prevents it from tracking the origin of r112 back
to a point where it knows that it its higher bits are all 0.

> > 
> > And you end up with an extra move rather than a zero_extend. But maybe the 
> > move
> > can be optimized away in later stages? Or maybe put in the same conditional
> > execution block as the XOR...
> 
> Well, we run into a general issue of the RTL combiner -- fwprop and
> ree are other passes that are supposed to remove extensions in some
> cases.
> 
> Really, the user could have written the code in a way CSEing the
> shift himself -- it's unfortunate that we now fail to optimize the
> non-CSEd source but that can only be a reason to enhance downstream
> passes.

True, if say the unused 'y' I left in there for some odd reason were used to
CSE (x >> 1) outside the if-then-else, then you would end up with the
zero_extend in both -fcode-hoisting and -fno-code-hoisting.

[Bug rtl-optimization/77499] [7 Regression] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-07 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

--- Comment #6 from avieira at gcc dot gnu.org ---
> so we are talking about the uxthne insn (I don't know arm / thumb very well).

Yes, the uxthne is the "zero_extend" that is otherwise optimized away if you
turn off code-hoisting.

This is because the way the code gets transformed leads to:
r112:SI=r112:SI 0>>0x1, this is the combination of instructions 12 and 13 in my
example earlier. r112 is also the first operand of the xor instruction and
because of the way combine does its "nonzero bit analysis" it always looks at
the last set value for each pseudo. For r112 here, thats an infinite loop and
so it will not be able to recognize that r112 originated from r0, thus loosing
the information that it is at most an unsigned short. Leading to the decision
not to get rid of the zero_extend.

I'll have a look at if-convert.

[Bug rtl-optimization/77499] [7 Regression] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-08 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

--- Comment #11 from avieira at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #10)
> That is what nonzero_bits etc. is about.  We could do much better nowadays
> with the generic DF framework.
> 
I am not familiar with the generic DF framework, could you point me to it?


> Is code hoisting making the code better at all here?  (At RTL level)

Not as is, but I was hoping that if the zero_extend gets removed, we could end
up with:

movwr6, #45345
.L4:
smull   r5, r4, r7, r1
lsrsr0, r0, #1
sub r4, r4, r1, asr #31
-eor r5, r0, r6
add r4, r4, r4, lsl #1
cmp r1, r4
sub r1, r1, r3
it  ne
-uxthne  r0, r5
+eorne r0, r0, r6
cmp r2, r1
blt .L4

So compared to the no-code-hoisting case it would realize it needs to do the
same shift in both cases and only do it once.

[Bug rtl-optimization/77499] [7 Regression] Regression after code-hoisting, due to combine pass failing to evaluate known value range

2016-09-13 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77499

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #12 from avieira at gcc dot gnu.org ---
I heard Kugan was working on getting rid of superfluous zero_extends. Adding
him to the watch list.

@Kugan: Could your work help this case? And when do you plan to have it
submitted?

[Bug target/71607] [5/6/7 Regression] [ARM] ice due to forbidden enabled attribute dependency on instruction operands

2016-10-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71607

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #7 from avieira at gcc dot gnu.org ---
Got a patch up for review on gcc-patches that fixes this, see
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00377.html

[Bug bootstrap/77695] [7 Regression] bootstrap failure due to undeclared hook_uint_uintp_false

2016-09-23 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77695

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #4 from avieira at gcc dot gnu.org ---
Sorry about that and thank you for the fix. I'm curious as to why my aarch64
bootstrap didnt pick this up, it was with an earlier version (2 months ago) but
I dont see why that would make a difference in this case.

Anyhow, again sorry for breaking the world.

Cheers,
Andre

[Bug debug/77773] [7/6 regression] Segfault when compiling __simd64_float16_t using arm-none-eabi-g++ with debug information

2016-09-28 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3

--- Comment #1 from avieira at gcc dot gnu.org ---
When I say without errors I meant without segfaulting. It will print out the
following error for version 5 if you dont include '-mfpu=neon':
t.c:1:9: error: '__simd64_float16_t' does not name a type
 typedef __simd64_float16_t float16x4_t;

[Bug debug/77773] New: [7/6 regression] Segfault when compiling __simd64_float16_t using arm-none-eabi-g++ with debug information

2016-09-28 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3

Bug ID: 3
   Summary: [7/6 regression] Segfault when compiling
__simd64_float16_t using arm-none-eabi-g++ with debug
information
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hello,

When compiling the following:
$ cat t.c
typedef __simd64_float16_t float16x4_t;


with:
$ arm-none-eabi-g++ -S t.c -mfloat-abi=hard -march=armv7-a -g
t.c:1:28: internal compiler error: Segmentation fault

0xd33a1f crash_signal
src/gcc/gcc/toplev.c:336
0x881b8f tree_class_check
src/gcc/gcc/tree.h:3148
0x881b8f c_pretty_printer::simple_type_specifier(tree_node*)
src/gcc/gcc/c-family/c-pretty-print.c:351
0x7ce46e cxx_pretty_printer::simple_type_specifier(tree_node*)
src/gcc/gcc/cp/cxx-pretty-print.c:1324
0x884dec pp_c_specifier_qualifier_list(c_pretty_printer*, tree_node*)
src/gcc/gcc/c-family/c-pretty-print.c:478
0x884dde pp_c_specifier_qualifier_list(c_pretty_printer*, tree_node*)
src/gcc/gcc/c-family/c-pretty-print.c:474
0x7ccbe2 pp_cxx_type_specifier_seq
src/gcc/gcc/cp/cxx-pretty-print.c:1379
0x6b4cd4 dump_type
src/gcc/gcc/cp/error.c:467
0x6be905 dump_type_prefix
src/gcc/gcc/cp/error.c:811
0x6b26b2 dump_simple_decl
src/gcc/gcc/cp/error.c:970
0x6b2e00 dump_decl
src/gcc/gcc/cp/error.c:1057
0x6beaf1 decl_as_string(tree_node*, int)
src/gcc/gcc/cp/error.c:2882
0x6beb1f decl_as_dwarf_string(tree_node*, int)
src/gcc/gcc/cp/error.c:2871
0x59a171 cxx_dwarf_name
src/gcc/gcc/cp/cp-lang.c:119
0x97f8be type_tag
src/gcc/gcc/dwarf2out.c:19191
0x9a1369 gen_array_type_die
src/gcc/gcc/dwarf2out.c:19367
0x9a1369 gen_type_die_with_usage
src/gcc/gcc/dwarf2out.c:23080
0x9a1c8b gen_type_die
src/gcc/gcc/dwarf2out.c:23142
0x9ab9d7 modified_type_die
src/gcc/gcc/dwarf2out.c:11469
0x9abf9c add_type_attribute
src/gcc/gcc/dwarf2out.c:19123

Removing -g makes it compile without errors.

[Bug debug/77773] [6/7 regression] Segfault when compiling __simd64_float16_t using arm-none-eabi-g++ with debug information

2016-09-28 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3

--- Comment #3 from avieira at gcc dot gnu.org ---
Just to make it clear:

The command I showed without the '-g' did use to error on gcc-5, but it doesnt
on 6 and 7:
$ gcc-5/arm-none-eabi-g++ -S t.c -mfloat-abi=hard -march=armv7-a
t.c:1:9: error: '__simd64_float16_t' does not name a type
 typedef __simd64_float16_t float16x4_t;
$ gcc-6/arm-none-eabi-g++ -S t.c -mfloat-abi=hard -march=armv7-a
$ gcc-7/arm-none-eabi-g++ -S t.c -mfloat-abi=hard -march=armv7-a

Adding -mfpu=neon to gcc-5 gets rid of the error:
$ gcc-5/arm-none-eabi-g++ -S t.c -mfloat-abi=hard -march=armv7-a -mfpu=neon

Adding -mfpu=neon to eitehr gcc-6 or 7 is irrelevant to both compilations with
or without '-g'.

[Bug target/78255] New: [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-11-08 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

Bug ID: 78255
   Summary: [5/6/7 regression] Indirect sibling call causing wrong
code generation for ARM
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

As first reported by Andrew on
https://bugs.launchpad.net/gcc-arm-embedded/+bug/1616992

To reproduce on trunk:
$ cat test.c
#include 
struct table_s
{
void (*fun0)
( void );
void (*fun1)
( void );
void (*fun2)
( void );
void (*fun3)
( void );
void (*fun4)
( void );
void (*fun5)
( void );
void (*fun6)
( void );
void (*fun7)
( void );
} table;

void callback0(){__asm("mov r0, r0 \n\t");}
void callback1(){__asm("mov r0, r0 \n\t");}
void callback2(){__asm("mov r0, r0 \n\t");}
void callback3(){__asm("mov r0, r0 \n\t");}
void callback4(){__asm("mov r0, r0 \n\t");}

void test(void) {
memset(, 0, sizeof table);

asm volatile ("" : : : "r3");

table.fun0 = callback0;
table.fun1 = callback1;
table.fun2 = callback2;
table.fun3 = callback3;
table.fun4 = callback4;
table.fun0();
}

$ arm-none-eabi-gcc -S -O2 -mthumb -mcpu=cortex-m3 test.c
$ cat test.s
...
ldr r5, .L8+4
ldr r3, .L8+8
ldr r0, .L8+12
ldr r1, .L8+16
ldr r2, .L8+20
str r5, [r4]
str r0, [r4, #4]
str r1, [r4, #8]
str r2, [r4, #12]
str r3, [r4, #16]
pop {r3, r4, r5, lr}
bx  r3  @ indirect register sibling call
...

As reported, we see that r3 is "restored" before being used to do the sibling
call. So it will no longer contain the address of the call.

I believe this is because 'arm_get_frame_offsets' is called to determine
whether we can safely use 'r3' to align the stack using the function
'any_sibcall_could_use_r3'. This is done before the address of the sibcall is
assigned a hard register, so 'any_sibcall_could_use_r3' returns 'false' and we
push and pop 'r3' in the pro- and epilogue.

[Bug target/78255] [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-11-09 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #1 from avieira at gcc dot gnu.org ---
OK I think I assigned the blame to the wrong function, I think it is the
responsibility of 'is_indirect_tailcall_p' to catch this. Though I believe the
last time it is called during the postreload pass, the call rtx still has a
symbolref in it and only later in the pass is it replaced with a register. Too
late for this function to catch it and after that 'reload_completed' is set to
true and 'arm_get_frame_offsets' only returns the precomputed offsets.

I have a workaround where I add a use clause to the sibling patterns, which
seems to work, but I am not entirely sure why it works and I am not sure it is
the right approach either.

[Bug target/78255] [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-11-22 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #2 from avieira at gcc dot gnu.org ---
The approach I had doesnt work, it ICE's elsewhere...

At the time I am not sure how to fix this without disabling indirect tail calls
for the current function if any sibcall is done within it.  This might be too
big a hammer... If anyone has any tips they are very welcome.

[Bug target/69538] gcc.dg/torture/stackalign/builtin-apply-4.c fails with flto for aarch32 targets with single precision FPU

2016-11-17 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69538

--- Comment #6 from avieira at gcc dot gnu.org ---
I had a look at this and after some digging I found the bug is not due to LTO,
but rather with "local" functions. If you make bar static you will end up with
the same faulty behavior.

After some more digging I found myself going through the 'untyped_call' code in
arm.md. Here I found both 'untyped_call' and 'untyped_return' had not been
adjusted to be able to cope with HardFP ABI's.  I wrote a patch to mend this,
needs a bit more work, but I think it's correct and I might put it on
gcc-patches at a later time.

However, when I started thinking of how I was going to "fix" this wrong-code
generation, I realized that due to the way untyped_call's and untyped_return's
are constructed and the nature of '__builtin_return' and '__builtin_apply', you
do not know which registers are actually used to return the values, you only
know it might be 'r0-r4' and 'd0-d7'. So even though I know the call-site would
expect a return of type 'double' in 'r0-r1', because this is local function
(aka 'ARM_PCS_AAPCS_LOCAL') and the target does not support double precision,
there is no way for me to know in which of the registers the function is
actually returning, so I dont know what registers to move to 'r0-r1'.

So  I don't think we can get this builtin to work for single precision
VFPs, without compromising on the way we use local function returns.

[Bug rtl-optimization/78255] [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-12-09 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #12 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Fri Dec  9 16:46:42 2016
New Revision: 243494

URL: https://gcc.gnu.org/viewcvs?rev=243494=gcc=rev
Log:
PR78255: Make postreload aware of NO_FUNCTION_CSE

gcc/ChangeLog:
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog:
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc.target/aarch64/pr78255.c: New.
* gcc.target/arm/pr78255-1.c: New.
* gcc.target/arm/pr78255-2.c: New.

Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr78255.c
trunk/gcc/testsuite/gcc.target/arm/pr78255-1.c
trunk/gcc/testsuite/gcc.target/arm/pr78255-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/postreload.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/78255] [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-12-09 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #13 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Fri Dec  9 17:22:20 2016
New Revision: 243496

URL: https://gcc.gnu.org/viewcvs?rev=243496=gcc=rev
Log:
PR78255: Make postreload aware of NO_FUNCTION_CSE

gcc/ChangeLog.arm:
2016-12-09 Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09 Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog.arm:
2016-12-09 Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09 Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc.target/aarch64/pr78255.c: New.
* gcc.target/arm/pr78255-1.c: New.
* gcc.target/arm/pr78255-2.c: New.

Added:
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/aarch64/pr78255.c
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/pr78255-1.c
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/pr78255-2.c
Modified:
branches/ARM/embedded-6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-6-branch/gcc/postreload.c
branches/ARM/embedded-6-branch/gcc/testsuite/ChangeLog.arm

[Bug rtl-optimization/78255] [5/6 regression] Indirect sibling call causing wrong code generation for ARM

2017-01-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #15 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Wed Jan 11 15:08:25 2017
New Revision: 244319

URL: https://gcc.gnu.org/viewcvs?rev=244319=gcc=rev
Log:
PR78255: Make postreload aware of NO_FUNCTION_CSE

gcc/ChangeLog:
2017-01-11  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog:
2017-01-11  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-20  Andre Vieira  <andre.simoesdiasvie...@arm.com>

* gcc.target/arm/pr78255-2.c: Fix to work for targets
that do not optimize for tailcall.

2017-01-11  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc.target/aarch64/pr78255.c: New.
* gcc.target/arm/pr78255-1.c: New.
* gcc.target/arm/pr78255-2.c: New.

Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/pr78255.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/pr78255-1.c
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/pr78255-2.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/postreload.c
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug target/78255] [5/6/7 regression] Indirect sibling call causing wrong code generation for ARM

2016-12-01 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wdijkstr at arm dot com

--- Comment #4 from avieira at gcc dot gnu.org ---
OK so after some extra debugging and digging I found that the postreload pass
is basically turning the direct sibcall into an indirect sibcall. It takes cost
into consideration, but does this only looking at the operands of the call,
i.e. the cost of a symbolref vs the cost of a register. It does not take into
consideration that it is doing a call. This doesn't seem like a good idea to
me.

Apart from that, I am now looking into letting arm_get_frame_offsets
recalculate the offsets and registers to push and pop past reload_completed. I
am not convinced this is entirely safe yet...

[Bug target/71607] [5/6/7 Regression] [ARM] ice due to forbidden enabled attribute dependency on instruction operands

2016-12-05 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71607

--- Comment #8 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Mon Dec  5 17:36:03 2016
New Revision: 243266

URL: https://gcc.gnu.org/viewcvs?rev=243266=gcc=rev
Log:
[ARM] PR71607: New approach to arm_disable_literal_pool

gcc/ChangeLog.arm:
2016-12-05  Andre Vieira  <andre.simoesdiasvie...@arm.com>

PR target/71607
* config/arm/arm.md (use_literal_pool): Removes.
(64-bit immediate split): No longer takes cost into consideration
if 'arm_disable_literal_pool' is enabled.
* config/arm/arm.c (arm_use_blocks_for_constant_p): New.
(TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
(arm_max_const_double_inline_cost): Remove use of
arm_disable_literal_pool.
* config/arm/vfp.md (no_literal_pool_df_immediate): New.
(no_literal_pool_sf_immediate): New.

gcc/testsuite/ChangeLog.arm:
2016-12-05  Andre Vieira  <andre.simoesdiasvie...@arm.com>
Thomas Preud'homme  <thomas.preudho...@arm.com>

PR target/71607
* gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
* gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
* gcc.target/arm/thumb2-slow-flash-data-2.c: New.
* gcc.target/arm/thumb2-slow-flash-data-3.c: New.
* gcc.target/arm/thumb2-slow-flash-data-4.c: New.
* gcc.target/arm/thumb2-slow-flash-data-5.c: New.


Added:
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
Removed:
   
branches/ARM/embedded-6-branch/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
Modified:
branches/ARM/embedded-6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-6-branch/gcc/config/arm/arm.c
branches/ARM/embedded-6-branch/gcc/config/arm/arm.md
branches/ARM/embedded-6-branch/gcc/config/arm/vfp.md
branches/ARM/embedded-6-branch/gcc/testsuite/ChangeLog.arm

[Bug rtl-optimization/78255] [5/6 regression] Indirect sibling call causing wrong code generation for ARM

2017-01-09 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

--- Comment #14 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Mon Jan  9 09:58:54 2017
New Revision: 244220

URL: https://gcc.gnu.org/viewcvs?rev=244220=gcc=rev
Log:
PR78255: Make postreload aware of NO_FUNCTION_CSE

gcc/ChangeLog:
2017-01-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog:
2017-01-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-20  Andre Vieira  <andre.simoesdiasvie...@arm.com>

* gcc.target/arm/pr78255-2.c: Fix to work for targets
that do not optimize for tailcall.

2017-01-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

Backport from mainline
2016-12-09  Andre Vieira <andre.simoesdiasvie...@arm.com>

PR rtl-optimization/78255
* gcc.target/aarch64/pr78255.c: New.
* gcc.target/arm/pr78255-1.c: New.
* gcc.target/arm/pr78255-2.c: New.


Added:
branches/gcc-6-branch/gcc/testsuite/gcc.target/aarch64/pr78255.c
branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/pr78255-1.c
branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/pr78255-2.c
Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/postreload.c
branches/gcc-6-branch/gcc/testsuite/ChangeLog

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-05-24 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

--- Comment #8 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Thu May 24 08:53:39 2018
New Revision: 260635

URL: https://gcc.gnu.org/viewcvs?rev=260635=gcc=rev
Log:
PR target/83009: Relax strict address checking for store pair lanes

The operand constraint for the memory address of store/load pair lanes was
enforcing strictly hardware registers be allowed as memory addresses.  We want
to relax that such that these patterns can be used by combine.  During register
allocation the register constraint will enforce the correct register is chosen.

gcc
2018-05-24  Andre Vieira  <andre.simoesdiasvie...@arm.com>

PR target/83009
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.

gcc/testsuite
2018-05-24  Andre Vieira  <andre.simoesdiasvie...@arm.com>

PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/predicates.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-05-30 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

--- Comment #10 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Wed May 30 15:59:14 2018
New Revision: 260957

URL: https://gcc.gnu.org/viewcvs?rev=260957=gcc=rev
Log:
Reverting r260635

gcc
2018-05-30  Andre Vieira  

2018-05-24  Andre Vieira  

PR target/83009
Revert:
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.

gcc/testsuite
2018-05-30  Andre Vieira  

2018-05-24  Andre Vieira  
Revert
PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260635
138bc75d-0d04-0410-961f-82ee72b054a4

Modified:
trunk/gcc/config/aarch64/predicates.md
trunk/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-05-29 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from avieira at gcc dot gnu.org ---
I believe my patch fixes this.

[Bug lto/86366] [9 regression] gcc.dg/profile-dir-3.c fails starting with r262251

2018-07-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86366

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
Hi Martin,

We have also seen profile-dir-1.gcda fail on aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, as well as profile-dir-3.gcda, recently.

I am assuming this is all related.

Cheers,
Andre

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-07-26 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #2 from avieira at gcc dot gnu.org ---
I am having quite a lot of trouble understanding what is going wrong, or maybe
I should say, what parts are going right.

I believe it tries to match the fifth alternative for anddi3_insn here which
is:
'' 'r' 'De'
This fails because of the early clobber, rightfully so because:
(insn 13 11 14 2 (set (reg:DI 0 r0 [125])
(and:DI (reg:DI 1 r1 [+-4 ])
(const_int 1 [0x1]))) "../t.c":3 79 {*anddi3_insn}
 (nil))

DI r0 overlaps with DI r1, seeing you need two consecutive GPRs to contain a
DImode.

I decided to debug reload to find out why it had picked r1 and I find
'get_hard_regno' first picks r2 for (subreg:DI (SI 122)) in the same
instruction. If we go up we see:
(insn 10 9 11 2 (set (reg:SI 2 r2 [122])
(xor:SI (reg:SI 0 r0 [orig:123 a ] [123])
(const_int 1 [0x1]))) "../t.c":3 111 {*arm_xorsi3}
 (nil))

Then in 'get_hard_regno' it invokes 'subreg_regno_offset', that returns
'nregs_xmode - nregs_ymode' as offset in big endian for paradoxical subregs
with offset 0, where, xmode is inner and ymode is outer. That is '-1' in our
case (and always negative). So I believe reload is now seeing 'r1-r2' as the
register pair for that first 'and' operand and 'r0-r1' as the destination
operand.

At first I was thinking this was a middle-end issue, specifically for
paradoxical subregs. However, I also saw a bit of Aarch64 big endian assembly
that used 'odd' registers to represent DI register pairs (V2DI).  

Given the comment in 'subreg_regno_offset':
  /* If this is a big endian paradoxical subreg, which uses more
 actual hard registers than the original register, we must
 return a negative offset so that we find the proper highpart
 of the register.

 We assume that the ordering of registers within a multi-register
 value has a consistent endianness: if bytes and register words
 have different endianness, the hard registers that make up a
 multi-register value must be at least word-sized.  */

It made me start to think that GCC expects register pairs in big endian to be
"called" by their Least Significant Register (LSR) and to be counted back from
there. So '[r1, r0]' to be called (DI r1). I am not entirely sure about this
though...

I tried changing the arm back-end to only accept DI mode register pairs if the
register is odd. That fixed this case but broke a lot of other things. I am
thinking another way to fix it is to adapt Arm's 's_register_operand' to not
accept paradoxical subregs in big endian, but I would first like to understand
how the middle end expects/sees/generates register pairs if
'REG_WORDS_BIG_ENDIAN' is true.

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-07-26 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #3 from avieira at gcc dot gnu.org ---
@Vlad: I added you to this ticket to see if maybe you can shine some light on
how GCC's register allocator deals with register pairs in big endian, I am
struggling to figure out how all of this works together, see comment before
this.

Thanks in advance!

[Bug fortran/25829] [F03] Asynchronous IO support

2018-07-31 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25829

--- Comment #46 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Tue Jul 31 08:42:21 2018
New Revision: 263082

URL: https://gcc.gnu.org/viewcvs?rev=263082=gcc=rev
Log:
Reverting 'AsyncI/O patch committed' as it is breaking bare-metal builds.

2018-07-31  Andre Vieira  

Revert 'AsyncI/O patch committed'
2018-07-25  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* gfortran.texi: Add description of asynchronous I/O.
* trans-decl.c (gfc_finish_var_decl): Treat asynchronous variables
as volatile.
* trans-io.c (gfc_build_io_library_fndecls): Rename st_wait to
st_wait_async and change argument spec from ".X" to ".w".
(gfc_trans_wait): Pass ID argument via reference.

2018-07-31  Andre Vieira  

Revert 'AsyncI/O patch committed'
2018-07-25  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* gfortran.dg/f2003_inquire_1.f03: Add write statement.
* gfortran.dg/f2003_io_1.f03: Add wait statement.

2018-07-31  Andre Vieira  

Revert 'AsyncI/O patch committed'
2018-07-25  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* Makefile.am: Add async.c to gfor_io_src.
Add async.h to gfor_io_headers.
* Makefile.in: Regenerated.
* gfortran.map: Add _gfortran_st_wait_async.
* io/async.c: New file.
* io/async.h: New file.
* io/close.c: Include async.h.
(st_close): Call async_wait for an asynchronous unit.
* io/file_pos.c (st_backspace): Likewise.
(st_endfile): Likewise.
(st_rewind): Likewise.
(st_flush): Likewise.
* io/inquire.c: Add handling for asynchronous PENDING
and ID arguments.
* io/io.h (st_parameter_dt): Add async bit.
(st_parameter_wait): Correct.
(gfc_unit): Add au pointer.
(st_wait_async): Add prototype.
(transfer_array_inner): Likewise.
(st_write_done_worker): Likewise.
* io/open.c: Include async.h.
(new_unit): Initialize asynchronous unit.
* io/transfer.c (async_opt): New struct.
(wrap_scalar_transfer): New function.
(transfer_integer): Call wrap_scalar_transfer to do the work.
(transfer_real): Likewise.
(transfer_real_write): Likewise.
(transfer_character): Likewise.
(transfer_character_wide): Likewise.
(transfer_complex): Likewise.
(transfer_array_inner): New function.
(transfer_array): Call transfer_array_inner.
(transfer_derived): Call wrap_scalar_transfer.
(data_transfer_init): Check for asynchronous I/O.
Perform a wait operation on any pending asynchronous I/O
if the data transfer is synchronous. Copy PDT and enqueue
thread for data transfer.
(st_read_done_worker): New function.
(st_read_done): Enqueue transfer or call st_read_done_worker.
(st_write_done_worker): New function.
(st_write_done): Enqueue transfer or call st_read_done_worker.
(st_wait): Document as no-op for compatibility reasons.
(st_wait_async): New function.
* io/unit.c (insert_unit): Use macros LOCK, UNLOCK and TRYLOCK;
add NOTE where necessary.
(get_gfc_unit): Likewise.
(init_units): Likewise.
(close_unit_1): Likewise. Call async_close if asynchronous.
(close_unit): Use macros LOCK and UNLOCK.
(finish_last_advance_record): Likewise.
(newunit_alloc): Likewise.
* io/unix.c (find_file): Likewise.
(flush_all_units_1): Likewise.
(flush_all_units): Likewise.
* libgfortran.h (generate_error_common): Add prototype.
* runtime/error.c: Include io.h and async.h.
(generate_error_common): New function.

2018-07-31  Andre Vieira  

Revert 'AsyncI/O patch committed'.
2018-07-25  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* testsuite/libgomp.fortran/async_io_1.f90: New test.
* testsuite/libgomp.fortran/async_io_2.f90: New test.
* testsuite/libgomp.fortran/async_io_3.f90: New test.
* testsuite/libgomp.fortran/async_io_4.f90: New test.
* testsuite/libgomp.fortran/async_io_5.f90: New test.
* testsuite/libgomp.fortran/async_io_6.f90: New test.
* testsuite/libgomp.fortran/async_io_7.f90: New test.


Removed:
trunk/libgfortran/io/async.c
trunk/libgfortran/io/async.h
trunk/libgomp/testsuite/libgomp.fortran/async_io_1.f90
trunk/libgomp/testsuite/libgomp.fortran/async_io_2.f90
trunk/libgomp/testsuite/libgomp.fortran/async_io_3.f90
trunk/libgomp/testsuite/libgomp.fortran/async_io_4.f90
trunk/libgomp/testsuite/libgomp.fortran/async_io_5.f90
trunk/libgomp/testsuite/libgomp.fortran/async_io_6.f90
trunk/libgomp/tests

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-07-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-07-16
 CC||avieira at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from avieira at gcc dot gnu.org ---
Confirmed with a local build.

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-07-19 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

--- Comment #11 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Thu Jul 19 14:03:21 2018
New Revision: 262881

URL: https://gcc.gnu.org/viewcvs?rev=262881=gcc=rev
Log:
[AArch64][PATCH 2/2] PR target/83009: Relax strict address checking for store
pair lanes

gcc/ChangeLog
2018-07-19  Andre Vieira  

PR target/83009
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make
address check not strict.

gcc/testsuite/ChangeLog
2018-07-19  Andre Vieira  

PR target/83009
* gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/predicates.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-04-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||avieira at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |avieira at gcc dot 
gnu.org

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-04-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

--- Comment #5 from avieira at gcc dot gnu.org ---
I have been looking at this and the problem does indeed lie with the register
not being a hard reg because aarch64_mem_pair_lanes_operand invokes
aarch64_legitimate_address_p with 1 for the strict_p argument.

Changing that to a 0 yields the desired results for this testcase. Also good to
note that this is not an ilp32 issue only, because of this we would also miss
cases where the argument hard-register would not be successfully combined into
the load/store. Say if for instance the argument in the test function were a
pointer to the pointer we are addressing.

I will proceed to run tests now, if someone knows why that "strict_p" was being
set to 1  please let me know, I am unfamiliar with this code and fear this
might be too big a hammer.

[Bug target/83009] gcc.target/aarch64/store_v2vec_lanes.c fails with -mabi=ilp32

2018-04-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009

--- Comment #7 from avieira at gcc dot gnu.org ---
Bootstrap and regression testing looks good. Ill put the patch up on the ML
when we enter stage 1 again.

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-11-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #5 from avieira at gcc dot gnu.org ---
I can confirm the ICE no longer occurs, but I am not entirely convinced the
issue was "fixed" by this.  I fear the underlying fault is still there, it is
simply hidden now.

[Bug target/88224] Wrong Cortex-R7 and Cortex-R8 FPU configuration

2018-12-14 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88224

--- Comment #3 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Fri Dec 14 09:04:24 2018
New Revision: 267124

URL: https://gcc.gnu.org/viewcvs?rev=267124=gcc=rev
Log:
PR target/88224: Fix FPU configuration of Cortex-R7 and Cortex-R8

gcc/
2018-12-14  Andre Vieira  

Backport from mainline
PR target/88224
* config/arm/arm-cpus.in (armv7-r): Add FP16conv configurations.
(cortex-r7, cortex-r8): Update fpu and add new configuration.
* doc/invoke.texi (armv7-r): Add two new vfp options.
(nofp.dp): Add cortex-r7 and cortex-r8 to the list of targets that
support this option.


Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/arm/arm-cpus.in
branches/gcc-8-branch/gcc/doc/invoke.texi

[Bug target/88224] Wrong Cortex-R7 and Cortex-R8 FPU configuration

2018-12-14 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88224

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from avieira at gcc dot gnu.org ---
Fixed on trunk and gcc-8.

[Bug target/88224] Wrong Cortex-R7 and Cortex-R8 FPU configuration

2018-11-29 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88224

--- Comment #2 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Thu Nov 29 10:20:13 2018
New Revision: 266612

URL: https://gcc.gnu.org/viewcvs?rev=266612=gcc=rev
Log:
[PATCH] [Arm] Fix fpu configurations for Cortex-R7 and Cortex-R8

gcc/ChangeLog:
2018-11-29  Andre Vieira  

PR target/88224
* config/arm/arm-cpus.in (armv7-r): Add FP16conv configurations.
(cortex-r7, cortex-r8): Update default and add new configuration.
* doc/invoke.texi (armv7-r): Add two new vfp options.
(nofp.dp): Add cortex-r7 and cortex-r8 to the list of targets that
support this option.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-cpus.in
trunk/gcc/doc/invoke.texi

[Bug target/88224] New: Wrong Cortex-R7 and Cortex-R8 FPU configuration

2018-11-27 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88224

Bug ID: 88224
   Summary: Wrong Cortex-R7 and Cortex-R8 FPU configuration
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

The Cortex-R7 and Cortex-R8 TRM's* indicate that both CPUs can be configured
with one of the following FPU options:
1) No FPU
2) Single precision-only VFPv3, with 16 double-precision registers and with
FP16 conversion instructions extension
3) Single and double-precision VFPv3, with 16 double-precision registers and
with FP16 conversion instructions extension.

Currently GCC configures R7 and R8 without FP16 conversion instructions when
using -mcpu=cortex-r7/cortex-r8 and it does not offer the single-precision only
configuration (i.e. no +npfp.dp)


*) https://static.docs.arm.com/ddi0458/c/DDI0458C_cortex_r7_r0p1_trm.pdf
https://static.docs.arm.com/100400/0001/arm_cortexr8_mpcore_processor_trm_100400_0001_03_en.pdf

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-12-19 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #7 from avieira at gcc dot gnu.org ---
Hi,

This one sort of fell through the cracks on me. With help from Vlad and Richard
S. I managed to track the issue to uses_hard_regs_p and the way it handles
paradoxical subregs (or fails to). I have a patch for this, which I will rebase
and test.  Ill give your new testcase a whirl Oliver thanks!

Cheers,
Andre

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2018-12-19 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #8 from avieira at gcc dot gnu.org ---
Oliver,

Your new example doesn't seem to be hitting the same issue as the first one.
The first failure was being caused by paradoxical subregs, the second one
doesn't have paradoxical subregs.

I'll try to investigate it.

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2019-02-20 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #12 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Wed Feb 20 14:11:43 2019
New Revision: 269039

URL: https://gcc.gnu.org/viewcvs?rev=269039=gcc=rev
Log:
[GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical
subregs

gcc/ChangeLog:
2019-02-20 Andre Vieira  

PR target/86487
* lra-constraints.c(uses_hard_regs_p): Fix handling of
paradoxical SUBREGS.

gcc/testsuite/ChangeLog:
2019-02-20 Andre Vieira  

PR target/86487
* gcc.target/arm/pr86487.c: New.

Added:
trunk/gcc/testsuite/gcc.target/arm/pr86487.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c
trunk/gcc/testsuite/ChangeLog

[Bug target/86487] [7/8/9 Regression] insn does not satisfy its constraints on arm big-endian

2019-01-31 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #10 from avieira at gcc dot gnu.org ---
Hi Vlad,

I don't think it is a duplication. I believe this PR is caused by an issue with
'uses_hard_regs_p' and paradoxical subregs. I proposed a patch in
https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00307.html , though it has a
mistake, I forgot to add '|| SUBREG_P (x)' to the 'if (REG_P (x))' line since x
can now be a subreg.  I haven't had much time lately, but I am now running the
last bootstrap, have done arm and aarch64, now doing x86.

I can't reproduce this on GCC 9 but I can on 8 and earlier and the latent bug
is still there on 9. So I believe we should fix it regardless.

Once the bootstrap is done Ill post the fixed patch + testcase (really only
useful for gcc-8) on the mailing list.

Cheers,
Andre

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2019-08-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 91460, which changed state.

Bug 91460 Summary: gcc -mpreferred-vector-width=256 is slower than 
-mpreferred-vector-width=128 for some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91460

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/88915] Try smaller vectorisation factors in scalar fallback

2019-08-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88915

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||skpgkp2 at gmail dot com

--- Comment #4 from avieira at gcc dot gnu.org ---
*** Bug 91460 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/91460] gcc -mpreferred-vector-width=256 is slower than -mpreferred-vector-width=128 for some loops

2019-08-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91460

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||avieira at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #4 from avieira at gcc dot gnu.org ---
Yes this looks like a duplicate of PR 88915. I'll mark it as such.

*** This bug has been marked as a duplicate of bug 88915 ***

[Bug tree-optimization/92347] [10 Regression] ICE in vect_get_vec_def_for_operand_1, at tree-vect-stmts.c:1537

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92347

--- Comment #4 from avieira at gcc dot gnu.org ---
The second case seems to be because vectorizable_simd_clone_call seems to be
inserting values and phi-nodes on the epilogue's preheader edge which uses a
value defined in the main loop's preheader edge (created by the main loop's
call to vectorizable_simd_clone_call). However this definition does not
dominate the use, as the main loop may have been skipped.

Not entirely sure what the best action is here, I didn't get enough time to
figure out what these values represent.

[Bug tree-optimization/92347] [10 Regression] ICE in vect_get_vec_def_for_operand_1, at tree-vect-stmts.c:1537

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92347

--- Comment #3 from avieira at gcc dot gnu.org ---
I had a look at the first testcase. I think the problem is I was setting the
epilogue's safelen to the loop's safelen, after the loop->safelen had been
cleared, as we do this after vectorization. Removing that update and letting
epilogue keep the original safelen seems to solve the first ICE.  The second is
something different, looking at that now.

[Bug tree-optimization/92429] [10 Regression] ICE in vect_transform_stmt, at tree-vect-stmts.c:10918

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92429

--- Comment #2 from avieira at gcc dot gnu.org ---
So I had a look at this, the ICE occurs because 'vectorizable_condition' does
not know how to handle a constant cond_expr.

The reason this cond_expr is constant in the epilogue is because
'simplify_replace_tree' folds the replacement and the replacement in this case
is:

"_34 < 0" where "_34 = _33 * _33", and fold-const is able to assert that _34 is
therefore always positive or zero and can fold the check to false. The question
now is, why was the original 'cond_expr' that we copied over not folded? I
suspect its because of the -fno-tree-fre.

If we want this to work I suggest we either:
1) teach 'vectorizable_condition' to learn how to deal with constant
cond_expr's
2) change 'simplify_replace_tree' to optionally fold.

I don't like 2) much because this doesn't guarantee we don't fold elsewhere. 
If we want the vectorizer to accept loop code in sub-optimal format I suggest
we do 1).

[Bug tree-optimization/92347] [10 Regression] ICE in vect_get_vec_def_for_operand_1, at tree-vect-stmts.c:1537

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92347

--- Comment #5 from avieira at gcc dot gnu.org ---
Not quite sure the third case has anything to do with epilogue vectorization
though... It still manifests itself with it turned off. Seems to be a lack of
"folding" again.

I think it would be useful to split testcases 2 and 3 into two new PR's as they
are unrelated issues to 1.

[Bug tree-optimization/92347] [10 Regression] ICE in vect_get_vec_def_for_operand_1, at tree-vect-stmts.c:1537

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92347

--- Comment #7 from avieira at gcc dot gnu.org ---
Thank you!

[Bug tree-optimization/92460] [10 Regression] ICE: verify_ssa failed (error: definition in block 13 does not dominate use in block 22)

2019-11-11 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92460

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-11-11
 Ever confirmed|0   |1

--- Comment #1 from avieira at gcc dot gnu.org ---
The ICE seems to be because vectorizable_simd_clone_call is inserting values
and phi-nodes on the epilogue's preheader edge which uses a value defined in
the main loop's preheader edge (created by the main loop's call to
vectorizable_simd_clone_call). However this definition does not dominate the
use, as the main loop may have been skipped.

Not entirely sure what the best action is here, I didn't get enough time to
figure out what these values represent. I suspect this is not because of my
changes though, but it was a latent issue that now shows up because I turned on
epilogue vectorization.

[Bug tree-optimization/92317] [10 Regression] ICE in slpeel_duplicate_current_defs_from_edges, at tree-vect-loop-manip.c:960 since r277569

2019-11-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92317

--- Comment #3 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Wed Nov  6 11:22:35 2019
New Revision: 277877

URL: https://gcc.gnu.org/viewcvs?rev=277877=gcc=rev
Log:
[vect] PR92317: fix skip_epilogue creation for epilogues

gcc/ChangeLog:
2019-11-06  Andre Vieira  

PR tree-optimization/92317
* tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_guard2): Also
update phi's with constant phi arguments.

gcc/testsuite/ChangeLog:
2019-11-06  Andre Vieira  

PR tree-optimization/92317
* gcc/testsuite/g++.dg/opt/pr92317.C: New test.


Added:
trunk/gcc/testsuite/g++.dg/opt/pr92317.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-loop-manip.c

[Bug tree-optimization/92317] [10 Regression] ICE in slpeel_duplicate_current_defs_from_edges, at tree-vect-loop-manip.c:960 since r277569

2019-11-06 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92317

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from avieira at gcc dot gnu.org ---
I believe that patch fixes the issue.

[Bug tree-optimization/92317] [10 Regression] ICE in slpeel_duplicate_current_defs_from_edges, at tree-vect-loop-manip.c:960 since r277569

2019-11-01 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92317

--- Comment #1 from avieira at gcc dot gnu.org ---
Confirmed. It seems get_loop_copy is returning NULL. I'm looking into it.

[Bug tree-optimization/92317] [10 Regression] ICE in slpeel_duplicate_current_defs_from_edges, at tree-vect-loop-manip.c:960 since r277569

2019-11-01 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92317

--- Comment #2 from avieira at gcc dot gnu.org ---
Actually upon a second look it has nothing to do with that, that get_loop_body
doesn't make much sense there anyways. I believe that should have just been
'loop' as slpeel_tree_duplicate_loop_to_edge_cfg creates a copy of LOOP from
LOOP if LOOP == SCALAR_LOOP.  The problem here lies with using SCALAR_LOOP for
an epilogue... not quite sure what is wrong though.

[Bug tree-optimization/92351] [10 Regression] Wrong code with -O3 -match=skylake since r277569

2019-11-08 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92351

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from avieira at gcc dot gnu.org ---
I believe the committed patch fixes this.

[Bug tree-optimization/92351] [10 Regression] Wrong code with -O3 -match=skylake since r277569

2019-11-08 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92351

--- Comment #3 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Fri Nov  8 13:52:56 2019
New Revision: 277974

URL: https://gcc.gnu.org/viewcvs?rev=277974=gcc=rev
Log:
[vect] PR 92351: When peeling for alignment make alignment of epilogues unknown

gcc/ChangeLog:
2019-11-08  Andre Vieira  

PR tree-optimization/92351
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): When we are
peeling the main loop for alignment, make sure to set the misalignment
of the epilogue's data references to DR_MISALIGNMENT_UNKNOWN.

gcc/testsuite/ChangeLog:
2019-11-08  Andre Vieira  

PR tree-optimization/92351
* gcc.dg/vect/vect-peel-2.c: Disable epilogue vectorization and
split the source of this test to...
* gcc.dg/vect/vect-peel-2-src.c: ... This.
* gcc.dg/vect/vect-peel-2-epilogues.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/vect-peel-2-epilogues.c
trunk/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/vect/vect-peel-2.c
trunk/gcc/tree-vect-data-refs.c

[Bug tree-optimization/91573] Vectorization failure for a loop to do multiply-add because SLP loads unnecessarily require permutation

2019-12-12 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||avieira at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #7 from avieira at gcc dot gnu.org ---
This now vectorizes for aarch64 and x86_64 with avx2 and avx512. Closing this
ticket.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2019-12-12 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 91573, which changed state.

Bug 91573 Summary: Vectorization failure for a loop to do multiply-add because 
SLP loads unnecessarily require permutation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2019-12-12 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 91573, which changed state.

Bug 91573 Summary: Vectorization failure for a loop to do multiply-add because 
SLP loads unnecessarily require permutation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/88915] Try smaller vectorisation factors in scalar fallback

2019-10-29 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88915

--- Comment #5 from avieira at gcc dot gnu.org ---
Author: avieira
Date: Tue Oct 29 13:15:46 2019
New Revision: 277569

URL: https://gcc.gnu.org/viewcvs?rev=277569=gcc=rev
Log:
[vect]PR 88915: Vectorize epilogues when versioning loops

gcc/ChangeLog:
2019-10-29  Andre Vieira  

PR 88915
* tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
and make the valueize function pointer also take a void pointer.
* gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
around vn_valueize, to call it without a context.
(process_bb): Use vn_valueize_wrapper instead of vn_valueize.
* tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
(~_loop_vec_info): Release epilogue_vinfos.
(vect_analyze_loop_costing): Use knowledge of main VF to estimate
number of iterations of epilogue.
(vect_analyze_loop_2): Adapt to analyse main loop for all supported
vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
versioning threshold needed for main loop.
(vect_analyze_loop): Likewise.
(find_in_mapping): New helper function.
(update_epilogue_loop_vinfo): New function.
(vect_transform_loop): When vectorizing epilogues re-use analysis done
on main loop and call update_epilogue_loop_vinfo to update it.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
stmts on loop preheader edge.
(vect_do_peeling): Enable skip-vectors when doing loop versioning if
we decided to vectorize epilogues.  Update epilogues NITERS and
construct ADVANCE to update epilogues data references where needed.
* tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
(vect_do_peeling, vect_update_inits_of_drs,
 determine_peel_for_niter, vect_analyze_loop): Add or update
declarations.
* tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
created loop_vec_info's for epilogues when available.  Otherwise
analyse
epilogue separately.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-loop-niter.c
trunk/gcc/tree-ssa-loop-niter.h
trunk/gcc/tree-ssa-sccvn.c
trunk/gcc/tree-vect-loop-manip.c
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree-vectorizer.c
trunk/gcc/tree-vectorizer.h

[Bug target/86487] [8 Regression] insn does not satisfy its constraints on arm big-endian

2020-01-27 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

--- Comment #15 from avieira at gcc dot gnu.org ---
Jeff seems to have backported this to gcc-8 already, so I guess we can close
this?

[Bug tree-optimization/92429] [10 Regression] ICE in vect_transform_stmt, at tree-vect-stmts.c:10918

2020-01-09 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92429

--- Comment #5 from avieira at gcc dot gnu.org ---
Hi Martin,

Sorry about that, forgot to check it after I got back from holidays. I wrote up
a patch, actually going with solution 2) (fixes both issues locally).

Just running more tests now to make sure I didn't break anything else.

[Bug tree-optimization/92429] [10 Regression] ICE in vect_transform_stmt, at tree-vect-stmts.c:10918

2020-01-16 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92429

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from avieira at gcc dot gnu.org ---
I believe this is fixed, closing.

[Bug target/94445] gcc.target/arm/cmse/cmse-15.c fails for cortex-m33

2020-04-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94445

--- Comment #5 from avieira at gcc dot gnu.org ---
Yeah...

So far I have checked that 'gimplify_call_expr' creates the right gimple, and
up until 'gimplify_modify_expr' I can verify it does by using
gimple_call_fntype .

Though at expansion time, the 'gimple_call_fntype (stmt)' of '_5 = s_bar_p_2(D)
(); [tail call]' now has the attribute ...

So it must go wrong somewhere between gimplification and expansion, but that's
a big window and dump files won't help us :(

[Bug target/94445] gcc.target/arm/cmse/cmse-15.c fails for cortex-m33

2020-04-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94445

--- Comment #4 from avieira at gcc dot gnu.org ---
Yeah...

So far I have checked that 'gimplify_call_expr' creates the right gimple, and
up until 'gimplify_modify_expr' I can verify it does by using
gimple_call_fntype .

Though at expansion time, the 'gimple_call_fntype (stmt)' of '_5 = s_bar_p_2(D)
(); [tail call]' now has the attribute ...

So it must go wrong somewhere between gimplification and expansion, but that's
a big window and dump files won't help us :(

[Bug target/94445] gcc.target/arm/cmse/cmse-15.c fails for cortex-m33

2020-04-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94445

--- Comment #6 from avieira at gcc dot gnu.org ---
I have also identified that this only goes wrong in O2 or higher. And it
happens sometime between tailcall optimization pass 1 and 2. But there's loads
of passes in between.

[Bug target/94445] gcc.target/arm/cmse/cmse-15.c fails for cortex-m33

2020-04-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94445

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2020-04-02

--- Comment #1 from avieira at gcc dot gnu.org ---
Hi Christophe,

This looks to me like an issue of not building distinct types for the ns_foo_t
and s_bar_t function types.

When I first wrote this code I tested for this and it was working, so I am
wondering whether changes have been made in the way we create types in the
c-frontend.

I am trying to find out how all this works again, its been a while...

[Bug target/94445] gcc.target/arm/cmse/cmse-15.c fails for cortex-m33

2020-04-02 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94445

--- Comment #2 from avieira at gcc dot gnu.org ---
start_decl seems to be doing the right thing, investigation continues...

[Bug target/94814] [8 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in output_3367, at config/aarch64/atomics.md:755

2020-04-28 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94814

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
 CC||avieira at gcc dot gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
I believe this is fixed with the above backport.

[Bug tree-optimization/91246] vectorization failure for a small loop to search array element

2020-03-18 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246

--- Comment #5 from avieira at gcc dot gnu.org ---
I have posted a prototype on the mailing list
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541908.html

This is really just a prototype to investigate code-gen impact, I don't expect
to commit this as is and whether it makes sense to do something like this.

[Bug target/96795] New: MVE: issue with polymorphism and integer promotion

2020-08-26 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96795

Bug ID: 96795
   Summary: MVE: issue with polymorphism and integer promotion
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

An example of this issue can be observed when trying to compile:

#include 
uint16x8_t foo (uint16x8_t a, int16_t b)
{
  return vaddq (a, (b<<3));
}

This will lead to an __ARM_undef being selected.

I believe this is because __ARM_mve_coerce only accepts one type for scalar
parameters and should have accepted the same range of types for scalar as is
done in __ARM_mve_typeid.

A workaround for this is to cast (b<<3) to uint16_t.

[Bug target/95646] arm-none-eabi function attribute 'cmse_nonsecure_entry' wipes register values with -Os

2020-06-15 Thread avieira at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95646

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2020-06-15
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from avieira at gcc dot gnu.org ---
Reproduced and confirmed.  This is because we special treat HI_REGS in Thumb-1
when optimizing for size.  I have a fix ready, just doing some testing.

[Bug target/97327] -mcpu=cortex-m55 warns without -mfloat-abi=hard or -march=armv8.1-m.main

2020-10-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97327

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #2 from avieira at gcc dot gnu.org ---
The last two should conflict though right? I never quite understood this
warning to be fair. My personal preference would be to warn for any invocation
where both -mcpu and -march are passed, but I understand that for legacy
reasons that might be undesirable.

Though yeah -mcpu=cortex-m55 with a -mfloat-abi=soft should not warn for
anything obviously.

[Bug target/97327] -mcpu=cortex-m55 warns without -mfloat-abi=hard or -march=armv8.1-m.main

2020-10-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97327

--- Comment #4 from avieira at gcc dot gnu.org ---
With -mcpu=cortex-m55+nomve should be equivalent to -march=armv8.1-m.main+dsp

[Bug target/97327] -mcpu=cortex-m55 warns without -mfloat-abi=hard or -march=armv8.1-m.main

2020-10-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97327

--- Comment #5 from avieira at gcc dot gnu.org ---
Your other one:
-mcpu=cortex-m55+nomve -march=armv8.1-m.main+mve -mfloat-abi=softfp
This has cpu without mve and arch with mve.

Another fun caveat to look at is in:
-mcpu=cortex-m55 -mfloat-abi=soft
float-abi=soft disables vector instructions, so it makes sense to remove mve.fp
and fp.dp/fp. However, we must make sure that +mve is still passed to the
assembler because +mve enables new scalar shift instructions.

If we want to be in-sync with legacy though I don't think we even need to look
at all these complicated cases as. Since it seems in the past we ignore fp
extensions, take for instance:
arm-none-eabi-gcc -mcpu=cortex-m7 -march=armv7e-m -mfloat-abi=hard
arm-none-eabi-gcc -mcpu=cortex-m7 -march=armv7e-m+fp -mfloat-abi=hard 
arm-none-eabi-gcc -mcpu=cortex-m7+nofp -march=armv7e-m  -mfloat-abi=soft
arm-none-eabi-gcc -mcpu=cortex-m7+nofp -march=armv7e-m+fp

None of these give the warning, so maybe the solution is to ignore MVE as well
as the FP extension when checking for this? There is a bit in the warning code 
that says:
  /* And if the target ISA lacks floating point, ignore any
 extensions that depend on that.  */
  if (!bitmap_bit_p (target->isa, isa_bit_vfpv2))
bitmap_and_compl (isa_delta, isa_delta, isa_all_fpbits);

Maybe we need to 'ignore any extension that depends on mve'? But I don't quite
understand how this works with the case where we do have isa_bit_vfpv2...

For Srinath's sake it would be good to agree on what the behaviour should be
and then work towards that. I personally don't have a strong feeling about this
other then: passing '-mcpu=cortex-m55' shouldn't give warnings ... since well
that's insane :P

[Bug target/96914] missing MVE intrinsics

2020-10-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96914

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #5 from avieira at gcc dot gnu.org ---
Hi Christophe,

The docs are right and so are you, those instructions should only have a signed
variant as the hardware instructions also only supports .S suffixes or in the
case of vmlaldavax do not support the cross 'X' variant with unsigned
datatypes.

[Bug target/93053] [9 Regression] libgcc build failure with old binutils on aarch64

2021-01-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93053

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from avieira at gcc dot gnu.org ---
I believe this has been fixed on all relevant branches.

[Bug target/95646] [GCC 9/10] arm-none-eabi function attribute 'cmse_nonsecure_entry' wipes register values with -Os

2021-01-25 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95646

avieira at gcc dot gnu.org changed:

   What|Removed |Added

Summary|arm-none-eabi function  |[GCC 9/10] arm-none-eabi
   |attribute   |function attribute
   |'cmse_nonsecure_entry'  |'cmse_nonsecure_entry'
   |wipes register values with  |wipes register values with
   |-Os |-Os

--- Comment #4 from avieira at gcc dot gnu.org ---
Changed title to reflect that  this still needs backports to GCC 9 and 10.

[Bug target/97528] [9/10 Regression] ICE in decompose_automod_address, at rtlanal.c:6298 (arm-linux-gnueabihf)

2021-02-01 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97528

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #7 from avieira at gcc dot gnu.org ---
Hi,

I am seeing this same fault cause a wrong-code gen on gcc-9 with the code
below:


void foo(uint16_t *dest, uint16x8_t a, unsigned long long stride)
{
   int i = 3;
   stride >>= 1;
   do {
 vst1_u16(dest, vget_low_u16(a));
 dest += stride;
 i = i - 1;
   } while (i != 0);
}

leading to:
foo:
vst1.16 {d0}, [r0], r0
vst1.16 {d0}, [r0], r0
vst1.16 {d0}, [r0]
bx  lr

which is obviously wrong. Can we backport this to gcc-9?

[Bug target/97528] [9/10 Regression] ICE in decompose_automod_address, at rtlanal.c:6298 (arm-linux-gnueabihf)

2021-02-03 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97528

--- Comment #12 from avieira at gcc dot gnu.org ---
@jakub: backported to gcc-8 and gcc-9. OK to close this?

[Bug tree-optimization/100981] ICE in info_for_reduction, at tree-vect-loop.c:4897

2021-06-09 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100981

--- Comment #5 from avieira at gcc dot gnu.org ---
Yeah that works. Ran it as is, no abort, ran it with s/ne/eq/ and it aborts.

[Bug tree-optimization/100981] ICE in info_for_reduction, at tree-vect-loop.c:4897

2021-06-09 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100981

--- Comment #6 from avieira at gcc dot gnu.org ---
FYI Tamar asked me to make sure the instructions were being generated. I
checked and they were, but not being used as it decides to inline MAIN__ and
inlining seems to break (as in not apply/missed oppurtunity) the complex
optimization.

So for this specific test I'd use -fno-inline, it executes the fcmla
instructions that way and it runs fine.

[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from avieira at gcc dot gnu.org ---
That should fix it.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2021-02-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 98974, which changed state.

Bug 98974 Summary: [11 Regression] ICE in vectorizable_condition after 
STMT_VINFO_VEC_STMTS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/98974] New: ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

Bug ID: 98974
   Summary: ICE in vectorizable_condition after
STMT_VINFO_VEC_STMTS
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hi,

After
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b05d5563f4be13b4a0d0951375a82adf483973c0
we found vectorizable_condition to ICE when autovectorizing for SVE.

The reduced fortran testcase is an example of this:
$ cat foo.F90
 module module_foobar
  integer,parameter :: fp_kind = selected_real_kind(15)
   contains
   subroutine foobar( foo, ix ,jx ,kx,iy,ky)
 real, dimension( ix, kx, jx )  :: foo
 real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz
   j_loop: do j=jts,enddo
   do k=0,ky
  do i=0,iy
if ( baz(i,k,1) > 0. ) then
  bar(i,k,1) = 0
endif
foo(i,nk,j) = baz0 *  bar(i,k,1)
  enddo
   enddo
   enddo j_loop
 end
end

And the following command will cause it to ICE:
$ gfortran  -Ofast -mcpu=neoverse-v1 foo.F90 -S

I have debugged this and I believe the issue is that before Richi's change
vectorizable_condition used to set vec_oprnds0 to vec_cond_lhs for each copy.
Now it is collected for all copies at the same time. However, when calling
vect_get_loop_mask we pass vec_num * ncopies as the nvectors parameter, where
vec_num has been set to the length of vec_oprnds0. I believe that because we
are now doing all ncopies at the same time we no longer need to multiply it by
ncopies.

I'll be posting a patch for this soon.

[Bug middle-end/98974] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

--- Comment #1 from avieira at gcc dot gnu.org ---
The testcase above issues a warning, around do j=jts,enddo

To use it as a testcase in my patch I'd like to get rid of it so if someone
proficient in Fortran knows a way to get rid of it that'd be great!

[Bug rtl-optimization/98791] [10 Regression] ICE in paradoxical_subreg_p (in ira) with SVE

2021-03-15 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from avieira at gcc dot gnu.org ---
Closing now as backport is done.

[Bug rtl-optimization/98791] [11 Regression] ICE in paradoxical_subreg_p (in ira) with SVE

2021-03-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

avieira at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work|10.2.1  |
  Known to fail||10.2.1
 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #6 from avieira at gcc dot gnu.org ---
Hi Jeffrey,

I was leaving thos open to remind me to backport the fix to gcc-10. I see the
ticket falsely claims it works for gcc-10. Reopening for backport.

[Bug rtl-optimization/98791] [10 Regression] ICE in paradoxical_subreg_p (in ira) with SVE

2021-03-08 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791

--- Comment #8 from avieira at gcc dot gnu.org ---
Aye my bad there, Thanks for the change.

[Bug target/86487] [8 Regression] insn does not satisfy its constraints on arm big-endian

2021-02-22 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from avieira at gcc dot gnu.org ---
Closing as it has been backported to 8 and 7 is closed.

[Bug target/98657] [11 Regression] SVE: ICE (unrecognizable insn) with shift at -O3 -msve-vector-bits=256

2021-02-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98657

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from avieira at gcc dot gnu.org ---
That should have fixed it. Closing.

[Bug tree-optimization/98726] [10/11 Regression] SVE: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5984

2021-02-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98726

--- Comment #8 from avieira at gcc dot gnu.org ---
Also at some point we should figure out why the vectorizer is generating this
much code for this example, where I think it should be able to optimized it to:

a = 22;
b &= c;

[Bug tree-optimization/98726] [10/11 Regression] SVE: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5984

2021-02-19 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98726

--- Comment #7 from avieira at gcc dot gnu.org ---
I'm looking at this and I have a feeling there is a disconnect on how some
passes define VECTOR_CST and how the expand pass handles it.

So the problem here seems to lie with the V4SImode VECTOR_CST at expand time:

{ POLY_INT_CST [24, 16], POLY_INT_CST [25, 16], POLY_INT_CST [26, 16],
POLY_INT_CST [27, 16] }

The problem seems to be that const_vector_from_tree only adds the first
VECTOR_CST_NPATTERNS * VECTOR_CST_NELTS_PER_PATTERN and this has:
 VECTOR_CST_NPATTERNS: 1
 VECTOR_CST_NELTS_PER_PATTERN: 3

The mode however dictates 4 elements (constant-sized V4SImode). So
rtx_vector_builder::build adds the first three and then tries to derive the
fourth (even though it is right there), at this point it fails as it uses
wi::sub and that doesn't seem to work for POLY_INT's.

This is where I started investigating how it should work. I looked at cases of
actual patterns involving POLY_INT's, like:
{ POLY_INT_CST [8, 8], POLY_INT_CST [9, 8], POLY_INT_CST [10, 8], ... }

These have a VLA mode, so because there is no constant element number
rtx_vector_builder::build uses the 'encoded_nelts' which are again the
VECTOR_CST_NPATTERNS * VECTOR_CST_NELTS_PER_PATTERN elements and never needs to
derive a step.

I also looked at how a VECTOR_CST with N random integers is built and there it
seems VECTOR_CST_NPATTERNS * VECTOR_CST_NELTS_PER_PATTERN describe the full
length of the VECTOR_CST.

At this point I don't know whether the construction of the VECTOR_CST is wrong,
or whether the building is, I just know there seems to be a disconnect.

There are a variety of things that we could do:
1) Change how the VECTOR_CST is being created so that VECTOR_CST_NPATTERNS *
VECTOR_CST_NELTS_PER_PATTERN == GET_MODE_NUNITS (m_mode).is_constant ()
for constant sized modes.
2) Change const_vector_from_tree to check whether a POLY_INT VECTOR_CST has a
constant sized mode, construct the RTVEC_ELT itself and use
rtx_vector_builder::build(rtvec v)
3) Teach rtx_vector_builder::step and apply_step how to deal with POLY_INT's

Out of all 2 is my favourite. Though we should aim to look at 1 too. Because I
have seen a big descrepancy in how these VECTOR_CST's are formed, I've also
seen:
{1, 1, 1, 1, 1, 1, 1, 1} being described using:
VECTOR_CST_NPATTERNS: 1
VECTOR_CST_NELTS_PER_PATTERN: 3

Which is unnecessary... {1, ...} would have sufficed with both NPATTERNS and
NELTS_PER_PATTERN set to 1 for instance, or make it so they multiply to 8.
Unless we want this flexibility?

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-12 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #5 from avieira at gcc dot gnu.org ---
Posted a fix on ML:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588237.html

Sorry for the breakage, wrong assumption by my part :(

  1   2   >