[Bug c++/113687] -Warray-bounds is not emitted inside class method

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113687

Richard Biener  changed:

   What|Removed |Added

 Blocks||56456
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-02-01

--- Comment #2 from Richard Biener  ---
(In reply to Andrew Pinski from comment #1)
> The warning only happens if the vague linkage function is used. and IIRC
> that is by design.

Yeah, we try to avoid diagnosing things on "dead" code and here the whole
functions are dead.  IIRC even -fanalyzer runs after cgraph removes unreachable
functions.

It would be still nice to diagnose these kind of trivial cases.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
[Bug 56456] [meta-bug] bogus/missing -Warray-bounds

[Bug testsuite/113685] [14 regression] gcc.dg/vect/vect-117.c fails profile checking with Invalid sum after r14-4089-gd45ddc2c04e471

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113685

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-01
 CC||hubicka at gcc dot gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |14.0
   Keywords||testsuite-fail
Summary|[14 regression] xxx fails   |[14 regression]
   |after yyy   |gcc.dg/vect/vect-117.c
   ||fails profile checking with
   ||Invalid sum after
   ||r14-4089-gd45ddc2c04e471
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
As said in the other PR, this is more for Honza who thought checking we do not
end with invalid profiles for all vect testcases is a good thing ;)

Btw, the wrong count pops up in DOM3:

t.c.203t.dom3:;;   Invalid sum of incoming counts 138435014 (estimated locally,
freq 3.0936), should be 134239200 (estimated locally, freq 2.)

so it seems to be a jump threading issue.  It's gone with -fno-thread-jumps.

Very likely a latent issue, but of course the change triggering this does
have an effect on jump threading.

Confirmed.

[Bug target/113684] Cross compiler without assembler and linker should assume that all assembler and linker features are available

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113684

--- Comment #3 from Richard Biener  ---
I'm usually having cross assembler/linker around as they are easy to build.

[Bug tree-optimization/110176] [11/12/13 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong
   |wrong code at -Os and above |code at -Os and above on
   |on x86_64-linux-gnu since   |x86_64-linux-gnu since
   |r11-2446|r11-2446

--- Comment #11 from Richard Biener  ---
Fixed on trunk sofar.

[Bug target/113641] [13/14 regression] 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug testsuite/113611] [14 Regression] gcc.dg/pr110279-1.c fails on cross build since gcc-14-5779-g746344dd538

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113611

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug rtl-optimization/110390] [13/14 regression] ICE on valid code on x86_64-linux-gnu with sel-scheduling: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609 since r13-3596-ge7310e24b1

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110390

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug target/105275] [12/13/14 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.4

[Bug debug/92444] [11/12/13/14 regression] gcc generates wrong debug information at -O2 and -O3 since r10-4122-gf658ad3002a0af

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92444

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.5

[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681

Richard Biener  changed:

   What|Removed |Added

   Keywords||error-recovery
   Target Milestone|--- |14.0

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|other   |rtl-optimization
Version|unknown |14.0
 Target||aarch64, x86_64-*-*

--- Comment #1 from Richard Biener  ---
Since there's a loop exit involved (and the loop has multiple exits)
if-conversion is made difficult here.

You could try rotating manually producing a do { } while loop with
a "nicer" exit condition and see whether that helps.

[Bug tree-optimization/110176] [11/12/13/14 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176

--- Comment #9 from Richard Biener  ---
With all VARYING we simplify

i_19 = (int) _2;
_6 = (int) _5;
Value numbering stmt = _7 = _6 <= i_19;
Applying pattern match.pd:6775, gimple-match-4.cc:1795
Match-and-simplified _6 <= i_19 to 1

where _5 is _Bool and _2 is unsigned int.  We match

 zext <= (int) 4294967295u

note that I see

Value numbering stmt = _2 = f$0_25;
Setting value number of _2 to 4294967295 (changed)
Value numbering stmt = i_19 = (int) _2;
Match-and-simplified (int) _2 to -1
RHS (int) _2 simplified to -1 
Not changing value number of i_19 from VARYING to -1
Making available beyond BB6 i_19 for value i_19

so it's odd we see the constant here, but ... we go

  (if (TREE_CODE (@10) == INTEGER_CST
   && INTEGRAL_TYPE_P (TREE_TYPE (@00))
   && !int_fits_type_p (@10, TREE_TYPE (@00)))
   (with
{
  tree min = lower_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00));
  tree max = upper_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00));
  bool above = integer_nonzerop (const_binop (LT_EXPR, type, max,
@10));
  bool below = integer_nonzerop (const_binop (LT_EXPR, type, @10,
min));
}
(if (above || below)

failing to see that we deal with a relational compare and a sign-change.

The original code from fold-const.cc had only INTEGER_TYPE support,
r6-4300-gf6c1575958f7bf made it cover all integral types (it half-way
supported BOOLEAN_TYPE already).  But the issue was latent I think.
One notable difference was that I think get_unwidened made sure to
convert a constant to the wider type while here we have @10 != @1
and the conversion not applied.  We're doing it correct in earlier code:

/* ???  The special-casing of INTEGER_CST conversion was in the original
   code and here to avoid a spurious overflow flag on the resulting
   constant which fold_convert produces.  */
(if (TREE_CODE (@1) == INTEGER_CST)

using @1 instead of @10.

Correcting that avoids the pattern from triggering in this wrong way.

[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Richard Biener  ---
Fixed.

[Bug middle-end/113680] Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680

Richard Biener  changed:

   What|Removed |Added

  Component|rtl-optimization|middle-end
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-01-31
   Keywords||easyhack,
   ||missed-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I don't think we have or had a (a - b) CMP 0 simplification pattern which
this seems to be about.  We have a +- CST CMP CST'.

Note the reverse, a < b ->  (a - b) < 0 isn't valid.

[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #20 from Richard Biener  ---
I think we want split_loop () handle this case.  That means extending it to
handle loops with multiple exits.  OTOH after loop rotation to

  if (i_21 == 1001)
goto ; [1.00%]
  else
goto ; [99.00%]

   [local count: 1004539166]:
  i_18 = i_21 + 1;
  if (N_13(D) > i_18)
goto ; [94.50%]
  else
goto ; [5.50%]

it could be also IVCANONs job to rewrite the exit test so the bound is
loop invariant and it becomes a single exit.

There's another recent PR where an exit condition like i < N && i < M
should become i < MIN(N,M).

[Bug tree-optimization/113630] [11/12/13 Regression] -fno-strict-aliasing introduces out-of-bounds memory access

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
Summary|[11/12/13/14 Regression]|[11/12/13 Regression]
   |-fno-strict-aliasing|-fno-strict-aliasing
   |introduces out-of-bounds|introduces out-of-bounds
   |memory access   |memory access
  Known to work||14.0

--- Comment #7 from Richard Biener  ---
Fixed on trunk sofar.

[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444

--- Comment #10 from Richard Biener  ---
Hmm, I have another fix.

[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444

--- Comment #9 from Richard Biener  ---
(In reply to Richard Biener from comment #8)
> The best fix would likely be to pre-insert all the IPA-CP known constants
> instead of trying to discover them "late".
> 
> I'm testing the easy fix for now.

Hmm.  gcc.dg/ipa/pr92497-1.c FAILs because of that.  We get

__attribute__((noinline))
int bar.constprop (struct a a)
{
  intD.6 a$aD.2808;
  intD.6 D.2807;
  struct a aD.2806;
  intD.6 _4;

   [local count: 1073741824]:
  # .MEM_5 = VDEF <.MEM_2(D)>
  aD.2806 = aD.2800;
  # VUSE <.MEM_5>
  a$a_3 = aD.2806.aD.2769;

here and thus translate through the aggregate copy - the result should then
be put on aD.2806 but of course only with .MEM_5.

Maybe we can and should always use the default def here but I'm slightly
uneasy with the ref adjustment, esp. since we're going to record
for the saved operands (if those exist - the path where it goes wrong
isn't translated).

[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444

--- Comment #8 from Richard Biener  ---
OK, so the issue is that we're recording the IPA result with the wrong VUSE
since we're calling vn_reference_lookup_2 with !data->last_vuse_ptr but
data->finish (vr->set, vr->base_set, v) inserts a hashtable entry with
data->last_vuse.  Note it's somewhat unexpected that vn_reference_lookup_2
performs hashtable insertion which is what causes the issue.  It's also
not as easy as using the updated vuse since if we're coming from translation
through a memcpy that would be wrong.  In fact we probably want to avoid
doing any insertion if theres sth fishy going on (!data->last_vuse_ptr).

The best fix would likely be to pre-insert all the IPA-CP known constants
instead of trying to discover them "late".

I'm testing the easy fix for now.

[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670

Richard Biener  changed:

   What|Removed |Added

  Known to fail|14.0|
   Target Milestone|--- |14.0
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
  Known to work||14.0

--- Comment #5 from Richard Biener  ---
Fixed for trunk.

[Bug tree-optimization/113678] SLP misses up vec_concat

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-31
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I think the SLP tree we discover is sound:

t2.c:11:14: note:   node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: *a_7(D) = _1;
t2.c:11:14: note:   stmt 0 *a_7(D) = _1;
t2.c:11:14: note:   stmt 1 MEM[(char *)a_7(D) + 1B] = _2;
t2.c:11:14: note:   stmt 2 MEM[(char *)a_7(D) + 2B] = _3;
t2.c:11:14: note:   stmt 3 MEM[(char *)a_7(D) + 3B] = _4;
t2.c:11:14: note:   stmt 4 MEM[(char *)a_7(D) + 4B] = _1;
t2.c:11:14: note:   stmt 5 MEM[(char *)a_7(D) + 5B] = _2;
t2.c:11:14: note:   stmt 6 MEM[(char *)a_7(D) + 6B] = _3;
t2.c:11:14: note:   stmt 7 MEM[(char *)a_7(D) + 7B] = _4;
t2.c:11:14: note:   children 0x5db7778
t2.c:11:14: note:   node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: _1 = *b_6(D);
t2.c:11:14: note:   stmt 0 _1 = *b_6(D);
t2.c:11:14: note:   stmt 1 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:   stmt 2 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:   stmt 3 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:   stmt 4 _1 = *b_6(D);
t2.c:11:14: note:   stmt 5 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:   stmt 6 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:   stmt 7 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:   load permutation { 0 1 2 3 0 1 2 3 }

the issue is as so often

t2.c:11:14: note:   ==> examining statement: _1 = *b_6(D);
t2.c:11:14: missed:   BB vectorization with gaps at the end of a load is not
supported
t2.c:3:19: missed:   not vectorized: relevant stmt not supported: _1 = *b_6(D);
t2.c:11:14: note:   Building vector operands of 0x5db7778 from scalars instead

where we are not applying much non-ad-hoc work to deal with those
"out-of-bound" accesses.  The choice here would be obvious in doing
a single vector(4) load instead.

[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-31
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
Yeah, most of the code in forwprop/match doesn't deal with the "new" permutes
where the result isn't the same length as the inputs.

[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
Summary|[11/12 Regression]  |[12 Regression]
   |Miscompilation tree-vrp |Miscompilation tree-vrp
   |__builtin_unreachable   |__builtin_unreachable

--- Comment #1 from Richard Biener  ---
Needs -std=c++20.  I can't reproduce locally.

[Bug c++/113674] [11/12/13/14 Regression] [[____attr____]] causes internal compiler error: in decl_attributes, at attribs.cc:776

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113674

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-01-31

[Bug tree-optimization/113673] [12/13/14 Regression] ICE: verify_flow_info failed: BB 5 cannot throw but has an EH edge with -Os -finstrument-functions -fnon-call-exceptions -ftrapv

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113673

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #2 from Richard Biener  ---
Looks like an issue in bswap with regard to EH.

[Bug regression/113672] [14 Regression] FAIL: g++.dg/pch/line-map-3.C -g -I. -Dwith_PCH (test for excess errors)

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113672

Richard Biener  changed:

   What|Removed |Added

   Keywords||testsuite-fail
   Target Milestone|--- |14.0

[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-01-31
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #3 from Richard Biener  ---
I'll hunt it down.

[Bug middle-end/113669] -fsanitize=undefined failed to check a signed integer overflow

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113669

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-31
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
So confirmed.

[Bug go/113668] [14 Regression] libgo soname bump needed for the GCC 14 release?

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113668

Richard Biener  changed:

   What|Removed |Added

   Keywords||ABI
 CC||rguenth at gcc dot gnu.org
   Target Milestone|--- |14.0

[Bug d/113667] [14 Regression] libgphobos symbols missing

2024-01-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113667

Richard Biener  changed:

   What|Removed |Added

   Keywords||ABI
   Priority|P3  |P1
   Target Milestone|--- |14.0

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

--- Comment #13 from Richard Biener  ---
(In reply to JuzheZhong from comment #12)
> OK. It seems it has data dependency issue:
> 
> missed:   not vectorized, possible dependence between data-refs a[i_15] and
> a[_4]
> 
> a[i_15] = _3;  STMT 1
> _4 = i_15 + 2;
> _5 = a[_4];STMT 2
> 
> STMT2 should not depend on STMT1.
> 
> It's recognized as dependency in vect_analyze_data_ref_dependence.
> 
> Is is reasonable to fix it in vect_analyze_data_ref_dependence ?

t2.c:4:21: note:   dependence distance  = 1.
t2.c:7:12: missed:   not vectorized, possible dependence between data-refs
a[i_15] and a[_4]
t2.c:4:21: missed:  bad data dependence.

so there's a cross iteration dependence with distance 1 - that's

(compute_affine_dependence
  ref_a: a[i_15], stmt_a: a[i_15] = _3;
  ref_b: a[_4], stmt_b: _5 = a[_4];
(analyze_overlapping_iterations
  (chrec_a = {0, +, 2}_1)
  (chrec_b = {2, +, 2}_1)
(analyze_siv_subscript 
(analyze_subscript_affine_affine
  (overlaps_a = [1 + 1 * x_1])
  (overlaps_b = [0 + 1 * x_1]))
) 
  (overlap_iterations_a = [1 + 1 * x_1])
  (overlap_iterations_b = [0 + 1 * x_1])) 
(build_classic_dist_vector
  dist_vector = (1 
  )
)
)

a read-after-write of a[i+2] after storing to a[i+1] in program order.
This would be fine with a VF of 1 only, but we are not really considering
that (a pure SLP vectorization w/o unrolling).  Instead we start with the
assumption of classical vectorization using interleaving which has a
minimal VF of the number of lanes of the vector type with the largest
number of lanes as determined by vect_analyze_data_refs.

We can delay this all a bit but then the SLP build will fail anyway:

t2.c:4:21: missed:   Build SLP failed: different interleaving chains in one
node _5 = a[_4];

which is because we do

t2.c:4:21: note:   === vect_analyze_data_ref_accesses ===
t2.c:4:21: note:   Detected interleaving load a[i_15] and a[_1]
t2.c:4:21: note:   Detected interleaving store a[i_15] and a[_1]
t2.c:4:21: note:   Detected interleaving load of size 2
t2.c:4:21: note:_2 = a[i_15];
t2.c:4:21: note:tem_10 = a[_1];
t2.c:4:21: note:   Detected single element interleaving a[_4] step 16

that is, we are splitting the chain because of the intermediate store
(that's kind-of OK-ish, heuristically it works for more cases).

We'd usually handle the VF == 1 cases also duriing BB vectorization on
the loop body, but we're only doing that when there was if-conversion
and the later stand-alone BB vectorization is after predictive commoning
which wrecks the loop.  We should move predcom after BB vect for that.

That said, this PR is quite elaborate and it will touch some key design
issues in the vectorizer.  I'd rather finally finish getting us to
work on the SLP representation only before touching all these delicate
things.  The following allows the analysis to proceed a bit longer
with VF == 1.  Not adjusting min_vf early might have issues, but the
change might work as-is and possibly allow some cases to be loop vectorized
with SLP and a low VF that we now fail to.

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index f592aeb8028..b16b4664e7b 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -589,7 +589,7 @@ vect_analyze_data_ref_dependence (struct
data_dependence_relation *ddr,
}

   unsigned int abs_dist = abs (dist);
-  if (abs_dist >= 2 && abs_dist < *max_vf)
+  if (abs_dist >= 1 && abs_dist < *max_vf)
{
  /* The dependence distance requires reduction of the maximal
 vectorization factor.  */
@@ -4946,7 +4955,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64
*min_vf, bool *fatal)
   /* Adjust the minimal vectorization factor according to the
 vector type.  */
   vf = TYPE_VECTOR_SUBPARTS (vectype);
-  *min_vf = upper_bound (*min_vf, vf);
+  //*min_vf = upper_bound (*min_vf, vf);

   /* Leave the BB vectorizer to pick the vector type later, based on
 the final dataref group size and SLP node size.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 30b90d99925..7eab3d4bebc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2719,7 +2719,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool
,
   opt_result ok = opt_result::success ();
   int res;
   unsigned int max_vf = MAX_VECTORIZATION_FACTOR;
-  poly_uint64 min_vf = 2;
+  poly_uint64 min_vf = 1;
   loop_vec_info orig_loop_vinfo = NULL;

   /* If we are dealing with an epilogue then orig_loop_vinfo points to the

[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #7 from Richard Biener  ---
I will have a look then.

[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2 since r11-4987-g602c6cfc79ce4a

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665

--- Comment #9 from Richard Biener  ---
(In reply to Jan Hubicka from comment #8)
> > Honza - ICF seems to fixup points-to sets when merging variables, so there
> > should be a way to kill off flow-sensitive info inside prevailing bodies
> > as well.  But would that happen before inlining the body?  Can you work
> > on that?  I think comparing ranges would weaken ICF unnecessarily?
> 
> AFAIK ICF does no changes to winning function body. It basically relies
> on the fact that early optimizations are local and thus arrive to same
> solutions for most of metadata. So only really easy fix is to make it
> match value ranges, too.  I will check how much that fire in practice -
> I can only think of split funtions to diverge, which is probably not
> that bad in practice.

But is it possible to add a local transform stage and would that also affect
which body we inline?  But yes, inlining the original body would be so
much better ...

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

--- Comment #9 from Richard Biener  ---
(In reply to JuzheZhong from comment #8)
> Hi, Richard.
> 
> Now, I find the time to GCC vectorization optimization.
> 
> I find this case:
> 
>   _2 = a[_1];
>   ...
>   a[i_16] = _4;
>   ,,,
>   _7 = a[_1];---> This load should be eliminated and re-use _2.
> 
> Am I right ?
> 
> Could you guide me which pass should do this CSE optimization ?
> 
> Thanks.

In principle it's value-numbering.  The reason it doesn't do this is
compile-time cost of doing full data-ref analysis.  In principle it's
as "easy" as hooking that up into vn_reference_lookup_3 as part of the
early work therein to disambiguate more defs.

Iff we chose to refrain from valueizing any of the SSA uses we could
cache both the data references and the dependence resolution.

One could also think of doing very simple recognition of these
single index expressions and / or integrating this with other cases.
IIRC there's some warranting SCEV processing / niter analysis as well
for example to figure that

 for (int i = 0; i < 128; ++i)
   a[i] = 1;
 return a[5];

returns 1.

[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #21 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #19)
> Created attachment 57258 [details]
> gcc14-pr113059.patch
> 
> So in patch form like this.  Untested so far.

LGTM.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #17 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #16)
> The question is revert what exactly?
> If we revert r14-6210, we get back the other P1.  Or do you mean revert
> r14-5355?
> I guess another option is move the vzeroupper pass one pass later, i.e.
> after pass_gcse.

I think moving mdreorg passes as late as possible esp. when they don't play
well with DF/notes is a good thing.  Maybe even after pass_rtl_dse2 and
thus after shrink-wrapping?

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #29 from Richard Biener  ---
(In reply to Hongtao Liu from comment #28)
> I saw we already maskoff integral modes for vector mask in store_constructor
> 
>   /* Use sign-extension for uniform boolean vectors with
>  integer modes and single-bit mask entries.
>  Effectively "vec_duplicate" for bitmasks.  */
>   if (elt_size == 1
>   && !TREE_SIDE_EFFECTS (exp)
>   && VECTOR_BOOLEAN_TYPE_P (type)
>   && SCALAR_INT_MODE_P (TYPE_MODE (type))
>   && (elt = uniform_vector_p (exp))
>   && !VECTOR_TYPE_P (TREE_TYPE (elt)))
> {
>   rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
>expand_normal (elt));
>   rtx tmp = gen_reg_rtx (mode);
>   convert_move (tmp, op0, 0);
> 
>   /* Ensure no excess bits are set.
>  GCN needs this for nunits < 64.
>  x86 needs this for nunits < 8.  */
>   auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
>   if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> tmp = expand_binop (mode, and_optab, tmp,
> GEN_INT ((1 << nunits) - 1), target,
> true, OPTAB_WIDEN);
>   if (tmp != target)
> emit_move_insn (target, tmp);
>   break;
> }

But that's just for CONSTRUCTORs, we got the VIEW_CONVERT_EXPR path for
VECTOR_CSTs.  But yeah, that _might_ argue we should perform the same
masking for VECTOR_CST expansion as well, instead of trying to fixup
in do_compare_and_jump?

[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659

--- Comment #3 from Richard Biener  ---
So the issue is similar to gcc.c-torture/execute/20150611-1.c, this time
the main exit ends in a path without a virtual use (__builtin_unreachable ()).
We can do the same as we do for the alternate exits here.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #27 from Richard Biener  ---
(In reply to Hongtao Liu from comment #25)
> (In reply to Tamar Christina from comment #24)
> > Just to avoid confusion, are you still working on this one Richi?
> 
> I'm working on a patch to add a target hook as #c18 mentioned.

Not sure a target hook was suggested - I think it was suggested that
do_compare_and_jump always masks excess bits for integer mode vector masks?

[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
   Priority|P3  |P2

--- Comment #6 from Richard Biener  ---
Honza - ICF seems to fixup points-to sets when merging variables, so there
should be a way to kill off flow-sensitive info inside prevailing bodies
as well.  But would that happen before inlining the body?  Can you work
on that?  I think comparing ranges would weaken ICF unnecessarily?

[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665

--- Comment #5 from Richard Biener  ---
Well, ICF figures out the other part of the partial inlined test() are equal
and I think they are.  The

if (i >= S){
return false;
}

tests are inlined and eliminated (I think correctly so).  -fno-partial-inlining
also avoids the issue.

The issue is that ICF doesn't wipe (or compare) range info so we get after
inlining:

   [local count: 10737416]:
  goto ; [100.00%]

   [local count: 1063004409]:
  # RANGE [irange] long unsigned int [0, 591] NONZERO 0x3ff
  _5 = (long unsigned int) i_2;
  # RANGE [irange] unsigned int [0, 287] NONZERO 0x1ff
  _11 = (unsigned int) _5;

[Bug tree-optimization/113664] False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)

2024-01-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-30
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Richard Biener  ---
Confirmed.  As usual it's jump-threading related where we isolate, in the
-Warray-bounds case

MEM[(char *)1B] = 48;

we inline 'f' and then, when s == dot == NULL your code dereferences both
NULL and NULL + 1.

So the diagnostic messages leave a lot to be desired but in the end they
point to a problem in your code which is a guard against a NULL 's'.

The jump threading is different with -fwrapv-pointer, in particular without
it we just get the NULL dereference which we seem to ignore during
array-bound diagnostics.

We later isolate the paths as unreachable but that happens after the
diagnostic.

[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-30
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659

--- Comment #1 from Richard Biener  ---
I will have a look.

[Bug debug/113562] [14 Regression] FAIL: gcc.dg/guality/pr54796.c

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562

--- Comment #4 from Richard Biener  ---
(In reply to Richard Biener from comment #3)
> Just to put it somewhere I ran dwlocstat on cc1plus before/after the
> offending change and it looks almost the same.  We go from
> 
> cov%samples cumul
> 0..10   1280217/38% 1280217/38%
> 11..20  55668/1%1335885/40%
> 21..30  68004/2%1403889/42%
> 31..40  70774/2%1474663/44%
> 41..50  75554/2%1550217/46%
> 51..60  91816/2%1642033/49%
> 61..70  101139/3%   1743172/52%
> 71..80  135281/4%   1878453/56%
> 81..90  198470/5%   2076923/62%
> 91..100 1233822/37% 3310745/100%
> 
> to
> 
> cov%samples cumul
> 0..10   1280197/38% 1280197/38%
> 11..20  55669/1%1335866/40%
> 21..30  68014/2%1403880/42%
> 31..40  70773/2%1474653/44%
> 41..50  75542/2%1550195/46%
> 51..60  91800/2%1641995/49%
> 61..70  101133/3%   1743128/52%
> 71..80  135259/4%   1878387/56%
> 81..90  198496/5%   2076883/62%
> 91..100 1233844/37% 3310727/100%

And with up-to-date elfutils to avoid some DWARF5 issues

cov%samples cumul
0..10   1280347/38% 1280347/38%
11..20  55720/1%1336067/40%
21..30  68040/2%1404107/42%
31..40  70805/2%1474912/44%
41..50  75585/2%1550497/46%
51..60  91850/2%1642347/49%
61..70  101224/3%   1743571/52%
71..80  135406/4%   1878977/56%
81..90  198509/5%   2077486/62%
91..100 1233880/37% 3311366/100%

to

cov%samples cumul
0..10   1280327/38% 1280327/38%
11..20  55721/1%1336048/40%
21..30  68050/2%1404098/42%
31..40  70804/2%1474902/44%
41..50  75573/2%1550475/46%
51..60  91834/2%1642309/49%
61..70  101218/3%   1743527/52%
71..80  135384/4%   1878911/56%
81..90  198535/5%   2077446/62%
91..100 1233902/37% 3311348/100%

[Bug debug/113562] [14 Regression] FAIL: gcc.dg/guality/pr54796.c

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562

--- Comment #3 from Richard Biener  ---
Just to put it somewhere I ran dwlocstat on cc1plus before/after the offending
change and it looks almost the same.  We go from

cov%samples cumul
0..10   1280217/38% 1280217/38%
11..20  55668/1%1335885/40%
21..30  68004/2%1403889/42%
31..40  70774/2%1474663/44%
41..50  75554/2%1550217/46%
51..60  91816/2%1642033/49%
61..70  101139/3%   1743172/52%
71..80  135281/4%   1878453/56%
81..90  198470/5%   2076923/62%
91..100 1233822/37% 3310745/100%

to

cov%samples cumul
0..10   1280197/38% 1280197/38%
11..20  55669/1%1335866/40%
21..30  68014/2%1403880/42%
31..40  70773/2%1474653/44%
41..50  75542/2%1550195/46%
51..60  91800/2%1641995/49%
61..70  101133/3%   1743128/52%
71..80  135259/4%   1878387/56%
81..90  198496/5%   2076883/62%
91..100 1233844/37% 3310727/100%

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

Richard Biener  changed:

   What|Removed |Added

  Attachment #57214|0   |1
is obsolete||

--- Comment #13 from Richard Biener  ---
Created attachment 57252
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57252=edit
prototype fix

Note when I extended the patch to also cover a PARM_DECL base to extent
coverage I see

FAIL: gcc.dg/torture/pr70421.c   -O1  execution test
FAIL: gcc.dg/torture/pr70421.c   -O2  execution test
FAIL: gcc.dg/torture/pr70421.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr70421.c   -Os  execution test
FAIL: gcc.dg/torture/pr70421.c   -O2 -flto -fno-use-linker-plugin
-flto-partitio
n=none  execution test
FAIL: gcc.dg/torture/pr70421.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-obje
cts  execution test

on x86_64.  It seems that arg_base_value isn't the correct thing to use
but it eventually should have been unique_base_value (UNIQUE_BASE_VALUE_ARGP)?
I'm not sure whether all the different unique base values mean we'll not
be able to derive exactly those classes from MEM_EXPRs.

[Bug tree-optimization/113622] [11/12/13 Regression] ICE with vectors in named registers

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE
   |ICE with vectors in named   |with vectors in named
   |registers   |registers

--- Comment #19 from Richard Biener  ---
Should be fixed on trunk, not sure to what extent backporting is suitable.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

Richard Biener  changed:

   What|Removed |Added

 Target||powerpc
   Target Milestone|--- |14.0

--- Comment #1 from Richard Biener  ---
What's the version of binutils you are using?

[Bug middle-end/113651] The GCC optimizer performs poorly on a very simple code snippet.

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-01-29
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.   This is a missed phiopt (or operation sinking) of

  if (r.1_90 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 391324129]:
  _91 = _89 ^ 79764919;

   [local count: 954449104]:
  # prephitmp_92 = PHI <_91(6), _89(5)>

to sth like

  if (r.1_90 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 391324129]:

   [local count: 954449104]:
  # prephitmp_91 = PHI <79764919(6), 0(5)>
  _92 = _89 ^ prephitmp_xx;

on some archs the conditional constant might be generated by a
conditional add of 79764919 to zero.

Whether this is better suited for GIMPLE or RTL if-conversion remains to be
seen.

That splitting the expression helps is just luck.

[Bug c/113650] __builtin_nonlocal_goto ICEs when passed 0 as arguments

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113650

--- Comment #1 from Richard Biener  ---
I don't think these are supposed to be used by the user ...

[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622

--- Comment #16 from Richard Biener  ---
typedef double __attribute__ ((vector_size (16))) vec;

void
test (void)
{
  register vec a asm("xmm1"), b asm("xmm2"), c asm("xmm3");
  for (int i = 0; i < 2; i++)
c[i] = a[i] < b[i] ? 0.1 : 0.2;
}

also ICEs with -O0 -msse.

[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622

--- Comment #15 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #11)
> I think it is most important we don't ICE and generate correct code.  I
> doubt this is used too much in real-world code, otherwise it would have been
> reported years ago, so how efficient it will be is less important.

We do spill on the read side already.  On the write side the ICE is because
of r0-71337-g1e188d1e130034.  Note we're spilling parts of bitpos to offset:

  /* Otherwise, split it up.  */
  if (offset)
{
  /* Avoid returning a negative bitpos as this may wreak havoc later.  */
  if (!bit_offset.to_shwi (pbitpos) || maybe_lt (*pbitpos, 0))
{
  *pbitpos = num_trailing_bits (bit_offset.force_shwi ());
  poly_offset_int bytes = bits_to_bytes_round_down (bit_offset);
  offset = size_binop (PLUS_EXPR, offset,
   build_int_cst (sizetype, bytes.force_shwi ()));
}

  *poffset = offset;

but it can also be large positive when the bit amount doesn't fit a HWI.

The flow of 'to' expansion is a bit awkward, but the following properly
spills in case of variable offset and non-MEM_P:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ee822c11dce..f54d0b1474e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6061,6 +6061,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
to_rtx = adjust_address (to_rtx, BLKmode, 0);
}

+  rtx stemp = NULL_RTX, old_to_rtx = NULL_RTX;
   if (offset != 0)
{
  machine_mode address_mode;
@@ -6070,9 +6071,24 @@ expand_assignment (tree to, tree from, bool nontemporal)
{
  /* We can get constant negative offsets into arrays with broken
 user code.  Translate this to a trap instead of ICEing.  */
- gcc_assert (TREE_CODE (offset) == INTEGER_CST);
- expand_builtin_trap ();
- to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+ if (TREE_CODE (offset) == INTEGER_CST)
+   {
+ expand_builtin_trap ();
+ to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+   }
+ /* Else spill for variable offset to the destination.  */
+ else
+   {
+ gcc_assert (!TREE_CODE (from) == CALL_EXPR
+ && COMPLETE_TYPE_P (TREE_TYPE (from))
+ && (TREE_CODE (TYPE_SIZE (TREE_TYPE (from)))
+ != INTEGER_CST));
+ stemp = assign_stack_temp (GET_MODE (to_rtx),
+GET_MODE_SIZE (GET_MODE
(to_rtx)));
+ emit_move_insn (stemp, to_rtx);
+ old_to_rtx = to_rtx;
+ to_rtx = stemp;
+   }
}

  offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode, EXPAND_SUM);
@@ -6305,6 +6321,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
  bitregion_start, bitregion_end,
  mode1, from, get_alias_set (to),
  nontemporal, reversep);
+ /* Move the temporary storage back to the non-MEM_P.  */
+ if (stemp)
+   emit_move_insn (old_to_rtx, stemp);
}

   if (result)

[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622

--- Comment #10 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #8)
> Guess for an rvalue (if even that crashes) we want to expand it to some
> permutation or whole vector shift which moves the indexed elements first and
> then extract it, for lvalue we need to insert it similarly.

If we can we should match this up with .VEC_SET / .VEC_EXTRACT, otherwise
we should go "simple" and spill.

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 7e2392ecd38..e94f292dd38 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun,
   machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
   machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));

-  if (auto_var_in_fn_p (view_op0, fun->decl)
+  if ((auto_var_in_fn_p (view_op0, fun->decl)
+  || DECL_HARD_REGISTER (view_op0))
  && !TREE_ADDRESSABLE (view_op0)
  && ((!is_extract && can_vec_set_var_idx_p (outermode))
  || (is_extract

ensures the former and fixes the ICE on x86_64 on trunk.  The comment#5
testcase then results in the following loop:

.L3:
movslq  %eax, %rdx
vmovaps %zmm2, -56(%rsp)
vmovaps %zmm0, -120(%rsp)
vmovss  -120(%rsp,%rdx,4), %xmm4
vmovss  -56(%rsp,%rdx,4), %xmm3
vcmpltss%xmm4, %xmm3, %xmm3
vpbroadcastd%eax, %zmm4
addl$1, %eax
vpcmpd  $0, %zmm7, %zmm4, %k1
vblendvps   %xmm3, %xmm5, %xmm6, %xmm3
vbroadcastss%xmm3, %zmm1{%k1}
cmpl$8, %eax
jne .L3

this isn't optimal of course, for optimality we need vectorization.  But
we still need to avoid the ICEs since vectorization can be disabled.  That
said, I'm quite sure in code using hard registers people are not doing
such stupid things so I wonder how important it is to avoid "regressing"
the vectorization here.

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Did you try with -fprofile-partial-training (is that default on?  it probably
should ...).  Can you please try training with the rate data instead of train
to rule out a mismatch?

[Bug c/113631] FAIL: gcc.dg/pr7356.c, fix still fails with #pragma

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113631

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-29
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Version|unknown |14.0

--- Comment #1 from Richard Biener  ---
:1:2: error: expected ';' before 'typedef'
1 | a

a.h:1:9: error: expected '=', ',', ';', 'asm' or '__attribute__' before
'#pragma'
1 | #pragma message "foo"
  | ^~~

as it's a different message it's likely using a different location to
highlight the issue.  In general it's difficult to tell whether pointing
to the first token sequence in the #included file or the last token
before the #include directive is better here.

Of course the pragma location should underline either #pragma or the whole
#pragma, not just 'message'.

Btw, same issue without the #include:

a
#pragma message "foo"

vs.

a
typedef int b;

I'm not sure it makes sense to special case the situation we've switched
files?

[Bug c++/113644] [14 regression] ICE when building libcxxabi-16.0.6 since r14-6520

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113644

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/113630] [11/12/13/14 Regression] -fno-strict-aliasing introduces out-of-bounds memory access

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #5 from Richard Biener  ---
(In reply to Andrew Pinski from comment #2)
> Confirmed.
> 
> I really think what PRE does is correct here since we have an aliasing set
> of 0 for both. Now what is incorrect is hoist_adjacent_loads which cannot do
> either of any of the aliasing sets are 0 ...
> 
> 
> 
> I think even the function below is valid for non-strict aliasing:
> ```
> int __attribute__((noipa,noinline))
> f(struct S *p, int c, int d)
> {
>   int r;
>   if (c)
> {
> r = ((struct M*)p)->a;
> }
>   else
> r = ((struct M*)p)->b;
>   return r;
> }
> ```
> 
> That is hoist_adjacent_loads is broken for non-strict-aliasing in general
> and has been since 4.8.0 when it was added (r0-117275-g372a6eb8d991eb).

It looks it relies on

  /* The zeroth operand of the two component references must be
 identical.  It is not sufficient to compare get_base_address of
 the two references, because this could allow for different
 elements of the same array in the two trees.  It is not safe to
 assume that the existence of one array element implies the
 existence of a different one.  */
  if (!operand_equal_p (TREE_OPERAND (ref1, 0), TREE_OPERAND (ref2, 0), 0))
continue;

for the correctness test.  Note the MEM accesses are of size sizeof (struct M).

With -fno-strict-aliasing we're not wiping that detail so I think it _is_
a bug in PRE that it merges the two accesses.

I'll have a more detailed look.

[Bug tree-optimization/113630] [11/12/13/14 Regression] -fno-strict-aliasing introduces out-of-bounds memory access

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630

--- Comment #4 from Richard Biener  ---
(In reply to Andrew Pinski from comment #3)
> Note LLVM produces decent code here by only using one load:
> ```
> xor eax, eax
> testesi, esi
> seteal
> mov eax, dword ptr [rdi + 4*rax]
> ```
> 
> Maybe GCC could do the same ...

IIRC there's duplicate bugs about this - phiprop does kind-of the reverse.
The sink pass can now sink two exactly same stores but doesn't try sinking
a "compatible" store by introducing a PHI for the address.

  /* ??? We could handle differing SSA uses in the LHS by inserting
 PHIs for them.  */
  else if (! operand_equal_p (gimple_assign_lhs (first_store),
  gimple_assign_lhs (def), 0)
   || (gimple_clobber_p (first_store)
   != gimple_clobber_p (def)))

[Bug target/113625] Interesting behavior with and without -mcpu=generic

2024-01-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113625

--- Comment #1 from Richard Biener  ---
Other targets (x86_64) default to -mtune=generic.  Maybe configure time
selection somehow interferes with this on aarch64?

[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers

2024-01-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Richard Biener  ---
I will have a look.

[Bug target/113618] [14 Regression] AArch64: memmove idiom regression

2024-01-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618

--- Comment #2 from Richard Biener  ---
It might be good to recognize this pattern in strlenopt or a related pass.

A purely local transform would turn it into

memcpy (temp, a, 64);
memmove (b, a, 64);

relying on DSE to eliminate the copy to temp if possible.  Not sure if
that possibly would be a bad transform if copying to temp is required.

stp q30, q31, [sp]
ldp q30, q31, [sp]

why is CSE not able to catch this?

[Bug debug/103047] Inconsistent arguments ordering for inlined subroutine

2024-01-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103047

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
   Target Milestone|--- |14.0
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Richard Biener  ---
Fixed for GCC 14.

[Bug debug/29461] inconsistent variable output

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29461

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Richard Biener  ---
 <2><6c>: Abbrev Number: 8 (DW_TAG_formal_parameter)
<6d>   DW_AT_name: s_p
<71>   DW_AT_decl_file   : 1
<72>   DW_AT_decl_line   : 3
<73>   DW_AT_decl_column : 19
<74>   DW_AT_type: <0x8a>
<78>   DW_AT_location: 2 byte block: 91 58  (DW_OP_fbreg: -40)
 <2><7b>: Abbrev Number: 9 (DW_TAG_variable)
<7c>   DW_AT_name: ss
<7f>   DW_AT_decl_file   : 1
<80>   DW_AT_decl_line   : 5
<81>   DW_AT_decl_column : 20
<82>   DW_AT_type: <0x46>
<86>   DW_AT_location: 2 byte block: 91 68  (DW_OP_fbreg: -24)

so we can and do now make those equal.  With -O1 we have

 <2><6c>: Abbrev Number: 9 (DW_TAG_formal_parameter)
<6d>   DW_AT_name: s_p
<71>   DW_AT_decl_file   : 1
<72>   DW_AT_decl_line   : 3
<73>   DW_AT_decl_column : 19
<74>   DW_AT_type: <0xc0>
<78>   DW_AT_location: 0x12 (location list)
<7c>   DW_AT_GNU_locviews: 0xc
 <2><80>: Abbrev Number: 10 (DW_TAG_variable)
<81>   DW_AT_name: ss
<84>   DW_AT_decl_file   : 1
<85>   DW_AT_decl_line   : 5
<86>   DW_AT_decl_column : 20
<87>   DW_AT_type: <0x46>
<8b>   DW_AT_location: 0x2b (location list)
<8f>   DW_AT_GNU_locviews: 0x25

0012 v000 v000 views at 000c for:
  0008 (DW_OP_reg5 (rdi))
0017 v000 v000 views at 000e for:
 0008 0012 (DW_OP_reg3 (rbx))
001c v000 v000 views at 0010 for:
 0012 0013 (DW_OP_entry_value: (DW_OP_reg5
(rdi)); DW_OP_stack_value)
0024 

002b v001 v000 views at 0025 for:
 0004 0008 (DW_OP_reg5 (rdi))
0030 v000 v000 views at 0027 for:
 0008 0012 (DW_OP_reg3 (rbx))
0035 v000 v000 views at 0029 for:
 0012 0013 (DW_OP_entry_value: (DW_OP_reg5
(rdi)); DW_OP_stack_value)
003d 

which is nearly equivalent and I suppose "correct" in that we're not
showing it live during the prologue before the declaration/assignment

func2:
.LVL0:
.LFB0:
.file 1 "t.c"
.loc 1 4 1 view -0
.cfi_startproc
.loc 1 4 1 is_stmt 0 view .LVU1
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq%rdi, %rbx
.loc 1 5 3 is_stmt 1 view .LVU2
.LVL1:
.loc 1 6 3 view .LVU3
callfunc

-O0 vs -O is also -fno-var-tracking vs. -fvar-tracking of course.

[Bug debug/27672] C frontend does not generate line information for multi-line conditions

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27672

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Richard Biener  ---
Yes, this particular form seems fixed.

[Bug ada/26827] "GNAT BUG DETECTED" on compile GPS 1.3.1/gtkada

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26827

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-26
 CC||dkm at gcc dot gnu.org
  Component|debug   |ada
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener  ---
Is this still an issue?

[Bug debug/103047] Inconsistent arguments ordering for inlined subroutine

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103047

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-26
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
  Known to fail||13.2.1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Richard Biener  ---
Confirmed, still happens.  But maybe this is a also gdb issue as we have for
the similar case

static inline int foo (int a, int b)
{
  volatile int x = a + b;
  return x;
}

int main()
{
  int c = 1;
  int d = 2;
  int res = foo (c, d);
  return res;
}


 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
<2e>   DW_AT_external: 1
<2e>   DW_AT_name: (indirect string, offset: 0x6f): main
...
 <2><79>: Abbrev Number: 5 (DW_TAG_inlined_subroutine)
<7a>   DW_AT_abstract_origin: <0xca>
<7e>   DW_AT_entry_pc: 0
<86>   DW_AT_GNU_entry_view: 4
<87>   DW_AT_low_pc  : 0
<8f>   DW_AT_high_pc : 0xc
<97>   DW_AT_call_file   : 1
<98>   DW_AT_call_line   : 11
<99>   DW_AT_call_column : 13
 <3><9a>: Abbrev Number: 6 (DW_TAG_formal_parameter)
<9b>   DW_AT_abstract_origin: <0xe1>
<9f>   DW_AT_location: 0x27 (location list)
   DW_AT_GNU_locviews: 0x25
 <3>: Abbrev Number: 6 (DW_TAG_formal_parameter)
   DW_AT_abstract_origin: <0xd7>
   DW_AT_location: 0x4d (location list)
   DW_AT_GNU_locviews: 0x4b
...
 <1>: Abbrev Number: 10 (DW_TAG_subprogram)
   DW_AT_name: foo
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 1
   DW_AT_decl_column : 19
   DW_AT_prototyped  : 1
   DW_AT_type: <0xbe>
   DW_AT_inline  : 3(declared as inline and inlined)
 <2>: Abbrev Number: 11 (DW_TAG_formal_parameter)
   DW_AT_name: a
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 1
   DW_AT_decl_column : 28
   DW_AT_type: <0xbe>
 <2>: Abbrev Number: 11 (DW_TAG_formal_parameter)
   DW_AT_name: b
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 1
   DW_AT_decl_column : 35
   DW_AT_type: <0xbe>

so it could look at the actual function for determining the order.

The order of the formal parameters are reversed because the fake
scope BLOCK the inliner adds has those as variables in that reverse order.
We output them via decls_for_scope.

static gimple *
setup_one_parameter (copy_body_data *id, tree p, tree value, tree fn,
 basic_block bb, tree *vars)
{ 
...
  /* Declare this new variable.  */
  DECL_CHAIN (var) = *vars;
  *vars = var;


I have a patch.

[Bug debug/23551] dwarf records for inlines appear incomplete

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23551

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #21 from Richard Biener  ---
There is PR103047 for that now.

[Bug debug/19954] Compiler emits incomplete structure type

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19954

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||13.2.1, 7.5.0
 Status|NEW |RESOLVED

--- Comment #4 from Richard Biener  ---
(gdb) n
6float* pt = d1.getData(1); /* set breakpoint here */
(gdb) ptype d1
type = class Derived1 {
  private:
int mySize;
int myId;
float *myPointer;

  public:
Derived1(int);
~Derived1();
virtual int getId(void);
virtual float * getData(int);
}

works now as also verified in PR12385.  Verified with GCC 7 and GCC 13.

[Bug debug/14169] Unneeded base types output in dwarf2

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14169

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed|2005-12-28 06:11:40 |2024-1-26

--- Comment #2 from Richard Biener  ---
Re-confirmed.  With -fno-eliminate-unused-debug-types we output everything.

I think we never prune unused base types, nor do we prune unused namespaces.

[Bug tree-optimization/113539] [14 Regression] perlbench miscompiled on aarch64 since r14-8223-g1c1853a70f

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113539

--- Comment #8 from Richard Biener  ---
Does this still happen after r14-8413-g578c7b91f418eb?

[Bug tree-optimization/113467] [14 regression] libgcrypt-1.10.3 is miscompiled

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467

--- Comment #22 from Richard Biener  ---
Is this fixed meanwhile?

[Bug c/85800] A miscompilation bug with unsigned char

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Richard Biener  ---
Yeah.  I think we have enough duplicates that show cases where conditional
equivalence propagation introduces these issues.  Here it's already present in
the source.

[Bug target/113615] New: internal compiler error: in extract_insn, at recog.cc:2812

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113615

Bug ID: 113615
   Summary: internal compiler error: in extract_insn, at
recog.cc:2812
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

I'm seeing a lot of ICEs like this when running libgomp testsuite with
offloading for gfx1030.

/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:
In function 'accum_._omp_fn.1':^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
error: unrecognizable insn:^M
(insn 108 107 109 6 (set (reg:V8SF 849)^M
(unspec:V8SF [^M
(reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M
(const_int 1 [0x1])^M
] UNSPEC_PLUS_DPP_SHR))
"/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29
discrim 1 -1^M  
 (nil))^M
during RTL pass: vregs^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
internal compiler error: in extract_insn, at recog.cc:2812^M

other ones:

(insn 93 92 94 7 (set (reg:V64DF 805)^M
(unspec:V64DF [^M
(reg:V64DF 802 [ vect__31.53_89 ])^M
(const_int 1 [0x1])^M
] UNSPEC_MOV_DPP_SHR))
"/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/target_data-3.f90":51:41
-1^M

[Bug tree-optimization/113602] ICE: in vn_reference_maybe_forwprop_address, at tree-ssa-sccvn.cc:1426 with invalid _BitInt() register asm with -O2 -fno-tree-loop-optimize

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113602

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug c++/113612] [13/14 Regression] ICE: SIGSEGV in get_template_info (pt.cc:378) or tree_check (tree.h:3611) with invalid -fpreprocessed

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113612

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug tree-optimization/113602] ICE: in vn_reference_maybe_forwprop_address, at tree-ssa-sccvn.cc:1426 with invalid _BitInt() register asm with -O2 -fno-tree-loop-optimize

2024-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113602

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
(gdb) p tem.last ()
$2 = (vn_reference_op_struct &) @0x7fffc820: {opcode = VAR_DECL, 
  clique = 0, base = 0, reverse = 0, align = 0, off = {coeffs = {-1}}, 
  type = , op0 = , 
  op1 = , op2 = }
(gdb) p debug_vn_reference_ops (tem)
{array_ref<_4,0,1>,view_convert_expr,r}
(gdb) p debug_generic_expr (addr)
_CONVERT_EXPR(r)[_4]

We're valueizing

MEM  [(_BitInt(503) *)vectp.5_18]

trying to forward the vectp.5_18 def

vectp.5_18 = _CONVERT_EXPR(r)[_4];

but we're not anticipating this shape of a non-invariant ADDR_EXPR.  We
do wrap all VAR_DECLs but DECL_HARD_REGISTER inside a MEM_REF but then
a DECL_HARD_REGISTER shouldn't be addressable so the IL is actually
invalid, generated by vectorization (but not diagnosed by IL checking).

I'm not sure to what extent we should try to paper over this though ...

The following works for me:

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index ae55bf6aa48..f37734b5340 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -1182,7 +1182,12 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree
ref,
   base = TREE_OPERAND (base, 0);
 }
   else
-base = build_fold_addr_expr (base);
+{
+  if (may_be_nonaddressable_p (base))
+   return opt_result::failure_at (stmt,
+  "failed: base not addressable.\n");
+  base = build_fold_addr_expr (base);
+}

   if (in_loop)
 {

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

--- Comment #3 from Richard Biener  ---
I'll note that esp. two-lane reductions (or in general two-lane BB
vectorization) is hardly profitable on modern x86 uarchs unless the vectorized
code is interleaved with other non-vectorized code that can execute at the same
time.  vectorizing two lanes will only make them dependent on each other while
when not vectorized modern uarchs have no difficulty in executing them in
parallel (but without the tied dependences).  It's only when there's sufficient
benefit, aka more lanes, approaching the issue width or the number of available
ports for the ops, or the whole SLP mostly consisting of loads/stores, that BB
vectorization is going to be profitable.  Note the cost model only ever looks
at the stmts participating in the vectorization, not the "surrounding" code,
and it would be difficult to include that since the schedule on GIMPLE isn't
even close to what we get later.  The reduction op is also a serialization
point on the scalar side of course, whether that means that BB reductions
with two lanes are possibly better candidates than grouped BB stores with
two lanes is another question.

The BB reduction op itself is costed properly.

So the 525.x264_r case might be loop vectorization, OTOH the epilogue
cost is hardly ever a knob that decides whether a vectorization is profitable.

I think we need to figure out what exactly gets slower (and hope it's not
scattered all over the place)

[Bug middle-end/113596] Stack memory leakage caused by inline alloca

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113596

--- Comment #10 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #9)
> Created attachment 57215 [details]
> gcc14-pr113596.patch
> 
> Untested patch to do that.
> The disadvantage of doing that is that it may penalize inline calls which
> just use VLAs, because calls_alloca covers even those functions.  For simple:
> static inline __attribute__((always_inline)) void
> foo (int n)
> {
>   char p[n];
>   bar (p, n);
> }
> the fab1 pass actually removes redundant pair of stack_save/stack_restore,
> but
> bet if it would be something like { call (); { char p[n]; bar (p, n); } call
> (); } then it wouldn't.
> Anyway, this isn't a regression, so I think it is stage1 material for GCC 15.

Most definitely.  We can make ->calls_alloca more precise though of course
we usually also do not want to inline functions with VLAs.  IIRC a VLA
forces a frame pointer for the caller then.

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

--- Comment #12 from Richard Biener  ---
Created attachment 57214
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57214=edit
prototype fix

The attached prototype fixes the testcase for me.

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

--- Comment #11 from Richard Biener  ---
In DSE the only differences is

 fbt (0x751a1a50: (plus:DI (reg/v/f:DI 117 [ u ])
-(reg:DI 146 [ _44 ]))) == (address 0)
+(reg:DI 146 [ _44 ]))) == (nil)
 fbt (0x7700b3c0: (reg/f:DI 64 sfp)) == (address:DI -3)
-bac false
+bac true

that's for

(mem:BLK (reg/f:DI 64 sfp) [0  A8])

vs

(mem:V4SF (plus:DI (reg/v/f:DI 117 [ u ])
(reg:DI 146 [ _44 ])) [0 MEM <__Float32x4_t> [(float * {ref-all})_42]+0
S16 A32])

from

#0  0x02ff3796 in scan_reads (insn_info=0x5e5b680, gen=0x5ec2338, 
kill=0x5ec2358) at /space/rguenther/src/gcc/gcc/dse.cc:3156
#1  0x02ff39b1 in dse_step3_scan (bb=)
at /space/rguenther/src/gcc/gcc/dse.cc:3238

processing

(insn 62 61 64 5 (set (reg:V4SF 147 [ MEM <__Float32x4_t> [(float *
{ref-all})_42] ])
(mem:V4SF (plus:DI (reg/v/f:DI 117 [ u ])
(reg:DI 146 [ _44 ])) [0 MEM <__Float32x4_t> [(float *
{ref-all})_42]+0 S16 A32])) "include/arm_neon.h":12531:36 1274
{*aarch64_simd_movv4sf}
 (expr_list:REG_DEAD (reg:DI 146 [ _44 ])
(nil)))

in this case we have _44 point to NONLOCAL only.  It got arg_base_value
as base value (from the MEM_EXPR and that points-to set we could
eventually derive this very same base term as well).

But I'll note that (mem:BLK (reg/f:DI 64 sfp) [0  A8]) is artificial,
generated by DSE get_group_info via record_store on

(insn 13 12 14 2 (set (mem/c:V2x16QI (reg/f:DI 119) [0 +0 S32 A128])
(unspec:V2x16QI [
(reg:V16QI 121) repeated x2
] UNSPEC_STP)) "t.cc":12:10 discrim 1 92 {*store_pair_16}
 (nil))

which is figured to be const_or_frame_p () based.  That notably
lacks a MEM_EXPR (though the bare MEM means only base_alias_check would
ever be able to disambiguate here).

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575

--- Comment #13 from Richard Biener  ---
(In reply to Robin Dapp from comment #12)
> Created attachment 57209 [details]
> Tentative
> 
> I tested the attached "fix".  On my machine with 13.2 host compiler it
> reduced the build time for insn-opinit.cc from > 4 mins to < 2 mins and the
> memory usage from >1G to 600ish M.  I didn't observe 3.5G before, though.
> 
> For now I just went with an arbitrary threshold of 5000 patterns and
> splitting into 10 functions.  After testing on x86 and aarch64 I realized
> that both have <3000 patterns so right now it would only split riscv's init
> function.
> 
> Or rather the other way, i.e. splitting into fixed-size chunks (of 1000)
> instead?

Yeah, I'd simplify it by doing exactly that.

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

--- Comment #10 from Richard Biener  ---
Created attachment 57212
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57212=edit
patch for debugging

Btw, I've used the attached to investigate other issues with the change.  It
will show the outcome of base_alias_check and find_base_term in dumps.

One issue is that we're much more dependent on MEM_EXPRs being present.

Before figuring there wouldn't be much important regressions the idea was to
instead of doing find_base_term have a known base value recorded in the
MEM_ATTRs, and as the only important ones should be the special ones for
argument frame and stack-based represent that by an enum (rather than
the other possibility of using ADDRESS).  I'll also note that for spill
slots we get around to use spill_slot_decl and set_mem_attrs_for_spill.

I've not yet convinced myself that the other special bases we have really
form a completely separate memory class.  But if they do then accesses
should do something similar there (but mind scheduling of frame related
instructions ...).

Argument stack slots are one important class, set up by init_alias_analysis.
But those are also backed by regular decls at times (but not always)?

assign_stack_temp "allocated" memory is another class, we're reusing
slots during RTL expansion and they get (even if shared) a specific
alias set.  I don't think we ever release those temps and say re-use
the space for spilling so assigning a different decl to each slot
should eventually work.

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-25
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597

--- Comment #1 from Richard Biener  ---
I will have a look - but can you explain for me what I see?  I suppose the
testcase was reduced from something?

Is the assembly diff complete?  That is, do we really have more fmla or
are they just moved?

+ stp   q31, q31, [sp, 256]

that's a store?  A paired store?  Aka, the sequence fills a stack(?)
region with replications of q31?

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #15 from Richard Biener  ---
(In reply to Richard Sandiford from comment #13)
> I don't think there's any principle that upper bits must be zero.
> How do we end up with a pattern that depends on that being the case?

I think the problem is the cbranch pattern which looks at all of the
QImode mask - but of course it doesn't know it's really V4BImode it's
working on ...

If there's no principle that the upper bits should be zero I think we
need a way for the target to say so.

[Bug middle-end/113596] Stack memory leakage caused by inline alloca

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113596

--- Comment #7 from Richard Biener  ---
In theory, if somebody really wanted it, we could replace alloca with
__builtin_stack_save/restore during inlining (not sure if it would
simply work, and be efficient, by just putting save at the start of the
function and restore at the end).

We could also warn when (forced-)inlining a function calling alloca.

[Bug tree-optimization/113592] missed partial sum optimization in vectorizer

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*

--- Comment #4 from Richard Biener  ---
The vectorizer for the original testcase generates

  # vect_sum_20.8_49 = PHI 
...
  vect__9.20_68 = vect__5.12_55 * vect__8.16_61;
  vect__9.20_69 = vect__5.12_56 * vect__8.17_63;
  vect__9.20_70 = vect__5.12_57 * vect__8.18_65;
  vect__9.20_71 = vect__5.12_58 * vect__8.19_67;
  _9 = _5 * _8;
  vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49;
  vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72;
  vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73;
  vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74;
  sum_16 = _9 + sum_20;

the adds are from the optimization to reduce the number of reduction IVs
(we could alternatively keep them independent with 4 IVs and handle the
reducing in the epilogue).  This is to reduce register pressure.

But this also shows if the issue isn't the multiple IVs, that this could
be handled by reassoc + FMA forming given the vectorizer itself doesn't
produce FMAs here.

[Bug tree-optimization/113590] The vectorizer introduces signed overflow

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113590

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2024-01-25
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  Should be reasonably easy to fix - we either move all induction
variable updates to the latch or compute them with unsigned arithmetic
(we usually prefer an empty latch).

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583

--- Comment #8 from Richard Biener  ---
(In reply to JuzheZhong from comment #7)
>
> But I wonder if we see it is beneficial on some boards, could you teach us
> how we can enable vectorization for such case according to uarchs ?

If you figure how to optimally vectorize this for a given uarch I can
definitely guide you.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #11 from Richard Biener  ---
(In reply to Hongtao Liu from comment #8)
> maybe 
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 1fd957288d4..6d321f9baef 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8035,6 +8035,9 @@ native_encode_vector_part (const_tree expr, unsigned
> char *ptr, int len,
>unsigned int extract_elts = extract_bytes * elts_per_byte;
>for (unsigned int i = 0; i < extract_elts; ++i)
> {
> + /* Don't encode any bit beyond the range of the vector.  */
> + if (first_elt + i > count)
> +   break;

Hmm.  I think that VECTOR_CST_ELT should have ICEd for out-of-bound
element queries but it seems to make up elements for us here.  Richard?

But yes, we do

  unsigned int extract_elts = extract_bytes * elts_per_byte;

and since native_encode_* and native_interpret_* operate on bytes we have
difficulties dealing with bit-precision entities with padding.

There's either the possibility to fail encoding when that happens or
do something else.  Note that RTL expansion will do

case VECTOR_CST:
  {
tree tmp = NULL_TREE; 
if (VECTOR_MODE_P (mode))
  return const_vector_from_tree (exp);
scalar_int_mode int_mode;
if (is_int_mode (mode, _mode))
  {
tree type_for_mode = lang_hooks.types.type_for_mode (int_mode, 1);
if (type_for_mode)
  tmp = fold_unary_loc (loc, VIEW_CONVERT_EXPR,
type_for_mode, exp);

which I think should always succeed (otherwise it falls back to expanding
a CTOR).  That means failing to encode/interpret might get into
store_constructor which I think will zero a register destination and thus
fill padding with zeros.

So yeah, something like this looks OK, but I think instead of only
testing against 'count' we should also test against TYPE_VECTOR_SUBPARTS
(that might be variable, so with known_gt).

Would be interesting to see whether this fixes the issue without the
now installed patch.

[Bug target/105275] [12/13/14 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2024-01-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275

--- Comment #4 from Richard Biener  ---
Since this was a costing change I wonder if we identified the code change
responsible and thus have a testcase?  I realize that for maximum assurance
one would need to have a debug counter for switching the patch on/off to
have it apply more selectively (possibly per SLP attempt rather than
per cost hook invocation which would be even more tricky to do).

Feeding another parameter to the hook via a new flag in the vinfo might
be possible (and set that from a dbg_cnt call) for example.

[Bug ipa/113520] ICE with mismatched types with LTO (tree check: expected array_type, have integer_type in array_ref_low_bound)

2024-01-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113520

--- Comment #9 from Richard Biener  ---
(In reply to Jan Hubicka from comment #8)
> I think the ipa-cp summaries should be used only when types match. At least
> Martin added type streaming for all the jump functions.  So we are missing
> some check?

I don't think this applies here, we're having


foo ([5]);

with b being int vs int[], so it's not about the argument types matching
or the type of the JF but instead the value effectively changing during
streaming due to varpool node "merging".

As said elsewhere we avoid the issue by preserving the type of possibly
merged decls by wrapping it with a MEM_REF (for rvalues a V_C_E would
be possible as well).  And we unwrap it later when possible (but that's
of course optional).

I think any summary streaming referencing decls subject to WPA merging
need to do the same - it's not possible to recover after the fact since
the original type is lost (for the ARRAY_REF case it might be possible
to infer a type that would be good enough of course).

[Bug c++/113581] Ignoring GCC unroll loop annotation for loops with increment in condition

2024-01-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113581

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-24
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
Confirmed.  The reason is we're seeing

 :
i.2_3 = i;
i = i.2_3 + 1;
_4 = i.2_3 <= 2;
D.2811 = .ANNOTATE (_4, 1, 16);
retval.1 = D.2811;
if (retval.1 != 0)
  goto ; [INV]
else
  goto ; [INV]

and the .ANNOTATE is not directly peceeding the condition but the assign
is in the way.

The FE generates

if (<>>) goto ;
else goto ;

for some reason this forces an extra temporary via voidify_wrapper_expr ()
during gimplification.

Possibly the frontend simply lacks knowledge of ANNOTATE_EXPR when
checking whether it needs this cleanup_point at all (but I see
it's too simplistic, checking for side-effects only).

We could walk simple assigns like those of course, but the extra
temporary looks superfluous.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-01-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #5 from Richard Biener  ---
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fe631252dc2..28ad03e0b8a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -991,8 +991,12 @@ vec_init_loop_exit_info (class loop *loop)
{
  tree may_be_zero = niter_desc.may_be_zero;
  if ((integer_zerop (may_be_zero)
-  || integer_nonzerop (may_be_zero)
-  || COMPARISON_CLASS_P (may_be_zero))
+  /* As we are handling may_be_zero that's not false by
+ rewriting niter to may_be_zero ? 0 : niter we require
+ an empty latch.  */
+  || (exit->src == single_pred (loop->latch)
+  && (integer_nonzerop (may_be_zero)
+  || COMPARISON_CLASS_P (may_be_zero
  && (!candidate
  || dominated_by_p (CDI_DOMINATORS, exit->src,
 candidate->src)))

fixes it, I'm testing this.

<    7   8   9   10   11   12   13   14   15   16   >