[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

--- Comment #5 from Andrew Pinski  ---
Note C and C++ are differ here. C says only if the return value is used it
becomes undefined while in C++ it is undefined at the point of return.

[Bug target/98977] [x86] Failure to optimize consecutive sub flags usage

2021-02-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98977

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug lto/96591] [8/9/10/11 Regression] ICE with -flto=auto and -O1: tree code ‘typename_type’ is not supported in LTO streams

2021-02-05 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96591

Jason Merrill  changed:

   What|Removed |Added

  Component|c++ |lto
   Assignee|jason at gcc dot gnu.org   |unassigned at gcc dot 
gnu.org
 Status|ASSIGNED|NEW

--- Comment #5 from Jason Merrill  ---
(In reply to Arseny Solokha from comment #4)
> Is it somehow related to PR83997? Maybe even a duplicate?

No, not a duplicate.

Reduced a bit more:

struct builtin_simd
{
  using type [[gnu::vector_size(sizeof(scalar_t) * length)]] = scalar_t;
};

struct simd_traits
{
  using scalar_type = int;

  template 
  using rebind = typename X::type;
};

template 
constexpr simd_t fill(typename simd_traits::scalar_type const scalar)
{
  return simd_t{scalar};
}

using score_type = typename builtin_simd::type;
// Uncommenting this makes it work:
// const simd_traits::scalar_type n = 8;
score_type data[1]{fill(8)};

The difference from uncommenting that line seems to be that then
free_lang_data_in_type is called for simd_traits::scalar_type.  So the problem
seems to be that find_decls_types isn't finding scalar_type in the vector in
the array.  So changing component to LTO and unassigning myself.  Feel free to
change it back if it seems appropriate.

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

--- Comment #3 from Jim Wilson  ---
I suppose cost model problems could explain why combine didn't do the
optimization.  I didn't have a chance to look at that.

I still think there is a fundmental problem with how we represent SImode
operations, but again cost model problems could explain why my experiments to
fix that didn't work as expected.  I probably didn't look at that when I was
experimenting with riscv.md changes.

Your patch does look useful, but setting cost to 1 for MULT is wrong, and would
be just as wrong for DIV.  That is OK for PLUS, MINUS, and NEG though.  I think
a better option is to set *total = 0 and return false.  That gives no extra
cost to the sign extend, and recurs to get the proper cost for the operation
underneath.  That would work for MUL and DIV.  I found code in the rs6000 port
that does this.

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #2 from Kito Cheng  ---
Here is a quick patch for fix part of this issue, it seems like because our
cost model is inprecise, but I guess I need run benchmark to make sure the
performance and code size didn't get any regression. 

find_max_i32:
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
addia3,a4,1024
addia6,a4,400
li  a0,0
.L3:
lw  a5,0(a4)
lw  a2,0(a3)
addia4,a4,4
addia3,a3,4
addwa1,a5,a2
addwa5,a5,a2
bge a1,a0,.L2
mv  a5,a0
.L2:
sext.w  a0,a5
bne a4,a6,.L3
ret



Patch:
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index d489717b2a5..b8c9f7200ce 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -1879,6 +1879,15 @@ riscv_rtx_costs (rtx x, machine_mode mode, int
outer_code, int opno ATTRIBUTE_UN
}
   /* Fall through.  */
 case SIGN_EXTEND:
+  if (TARGET_64BIT && !REG_P (XEXP (x, 0)))
+   {
+ int code = GET_CODE (XEXP (x, 0));
+ if (code == PLUS || code == MINUS || code == NEG || code == MULT)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+   }
   *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
   return false;

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
The extra move instruction is a side effect of how the riscv64 toolchain
handles 32-bit arithmetic.  We lie to the compiler and tell it that we have
instructions that produce 32-bit results.  In fact, we only have instructions
that produce 64-bit sign-extended 32-bit results.  The lie means that the RTL
has some insns with SImode output and some instructions with DImode outputs,
and sometimes we end up with nop moves to convert between the modes.  In this
case, it is peephole2 after regalloc that notices a SImode add followed by a
sign-extend, and converts it to a sign-extending 32-bit add followed by a move,
but can't eliminate the move because we already did register allocation.

This same problem is also why we get the unnecessary sext after the label, as
peephole can't fix that.

This problem has been on my todo list for a few years, and I have ideas of how
to fix it, but I have no idea when I will have time to try to fix it. I did
document it for the RISC-V International Code Speed Optimization task group.
https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc
This one is the first one in the list.

[Bug target/98981] New: gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread brian.grayson at sifive dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

Bug ID: 98981
   Summary: gcc-10.2 for RISC-V has extraneous register moves
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: brian.grayson at sifive dot com
  Target Milestone: ---

gcc is inserting an unnecessary register-register move for a simple max-style
operation:

int a[256], b[256];
int32_t find_max_i32() {
  int32_t xme = 0, sc=0;
  for (int32_t i = 0; i < 100; i++) {
if ((sc=a[i]+b[i]) > xme) xme=sc;
  }
  return xme;
}

This is from the SPECint2006 benchmark HMMER, in P7Viterbi(), hence the
variable names sc and xme from the original source.

Under these flags:
-march=rv64imafdc -mcmodel=medany -mabi=lp64d -O3

I get this disassembly for the loop:

.L5:
  lw  a5,0(a4)
  lw  a2,0(a3)
  addi  a4,a4,4
  addi  a3,a3,4
  addw  a2,a5,a2
  mv  a5,a2  <--- unnecessary move
  bge a2,a0,.L4
  mv  a5,a0
.L4:
  sext.w  a0,a5
  bne a4,a1,.L5

If the addw targets a5, and the bge compares a5 to a0, the mv could be removed.
In fact, if the variable types are changed to int64_t, that's exactly what
happens:

.L13:
  ld  a5,0(a4)
  ld  a2,0(a3)
  addi  a4,a4,8
  addi  a3,a3,8
  add a5,a5,a2
  bgeu  a0,a5,.L12
  mv  a0,a5
.L12:
  bne a4,a1,.L13

[Bug c++/82952] Hang compiling with g++ -fsanitize=undefined -Wduplicated-branches

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82952

Marek Polacek  changed:

   What|Removed |Added

 CC||sshannin at gmail dot com

--- Comment #8 from Marek Polacek  ---
*** Bug 98980 has been marked as a duplicate of this bug. ***

[Bug c++/98980] Very slow compilation with -Wduplicated-branches and ubsan

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98980

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE
 CC||mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek  ---
Dup.

*** This bug has been marked as a duplicate of bug 82952 ***

[Bug c++/98980] New: Very slow compilation with -Wduplicated-branches and ubsan

2021-02-05 Thread sshannin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98980

Bug ID: 98980
   Summary: Very slow compilation with -Wduplicated-branches and
ubsan
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sshannin at gmail dot com
  Target Milestone: ---

Created attachment 50136
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50136&action=edit
the code

I have attached a heavily reduced example which encounters excessively slow
compilation.  I was not able to remove the stringstream include without the
degenerate behavior disappearing, unfortunately; I hope that's ok.

It seems to be mostly exponential in the number of operator<< invocations, with
a couple interesting behaviors
- s/long/int/g on the code allows it to compile almost instantly
- removing any of the longs being streamed seems to halve the time, but so does
replacing the variable with a literal
- removing any of the string literals also halves the time.

Compiled with flags:
/toolchain14/bin/g++ -std=c++2a -Wduplicated-branches -c -fsanitize=undefined
-o dup.o dup.cpp

/toolchain14/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/toolchain14/bin/g++
COLLECT_LTO_WRAPPER=/toolchain14/libexec/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc_9_1_0/configure --prefix=/toolchain14
--enable-languages=c,c++,fortran --enable-lto --disable-plugin
--program-suffix=-9.1.0 --disable-multilib
Thread model: posix
gcc version 9.1.0 (GCC)

[Bug fortran/98979] New: [11 regression] ICE in several tests cases after r11-7112

2021-02-05 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98979

Bug ID: 98979
   Summary: [11 regression] ICE in several tests cases after
r11-7112
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146, r11-7112

previous run: g:f743fe231663e32d52db987650d0ec3381a777af, r11-7111: 75 failures
this run: g:9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146, r11-7112: 89 failures

FAIL: gfortran.dg/goacc/array-with-dt-2.f90   -O  (internal compiler error)
FAIL: gfortran.dg/goacc/array-with-dt-2.f90   -O  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (internal compiler error)
FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)


/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90:8:34:
internal compiler error: Segmentation fault
0x10c21d1b crash_signal
/home/seurer/gcc/git/gcc-test/gcc/toplev.c:327
0x10404f18 gfc_conv_scalarized_array_ref
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-array.c:3570
0x10406913 gfc_conv_array_ref(gfc_se*, gfc_array_ref*, gfc_expr*, locus*)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-array.c:3721
0x1045ad07 gfc_conv_variable
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:2998
0x10453f6b gfc_conv_expr(gfc_se*, gfc_expr*)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:8886
0x104614bb gfc_conv_expr_reference(gfc_se*, gfc_expr*, bool)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:8986
0x104a01d7 gfc_trans_omp_array_section
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:2157
0x104aaf1f gfc_trans_omp_clauses
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:3151
0x104bace7 gfc_trans_oacc_executable_directive
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:3984
0x104bace7 gfc_trans_oacc_directive(gfc_code*)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:6124
0x103fab57 trans_code
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans.c:2216
0x1043d48f gfc_generate_function_code(gfc_namespace*)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans-decl.c:6880
0x103fb56b gfc_generate_code(gfc_namespace*)
/home/seurer/gcc/git/gcc-test/gcc/fortran/trans.c:2272
0x1037df87 translate_all_program_units
/home/seurer/gcc/git/gcc-test/gcc/fortran/parse.c:6351
0x1037df87 gfc_parse_file()
/home/seurer/gcc/git/gcc-test/gcc/fortran/parse.c:6620
0x103efa1f gfc_be_parse_file
/home/seurer/gcc/git/gcc-test/gcc/fortran/f95-lang.c:212

[Bug c++/95888] [9/10/11 Regression] Regression in 9.3. GCC freezes when compiling code using boost::poly_collection::segment

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95888

--- Comment #5 from Marek Polacek  ---
Simplified test:

template  class A {
  A(int, int);
  template  friend class A;
  friend T;
};

template struct B {
  template struct C {
A begin() { return {1, 0}; }
  };
  template
  C fn();
};

int
main ()
{
  B b;
  b.fn().begin();
}

[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

--- Comment #3 from andysem at mail dot ru ---
(In reply to Jonathan Wakely from comment #1)
> This would be an ABI break, and so not going to happen.

Is there no way to improve standard components implementation? I'd imagine you
could provide the new implementation in the new version inline namespace and
still support the old ABI for backward compatibility.

(In reply to Jonathan Wakely from comment #2)
> If we were going to do this, we could also make std::optional occupy a
> single byte, using one bit for the value and one for the engaged flag.

This would be more problematic as emplace(), value() and operator*() need to
return T&, which would not be possible.

[Bug fortran/95682] [9/10/11 Regression] Default assignment fails with allocatable array of deferred-length strings

2021-02-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95682

--- Comment #2 from anlauf at gcc dot gnu.org ---
Adding some printout after initializing the t1%x(:),

  do i = 1, size(t1%x)
 print *, len_trim (t1%x(i)), t1%x(i)
  end do

I get for gcc-8:

   5 three 
   5 three 
   5 three 

and for 9,10,11:

   3 one   
   3 two   
   5 three 

That's not a typical regression, but rather wrong code replaced by other wrong
code.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #15 from Jakub Jelinek  ---
The needed permutations for this boil down to
typedef int V __attribute__((vector_size (16)));
typedef int W __attribute__((vector_size (32)));

#ifdef __clang__
V f1 (V x) { return __builtin_shufflevector (x, x, 1, 1, 3, 3); }
V f2 (V x, V y) { return __builtin_shufflevector (x, y, 1, 5, 3, 7); }
V f3 (V x, V y) { return __builtin_shufflevector (x, y, 0, 5, 2, 7); }
#ifdef __AVX2__
W f4 (W x, W y) { return __builtin_shufflevector (x, y, 1, 9, 3, 11, 5, 13, 7,
15); }
W f5 (W x, W y) { return __builtin_shufflevector (x, y, 0, 9, 2, 11, 4, 13, 6,
15); }
W f6 (W x) { return __builtin_shufflevector (x, x, 1, 1, 3, 3, 5, 5, 7, 7); }
#endif
V f7 (V x) { return __builtin_shufflevector (x, x, 1, 3, 2, 3); }
V f8 (V x) { return __builtin_shufflevector (x, x, 0, 2, 2, 3); }
V f9 (V x, V y) { return __builtin_shufflevector (x, y, 0, 4, 1, 5); }
#else
V f1 (V x) { return __builtin_shuffle (x, (V) { 1, 1, 3, 3 }); }
V f2 (V x, V y) { return __builtin_shuffle (x, y, (V) { 1, 5, 3, 7 }); }
V f3 (V x, V y) { return __builtin_shuffle (x, y, (V) { 0, 5, 2, 7 }); }
#ifdef __AVX2__
W f4 (W x, W y) { return __builtin_shuffle (x, y, (W) { 1, 9, 3, 11, 5, 13, 7,
15 }); }
W f5 (W x, W y) { return __builtin_shuffle (x, y, (W) { 0, 9, 2, 11, 4, 13, 6,
15 }); }
W f6 (W x, W y) { return __builtin_shuffle (x, (W) { 1, 1, 3, 3, 5, 5, 7, 7 });
}
#endif
V f7 (V x) { return __builtin_shuffle (x, (V) { 1, 3, 2, 3 }); }
V f8 (V x) { return __builtin_shuffle (x, (V) { 0, 2, 2, 3 }); }
V f9 (V x, V y) { return __builtin_shuffle (x, y, (V) { 0, 4, 1, 5 }); }
#endif

With -msse2, LLVM emits 2 x pshufd $237 + punpckldq for f2 and pshufd $237 +
pshufd $232 + punpckldq, we give up or emit very large code.
With -msse4, we handle everything, and f1/f3 are the same/comparable, but for
f2 we emit 2 x pshufb (with memory operands) + por while
LLVM emits pshufd $245 + pblendw $204.
With -mavx2, the f2 inefficiency remains, and for f4 we emit 2x vpshufb with
memory operands + vpor while LLVM emits vpermilps $245 + vblendps $170.
f6-f9 are all insns that we handle through a single insn and that plus f3 are
the roadblocks to build the f2 and f4 permutations more efficiently.

[Bug c++/82235] Copy ctor is not found for copying array of an object when it's marked explicit

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82235

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #4 from Marek Polacek  ---
Some debugging notes.

We're synthesizing the Bar::Bar(const Bar&) constructor in 
do_build_copy_constructor which creates a list of Bar's fields along with their
initializers.  Here we have

  m D.2398->m

because we're initializing m from m of a copy.  We pass this list down to
finish_mem_initializers, which passes each pair to perform_member_init. 
perform_member_init sees that we're initializing an array so creates a
VEC_INIT_EXPR via build_vec_init_expr.

VEC_INIT_EXPRs are expanded in cp_gimplify_expr, so we call build_vec_init to
do so.  We're initializing an array from another array and so do

4508   else if (type_build_ctor_call (type))
4509 elt_init = build_aggr_init (to, from, 0, complain)

where to = *D.2445 and from = *D.2447.

and build_aggr_init has

1822   if (init && init != void_type_node
1823   && TREE_CODE (init) != TREE_LIST
1824   && !(TREE_CODE (init) == TARGET_EXPR
1825&& TARGET_EXPR_DIRECT_INIT_P (init))
1826   && !DIRECT_LIST_INIT_P (init))
1827 flags |= LOOKUP_ONLYCONVERTING;

and init is an indirect_ref so we set L_O, never realizing that in this case we
don't want to set L_O.

I suppose we could introduce VEC_INIT_EXPR_DIRECT_INIT_P, set it in
perform_member_init, and then use it in cp_gimplify_expr to let build_aggr_init
know not to set L_O.  Because cp_gimplify_expr can't know in what context the
VEC_INIT_EXPR was created.

[Bug c++/98232] [9 Regression] ICE when compiling libreoffice

2021-02-05 Thread ht990332 at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232

--- Comment #7 from Hussam Al-Tayeb  ---
The patch in bug 95719 fixes the ICE. Can you please backport it to the gcc-9
branch?
Also we need some methodology for followup patches so they are marked as
candidates for stable branches as well.

In this case https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ddb93ce77374004 is
the initial patch which was applied to 10 and 9 branches.
The a followup patch
https://gcc.gnu.org/g:554eb7d2e1ef5660d6a8e1c12ee1d751a70bbf31 was only applied
in gcc-10 branch but not gcc-9 branch.

[Bug lto/83997] ICE with alias template and attribute

2021-02-05 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83997

Jason Merrill  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

--- Comment #2 from Jonathan Wakely  ---
If we were going to do this, we could also make std::optional occupy a
single byte, using one bit for the value and one for the engaged flag.

[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||ABI

--- Comment #1 from Jonathan Wakely  ---
This would be an ABI break, and so not going to happen.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #14 from Jakub Jelinek  ---
WIP that implements that.  Except that we need some permutation expansion
improvements, both for the SSE2 V4SImode permutation cases and for AVX2
V8SImode permutation cases.

--- gcc/config/i386/sse.md.jj   2021-02-05 14:32:44.175463716 +0100
+++ gcc/config/i386/sse.md  2021-02-05 18:49:29.621590903 +0100
@@ -12458,7 +12458,7 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])

-(define_insn "ashr3"
+(define_insn "ashr3"
   [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v")
(ashiftrt:VI248_AVX512BW_AVX512VL
  (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand"
"v,vm")
@@ -12472,6 +12472,125 @@
(const_string "0")))
(set_attr "mode" "")])

+(define_expand "ashr3"
+  [(set (match_operand:VI248_AVX512BW 0 "register_operand")
+   (ashiftrt:VI248_AVX512BW
+ (match_operand:VI248_AVX512BW 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "ashrv4di3"
+  [(set (match_operand:V4DI 0 "register_operand")
+   (ashiftrt:V4DI
+ (match_operand:V4DI 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_AVX2"
+{
+  if (!TARGET_AVX512VL)
+{
+  if (CONST_INT_P (operands[2]) && UINTVAL (operands[2]) >= 63)
+   {
+ rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode));
+ emit_insn (gen_avx2_gtv4di3 (operands[0], zero, operands[1]));
+ DONE;
+   }
+  if (operands[2] == const0_rtx)
+   {
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+   }
+  if (CONST_INT_P (operands[2]))
+   {
+ vec_perm_builder sel (8, 8, 1);
+ sel.quick_grow (8);
+ rtx arg0, arg1;
+ rtx op1 = lowpart_subreg (V8SImode, operands[1], V4DImode);
+ rtx target = gen_reg_rtx (V8SImode);
+ if (INTVAL (operands[2]) > 32)
+   {
+ arg0 = gen_reg_rtx (V8SImode);
+ arg1 = gen_reg_rtx (V8SImode);
+ emit_insn (gen_ashrv8si3 (arg1, op1, GEN_INT (31)));
+ emit_insn (gen_ashrv8si3 (arg0, op1,
+   GEN_INT (INTVAL (operands[2]) - 32)));
+ sel[0] = 1;
+ sel[1] = 9;
+ sel[2] = 3;
+ sel[3] = 11;
+ sel[4] = 5;
+ sel[5] = 13;
+ sel[6] = 7;
+ sel[7] = 15;
+   }
+ else if (INTVAL (operands[2]) == 32)
+   {
+ arg0 = op1;
+ arg1 = gen_reg_rtx (V8SImode);
+ emit_insn (gen_ashrv8si3 (arg1, op1, GEN_INT (31)));
+ sel[0] = 1;
+ sel[1] = 9;
+ sel[2] = 3;
+ sel[3] = 11;
+ sel[4] = 5;
+ sel[5] = 13;
+ sel[6] = 7;
+ sel[7] = 15;
+   }
+ else
+   {
+ arg0 = gen_reg_rtx (V2DImode);
+ arg1 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_lshrv2di3 (arg0, operands[1], operands[2]));
+ emit_insn (gen_ashrv4si3 (arg1, op1, operands[2]));
+ arg0 = lowpart_subreg (V4SImode, arg0, V2DImode);
+ sel[0] = 0;
+ sel[1] = 9;
+ sel[2] = 2;
+ sel[3] = 11;
+ sel[4] = 4;
+ sel[5] = 13;
+ sel[6] = 6;
+ sel[7] = 15;
+   }
+ vec_perm_indices indices (sel, 2, 8);
+ bool ok = targetm.vectorize.vec_perm_const (V8SImode, target,
+ arg0, arg1, indices);
+ gcc_assert (ok);
+ emit_move_insn (operands[0],
+ lowpart_subreg (V4DImode, target, V8SImode));
+ DONE;
+   }
+
+  rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode));
+  rtx zero_or_all_ones = gen_reg_rtx (V4DImode);
+  emit_insn (gen_avx2_gtv4di3 (zero_or_all_ones, zero, operands[1]));
+  rtx lshr_res = gen_reg_rtx (V4DImode);
+  emit_insn (gen_lshrv4di3 (lshr_res, operands[1], operands[2]));
+  rtx ashl_res = gen_reg_rtx (V4DImode);
+  rtx amount;
+  if (TARGET_64BIT)
+   {
+ amount = gen_reg_rtx (DImode);
+ emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)),
+operands[2]));
+   }
+  else
+   {
+ rtx temp = gen_reg_rtx (SImode);
+ emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)),
+lowpart_subreg (SImode, operands[2],
+DImode)));
+ amount = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode),
+   temp));
+   }
+  amount = lowpart_subreg (DImode, amount, GET_MODE (amount));
+  emit_insn (gen_ashlv4di

[Bug libstdc++/98978] New: Consider packing _M_Engaged in the tail padding of T in optional<>

2021-02-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978

Bug ID: 98978
   Summary: Consider packing _M_Engaged in the tail padding of T
in optional<>
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Using std::optional with some types may considerably increase object sizes
since it adds alignof(T) bytes worth of overhead. Sometimes it is possible to
avoid this overhead if the flag indicating presence of the stored value
(_M_Engaged in libstdc++ sources) is placed in the tail padding of the T
object. This can be done if std::optional constructs an object of a type that
derives from T, which has an additional bool data member that is initialized to
true upon construction. The below code roughly illustrates the idea:

template< typename T >
struct _Optional_payload_base
{
  struct _PresentT : T
  {
const bool _M_Engaged = true;

// Forwarding ctors and other members
  };

  static constexpr size_t engaged_offset = offsetof(_PresentT, _M_Engaged);

  struct _AbsentT
  {
unsigned char _M_Offset[engaged_offset];
const bool _M_Engaged = false;
  };

  union _Storage
  {
_AbsentT _M_Empty;
_PresentT _M_Value;

_Storage() : _M_Empty() {}

// Forwarding ctors and other members
  };

  _Storage _M_payload

  bool is_engaged() const noexcept
  {
return *reinterpret_cast< const bool* >(reinterpret_cast< const unsigned
char* >(&_M_payload) + engaged_offset);
  }
};

The above relies on some implementation details, such as:

- offsetof works for the type T. It does for many types in gcc, beyond what is
required by the C++ standard. Maybe there is a way to avoid offsetof, I just
didn't immediately see it.
- The location of _M_Engaged in both _PresentT and _AbsentT is the same. This
is a property of the target ABI, and AFAICS it should be true at least on x86
psABI and I think Microsoft ABI.

The above will only work for non-final class types, for other types, and where
the above requirements don't hold true, the current code with a separate
_M_Engaged flag would work.

[Bug c++/93788] Segfault caused by infinite loop in cc1plus

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93788

Marek Polacek  changed:

   What|Removed |Added

 CC||zhan3299 at purdue dot edu

--- Comment #3 from Marek Polacek  ---
*** Bug 98972 has been marked as a duplicate of this bug. ***

[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||mpolacek at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #4 from Marek Polacek  ---
Looks like a dup.

*** This bug has been marked as a duplicate of bug 93788 ***

[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus

2021-02-05 Thread zhan3299 at purdue dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972

--- Comment #3 from Zhuo Zhang  ---
I reduced the test-case, and the simplest test-case should be:

--- crash1.cc starts ---
constexpr p([](register const signed struct s;
--- crash1.cc ends ---

The bug is also reproduced on the commit
8d0737d8f4b10bffe0411507ad2dc21ba7679883.

Hope it can help. Thanks.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #13 from Jakub Jelinek  ---
Looking at what other compilers emit for this, ICC seems to be completely
broken, it emits logical right shifts instead of arithmetic right shift, and
LLVM trunk emits for >> 63 what this patch emits, for >> 17 it emits
vpsrad  $17, %xmm0, %xmm1
vpsrlq  $17, %xmm0, %xmm0
vpblendd$10, %xmm1, %xmm0, %xmm0
instead of
vpxor   %xmm1, %xmm1, %xmm1
vpcmpgtq%xmm0, %xmm1, %xmm1
vpsrlq  $17, %xmm0, %xmm0
vpsllq  $47, %xmm1, %xmm1
vpor%xmm1, %xmm0, %xmm0
the patch emits.  For >> 47 it emits:
vpsrad  $31, %xmm0, %xmm1
vpsrad  $15, %xmm0, %xmm0
vpshufd $245, %xmm0, %xmm0
vpblendd$10, %xmm1, %xmm0, %xmm0
etc.
So, in summary, for >> 63 with SSE4.2 I think what the patch does looks best,
for >> 63 and SSE2 we can emit psrad $31 instead and permute the odd elements
into even ones (i.e. __builtin_shuffle ((v4si) x >> 31, { 1, 1, 3, 3 })).
For >> cst where cst < 32, do a psrad and psrlq by that cst and permute such
that
we get the even SI elts from the psrlq result and odd from psrad result.
For >> 32, do a psrad $31 and permute to get the even SI elts from odd elts of
the source and odd SI elts from odd results of psrad $31.
For >> cst where cst > 32, do psrad $31 and psrad $(cst-32) and permute
such that even SI elts come from odd elts of the latter and odd elts come from
odd elts of the former.

[Bug c++/98947] [10 Regression] Incorrect warning when using a ternary operator to select one of two volatile variables to write to

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98947

Marek Polacek  changed:

   What|Removed |Added

Summary|[10/11 Regression]  |[10 Regression] Incorrect
   |Incorrect warning when  |warning when using a
   |using a ternary operator to |ternary operator to select
   |select one of two volatile  |one of two volatile
   |variables to write to   |variables to write to

--- Comment #4 from Marek Polacek  ---
Fixed on trunk so far.

[Bug c++/98947] [10/11 Regression] Incorrect warning when using a ternary operator to select one of two volatile variables to write to

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98947

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:7a18bc4ae62081021f4fd90d591a588cac931f77

commit r11-7126-g7a18bc4ae62081021f4fd90d591a588cac931f77
Author: Marek Polacek 
Date:   Wed Feb 3 17:57:22 2021 -0500

c++: Fix bogus -Wvolatile warning in C++20 [PR98947]

Since most of volatile is deprecated in C++20, we are required to warn
for compound assignments to volatile variables and so on.  But here we
have

  volatile int x, y, z;
  (b ? x : y) = 1;

and we shouldn't warn, because simple assignments like x = 24; should
not provoke the warning when they are a discarded-value expression.

We warn here because when ?: is used as an lvalue, we transform it in
cp_build_modify_expr/COND_EXPR from (a ? b : c) = rhs to

  (a ? (b = rhs) : (c = rhs))

and build_conditional_expr then calls mark_lvalue_use for the new
artificial assignments, which then evokes the warning.  The calls
to mark_lvalue_use were added in r160289 to suppress warnings in
Wunused-var-10.c, but looks like they're no longer needed.

To warn on

(b ? (x = 2) : y) = 1;
(b ? x : (y = 5)) = 1;

I've tweaked a check in mark_use/MODIFY_EXPR.

I'd argue this is a regression because GCC 9 doesn't warn.

gcc/cp/ChangeLog:

PR c++/98947
* call.c (build_conditional_expr_1): Don't call mark_lvalue_use
on arg2/arg3.
* expr.c (mark_use) : Don't check read_p when
issuing the -Wvolatile warning.  Only set TREE_THIS_VOLATILE if
a warning was emitted.

gcc/testsuite/ChangeLog:

PR c++/98947
* g++.dg/cpp2a/volatile5.C: New test.

[Bug c++/96462] [10 Regression] ICE in tree check: expected identifier_node, have bit_not_expr in find_namespace_slot, at cp/name-lookup.c:97

2021-02-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96462

Marek Polacek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
Summary|[10/11 Regression] ICE in   |[10 Regression] ICE in
   |tree check: expected|tree check: expected
   |identifier_node, have   |identifier_node, have
   |bit_not_expr in |bit_not_expr in
   |find_namespace_slot, at |find_namespace_slot, at
   |cp/name-lookup.c:97 |cp/name-lookup.c:97

--- Comment #6 from Marek Polacek  ---
Fixed in GCC 11.

[Bug c++/96462] [10/11 Regression] ICE in tree check: expected identifier_node, have bit_not_expr in find_namespace_slot, at cp/name-lookup.c:97

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96462

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:1cbc10d894494c34987d1f42f955e7843457ee38

commit r11-7125-g1cbc10d894494c34987d1f42f955e7843457ee38
Author: Marek Polacek 
Date:   Thu Feb 4 12:53:59 2021 -0500

c++: Fix ICE with invalid using enum [PR96462]

Here we ICE in finish_nonmember_using_decl -> lookup_using_decl ->
... -> find_namespace_slot because "name" is not an IDENTIFIER_NODE.
It is a BIT_NOT_EXPR because this broken test uses

  using E::~E; // SCOPE::NAME

A using-decl can't refer to a destructor, and lookup_using_decl already
checks that in the class member case.  But in C++17, we do the "enum
scope is the enclosing scope" block, and so scope gets set to ::, and
we go into the NAMESPACE_DECL block.  In C++20 we don't do it, we go
to the ENUMERAL_TYPE block.

I resorted to hoisting the check along with a diagnostic tweak: we
don't want to print "~E names destructor".

gcc/cp/ChangeLog:

PR c++/96462
* name-lookup.c (lookup_using_decl): Hoist the destructor check.

gcc/testsuite/ChangeLog:

PR c++/96462
* g++.dg/cpp2a/using-enum-8.C: New test.

[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012

2021-02-05 Thread akrl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931

--- Comment #12 from akrl at gcc dot gnu.org ---
Right LE is 4 bytes, good catch thanks

[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931

--- Comment #11 from Jakub Jelinek  ---
Isn't the normal length of short le lr, 1b 4 bytes rather than 2?

[Bug tree-optimization/97236] [8 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||9.3.1
 CC||ktkachov at gcc dot gnu.org
Summary|[8/9 Regression]|[8 Regression]
   |g:e93428a8b056aed83a7678|g:e93428a8b056aed83a7678
   |triggers vlc miscompile |triggers vlc miscompile

--- Comment #14 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 9.4

[Bug tree-optimization/97236] [8/9 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236

--- Comment #13 from ktkachov at gcc dot gnu.org ---
*** Bug 98949 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/98949] gcc-9.3 aarch64 -ftree-vectorize generates wrong code

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98949

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Dup. The patch fixing PR 97236 has been backported to the GCC 9 branch for GCC
9.4

*** This bug has been marked as a duplicate of bug 97236 ***

[Bug tree-optimization/97236] [8/9 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236

--- Comment #12 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Kyrylo Tkachov
:

https://gcc.gnu.org/g:97b668f9a8c6ec565c278a60e7d1492a6932e409

commit r9-9224-g97b668f9a8c6ec565c278a60e7d1492a6932e409
Author: Matthias Klose 
Date:   Tue Oct 6 13:41:37 2020 +0200

Backport fix for PR/tree-optimization/97236 - fix bad use of
VMAT_CONTIGUOUS

This avoids using VMAT_CONTIGUOUS with single-element interleaving
when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
continue to avoid load-lanes and gathers.

2020-10-01  Richard Biener  

PR tree-optimization/97236
* tree-vect-stmts.c (get_group_load_store_type): Keep
VMAT_ELEMENTWISE for single-element vectors.

* gcc.dg/vect/pr97236.c: New testcase.

(cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)

Re: [Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012

2021-02-05 Thread Andrea Corallo via Gcc-bugs
Following suggestions I'm testing the attached emitting the following
for long branches where LE cannot cover:

subslr, #1
bmi .L2

>From 0cd38cb29829b48f96e8e060e7a875f49236b67b Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Wed, 3 Feb 2021 15:21:54 +0100
Subject: [PATCH] arm: Add low overhead loop address range check [PR98931]

2021-02-05  Andrea Corallo  

* config/arm/thumb2.md: Generate alternative sequence for long
range branches.
---
 gcc/config/arm/thumb2.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index bd53bf320de..a8327066bfe 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1719,7 +1719,18 @@
   (set (reg:SI LR_REGNUM)
(plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
   "TARGET_32BIT && TARGET_HAVE_LOB"
-  "le\t%|lr, %l0")
+  "*
+  if (get_attr_length (insn) == 2)
+return \"le\\t%|lr, %l0\";
+  else
+return \"subs\\t%|lr, #1\;bmi\\t%l0\";
+  "
+  [(set (attr "length")
+(if_then_else
+(lt (minus (pc) (match_dup 0)) (const_int 1024))
+   (const_int 2)
+   (const_int 6)))
+   (set_attr "type" "branch")])
 
 (define_expand "doloop_begin"
   [(match_operand 0 "" "")
-- 
2.20.1



[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012

2021-02-05 Thread andrea.corallo at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931

--- Comment #10 from Andrea Corallo  ---
Following suggestions I'm testing the attached emitting the following
for long branches where LE cannot cover:

subslr, #1
bmi .L2

[Bug c++/95719] [10/11 Regression] ICE in lookup_vfn_in_binfo at gcc/cp/class.c:2459 since r11-954-g0ddb93ce77374004

2021-02-05 Thread ht990332 at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95719

Hussam Al-Tayeb  changed:

   What|Removed |Added

 CC||ht990332 at gmx dot com

--- Comment #6 from Hussam Al-Tayeb  ---
gcc-9 branch also has a backport of
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ddb93ce77374004 which caused
the regression.
Can the fix for this bug be backported to the gcc-9 branch please?

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

--- Comment #10 from Jakub Jelinek  ---
Ugh, that is quite misdesigned then...

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #12 from Jakub Jelinek  ---
V4DImode arithmetic right shifts would be (untested):
--- gcc/config/i386/sse.md.jj   2021-02-05 14:32:44.175463716 +0100
+++ gcc/config/i386/sse.md  2021-02-05 15:24:37.942026401 +0100
@@ -12458,7 +12458,7 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])

-(define_insn "ashr3"
+(define_insn "ashr3"
   [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v")
(ashiftrt:VI248_AVX512BW_AVX512VL
  (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand"
"v,vm")
@@ -12472,6 +12472,67 @@
(const_string "0")))
(set_attr "mode" "")])

+(define_expand "ashr3"
+  [(set (match_operand:VI248_AVX512BW 0 "register_operand")
+   (ashiftrt:VI248_AVX512BW
+ (match_operand:VI248_AVX512BW 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "ashrv4di3"
+  [(set (match_operand:V4DI 0 "register_operand")
+   (ashiftrt:V4DI
+ (match_operand:V4DI 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_AVX2"
+{
+  if (!TARGET_AVX512VL)
+{
+  if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 63)
+   {
+ rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode));
+ emit_insn (gen_avx2_gtv4di3 (operands[0], zero, operands[1]));
+ DONE;
+   }
+  if (operands[2] == const0_rtx)
+   {
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+   }
+
+  rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode));
+  rtx zero_or_all_ones = gen_reg_rtx (V4DImode);
+  emit_insn (gen_avx2_gtv4di3 (zero_or_all_ones, zero, operands[1]));
+  rtx lshr_res = gen_reg_rtx (V4DImode);
+  emit_insn (gen_lshrv4di3 (lshr_res, operands[1], operands[2]));
+  rtx ashl_res = gen_reg_rtx (V4DImode);
+  rtx amount;
+  if (CONST_INT_P (operands[2]))
+   amount = GEN_INT (64 - INTVAL (operands[2]));
+  else if (TARGET_64BIT)
+   {
+ amount = gen_reg_rtx (DImode);
+ emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)),
+operands[2]));
+   }
+  else
+   {
+ rtx temp = gen_reg_rtx (SImode);
+ emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)),
+lowpart_subreg (SImode, operands[2],
+DImode)));
+ amount = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode),
+   temp));
+   }
+  if (!CONST_INT_P (operands[2]))
+   amount = lowpart_subreg (DImode, amount, GET_MODE (amount));
+  emit_insn (gen_ashlv4di3 (ashl_res, zero_or_all_ones, amount));
+  emit_insn (gen_iorv4di3 (operands[0], lshr_res, ashl_res));
+  DONE;
+}
+})
+
 (define_insn "3"
   [(set (match_operand:VI248_AVX512BW_2 0 "register_operand" "=v,v")
(any_lshift:VI248_AVX512BW_2

Trying 3 different routines, one returning >> 63 of a V4DImode vector, another
one >> 17 and another one >> var, the differences with -mavx2 are:
-   vextracti128$0x1, %ymm0, %xmm1
-   vmovq   %xmm0, %rax
-   vpextrq $1, %xmm0, %rcx
-   cqto
-   vmovq   %xmm1, %rax
-   sarq$63, %rcx
-   sarq$63, %rax
-   vmovq   %rdx, %xmm3
-   movq%rax, %rsi
-   vpextrq $1, %xmm1, %rax
-   vpinsrq $1, %rcx, %xmm3, %xmm0
-   sarq$63, %rax
-   vmovq   %rsi, %xmm2
-   vpinsrq $1, %rax, %xmm2, %xmm1
-   vinserti128 $0x1, %xmm1, %ymm0, %ymm0
+   vmovdqa %ymm0, %ymm1
+   vpxor   %xmm0, %xmm0, %xmm0
+   vpcmpgtq%ymm1, %ymm0, %ymm0

-   vmovq   %xmm0, %rax
-   vextracti128$0x1, %ymm0, %xmm1
-   vpextrq $1, %xmm0, %rcx
-   sarq$17, %rax
-   sarq$17, %rcx
-   movq%rax, %rdx
-   vmovq   %xmm1, %rax
-   sarq$17, %rax
-   vmovq   %rdx, %xmm3
-   movq%rax, %rsi
-   vpextrq $1, %xmm1, %rax
-   vpinsrq $1, %rcx, %xmm3, %xmm0
-   sarq$17, %rax
-   vmovq   %rsi, %xmm2
-   vpinsrq $1, %rax, %xmm2, %xmm1
-   vinserti128 $0x1, %xmm1, %ymm0, %ymm0
+   vpxor   %xmm1, %xmm1, %xmm1
+   vpcmpgtq%ymm0, %ymm1, %ymm1
+   vpsrlq  $17, %ymm0, %ymm0
+   vpsllq  $47, %ymm1, %ymm1
+   vpor%ymm1, %ymm0, %ymm0

and

-   movl%edi, %ecx
-   vmovq   %xmm0, %rax
-   vextracti128$0x1, %ymm0, %xmm1
-   sarq%cl, %rax
-   vpextrq $1, %xmm0, %rsi
-   movq%rax, %rdx
-   vmovq   %xmm1, %rax
-   sarq%cl, %rsi
-   sarq%cl, %rax
-   vmovq   %rdx, %xmm3
-   movq%rax, %rdi
-   vpextrq $1, %xmm1, %rax
-   vpinsrq $1, %rsi, %xmm3, %xmm0
-   sarq%cl, %rax
+   vpxor   %xmm1, %xmm1, %xmm1
+   movslq  %edi,

[Bug c++/98232] [9 Regression] ICE when compiling libreoffice

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232

--- Comment #6 from Martin Liška  ---
(In reply to Hussam Al-Tayeb from comment #4)
> (In reply to Martin Liška from comment #3)
> > Run build system with in a verbose mode (V=1 or VERBOSE=1), or so.
> > And then for the problematic TU do -E, which will save pre-processed source
> > file instead of the object file.
> 
> Can you please tell me what to type for -E? And what is a TU?

You need to display full command line where vcl/workben/vcldemo.cxx is
compiler.
In order to do that do: make V=1 VERBOSE=1

then take the command line and append '-E'.
And attach pre-processed source file that will be in '-o .o' file.

[Bug c++/98232] [9 Regression] ICE when compiling libreoffice

2021-02-05 Thread ht990332 at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232

--- Comment #5 from Hussam Al-Tayeb  ---
I also found this https://bugzilla.redhat.com/show_bug.cgi?id=1858036

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

--- Comment #9 from Florian Weimer  ---
(In reply to Jakub Jelinek from comment #8)
> Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc
> doesn't support that symbol looks wrong.

I think all the interceptors use unversioned symbols, so this doesn't matter.

Yes, it's quite broken, but when fixing this, you might as well go with the
flow …

[Bug c++/98232] [9 Regression] ICE when compiling libreoffice

2021-02-05 Thread ht990332 at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232

--- Comment #4 from Hussam Al-Tayeb  ---
(In reply to Martin Liška from comment #3)
> Run build system with in a verbose mode (V=1 or VERBOSE=1), or so.
> And then for the problematic TU do -E, which will save pre-processed source
> file instead of the object file.

Can you please tell me what to type for -E? And what is a TU?

[Bug target/98977] New: [x86] Failure to optimize consecutive sub flags usage

2021-02-05 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98977

Bug ID: 98977
   Summary: [x86] Failure to optimize consecutive sub flags usage
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

extern bool z, c;

uint8_t f(uint8_t dest, uint8_t src)
{
u8 res = dest - src;
z = !res;
c = src > dest;
return res;
}

With -O3, LLVM outputs this:

f(unsigned char, unsigned char):
  mov eax, edi
  sub al, sil
  sete byte ptr [rip + z]
  setb byte ptr [rip + c]
  ret

GCC outputs this:

f(unsigned char, unsigned char):
  mov eax, edi
  sub al, sil
  sete BYTE PTR z[rip]
  cmp dil, sil
  setb BYTE PTR c[rip]
  ret

It seems desirable to eliminate the `cmp`, unless there's some weird flag stall
thing I'm not aware of.

[Bug analyzer/98969] [11 Regression] ICE: Segmentation fault (in print_mem_ref)

2021-02-05 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98969

David Malcolm  changed:

   What|Removed |Added

   Assignee|msebor at gcc dot gnu.org  |dmalcolm at gcc dot 
gnu.org

--- Comment #6 from David Malcolm  ---
Mine; the analyzer shouldn't ICE by constructing malformed trees.

Also, the leak diagnostic is arguably a false positive in that
  (struct TYPE_14__ *) _round_2_cb_n_0
is still effectively reachable by the caller after the function returns.

[Bug driver/98943] [11 Regression] gcc driver does not fail on unknown files: tricks configure scripts to recognize /W4 and -diag-disable 1,2,3,4 options

2021-02-05 Thread nathan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98943

Nathan Sidwell  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Nathan Sidwell  ---
 6606b852bfa 2021-02-04 | driver: error for nonexistent linker inputs [PR
98943]

I think that's sufficient, but please reopen if it is not.

[Bug driver/98943] [11 Regression] gcc driver does not fail on unknown files: tricks configure scripts to recognize /W4 and -diag-disable 1,2,3,4 options

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98943

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Nathan Sidwell :

https://gcc.gnu.org/g:6606b852bfa866c19375a7c5e9cb94776a28bd94

commit r11-7124-g6606b852bfa866c19375a7c5e9cb94776a28bd94
Author: Nathan Sidwell 
Date:   Thu Feb 4 08:16:17 2021 -0800

driver: error for nonexistent linker inputs [PR 98943]

We used to check all unknown input files, even when passing them to a
compiler.  But that caused problems.  However, not erroring out on
non-existent would-be-linker inputs confuses configure machinery that
probes the compiler to see if it accepts various inputs.  This
restores the access check for things that are thought to be linker
input files, when we're not linking.  (If we are linking, we presume
the linker will error out on its own accord.)

PR driver/98943
gcc/
* gcc.c (driver::maybe_run_linker): Check for input file
accessibility if not linking.
gcc/testsuite/
* c-c++-common/pr98943.c: New.

[Bug analyzer/98969] [11 Regression] ICE: Segmentation fault (in print_mem_ref)

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98969

--- Comment #5 from Jakub Jelinek  ---
Yeah, seems the analyzer looked through the cast, so either it shouldn't, or it
needs to readd the cast in there.

As for print_mem_ref, if we wanted to protect it from bogus MEM_REF creation
(not sure about if we want to), the right change IMHO would be to set
access_type to
NULL_TREE if TREE_TYPE (arg) doesn't have POINTER_TYPE_P, and in the spots that
use access_type treat access_type NULL as unknown access type, e.g. access_cast
should be true if access_type is NULL, and char_cast should be true too.

[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

--- Comment #4 from Jakub Jelinek  ---
The only thing that should be fixed is whatever code invokes the UB.
There is no bug on the compiler side, you essentially end up with
__builtin_unreachable (); in place of the loop.
You can use -fsanitize=unreachable to get a runtime diagnostics instead if the
UB is turned into __builtin_unreachable ().

[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Jakub Jelinek  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek  ---
For V2DImode arithmetic right shift, I think it would be something like:
--- gcc/config/i386/sse.md.jj   2021-01-27 11:50:09.168981297 +0100
+++ gcc/config/i386/sse.md  2021-02-05 14:32:44.175463716 +0100
@@ -20313,10 +20313,55 @@ (define_expand "ashrv2di3"
(ashiftrt:V2DI
  (match_operand:V2DI 1 "register_operand")
  (match_operand:DI 2 "nonmemory_operand")))]
-  "TARGET_XOP || TARGET_AVX512VL"
+  "TARGET_SSE4_2"
 {
   if (!TARGET_AVX512VL)
 {
+  if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 63)
+   {
+ rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode));
+ emit_insn (gen_sse4_2_gtv2di3 (operands[0], zero, operands[1]));
+ DONE;
+   }
+  if (operands[2] == const0_rtx)
+   {
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+   }
+  if (!TARGET_XOP)
+   {
+ rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode));
+ rtx zero_or_all_ones = gen_reg_rtx (V2DImode);
+ emit_insn (gen_sse4_2_gtv2di3 (zero_or_all_ones, zero, operands[1]));
+ rtx lshr_res = gen_reg_rtx (V2DImode);
+ emit_insn (gen_lshrv2di3 (lshr_res, operands[1], operands[2]));
+ rtx ashl_res = gen_reg_rtx (V2DImode);
+ rtx amount;
+ if (CONST_INT_P (operands[2]))
+   amount = GEN_INT (64 - INTVAL (operands[2]));
+ else if (TARGET_64BIT)
+   {
+ amount = gen_reg_rtx (DImode);
+ emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)),
+operands[2]));
+   }
+ else
+   {
+ rtx temp = gen_reg_rtx (SImode);
+ emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)),
+lowpart_subreg (SImode, operands[2],
+DImode)));
+ amount = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode),
+   temp));
+   }
+ if (!CONST_INT_P (operands[2]))
+   amount = lowpart_subreg (DImode, amount, GET_MODE (amount));
+ emit_insn (gen_ashlv2di3 (ashl_res, zero_or_all_ones, amount));
+ emit_insn (gen_iorv2di3 (operands[0], lshr_res, ashl_res));
+ DONE;
+   }
+
   rtx reg = gen_reg_rtx (V2DImode);
   rtx par;
   bool negate = false;
plus adjusting the cost computation to hint that at least the non-63 arithmetic
right V2DImode shifts are more expensive.

Even if in the end the V2DImode arithmetic right shifts turn to be more
expensive than scalar code (though, it surprises me at least for the >> 63
case),
I think V4DImode for TARGET_AVX2 should be beneficial always (haven't tried to
adjust the expander for that yet).

[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread e.meissner at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

--- Comment #3 from Emil Meissner  ---
(In reply to Jakub Jelinek from comment #2)
> And the bug is?  The code always invokes undefined behavior, so anything can
> happen.

Whilst that is true, shouldn't it still be fixed, given (possible) security
implications?

[Bug lto/98971] LTO removes __patchable_function_entries

2021-02-05 Thread gagomes at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971

--- Comment #5 from Gabriel F. T. Gomes  ---
(In reply to Martin Liška from comment #4)
>
> Well, the intermediate object contains just LTO bytecode, that's why you
> can't see the section. You can use -ffat-lto-objects in order to generate
> both assembly and LTO bytecode.

Indeed. Thanks again! :)

[Bug c++/98976] New: [coroutines] co_return in a switch statement doesn't make a generic lambda non-constexpr

2021-02-05 Thread mail+gnu at tzik dot jp via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98976

Bug ID: 98976
   Summary: [coroutines] co_return in a switch statement doesn't
make a generic lambda non-constexpr
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mail+gnu at tzik dot jp
  Target Milestone: ---

In a repro case below, the lambda is wrongly handled as a constexpr and its
co_return causes a compile error, as a coroutine can not be constexpr.
https://wandbox.org/permlink/y4pEMCNki1ndzJYI

The gcc here is the trunk version as of today, and the command was:
$ g++ -c -std=c++20 failcase.cc
--- failcase.cc
#include 

struct future {
  struct promise_type {
std::suspend_always initial_suspend() noexcept { return {};}
std::suspend_always final_suspend() noexcept { return {}; }
void unhandled_exception() {}
future get_return_object() { return {}; }
void return_void() {}
  };
};

void failcase() {
  auto foo = [](auto&&) -> future {
switch (42) {
  case 42:
co_return;
}
  };
  foo(1);
}

The error message was:

prog.cc: In instantiation of 'failcase():: [with auto:1 =
int]':
prog.cc:20:9:   required from here
prog.cc:17:9: error: 'co_return' cannot be used in a 'constexpr' function
   17 | co_return;
  | ^

[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jakub Jelinek  ---
And the bug is?  The code always invokes undefined behavior, so anything can
happen.

[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread e.meissner at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

--- Comment #1 from Emil Meissner  ---
The code in the attachment, compiled with `g++ file.cpp -o bug -O3 -std=c++20`
produces no assembly for both the `main` and `bsort` function`. (I.e. not even
a `ret` instruction), ultimating in a segmentation fault when run.

The code has an intentional bug in it, where instead of comparing `j <
std::size(arr)` we compare `i < std::size(arr)`. I couldn't further simplify
the example.

Compiling with -O2 and -O1 produces the expected infinite loop.

I suspect this may be exploitable.

[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus

2021-02-05 Thread zhan3299 at purdue dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972

--- Comment #2 from Zhuo Zhang  ---
(In reply to Martin Liška from comment #1)
> Thank you for the report. Actually, it's an invalid code and we do have a
> lot of error recovery ICEs.
> Or do you have an original test-case that is a valid C++ code?

Hi, thanks for your prompt reply. I think I do not have a valid C++ code, as
this test-case is generated by fuzzer.

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

--- Comment #8 from Jakub Jelinek  ---
Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc
doesn't support that symbol looks wrong.

[Bug c++/98975] New: Infinite loop produces no assembly (including returning) with -O3

2021-02-05 Thread e.meissner at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975

Bug ID: 98975
   Summary: Infinite loop produces no assembly (including
returning) with -O3
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e.meissner at seznam dot cz
  Target Milestone: ---

Created attachment 50134
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50134&action=edit
Code producing the bug

[Bug lto/98971] LTO removes __patchable_function_entries

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971

--- Comment #4 from Martin Liška  ---
> The only difference now is that the intermediate object doesn't have a
> __patchable_function_entries section, but that's OK as far as I can tell.

Well, the intermediate object contains just LTO bytecode, that's why you can't
see the section. You can use -ffat-lto-objects in order to generate both
assembly and LTO bytecode.

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

Florian Weimer  changed:

   What|Removed |Added

 CC||fw at gcc dot gnu.org

--- Comment #7 from Florian Weimer  ---
I think libsanitizer falls back to a version-less lookup if the version cannot
be found. Therefore, if the glibc baseline is after 2.3.4, the version-less
lookup will find the unversioned symbol, which has the right behavior.

I don't see any architecture that has two regexec symbols, but does not use
GLIBC_2.3.4 for the most recent symbol, based on this command in the glibc
source tree:

git grep -c ' regexec F' | grep :2$ | cut -d: -f1 | xargs grep ' regexec F'

A comment in the interceptor might make sense, though.

[Bug lto/98971] LTO removes __patchable_function_entries

2021-02-05 Thread gagomes at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971

--- Comment #3 from Gabriel F. T. Gomes  ---
(In reply to Martin Liška from comment #2)
>
> @Gabriel: Is it intended behavior?

That's what I expected, yes! Thank you.

The only difference now is that the intermediate object doesn't have a
__patchable_function_entries section, but that's OK as far as I can tell.

With:

$ gcc libtesta.c -fPIC -fpatchable-function-entry=4,2 -flto -c -o libtesta.o
$ gcc libtesta.o -flto -shared -o libtesta.so

now I get:

$ readelf --sections libtesta.o | grep __patchable
$ readelf --sections libtesta.so | grep __patchable
  [22] __patchable_[...] PROGBITS 4020  3020

Cheers!

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

--- Comment #6 from Jakub Jelinek  ---
Well, it is not about what arches you care about, but what arches we support in
libsanitizer/configure.tgt (from *-linux*).
So, riscv64, aarch64, mips, arm, s390*, sparc*, powerpc*, x86.
So it is desirable to get it right for all these.
Thus, I think you want to use GLIBC_2.3.4 for Linux except on arm, riscv*,
x86_64 -mx32, powerpc64le and aarch64.
So you need to look up SANITIZE* macros for all of these...

[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Richard Biener  ---
Fixed.

[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:63538886d1f7fc7cbf066b4c2d6d7fd4da537259

commit r11-7123-g63538886d1f7fc7cbf066b4c2d6d7fd4da537259
Author: Richard Biener 
Date:   Fri Feb 5 09:54:00 2021 +0100

tree-optimization/98855 - redo BB vectorization costing

The following attempts to account for the fact that BB vectorization
regions now can span multiple loop levels and that an unprofitable
inner loop vectorization shouldn't be offsetted by a profitable
outer loop vectorization to make it overall profitable.

For now I've implemented a heuristic based on the premise that
vectorization should be profitable even if loops may not be entered
or if they iterate any number of times.  Especially the first
assumption then requires that stmts directly belonging to loop A
need to be costed separately from stmts belonging to another loop
which also simplifies the implementation.

On x86 the added testcase has in the outer loop

t.c:38:20: note: Cost model analysis for part in loop 1:
  Vector cost: 56
  Scalar cost: 192

and the inner loop

t.c:38:20: note: Cost model analysis for part in loop 2:
  Vector cost: 132
  Scalar cost: 48

and thus the vectorization is considered not profitable
(note the same would happen in case the 2nd cost were for
a loop outer to the 1st costing).

Future enhancements may consider static knowledge of whether
a loop is always entered which would allow some inefficiency
in the vectorization of its loop header.  Likewise stmts only
reachable from a loop exit can be treated this way.

2021-02-05  Richard Biener  

PR tree-optimization/98855
* tree-vectorizer.h (add_stmt_cost): New overload.
* tree-vect-slp.c (li_cost_vec_cmp): New.
(vect_bb_slp_scalar_cost): Cost individual loop regions
separately.  Account for the scalar instance root stmt.

* g++.dg/vect/slp-pr98855.cc: New testcase.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #10 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #9)
> For arithmetic >> (element_precision - 1) one can just use
> {,v}pxor + {,v}pcmpgtq, as in instead of return vec >> 63; do return vec < 0;
> (in C++-ish way), aka VEC_COND_EXPR vec < 0, { all ones }, { 0 }
> For other arithmetic shifts by scalar constant, perhaps one can replace
> return vec >> 17; with return (vectype) ((uvectype) vec >> 17) | ((vec < 0)
> << (64 - 17));
> - it will actually work even for non-constant scalar shift amounts because
> {,v}psllq treats shift counts > 63 as 0.

OK, so that yields

poly_double_le2:
.LFB0:
.cfi_startproc
vmovdqu (%rsi), %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpalignr$8, %xmm0, %xmm0, %xmm2
vpcmpgtq%xmm2, %xmm1, %xmm1
vpand   .LC0(%rip), %xmm1, %xmm1
vpsllq  $1, %xmm0, %xmm0
vpxor   %xmm1, %xmm0, %xmm0
vmovdqu %xmm0, (%rdi)
ret

when I feed the following to SLP2 directly:

void __GIMPLE (ssa,guessed_local(1073741824),startwith("slp"))
poly_double_le2 (unsigned char * out, const unsigned char * in)
{
  long unsigned int carry;
  long unsigned int _1;
  long unsigned int _2;
  long unsigned int _3;
  long unsigned int _4;
  long unsigned int _5;
  long unsigned int _6;
  __int128 unsigned _9;
  long unsigned int _14;
  long unsigned int _15;
  long int _18;
  long int _19;
  long unsigned int _20;

  __BB(2,guessed_local(1073741824)):
  _9 = __MEM <__int128 unsigned, 8> ((char *)in_8(D));
  _14 = __BIT_FIELD_REF  (_9, 64u, 64u);
  _18 = (long int) _14;
  _1 = _18 < 0l ? _Literal (unsigned long) -1ul : 0ul;
  carry_10 = _1 & 135ul;
  _2 = _14 << 1;
  _15 = __BIT_FIELD_REF  (_9, 64u, 0u);
  _19 = (long int) _15;
  _20 = _19 < 0l ? _Literal (unsigned long) -1ul : 0ul;
  _3 = _20 & 1ul;
  _4 = _2 ^ _3;
  _5 = _15 << 1;
  _6 = _5 ^ carry_10;
  __MEM  ((char *)out_11(D)) = _6;
  __MEM  ((char *)out_11(D) + _Literal (char *) 8) = _4;
  return;

}

with

   [local count: 1073741824]:
  _9 = MEM <__int128 unsigned> [(char *)in_8(D)];
  _12 = VIEW_CONVERT_EXPR(_9);
  _7 = VEC_PERM_EXPR <_12, _12, { 1, 0 }>;
  vect__18.1_25 = VIEW_CONVERT_EXPR(_7);
  vect_carry_10.3_28 = .VCOND (vect__18.1_25, { 0, 0 }, { 135, 1 }, { 0, 0 },
108);
  vect__5.0_13 = _12 << 1;
  vect__6.4_29 = vect__5.0_13 ^ vect_carry_10.3_28;
  MEM  [(char *)out_11(D)] = vect__6.4_29;
  return;

in .optimized

The latency of the data is at least 7 instructions that way, compared to
4 in the not vectorized code (guess I could try Intel iaca on it).

So if that's indeed the best we can do then it's not profitable (btw,
with the above the vectorizers conclusion is not profitable but due
to excessive costing of constants for the condition vectorization).

Simple asm replacement of the kernel results in

ES-128/XTS 292740 key schedule/sec; 0.00 ms/op 11571 cycles/op (2 ops in 0 ms)
AES-128/XTS encrypt buffer size 1024 bytes: 765.571 MiB/sec 4.62 cycles/byte
(382.79 MiB in 500.00 ms)
AES-128/XTS decrypt buffer size 1024 bytes: 767.064 MiB/sec 4.61 cycles/byte
(382.79 MiB in 499.03 ms)

compared to

AES-128/XTS 283527 key schedule/sec; 0.00 ms/op 11932 cycles/op (2 ops in 0 ms)
AES-128/XTS encrypt buffer size 1024 bytes: 768.446 MiB/sec 4.60 cycles/byte
(384.22 MiB in 500.00 ms)
AES-128/XTS decrypt buffer size 1024 bytes: 769.292 MiB/sec 4.60 cycles/byte
(384.22 MiB in 499.45 ms)

so that's indeed no improvement.  Bigger block sizes also contain vector
code but that's not exercised by the botan speed measurement.

[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920

--- Comment #5 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #3)
> I'm not sure if your patch is correct.
> glibc has the system of earliest symbol versions, and so on certain
> architectures
> GLIBC_2.3.4 symver will not appear at all.
> Given:
> ./sysdeps/mach/hurd/shlib-versions:DEFAULTGLIBC_2.2.6
> ./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT 
> GLIBC_2.29
> ./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT   
> GLIBC_2.18
> ./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT
> GLIBC_2.32
> ./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT  
> GLIBC_2.4
> ./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT  
> GLIBC_2.4
> ./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT 
> GLIBC_2.2
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT
> GLIBC_2.27
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT
> GLIBC_2.27
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT
> GLIBC_2.33
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT
> GLIBC_2.33
> ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT 
> Earliest
> symbol set
> ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT   
> GLIBC_2.16
> ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT  
> Earliest
> symbol set
> ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT
> GLIBC_2.2.5
> ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT
> GLIBC_2.17
> ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT
> GLIBC_2.3
> ./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT
> GLIBC_2.21
> ./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT  
> GLIBC_2.17
> and the limited list of arches supported by libsanitizer, I'd say at least
> riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are
> affected.

Thank you for this. You are right, my patch is not correct. So for the archs we
care about we should do:

x86_64 - require GLIBC_2.3.4
ppc64 - require GLIBC_2.3.4

aarch64, ppc64le, x32 and riscv* are newer than 2.3.4, so a default
non-versioned symbol should be fine.

Am I right?

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #8 from Richard Biener  ---
(In reply to Richard Biener from comment #6)
> Btw, -fgcse-sm is nowhere enabled by default (same applies to -fgcse-las),
> we should consider removing these optimizations (though -fgcse-las at least
> sounds
> useful and I wonder why it is not enabled).  GCSE store-motion should be
> re-implemented on GIMPLE, replacing the sink pass (there were previous
> attempts in implementing SSU-PRE).
> 
> A comment in store-motion.c claims
> 
> /* This pass implements downward store motion.
>As of May 1, 2009, the pass is not enabled by default on any target,
>but bootstrap completes on ia64 and x86_64 with the pass enabled.  */
> 
> I'm trying if enabling it by default still bootstraps & tests OK on x86-64
> (also enabling gcse-las at the same time..)

It does.  Extra FAILs are

FAIL: c-c++-common/guality/Og-dce-2.c  -Og  line 17 ptr->a == 1
FAIL: c-c++-common/guality/Og-dce-2.c  -Og -flto line 17 ptr->a == 1

[Bug debug/98656] [9/10 Regression] switchlower_O0 drops line number of switch

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656

Martin Liška  changed:

   What|Removed |Added

  Known to work||11.0
Summary|[9/10/11 Regression]|[9/10 Regression]
   |switchlower_O0 drops line   |switchlower_O0 drops line
   |number of switch|number of switch

--- Comment #6 from Martin Liška  ---
Fixed on master so far.

[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Martin Liska :

https://gcc.gnu.org/g:4ede02a5f2af1205434f0e05aaaeff762b24e329

commit r11-7122-g4ede02a5f2af1205434f0e05aaaeff762b24e329
Author: Tom de Vries 
Date:   Fri Feb 5 10:36:38 2021 +0100

debug: fix switch lowering debug info

gcc/ChangeLog:

PR debug/98656
* tree-switch-conversion.c (jump_table_cluster::emit): Add loc
argument.
(bit_test_cluster::emit): Reuse location_t for newly created
gswitch statement.
(switch_decision_tree::try_switch_expansion): Preserve
location_t.
* tree-switch-conversion.h: Change function signatures.

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Started with r11-4122-g06729598b0dc10dbe60545f21c2214ad66a5a3db

[Bug lto/98971] LTO removes __patchable_function_entries

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971

--- Comment #2 from Martin Liška  ---
Created attachment 50133
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50133&action=edit
Tentative patch

As seen the flag -fpatchable-function-entry is properly marked as Optimization.
However, it's the argument is parsed early and stored into the following tuple:

; How many NOP insns to place at each function entry by default
Variable
HOST_WIDE_INT function_entry_patch_area_size

; And how far the real asm entry point is into this area
Variable
HOST_WIDE_INT function_entry_patch_area_start

That does not work with set_current_function where per-function arguments are
restored. My tentative patch fixes that.

The following examples works now:

$ cat pr98971.c
int
testa7(void)
{
  return 7;
}

int
__attribute__((patchable_function_entry(10,5)))
testa77(void)
{
  return 77;
}

#pragma GCC optimize("patchable-function-entry=0,0")

int
testa_no(void)
{
  return 1234;
}

$ cat pr98971-2.c
int
testa8(void)
{
  return 8;
}

$ gcc pr98971.c -fPIC -fpatchable-function-entry=4,1 -flto -c
$ gcc pr98971-2.c -fPIC -flto -c
$ gcc pr98971.o pr98971-2.o -flto -shared -o x.so
$ objdump -d x.so
...
0650 :
 650:   f3 0f 1e fa endbr64 
 654:   e9 77 ff ff ff  jmp5d0 
 659:   90  nop

065a :
 65a:   90  nop
 65b:   90  nop
 65c:   90  nop
 65d:   55  push   %rbp
 65e:   48 89 e5mov%rsp,%rbp
 661:   b8 07 00 00 00  mov$0x7,%eax
 666:   5d  pop%rbp
 667:   c3  ret
 668:   90  nop
 669:   90  nop
 66a:   90  nop
 66b:   90  nop
 66c:   90  nop

066d :
 66d:   90  nop
 66e:   90  nop
 66f:   90  nop
 670:   90  nop
 671:   90  nop
 672:   55  push   %rbp
 673:   48 89 e5mov%rsp,%rbp
 676:   b8 4d 00 00 00  mov$0x4d,%eax
 67b:   5d  pop%rbp
 67c:   c3  ret

067d :
 67d:   55  push   %rbp
 67e:   48 89 e5mov%rsp,%rbp
 681:   b8 d2 04 00 00  mov$0x4d2,%eax
 686:   5d  pop%rbp
 687:   c3  ret

0688 :
 688:   55  push   %rbp
 689:   48 89 e5mov%rsp,%rbp
 68c:   b8 08 00 00 00  mov$0x8,%eax
 691:   5d  pop%rbp
 692:   c3  ret

@Gabriel: Is it intended behavior?

[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

--- Comment #3 from Richard Biener  ---
(In reply to avieira from comment #1)
> The testcase above issues a warning, around do j=jts,enddo
> 
> To use it as a testcase in my patch I'd like to get rid of it so if someone
> proficient in Fortran knows a way to get rid of it that'd be great!

The following still reproduces the issue for me and is more valid.

module module_foobar
  integer,parameter :: fp_kind = selected_real_kind(15)
contains
 subroutine foobar( foo, ix ,jx ,kx,iy,ky)
   real, dimension( ix, kx, jx )  :: foo
   real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz
   do k=1,ky
  do i=1,iy
if ( baz(i,k,1) > 0. ) then
  bar(i,k,1) = 0
endif
foo(i,nk,j) = baz0 *  bar(i,k,1)
  enddo
   enddo
 end
end

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

Tamar Christina  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
Summary|IRA artificially creating   |[11 Regression] Bad
   |spills due to BB|interaction between IPA
   |frequencies |frequences and IRA
   ||resulting in spills due to
   ||changes in BB frequencies

--- Comment #3 from Tamar Christina  ---
Hi,

Since we are in stage-4 I'd like to put all our ducks in a row and see what the
options are at this point.

IRA as you can imagine is huge and quite complex, the more I investigate the
problem the more I realize that
there isn't a spot fix for this issue.  It will require a lot more work in IRA
and understanding parts of it
that I don't fully understand yet.

But one thing is clear, there is a severe interaction between IPA predictions
and IRA under conditions where
there is high register pressure *and* a function call.

The problem is that the changes introduced in
g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 make local changes.
i.e. they effect only some BB and not others.  The problem is any spot fix in
IRA would be a globally scoped.

I was investigating whether the issue could be solved by having IRA treat the
recursive inlined function in
exchange2 as one region instead of going live range splitting. And yes using
-fira-region=one does make a
difference, but only a small difference of about 33% of the regression. However
doing this has some disadvantage
in that regions that before would not count in the live range of the call are
now counted, so you regress
spilling in those cases.  This is why this flag can only recover 33% of the
regression, it introduces some of it's
own.

The second alternative I tried as a spot fix is to be able to specify a weight
for the CALL_FREQ for use during
situations of high reg pressure and call live ranges.  The "hack" looks like
this:

index 4fe019b2367..674e6ca7a48 100644
--- a/gcc/caller-save.c
+++ b/gcc/caller-save.c
@@ -425,6 +425,7 @@ setup_save_areas (void)
  || find_reg_note (insn, REG_NORETURN, NULL))
continue;
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
+  freq = freq * (param_ira_call_freq_weight / 100.f);
   REG_SET_TO_HARD_REG_SET (hard_regs_to_save,
   &chain->live_throughout);
   used_regs = insn_callee_abi (insn).full_reg_clobbers ();
diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c
index 4ba29dcadf4..6e2699e5a7d 100644
--- a/gcc/ira-lives.c
+++ b/gcc/ira-lives.c
@@ -1392,7 +1392,7 @@ process_bb_node_lives (ira_loop_tree_node_t
loop_tree_node)
   it was saved on previous call in the same basic
   block and the hard register was not mentioned
   between the two calls.  */
-   ALLOCNO_CALL_FREQ (a) += freq / 3;
+   ALLOCNO_CALL_FREQ (a) += (freq *
(param_ira_call_freq_weight / 100.0f));
diff --git a/gcc/params.opt b/gcc/params.opt
index cfed980a4d2..39d5cae9f31 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -321,6 +321,11 @@ Max size of conflict table in MB.
 Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param
Optimization
 Max loops number for regional RA.

+-param=ira-call-freq-weight=
+Common Joined UInteger Var(param_ira_call_freq_weight) Init(100) Param
Optimization
+Scale to be applied to the weighting of the frequencies of allocations live
across
+a call.
+
 -param=iv-always-prune-cand-set-bound=
 Common Joined UInteger Var(param_iv_always_prune_cand_set_bound) Init(10)
Param Optimization
 If number of candidates in the set is smaller, we always try to remove unused
ivs during its optimization.

And if we look at the changes in the frequency between the good and bad case
the prediction changes approx 40%.
So using the value of --param ira-call-freq-weight=40 recovers about 60% of the
regression.  The issue this global
change introduce is however that IRA seems to start preferring callee-saves.
Which is in itself not an issue, but
at the boundary of a region it will then emit moves from temp to callee-saves
to carry live values to the next region.

This is completely unneeded, enabling the late register renaming pass
(-frename-registers) removes these superfluous moves
and we recover 66% of the regression. But this is just a big hack.  The obvious
disadvantage here, since again it's a global
change is that it pushes all caller saves to be spilled before the function
call.  And indeed, before the recursive call
there now is a massive amount of spilling happening.

But it is something that would be "safe" to do at this point in the GCC
development cycle.

The last and preferred approach, if you a

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #6 from Richard Biener  ---
Btw, -fgcse-sm is nowhere enabled by default (same applies to -fgcse-las), we
should consider removing these optimizations (though -fgcse-las at least sounds
useful and I wonder why it is not enabled).  GCSE store-motion should be
re-implemented on GIMPLE, replacing the sink pass (there were previous
attempts in implementing SSU-PRE).

A comment in store-motion.c claims

/* This pass implements downward store motion.
   As of May 1, 2009, the pass is not enabled by default on any target,
   but bootstrap completes on ia64 and x86_64 with the pass enabled.  */

I'm trying if enabling it by default still bootstraps & tests OK on x86-64
(also enabling gcse-las at the same time..)

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
For arithmetic >> (element_precision - 1) one can just use
{,v}pxor + {,v}pcmpgtq, as in instead of return vec >> 63; do return vec < 0;
(in C++-ish way), aka VEC_COND_EXPR vec < 0, { all ones }, { 0 }
For other arithmetic shifts by scalar constant, perhaps one can replace
return vec >> 17; with return (vectype) ((uvectype) vec >> 17) | ((vec < 0) <<
(64 - 17));
- it will actually work even for non-constant scalar shift amounts because
{,v}psllq treats shift counts > 63 as 0.

[Bug tree-optimization/98932] Wrong output with -O3 on aarch64

2021-02-05 Thread kristian.klausen at scoutdi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932

--- Comment #7 from Kristian  ---
Thanks!

Yes, I agree. We are however bound to earlier versions due to CUDA-dependency
on the NVIDIA Jetson-platforms:
https://github.com/OE4T/meta-tegra/wiki/Compatibility-notes

Hopefully NVIDIA will update their CUDA-libraries in due time. 

Best,
Kristian

[Bug fortran/98890] ICE on reference to module function

2021-02-05 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98890

Tobias Burnus  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org
   Keywords||ice-on-invalid-code

--- Comment #3 from Tobias Burnus  ---
Likewise for the following, which uses an assignment:

  implicit none
contains
  real function bar(x)
real :: x(2,2)
bar = bar  ! OK
bar = baz  ! ERROR: function name not reference
bar = get_funptr() ! ERROR: proc-pointer returning function

bar = bar * x(1,1) ! OK
bar = baz * x(1,1) ! error - as above but as operator
bar = get_funptr() * x(1,1) ! likewise
  end function bar  
  function get_funptr() result(ptr)
procedure(bar), pointer :: ptr
ptr => bar
  end
  real function baz(x) result(bazr)
real :: x(2,2)
bazr=x(1,1)
  end function baz  
end module foo

  * * *

I am not sure whether the problem is that expr_type == EXPR_VARIABLE instead of
expr_type == EXPR_FUNCTION
or whether the proper fix should be inside both resolve_ordinary_assign() and
resolve_operator() a check like:

  symbol_attribute rhs_attr = gfc_expr_attr (rhs);
  if (rhs_attr.function && ...)
{
  gfc_error ("Unexpected function name at %L", &rhs->where);
  return false;
}
  if (rhs_attr.proc_pointer)
{
  gfc_error ("Unexpected procedure pointer at %L", &rhs->where);
  return false;
}

where "..." detects that the rhs may be used as result name in this context.

This check always confuses me. And a quick try failed:

I tried rhs_attr – but it is identical for 'bar' and baz'; and also
'sym->result = sym' is the same (if changing 'baz' to use no result variable).
I also thought about the namespace but thanks to BLOCK and contained procedures
(which may access their parent's result variable) it is not that simple.

 * * *

I have not checked but, e.g., for 'call foo(baz)' a similar issue may pop up. I
think not occurring, but to check: proc_pointer_comp (should be resolved
already at parse time?) and derived-type procedures returning proc pointers
(same check as for other functions).

[Bug tree-optimization/98932] Wrong output with -O3 on aarch64

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932

--- Comment #6 from Martin Liška  ---
(In reply to Kristian from comment #5)
> Thanks for such a swift reponse! Looking forward to testing the patches for
> 8.x.

You're welcome. Just a note, please try to use the latest GCC release (version
10). GCC 8 will goes out of support quite soon and you will more likely receive
backports for serious bugs.

[Bug tree-optimization/98932] Wrong output with -O3 on aarch64

2021-02-05 Thread kristian.klausen at scoutdi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932

--- Comment #5 from Kristian  ---
Thanks for such a swift reponse! Looking forward to testing the patches for
8.x.

[Bug testsuite/98325] [11 regression] gcc.dg/pr25376.c fails after r11-5027

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98325

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jakub Jelinek  ---
Fixed, verified in gcc-testresults archive too.

[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2021-02-05
  Known to fail||11.0
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
Summary|ICE in  |[11 Regression] ICE in
   |vectorizable_condition  |vectorizable_condition
   |after STMT_VINFO_VEC_STMTS  |after STMT_VINFO_VEC_STMTS
   Priority|P3  |P1
   Target Milestone|--- |11.0
 Ever confirmed|0   |1
 Target||aarch64
  Known to work||10.2.1

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. This affects building 521.wrf_r from SPEC2017 with LTO

[Bug tree-optimization/98949] gcc-9.3 aarch64 -ftree-vectorize generates wrong code

2021-02-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98949

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #4 from ktkachov at gcc dot gnu.org ---
I can confirm that the commit
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82
from PR97236 fixes the abort here.

[Bug middle-end/98974] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

--- Comment #1 from avieira at gcc dot gnu.org ---
The testcase above issues a warning, around do j=jts,enddo

To use it as a testcase in my patch I'd like to get rid of it so if someone
proficient in Fortran knows a way to get rid of it that'd be great!

[Bug middle-end/98974] New: ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974

Bug ID: 98974
   Summary: ICE in vectorizable_condition after
STMT_VINFO_VEC_STMTS
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

Hi,

After
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b05d5563f4be13b4a0d0951375a82adf483973c0
we found vectorizable_condition to ICE when autovectorizing for SVE.

The reduced fortran testcase is an example of this:
$ cat foo.F90
 module module_foobar
  integer,parameter :: fp_kind = selected_real_kind(15)
   contains
   subroutine foobar( foo, ix ,jx ,kx,iy,ky)
 real, dimension( ix, kx, jx )  :: foo
 real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz
   j_loop: do j=jts,enddo
   do k=0,ky
  do i=0,iy
if ( baz(i,k,1) > 0. ) then
  bar(i,k,1) = 0
endif
foo(i,nk,j) = baz0 *  bar(i,k,1)
  enddo
   enddo
   enddo j_loop
 end
end

And the following command will cause it to ICE:
$ gfortran  -Ofast -mcpu=neoverse-v1 foo.F90 -S

I have debugged this and I believe the issue is that before Richi's change
vectorizable_condition used to set vec_oprnds0 to vec_cond_lhs for each copy.
Now it is collected for all copies at the same time. However, when calling
vect_get_loop_mask we pass vec_num * ncopies as the nvectors parameter, where
vec_num has been set to the length of vec_oprnds0. I believe that because we
are now doing all ncopies at the same time we no longer need to multiply it by
ncopies.

I'll be posting a patch for this soon.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-02-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #8 from Richard Biener  ---
exploring more options I noticed there's no arithmetic vector V2DI right shift,
so vectorizing

  uint64_t carry = (uint64_t)(((int64_t)W[1]) >> 63) & (uint64_t)135;
  W[1] = (W[1] << 1) ^ ((uint64_t)(((int64_t)W[0]) >> 63) & (uint64_t)1);
  W[0] = (W[0] << 1) ^ carry;

didn't work out.  But V2DI >> CST with CST > 31 can be implemented with
VPSRAD and then doing PMOVSXDQ after shuffling the high shifted part into
low position.

Maybe there's sth more clever for the special case of >> 63 even.

As said, just trying if "optimal" vectorization of the kernel would solve
the issue.  But I guess pipelines are wide enough so the original scalar
code effectively executes "vectorized".

[Bug middle-end/98465] [11 Regression] Bogus -Wstringop-overread with -std=gnu++20 -O2 and std::string::insert

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98465

--- Comment #28 from Jakub Jelinek  ---
Actually tested version.

The above testcase with [2, INF] range doesn't make really much sense,
but adjusted testcase where n has [0, 2] range doesn't warn anymore like the
one with constant 2.

diff --git a/libstdc++-v3/include/bits/c++config
b/libstdc++-v3/include/bits/c++config
index b57ff339990..69336a32bc6 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -731,6 +731,10 @@ namespace std
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #endif

+#if _GLIBCXX_HAS_BUILTIN(__builtin_object_size)
+# define _GLIBCXX_HAVE_BUILTIN_OBJECT_SIZE 1
+#endif
+
 #undef _GLIBCXX_HAS_BUILTIN

 #if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED && __cplusplus >= 201402L
diff --git a/libstdc++-v3/include/bits/basic_string.tcc
b/libstdc++-v3/include/bits/basic_string.tcc
index 5beda8b829b..bc6e0b98186 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -477,7 +477,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  if (__s + __len2 <= __p + __len1)
this->_S_move(__p, __s, __len2);
+#if defined(_GLIBCXX_HAVE_BUILTIN_OBJECT_SIZE) && defined(__OPTIMIZE__)
+ /* Help the optimizers rule out impossible cases and
+get rid of false positive warnings at the same time.
+If we know the maximum size of the __s object and
+it is shorter than 2 * __len2 - __len1, then
+__s >= __p + __len1 case is impossible.  */
+ else if (!(__builtin_constant_p(__builtin_object_size(__s, 0)
+ < ((2 * __len2 - __len1)
+* sizeof(_CharT)))
+&& (__builtin_object_size(__s, 0)
+< (2 * __len2 - __len1) * sizeof(_CharT)))
+  && __s >= __p + __len1)
+#else
  else if (__s >= __p + __len1)
+#endif
this->_S_copy(__p, __s + __len2 - __len1, __len2);
  else
{

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #5 from Andreas Krebbel  ---
Created attachment 50132
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50132&action=edit
RTL dump from store motion pass

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #4 from Andreas Krebbel  ---
The update of global variable c is moved out of the loop. Due to that c stays
at 8 although it should be counted down to 2.

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #3 from Andreas Krebbel  ---
Created attachment 50131
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50131&action=edit
RTL GCSE dump without -fgcse-sm

[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass

2021-02-05 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973

--- Comment #2 from Andreas Krebbel  ---
Created attachment 50130
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50130&action=edit
RTL GCSE dump with -fgcse-sm

[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch

2021-02-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Martin Liška  ---
I'm going to test the patch and install it.

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit right shift

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jakub Jelinek  ---
Fixed.

[Bug c++/98967] warning to spot recursive include graph

2021-02-05 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98967

Eric Gallager  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=96842
 CC||egallager at gcc dot gnu.org

--- Comment #1 from Eric Gallager  ---
Fixing bug 96842 would also help here (not exactly the same thing, but serves a
similar purpose)

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit right shift

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:37876976b0511ec96741f638f160874f2added0e

commit r11-7121-g37876976b0511ec96741f638f160874f2added0e
Author: Jakub Jelinek 
Date:   Fri Feb 5 10:39:03 2021 +0100

i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957]

As written in the PR, TARGET_QIMODE_MATH was meant to be set for all
tunings and it was the case for GCC <= 7, but as the number of
PROCESSOR_* enumerators grew, some AMD tunings (which are at the end
of the list) over time got enumerators with values >= 32 and
TARGET_QIMODE_MATH became disabled for them, in GCC 8 for 2
tunings, in GCC 9 for 7 tunings, in GCC 10 for 8 tunings, and
on the trunk for 11 tunings.

The following patch fixes it by using uhwis rather than uints
and gives them also symbolic names.

2021-02-05  Jakub Jelinek  

PR target/98957
* config/i386/i386-options.c (m_NONE, m_ALL): Define.
* config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS,
X86_TUNE_PROMOTE_QI_REGS): Use m_NONE instead of 0U.
(X86_TUNE_QIMODE_MATH): Use m_ALL instead of ~0U.

[Bug c/60759] improve -Wlogical-op

2021-02-05 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60759

--- Comment #7 from Vincent Lefèvre  ---
(In reply to Manuel López-Ibáñez from comment #6)
> I believe this is on purpose to avoid too much noise.  The warning in GCC
> needs to be smarter about types and macros and avoid early folding.

Well, for the case constant-logical-operand, the warning on X || Y should be on
"true" constants X and Y (which is stricter than what __builtin_constant_p
regards as constants). I don't think that there would be much noise in this
case, or this could be a separate macro like clang's
-Wconstant-logical-operand, thus which can easily be enabled/disabled.

[Bug c++/97878] [8/9/10 Regression] ICE in cxx_eval_outermost_constant_expr, at cp/constexpr.c:6825

2021-02-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97878

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[8/9/10/11 Regression] ICE  |[8/9/10 Regression] ICE in
   |in  |cxx_eval_outermost_constant
   |cxx_eval_outermost_constant |_expr, at
   |_expr, at   |cp/constexpr.c:6825
   |cp/constexpr.c:6825 |

--- Comment #6 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug c++/97878] [8/9/10/11 Regression] ICE in cxx_eval_outermost_constant_expr, at cp/constexpr.c:6825

2021-02-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97878

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b229baa75ce4627d1bd38f2d3dcd91af1a7071db

commit r11-7120-gb229baa75ce4627d1bd38f2d3dcd91af1a7071db
Author: Jakub Jelinek 
Date:   Fri Feb 5 10:22:07 2021 +0100

c++: Fix ICE with structured binding initialized to incomplete array
[PR97878]

We ICE on the following testcase, for incomplete array a on auto [b] { a };
without
giving any kind of diagnostics, with auto [c] = a; during error-recovery.
The problem is that we get too far through check_initializer and e.g.
store_init_value -> constexpr stuff can't deal with incomplete array types.

As the type of the structured binding artificial variable is always
deduced,
I think it is easiest to diagnose this early, even if they have array types
we'll need their deduced type to be complete rather than just its element
type.

2021-02-05  Jakub Jelinek  

PR c++/97878
* decl.c (check_array_initializer): For structured bindings,
require
the array type to be complete.

* g++.dg/cpp1z/decomp54.C: New test.

  1   2   >