Re: Fix testsuite/g++.dg/cpp1y/constexpr-66093.C execution failure...

2021-01-04 Thread Alexandre Oliva
On Jan  4, 2021, Mike Stump  wrote:

> Oh, It's possible the comments were stripped from the bug report or
> initial email.  Never mind then about the comment part.

Thanks, I'm installing the initial patch.  Please let me know in case I
mistook the above as approval thereof.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


robustify vxworks glimits.h overriding

2021-01-04 Thread Alexandre Oliva


The glimits.h overriding used in gcc/config/t-vxworks was fragile: the
intermediate file would already be there in a rebuild, and so the
adjustments would not be made, so the generated limits.h would miss
them, causing limits-width-[12] tests to fail on that target.

While changing it, I also replaced the modern $(cmd) shell syntax with
the more portable `cmd` construct.

Regstrapped on x86_64-linux-gnu, also tested on x-arm-wrs-vxworks7r2.
I'm checking it in unless there are objections in the next 24 hours.


for  gcc/ChangeLog

* Makefile.in (T_GLIMITS_H): New.
(stmp-int-hdrs): Depend on it, use it.
* config/t-vxworks (T_GLIMITS_H): Override it.
(vxw-glimits.h): New.
---
 gcc/Makefile.in  |9 ++---
 gcc/config/t-vxworks |   33 ++---
 2 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6847022f1082e..310b556c3fe38 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -449,6 +449,9 @@ USER_H = $(srcdir)/ginclude/float.h \
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
 USER_H_INC_NEXT_POST = @user_headers_inc_next_post@
 
+# Enable target overriding of this fragment, as in config/t-vxworks.
+T_GLIMITS_H = $(srcdir)/glimits.h
+
 # The GCC to use for compiling crt*.o.
 # Usually the one we just built.
 # Don't use this as a dependency--use $(GCC_PASSES).
@@ -3075,7 +3078,7 @@ gcov-tool$(exeext): $(GCOV_TOOL_OBJS) $(LIBDEPS)
 # be rebuilt.
 
 # Build the include directories.
-stmp-int-hdrs: $(STMP_FIXINC) $(USER_H) fixinc_list
+stmp-int-hdrs: $(STMP_FIXINC) $(T_GLIMITS_H) $(USER_H) fixinc_list
 # Copy in the headers provided with gcc.
 #
 # The sed command gets just the last file name component;
@@ -3129,9 +3132,9 @@ stmp-int-hdrs: $(STMP_FIXINC) $(USER_H) fixinc_list
  multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
  fix_dir=include-fixed$${multi_dir}; \
  if $(LIMITS_H_TEST) ; then \
-   cat $(srcdir)/limitx.h $(srcdir)/glimits.h $(srcdir)/limity.h > 
tmp-xlimits.h; \
+   cat $(srcdir)/limitx.h $(T_GLIMITS_H) $(srcdir)/limity.h > 
tmp-xlimits.h; \
  else \
-   cat $(srcdir)/glimits.h > tmp-xlimits.h; \
+   cat $(T_GLIMITS_H) > tmp-xlimits.h; \
  fi; \
  $(mkinstalldirs) $${fix_dir}; \
  chmod a+rx $${fix_dir} || true; \
diff --git a/gcc/config/t-vxworks b/gcc/config/t-vxworks
index 221f53ce3112a..e60fd31bf0943 100644
--- a/gcc/config/t-vxworks
+++ b/gcc/config/t-vxworks
@@ -43,29 +43,16 @@ $(INSTALL_HEADERS_DIR): install-stdint.h
 LIMITS_H_TEST = true
 STMP_FIXINC = stmp-fixinc
 
-# VxWorks system environments have been GCC based for a long time and we need
-# to make sure that our files and the system ones use distinct macro names to
-# protect against recursive inclusions.  We achieve this by temporarily
-# substituting the headers used by stmp-int-headers with alternative versions
-# where we add some version indication in the inclusion-protection macro
+# VxWorks system environments have been GCC based for a long time and
+# we need to make sure that our files and the system ones use distinct
+# macro names to protect against recursive inclusions.  We achieve
+# this by modifying the GLIMITS_H fragment that goes into limits.h
+# with some version indication in the inclusion-protection macro
 # names.
 
-# Before the standard stmp-int-headers operations take place, arrange to
-# copy the current version of the relevant header files locally, generate
-# the alternate version and replace the original version with ours:
+T_GLIMITS_H = vxw-glimits.h
 
-stmp-int-hdrs: subst-glimits.h
-
-subst-%.h:
-   cp -p $(srcdir)/$*.h orig-$*.h
-   ID=$$(echo $(BASEVER_c) | sed -e 's/\./_/g'); \
-   sed -e "s/_LIMITS_H__/_LIMITS_H__$${ID}_/" < $(srcdir)/$*.h > $@
-   cp $@ $(srcdir)/$*.h
-
-# Then arrange to restore the original versions after the standard
-# operations have taken place:
-
-INSTALL_HEADERS += restore-glimits.h
-
-restore-glimits.h: stmp-int-hdrs
-   cp -p orig-glimits.h $(srcdir)/glimits.h
+vxw-glimits.h: $(srcdir)/glimits.h
+   ID=`echo $(BASEVER_c) | sed -e 's/\./_/g'` && \
+   sed -e "s/_LIMITS_H__/_LIMITS_H__$${ID}_/" < $(srcdir)/glimits.h > $@T
+   mv $@T $@

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


use -mfpu=neon for arm/simd/vmmla_1.c

2021-01-04 Thread Alexandre Oliva


On some of our arm targets, we get various -mfpu flags implicitly or
explicitly passed to the compiler during test runs.  The target
options pushed in arm_neon.h that affect vmmlaq_s32 set isa_bit_neon,
but the caller doesn't have that bit set, so arm_can_inline_p rejects
the attempt to inline it, and the test fails.

An explicit -mfpu=neon would address the compile problem, but cause
the assembler to reject the generated code.

So this patch adds -mfpu=auto to the test, overriding any implicit
flags with the fpu implied by the arch.

Regstrapped on x86_64-linux-gnu, also tested on x-arm-wrs-vxworks7r2.
Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/arm/simd/vmmla_1.c: Pass -mfpu=auto.
---
 gcc/testsuite/gcc.target/arm/simd/vmmla_1.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c 
b/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
index aeb4a359e3351..d33ebf361d436 100644
--- a/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/vmmla_1.c
@@ -1,6 +1,6 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
-/* { dg-options "-save-temps -O2 -march=armv8.2-a+i8mm -mfloat-abi=hard" } */
+/* { dg-options "-save-temps -O2 -march=armv8.2-a+i8mm -mfpu=auto 
-mfloat-abi=hard" } */
 
 #include "arm_neon.h"
 

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


add alignment to enable store merging in strict-alignment targets

2021-01-04 Thread Alexandre Oliva


In g++.dg/opt/store-merging-2.C, the natural alignment of types T and
S is a single byte, so we shouldn't expect store merging on
strict-alignment platforms.  Indeed, without something like the
adjust-alignment pass to bump up the alignment of the automatic
variable, as in GCC 10, the optimization does not occur.

This patch adjusts the test so that the required alignment is
expressly stated, and so we don't rely on its accidentally being there
to get the desired optimization.

Regstrapped on x86_64-linux-gnu, also tested on x-arm-wrs-vxworks7r2.
Ok to install?


for  gcc/testsuite/ChangeLog

* g++.dg/opt/store-merging-2.C: Add the required alignment.
---
 gcc/testsuite/g++.dg/opt/store-merging-2.C |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/opt/store-merging-2.C 
b/gcc/testsuite/g++.dg/opt/store-merging-2.C
index 3c17033764b75..b1ad7018a1789 100644
--- a/gcc/testsuite/g++.dg/opt/store-merging-2.C
+++ b/gcc/testsuite/g++.dg/opt/store-merging-2.C
@@ -4,7 +4,9 @@
 // { dg-options "-O2 -flifetime-dse=2 -fdump-tree-store-merging-details" }
 // { dg-final { scan-tree-dump "New sequence of 2 stores to replace old one of 
3 stores" "store-merging" } }
 
-struct T { char a[128]; };
+/* The alignment is necessary for store-merging to take place on
+   strict-alignment targets.  */
+struct __attribute__ ((__aligned__ (4))) T { char a[128]; };
 struct S { S () : a () { a.a[12] = 0; a.a[13] = 1; a.a[14] = 0; a.a[15] = 6; } 
T a; };
 void foo (S &);
 void bar (void) { S s; foo (s); }

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


calibrate intervals to avoid zero in futures poll test

2021-01-04 Thread Alexandre Oliva


We get occasional failures of 30_threads/future/members/poll.cc
on some platforms whose high resolution clock doesn't have such a high
resolution; wait_for_0 ends up as 0, and then some asserts fail as
intervals measured as longer than zero are tested for less than
several times zero.

This patch adds some calibration in the iteration count to set a
measurable base time interval with some additional margin.

Regstrapped on x86_64-linux-gnu, and also tested on
x-arm-wrs-vxworks7r2.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/30_threads/future/members/poll.cc: Calibrate
iteration count.
---
 .../testsuite/30_threads/future/members/poll.cc|   33 +++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc 
b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
index fff9bea899c90..7b41411a54386 100644
--- a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
@@ -25,7 +25,7 @@
 #include 
 #include 
 
-const int iterations = 200;
+int iterations = 200;
 
 using namespace std;
 
@@ -45,10 +45,41 @@ int main()
   promise p;
   future f = p.get_future();
 
+ start_over:
   auto start = chrono::high_resolution_clock::now();
   for(int i = 0; i < iterations; i++)
 f.wait_for(chrono::seconds(0));
   auto stop = chrono::high_resolution_clock::now();
+
+  /* We've run too few iterations for the clock resolution.
+ Attempt to calibrate it.  */
+  if (start == stop)
+{
+  /* Loop until the clock advances, so that start is right after a
+time increment.  */
+  do
+   start = chrono::high_resolution_clock::now();
+  while (start == stop);
+  int i = 0;
+  /* Now until the clock advances again, so that stop is right
+after another time increment.  */
+  do
+   {
+ f.wait_for(chrono::seconds(0));
+ stop = chrono::high_resolution_clock::now();
+ i++;
+   }
+  while (start == stop);
+  /* Got for some 10 cycles, but we're already past that and still
+get into the calibration loop, double the iteration count and
+try again.  */
+  if (iterations < i * 10)
+   iterations = i * 10;
+  else
+   iterations *= 2;
+  goto start_over;
+}
+
   double wait_for_0 = print("wait_for(0s)", stop - start);
 
   start = chrono::high_resolution_clock::now();


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak  wrote:
> >
> > +(define_split
> > +  [(set (match_operand:SI 0 "register_operand")
> > +(zero_extend:SI
> > +  (not:HI
> > +(subreg:HI
> > +  (unspec:SI
> > +[(match_operand:V16QI 1 "register_operand")]
> > +UNSPEC_MOVMSK) 0]
> > +  "TARGET_SSE2"
> > +  [(set (match_dup 2)
> > +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > +   (set (match_dup 0)
> > +(match_dup 3))]
>
> Just write:
>
> (set (match_dup 0)
> (xor:SI (match_dup 2)(const_int 65535))

BTW: This could be a universal combine splitter to simplify

unsigned int foo (unsigned short z)
{
return (unsigned short)~z;
}

Trying 7 -> 8:
   7: r87:HI=~r88:SI#0
 REG_DEAD r88:SI
   8: r86:SI=zero_extend(r87:HI)
 REG_DEAD r87:HI
Failed to match this instruction:
(set (reg:SI 86)
   (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0

But combine does not "split" to one insns.

Uros.


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 5, 2021 at 7:30 AM Hongtao Liu  wrote:
>
> On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu  wrote:
> >
> > On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches 
> > > wrote:
> > > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +   (zero_extend:SI (subreg:HI (unspec:SI
> > > > + [(match_operand:V16QI 1 "register_operand")]
> > > > +  UNSPEC_MOVMSK) 0)))]
> > >
> > > Also, please fix up formatting.  Should be:
> > > (zero_extend:SI
> > >   (subreg:HI
> > > (unspec:SI
> > >   [(match_operand:V16QI 1 "register_operand")]
> > >   UNSPEC_MOVMSK) 0)))]
> > > I think.
> > >
> > > Jakub
> > >
> >
> > Yes, thanks for the review both, and happy new year!
> >
> > --
> > BR,
> > Hongtao
>
> Sorry for the bother, this is an incremental patch to split
> (zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI to
>
> pmovmskb%xmm0, %eax
> -   notl%eax
> -   movzwl  %ax, %eax
> +   xorl$65535, %eax
>
>
> The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> The following patch adds define_insn_and_split to optimize
>
>vpmovmskb   %xmm0, %eax
> -   movzwl  %ax, %eax
> notl%eax
>
> and combine splitter to optimize
>
> pmovmskb%xmm0, %eax
> -   notl%eax
> -   movzwl  %ax, %eax
> +   xorl$65535, %eax
>
> gcc/ChangeLog
> PR target/98461
> * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> define_insn_and_split for zero_extend of subreg HI of pmovskb
> result.
> (*sse2_pmovskb_zexthisi): Add new combine splitters for
> zero_extend of not of subreg HI of pmovskb result.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md| 32 +++
>  .../gcc.target/i386/sse2-pr98461-2.c  | 25 +++
>  2 files changed, 57 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4fcff0800c0 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
> (set_attr "prefix" "maybe_vex")
> (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +(zero_extend:SI
> +  (subreg:HI
> +(unspec:SI
> +  [(match_operand:V16QI 1 "register_operand")]
> +  UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +(zero_extend:SI
> +  (not:HI
> +(subreg:HI
> +  (unspec:SI
> +[(match_operand:V16QI 1 "register_operand")]
> +UNSPEC_MOVMSK) 0]
> +  "TARGET_SSE2"
> +  [(set (match_dup 2)
> +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> +   (set (match_dup 0)
> +(match_dup 3))]

Just write:

(set (match_dup 0)
(xor:SI (match_dup 2)(const_int 65535))

Uros.

>
> +{
> +  operands[2] = gen_reg_rtx (SImode);
> +  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
> +  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
> +})
> +
>  (define_split
>[(set (match_operand:SI 0 "register_operand")
>  (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 000..330272c69bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,25 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
> +/* { dg-final { scan-assembler-times "\txorl" 1 } } */
> +
> +#include 
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> +
> +unsigned int movemask_not2(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  res = ~res;
> +  return res;
> +}
> +
> +unsigned int movemask_zero_extend(__m128i logical) {
> +  unsigned int res = _mm_movemask_epi8(logical);
> +  return res & 0x;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao


Re: [PATCH] Restore input_location after recursive expand_call_inline

2021-01-04 Thread Bernd Edlinger



On 1/4/21 10:23 PM, Jeff Law wrote:
> 
> 
> On 1/4/21 1:12 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> I spotted a place where input_location is clobbered accidentally.
>>
>> That is in a recursive call to expand_call_inline.  The input_location
>> is usually restored by goto egress in this function.
>>
>> Additionally the return value of the recursive expand call is thrown
>> away, which does not look like a good idea.
>>
>> Although this causes no problems ATM, I wanted to fix it anyway.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>> 0001-Restore-input_location-after-recursive-expand_call_i.patch
>>
>> From 88b963bba7b32972abf0ea44a01c03d643d7c6ca Mon Sep 17 00:00:00 2001
>> From: Bernd Edlinger 
>> Date: Mon, 4 Jan 2021 11:35:31 +0100
>> Subject: [PATCH] Restore input_location after recursive expand_call_inline
>>
>> This is just a precautionary fix.
>>
>> 2021-01-04  Bernd Edlinger  
>>
>>  * tree-inline.c (expand_call_inline): Restore input_location.
>>  Return result from recursive call.
> I suspect that we're always supposed to inline in this case.  As
> asserting that successfully_inlined is true before jumping to "egress"
> seems wise.
> 
> OK with that change after the usual testing.
> 

No this does not work:

+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++98 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-5.C  -std=gnu++98 compilation failed to produce 
executable
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++14 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-5.C  -std=gnu++14 compilation failed to produce 
executable
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++17 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++17 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-5.C  -std=gnu++17 compilation failed to produce 
executable
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++2a (internal compiler error)
+FAIL: g++.dg/ipa/devirt-5.C  -std=gnu++2a (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-5.C  -std=gnu++2a compilation failed to produce 
executable
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++98 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-c-4.C  -std=gnu++98 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++14 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-c-4.C  -std=gnu++14 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++17 (internal compiler error)
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++17 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-c-4.C  -std=gnu++17 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++2a (internal compiler error)
+FAIL: g++.dg/ipa/devirt-c-4.C  -std=gnu++2a (test for excess errors)
+UNRESOLVED: g++.dg/ipa/devirt-c-4.C  -std=gnu++2a compilation failed to 
produce executable
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++98 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/imm-devirt-2.C  -std=gnu++98 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++14 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/imm-devirt-2.C  -std=gnu++14 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++17 (internal compiler error)
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++17 (test for excess errors)
+UNRESOLVED: g++.dg/ipa/imm-devirt-2.C  -std=gnu++17 compilation failed to 
produce executable
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++2a (internal compiler error)
+FAIL: g++.dg/ipa/imm-devirt-2.C  -std=gnu++2a (test for excess errors)
+UNRESOLVED: g++.dg/ipa/imm-devirt-2.C  -std=gnu++2a compilation failed to 
produce executable
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (test for excess errors)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (test for excess errors)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (internal compiler error)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (test for excess errors)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (internal compiler error)
+FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (test for excess errors)
+FAIL: g++.dg/ipa/pr79776.C  -std=gnu++98 (internal compiler error)
+FAIL: g++.dg/ipa/pr79776.C  -std=gnu++98 (test for excess errors)
+FAIL: g++.dg/ipa/pr79776.C  -std=gnu++14 (internal compiler error)
+FAIL: g++.dg/ipa/pr79776.C  -std=gnu++14 (test for excess errors)
+FAIL: g++.dg/ipa/pr79776.C  -std=gnu++17 (internal compiler err

Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Hongtao Liu via Gcc-patches
On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu  wrote:
>
> On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek  wrote:
> >
> > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +   (zero_extend:SI (subreg:HI (unspec:SI
> > > + [(match_operand:V16QI 1 "register_operand")]
> > > +  UNSPEC_MOVMSK) 0)))]
> >
> > Also, please fix up formatting.  Should be:
> > (zero_extend:SI
> >   (subreg:HI
> > (unspec:SI
> >   [(match_operand:V16QI 1 "register_operand")]
> >   UNSPEC_MOVMSK) 0)))]
> > I think.
> >
> > Jakub
> >
>
> Yes, thanks for the review both, and happy new year!
>
> --
> BR,
> Hongtao

Sorry for the bother, this is an incremental patch to split
(zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI to

pmovmskb%xmm0, %eax
-   notl%eax
-   movzwl  %ax, %eax
+   xorl$65535, %eax


The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

The following patch adds define_insn_and_split to optimize

   vpmovmskb   %xmm0, %eax
-   movzwl  %ax, %eax
notl%eax

and combine splitter to optimize

pmovmskb%xmm0, %eax
-   notl%eax
-   movzwl  %ax, %eax
+   xorl$65535, %eax

gcc/ChangeLog
PR target/98461
* config/i386/sse.md (*sse2_pmovskb_zexthisi): New
define_insn_and_split for zero_extend of subreg HI of pmovskb
result.
(*sse2_pmovskb_zexthisi): Add new combine splitters for
zero_extend of not of subreg HI of pmovskb result.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md| 32 +++
 .../gcc.target/i386/sse2-pr98461-2.c  | 25 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4fcff0800c0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+(zero_extend:SI
+  (subreg:HI
+(unspec:SI
+  [(match_operand:V16QI 1 "register_operand")]
+  UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+(zero_extend:SI
+  (not:HI
+(subreg:HI
+  (unspec:SI
+[(match_operand:V16QI 1 "register_operand")]
+UNSPEC_MOVMSK) 0]
+  "TARGET_SSE2"
+  [(set (match_dup 2)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+(match_dup 3))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
+  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 000..330272c69bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,25 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
+/* { dg-final { scan-assembler-times "\txorl" 1 } } */
+
+#include 
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
+
+unsigned int movemask_not2(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  res = ~res;
+  return res;
+}
+
+unsigned int movemask_zero_extend(__m128i logical) {
+  unsigned int res = _mm_movemask_epi8(logical);
+  return res & 0x;
+}
-- 
2.18.1


-- 
BR,
Hongtao


Re: [PATCH] x86: Cast to unsigned short first for _mm_extract_pi16

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/1/21 6:34 AM, H.J. Lu via Gcc-patches wrote:
> _mm_extract_pi16 is intrinsic for pextrw, which should be zero-extended,
> not sign-extended.
>
> gcc/
>
>   PR target/98495
>   * config/i386/xmmintrin.h (_mm_extract_pi16): Cast to unsigned
>   short first.
I'd tend to prefer masking with 0x  rather than relying on the size
of a particular type being what we need.  But this header is limited to
just x86 and it doesn't look like there's any variance in the size of a
short, across the x86 platforms.

So, OK.
jeff



Re: [PATCH] ira: Skip some pseudos in move_unallocated_pseudos

2021-01-04 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2021/1/5 上午7:13, Jeff Law wrote:
> 
> 
> On 12/22/20 11:40 PM, Kewen.Lin via Gcc-patches wrote:
>> Hi Segher,
>>
>> on 2020/12/22 下午9:55, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> Just a dumb formatting comment:
>>>
>>> On Tue, Dec 22, 2020 at 04:05:39PM +0800, Kewen.Lin wrote:
 This patch is to make move_unallocated_pseudos consistent
 to what we have in function find_moveable_pseudos, where we
 record the original pseudo into pseudo_replaced_reg only if
 validate_change succeeds with newreg.  To ensure every
 unallocated pseudo in move_unallocated_pseudos has expected
 information, it's better to add a check and skip it if it's
 unexpected.  This avoids possible ICEs in future.

 btw, I happened to found this in the bootstrapping for one
 experimental local patch, which is considered as impractical.
 --- a/gcc/ira.c
 +++ b/gcc/ira.c
 @@ -5111,6 +5111,11 @@ move_unallocated_pseudos (void)
{
int idx = i - first_moveable_pseudo;
rtx other_reg = pseudo_replaced_reg[idx];
 +  /* If there is no appropriate pseudo in pseudo_replaced_reg, it
 + means validate_change fails for this new pseudo in function
 + find_moveable_pseudos, then bypass it here.*/
>>> Dot space space.
>> Good catch, thanks!  I forgot to reformat after polishing the comments.
>> Will fix it with other potential comments.
>>
>>> The patch sounds fine to me.  Hard to tell without seeing the patch that
>>> exposed the problem (for onlookers like me who do not know this code
>>> well, anyway ;-) )
>> The patch which made this issue exposed looks like:
>>
>> +; Like *rotl3_insert_3 but work with nonzero_bits rather than
>> +; explicit AND.
>> +(define_insn "*rotl3_insert_8"
>> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>> +(ior:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>> + (match_operand:SI 2 "u6bit_cint_operand" "n"))
>> + (match_operand:GPR 3 "gpc_reg_operand" "0")))]
>> +  "HOST_WIDE_INT_1U << INTVAL (operands[2])
>> +   > nonzero_bits (operands[3], mode)"
>> +{
>> +  if (mode == SImode)
>> +return "rlwimi %0,%1,%h2,0,31-%h2";
>> +  else
>> +return "rldimi %0,%1,%H2,0";
>> +}
>> +  [(set_attr "type" "insert")])
>>
>> Some insn matches this pattern in combine, later ira tries to introduce
>> one new pseudo since it meets the checks in find_moveable_pseudos, but
>> it fails in the call to validate_change since the nonzero_bits is more
>> rough and can't satisfy the pattern condition, leaving the unexpected
>> entry in pseudo_replaced_reg.
> But what doesn't make any sense to me is pseudo_replaced_reg[] is only
> set when validation is successful in find_moveable_pseudos.   So I can't
> see how this patch actually helps the problem you're describing.
> 

Yeah, pseudo_replaced_reg[] is only set when validation is successful,
but we bump the max pseudo number in ira_create_new_reg as below
regardless of whether validation succeeds or not:

  rtx newreg = ira_create_new_reg (def_reg);
  if (validate_change (def_insn, DF_REF_REAL_LOC (def), newreg, 0))

Later in move_unallocated_pseudos, the iterating could cover those
pseudos which were created but not used due to failed validation.

  for (i = first_moveable_pseudo; i < last_moveable_pseudo; i++)
if (reg_renumber[i] < 0)
  {
int idx = i - first_moveable_pseudo;
rtx other_reg = pseudo_replaced_reg[idx];// (1)
rtx_insn *def_insn = DF_REF_INSN (DF_REG_DEF_CHAIN (i));
/* The use must follow all definitions of OTHER_REG, so we can
   insert the new definition immediately after any of them.  */
df_ref other_def = DF_REG_DEF_CHAIN (REGNO (other_reg))

Then we can get the NULL other_reg in (1), also have unexpected df info
which causes ICE.  The patch skips the handlings on those pseudos which
were intended to be used in validatation INSN but failed to.

BR,
Kewen


Go patch committed: Accept -fgo-embedcfg option

2021-01-04 Thread Ian Lance Taylor via Gcc-patches
This patch adds a new -fgo-embedcfg option.  This will be used by the
go command to implement the go:embed directive that is new in the
upcoming Go 1.16 release.  The option doesn't yet do anything, this is
just framework.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* lang.opt (fgo-embedcfg): New option.
* go-c.h (struct go_create_gogo_args): Add embedcfg field.
* go-lang.c (go_embedcfg): New static variable.
(go_langhook_init): Set go_create_gogo_args embedcfg field.
(go_langhook_handle_option): Handle OPT_fgo_embedcfg_.
* gccgo.texi (Invoking gccgo): Document -fgo-embedcfg.
19d240387d930c7c049927f47351ccd6cbde0853
diff --git a/gcc/go/gccgo.texi b/gcc/go/gccgo.texi
index 37adf20d55b..ce6b518bb7b 100644
--- a/gcc/go/gccgo.texi
+++ b/gcc/go/gccgo.texi
@@ -262,6 +262,15 @@ Apply special rules for compiling the runtime package.  
Implicit
 memory allocation is forbidden.  Some additional compiler directives
 are supported.
 
+@item -fgo-embedcfg=@var{file}
+@cindex @option{-fgo-embedcfg}
+Identify a JSON file used to map patterns used with special
+@code{//go:embed} comments to the files named by the patterns.  The
+JSON file should have two components: @code{Patterns} maps each
+pattern to a list of file names, and @code{Files} maps each file name
+to a full path to the file.  This option is intended for use by the
+@command{go} command to implement @code{//go:embed}.
+
 @item -g
 @cindex @option{-g for gccgo}
 This is the standard @command{gcc} option (@pxref{Debugging Options, ,
diff --git a/gcc/go/go-c.h b/gcc/go/go-c.h
index 651a787088a..5930eadf7fd 100644
--- a/gcc/go/go-c.h
+++ b/gcc/go/go-c.h
@@ -41,6 +41,7 @@ struct go_create_gogo_args
   const char* prefix;
   const char* relative_import_path;
   const char* c_header;
+  const char* embedcfg;
   Backend* backend;
   Linemap* linemap;
   bool check_divide_by_zero;
diff --git a/gcc/go/go-lang.c b/gcc/go/go-lang.c
index eba40e471a2..a01db8dbdcd 100644
--- a/gcc/go/go-lang.c
+++ b/gcc/go/go-lang.c
@@ -89,6 +89,7 @@ static const char *go_pkgpath = NULL;
 static const char *go_prefix = NULL;
 static const char *go_relative_import_path = NULL;
 static const char *go_c_header = NULL;
+static const char *go_embedcfg = NULL;
 
 /* Language hooks.  */
 
@@ -112,6 +113,7 @@ go_langhook_init (void)
   args.prefix = go_prefix;
   args.relative_import_path = go_relative_import_path;
   args.c_header = go_c_header;
+  args.embedcfg = go_embedcfg;
   args.check_divide_by_zero = go_check_divide_zero;
   args.check_divide_overflow = go_check_divide_overflow;
   args.compiling_runtime = go_compiling_runtime;
@@ -282,6 +284,10 @@ go_langhook_handle_option (
   go_c_header = arg;
   break;
 
+case OPT_fgo_embedcfg_:
+  go_embedcfg = arg;
+  break;
+
 default:
   /* Just return 1 to indicate that the option is valid.  */
   break;
diff --git a/gcc/go/lang.opt b/gcc/go/lang.opt
index 454a118695e..7d6780eb0cd 100644
--- a/gcc/go/lang.opt
+++ b/gcc/go/lang.opt
@@ -57,6 +57,10 @@ fgo-dump-
 Go Joined RejectNegative
 -fgo-dump-   Dump Go frontend internal information.
 
+fgo-embedcfg=
+Go Joined RejectNegative
+-fgo-embedcfg=   List embedded files via go:embed
+
 fgo-optimize-
 Go Joined
 -fgo-optimize-   Turn on optimization passes in the frontend.


[PATCH] c++: ICE with deferred noexcept when deducing targs [PR82099]

2021-01-04 Thread Marek Polacek via Gcc-patches
In this test we ICE in type_throw_all_p because it got a deferred
noexcept which it shouldn't.  Here's the story:

In noexcept61.C, we call bar, so we perform overload resolution.  When
adding the (only) candidate, we need to deduce template arguments, so
call fn_type_unification as usually.  That deduces U to

  void (*) (int &, int &)

which is correct, but its noexcept-spec is deferred_noexcept.  Then
we call add_function_candidate (bar), wherein we try to create an
implicit conversion sequence for every argument.  Since baz is
of unknown type, we instantiate_type it; it is a TEMPLATE_ID_EXPR
so that calls resolve_address_of_overloaded_function.  But we crash
there, because target_type contains the deferred_noexcept.

So we need to maybe_instantiate_noexcept before we can compare types.
resolve_overloaded_unification seemed like the appropriate spot, now
fn_type_unification produces the function type with its noexcept-spec
instantiated.  This shouldn't go against CWG 1330 because here we
really need to instantiate the noexcept-spec.

This also fixes class-deduction76.C, a dg-ice test I recently added,
therefore this fix also fixes c++/90799, yay.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/82099
* pt.c (resolve_overloaded_unification): Call
maybe_instantiate_noexcept after instantiating the function
decl.

gcc/testsuite/ChangeLog:

PR c++/82099
* g++.dg/cpp1z/class-deduction76.C: Remove dg-ice.
* g++.dg/cpp0x/noexcept61.C: New test.
---
 gcc/cp/pt.c|  3 +++
 gcc/testsuite/g++.dg/cpp0x/noexcept61.C| 17 +
 gcc/testsuite/g++.dg/cpp1z/class-deduction76.C |  1 -
 3 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept61.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 062ef858501..0d061adc2ed 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22373,6 +22373,9 @@ resolve_overloaded_unification (tree tparms,
  --function_depth;
}
 
+ if (flag_noexcept_type)
+   maybe_instantiate_noexcept (fn, tf_none);
+
  elem = TREE_TYPE (fn);
  if (try_one_overload (tparms, targs, tempargs, parm,
elem, strict, sub_strict, addr_p, explain_p)
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept61.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept61.C
new file mode 100644
index 000..653cd7e6680
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept61.C
@@ -0,0 +1,17 @@
+// PR c++/82099
+// { dg-do compile { target c++11 } }
+
+template 
+void bar (T &x, T &y, U u)
+{
+  u (x, y);
+}
+
+template 
+void baz (T &x, T &y) noexcept (noexcept (x == y));
+
+void
+foo (int x, int y)
+{
+  bar (x, y, baz);
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
index 23bb6e8fa9a..a131a386baa 100644
--- a/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction76.C
@@ -1,6 +1,5 @@
 // PR c++/90799
 // { dg-do compile { target c++17 } }
-// { dg-ice "unify" }
 
 template
 void foo() noexcept(T::value);

base-commit: f262a3518877ccce9ed41b2e152c3a3564727bd6
-- 
2.29.2



Re: [committed] analyzer: fix ICE with -fsanitize=undefined [PR98293]

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 07:22:58PM -0500, David Malcolm via Gcc-patches wrote:
> --- a/gcc/analyzer/store.cc
> +++ b/gcc/analyzer/store.cc
> @@ -524,10 +524,27 @@ binding_map::apply_ctor_to_region (const region 
> *parent_reg, tree ctor,
>unsigned ix;
>tree index;
>tree val;
> +  tree parent_type = parent_reg->get_type ();
> +  tree field;
> +  if (TREE_CODE (parent_type) == RECORD_TYPE)
> +field = TYPE_FIELDS (parent_type);
> +  else
> +field = NULL_TREE;
>FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), ix, index, val)
>  {
>if (!index)
> - index = build_int_cst (integer_type_node, ix);
> + {
> +   /* If index is NULL, then iterate through the fields for
> +  a RECORD_TYPE, or use an INTEGER_CST otherwise.
> +  Compare with similar logic in output_constructor.  */
> +   if (field)
> + {
> +   index = field;
> +   field = DECL_CHAIN (field);
> + }

The TYPE_FIELDS chain doesn't contain only FIELD_DECLs, can contain other
decls (FUNCTION_DECLs, USING_DECLs, TYPE_DECLs, ...).
So this should be really skipping chain elts other than FIELD_DECLs.
E.g. C++ FE has next_initializable_field function for that, unfortunately
the middle-end doesn't.

Jakub



[committed] analyzer: fix ICE with -fsanitize=undefined [PR98293]

2021-01-04 Thread David Malcolm via Gcc-patches
-fsanitize=undefined with calls to nonnull functions
creates struct __ubsan_nonnull_arg_data instances
with CONSTRUCTORs for RECORD_TYPEs with NULL index values.
The analyzer was mistakenly using INTEGER_CST for these
fields, leading to ICEs.

Fix the issue by iterating through the fields in the type
for such cases, imitating similar logic in varasm.c's
output_constructor.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-6452-g15af33a88065f983181550fc53821f1c6e14c5c7.

gcc/analyzer/ChangeLog:
PR analyzer/98293
* store.cc (binding_map::apply_ctor_to_region): When "index" is
NULL, iterate through the fields for RECORD_TYPEs, rather than
creating an INTEGER_CST index.

gcc/testsuite/ChangeLog:
PR analyzer/98293
* gcc.dg/analyzer/pr98293.c: New test.
---
 gcc/analyzer/store.cc   | 19 ++-
 gcc/testsuite/gcc.dg/analyzer/pr98293.c |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98293.c

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 35a7e2985cd..b4a4d5f3bb2 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -524,10 +524,27 @@ binding_map::apply_ctor_to_region (const region 
*parent_reg, tree ctor,
   unsigned ix;
   tree index;
   tree val;
+  tree parent_type = parent_reg->get_type ();
+  tree field;
+  if (TREE_CODE (parent_type) == RECORD_TYPE)
+field = TYPE_FIELDS (parent_type);
+  else
+field = NULL_TREE;
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), ix, index, val)
 {
   if (!index)
-   index = build_int_cst (integer_type_node, ix);
+   {
+ /* If index is NULL, then iterate through the fields for
+a RECORD_TYPE, or use an INTEGER_CST otherwise.
+Compare with similar logic in output_constructor.  */
+ if (field)
+   {
+ index = field;
+ field = DECL_CHAIN (field);
+   }
+ else
+   index = build_int_cst (integer_type_node, ix);
+   }
   else if (TREE_CODE (index) == RANGE_EXPR)
{
  tree min_index = TREE_OPERAND (index, 0);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98293.c 
b/gcc/testsuite/gcc.dg/analyzer/pr98293.c
new file mode 100644
index 000..f750c902440
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr98293.c
@@ -0,0 +1,2 @@
+/* { dg-additional-options "-fsanitize=undefined" } */
+#include "../pr93399.c"
-- 
2.26.2



Re: [PATCH, rs6000] improve vec_ctf invalid parameter handling. (pr91903)

2021-01-04 Thread will schmidt via Gcc-patches
On Mon, 2020-10-26 at 16:22 -0500, will schmidt wrote:
> [PATCH, rs6000] improve vec_ctf invalid parameter handling.
> 
> Hi,
>   Per PR91903, GCC ICEs when we attempt to pass a variable
> (or out of range value) into the vec_ctf() builtin.  Per
> investigation, the parameter checking exists for this
> builtin with the int types, but was missing for
> the long long types.
> 
> This patch adds the missing CODE_FOR_* entries to the
> rs6000_expand_binup_builtin to cover that scenario.
> This patch also updates some existing tests to remove
> calls to vec_ctf() and vec_cts() that contain negative
> values.
> 
> Regtested clean on power7, power8, power9 Linux targets.
> 
> OK for trunk?


I've reviewed the list archives in case my local inbox lost a response..  I 
don't think this one was reviewed.  
so..

ping!  

:-) 

thanks
-Will


> 
> THanks,
> -Will
> 
> PR target/91903
> 
> 2020-10-26  Will Schmidt  
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-call.c (rs6000_expand_binup_builtin): Add
>   clauses for CODE_FOR_vsx_xvcvuxddp_scale and
>   CODE_FOR_vsx_xvcvsxddp_scale to the parameter checking code.
> 
> gcc/testsuite/ChangeLog:
>   * testsuite/gcc.target/powerpc/pr91903.c: New test.
>   * testsuite/gcc.target/powerpc/builtins-1.fold.h: Update.
>   * testsuite/gcc.target/powerpc/builtins-2.c: Update.
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index b044778a7ae4..eb7e007e68d3 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9447,11 +9447,13 @@ rs6000_expand_binop_builtin (enum insn_code icode, 
> tree exp, rtx target)
>   }
>  }
>else if (icode == CODE_FOR_altivec_vcfux
>|| icode == CODE_FOR_altivec_vcfsx
>|| icode == CODE_FOR_altivec_vctsxs
> -  || icode == CODE_FOR_altivec_vctuxs)
> +  || icode == CODE_FOR_altivec_vctuxs
> +  || icode == CODE_FOR_vsx_xvcvuxddp_scale
> +  || icode == CODE_FOR_vsx_xvcvsxddp_scale)
>  {
>/* Only allow 5-bit unsigned literals.  */
>STRIP_NOPS (arg1);
>if (TREE_CODE (arg1) != INTEGER_CST
> || TREE_INT_CST_LOW (arg1) & ~0x1f)
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h 
> b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> index 8bc5f5e43366..42d552295e3e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> @@ -212,14 +212,14 @@ int main ()
>extern vector unsigned long long u9; u9 = vec_mergeo (u3, u4);
>  
>extern vector long long l8; l8 = vec_mul (l3, l4);
>extern vector unsigned long long u6; u6 = vec_mul (u3, u4);
>  
> -  extern vector double dh; dh = vec_ctf (la, -2);
> +  extern vector double dh; dh = vec_ctf (la, 2);
>extern vector double di; di = vec_ctf (ua, 2);
>extern vector int sz; sz = vec_cts (fa, 0x1F);
> -  extern vector long long l9; l9 = vec_cts (dh, -2);
> +  extern vector long long l9; l9 = vec_cts (dh, 2);
>extern vector unsigned long long u7; u7 = vec_ctu (di, 2);
>extern vector unsigned int usz; usz = vec_ctu (fa, 0x1F);
>  
>extern vector float f1; f1 = vec_mergee (fa, fb);
>extern vector float f2; f2 = vec_mergeo (fa, fb);
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-2.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> index 2aa23a377992..30acae47faff 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> @@ -40,16 +40,16 @@ int main ()
>  
>if (se[0] != 27L || se[1] != 27L || sf[0] != -14L || sf[1] != -14L
>|| ue[0] != 27L || ue[1] != 27L || uf[0] != 14L || uf[1] != 14L)
>  abort ();
>  
> -  vector double da = vec_ctf (sa, -2);
> +  vector double da = vec_ctf (sa, 2);
>vector double db = vec_ctf (ua, 2);
> -  vector long long sg = vec_cts (da, -2);
> +  vector long long sg = vec_cts (da, 2);
>vector unsigned long long ug = vec_ctu (db, 2);
>  
> -  if (da[0] != 108.0 || da[1] != -56.0 || db[0] != 6.75 || db[1] != 3.5
> +  if (da[0] != 6.75 || da[1] != -3.5 || db[0] != 6.75 || db[1] != 3.5
>|| sg[0] != 27L || sg[1] != -14L || ug[0] != 27L || ug[1] != 14L)
>  abort ();
>  
>vector float fa = vec_ctf (inta, 5);
>if (fa[0] != 0.843750 || fa[1] != -0.031250 || fa[2] != 0.125000 || fa[3] 
> != 0.281250)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr91903.c 
> b/gcc/testsuite/gcc.target/powerpc/pr91903.c
> new file mode 100644
> index ..f0792117a88f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr91903.c
> @@ -0,0 +1,74 @@
> +/* { dg-do compile */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-options "-mdejagnu-cpu=power8" } */
> +
> +#include 
> +#include 
> +#include 
> +
> +vector double retd;
> +vector float retf;
> +vector signed int retsi;
> +
> +void test_int(vector signed int a, const int b)
> +{
> + retf = v

Re: [PATCH] correct -Wmismatched-new-delete for template instantiations (PR 98305)

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/16/20 7:10 PM, Martin Sebor via Gcc-patches wrote:
> The -Wmismatched-new-delete detection of operator members of class
> template instantiations is incomplete and overly simplistic, leading
> to incorrect results and false positives.  Rather than reinventing
> the wheel and parsing the mangled qualified names of the operators
> the attached patch uses the demangler to compare their names.  Since
> the code is only entered rarely (for user- defined overloads of
> the operators, the cost of the demangling should be negligible in
> most code bases).
>
> Tested on x86_64-linux.
>
> Martin
>
> gcc-98305.diff
>
> PR c++/98305 spurious -Wmismatched-new-delete on template instance
>
> gcc/ChangeLog:
>
>   PR c++/98305
>   * builtins.c (new_delete_mismatch_p): New overload.
>   (new_delete_mismatch_p (tree, tree)): Call it.
>
> gcc/testsuite/ChangeLog:
>
>   PR c++/98305
>   * g++.dg/warn/Wmismatched-new-delete-3.C: New test.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 28e5ab2..b1d69855523 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-live.h"
>  #include "tree-outof-ssa.h"
>  #include "attr-fnspec.h"
> +#include "demangle.h"
>  
>  struct target_builtins default_target_builtins;
>  #if SWITCHABLE_TARGET
> @@ -13053,121 +13054,152 @@ call_dealloc_argno (tree exp)
>return UINT_MAX;
>  }
>  
> -/* Return true if DELETE_DECL is an operator delete that's not suitable
> -   to call with a pointer returned fron NEW_DECL.  */
> +/* Return true if DELC doesn't refer to an operator delete that's
> +   suitable to call with a pointer returned from the operator new
> +   described by NEWC.  */
s/DELC/DECL/

OK with the nit fixed.
jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Martin Sebor via Gcc-patches

On 1/4/21 2:10 PM, Jeff Law wrote:



On 1/4/21 1:53 PM, Martin Sebor wrote:

On 1/4/21 12:23 PM, Jeff Law wrote:



On 1/4/21 12:19 PM, Jakub Jelinek wrote:

On Mon, Jan 04, 2021 at 12:14:15PM -0700, Jeff Law via Gcc-patches
wrote:

Doing the STRING_CST is certainly less fragile since the SSA names
created at gimplification time could even be ggc_freed when no longer
used in the IL.

Obviously we can't use SSA_NAMEs as they're specific to each
function as
they get compiled.  But what's not as clear to me is why we can't
use a
SAVE_EXPR of the original expression that indicates the size of the
parameter.

The gimplifier is destructive, so if the expressions are partly
(e.g. in
those SAVE_EXPRs) shared with what is in the actual IL, we lose.
And if they aren't shared and there are side-effects, if we tried to
gimplify them again we'd get the side-effects duplicated.
So it all depends on what the code wants to handle, if e.g. just
values of
parameters with simple arithmetics on those and punt on everything
else,
then it is doable, but generally it is not.


I explained what the code handles and when in the pipeline in
the discussion of the previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559770.html

Right, but that message talks about GC.  This is not a GC issue.

This feels like we need a SAVE_EXPR to me to ensure single evaluation
and an unshare_expr to avoid problems with destructive gimplification.






I would expect the expressions to be values of parameters (or objects in
static storage) and simple arithemetic on them.  If there's other cases,
punting seems appropriate.

Martin -- are there nontrivial expressions we need to be worried
about here?


At the moment the middle warnings only consider parameters, like
the N in

   void f (int N, int[N]);

   void g (void)
   {
     int a[3];
     f (sizeof a, a);   // warning
   }

The front end redeclaration warnings consider all expressions,
including

   int f (void);

   void g (int[f () + 1]);
   void g (int[f () + 2]);   // warning

The patch turns these complex bounds into strings that the front
end compares instead.


If you can have an arbitrary expression, such as a function call like
that, then ISTM that a SAVE_EPR is mandatory as you can't call the
function more than once.  BUt it also seems to me that for cases that
aren't simple arithmetic of leaf nodes we could just punt.  I doubt
we're going to miss significant real world diagnostics by doing that.


I don't know about that.  Bugs are rare and often in unusual and
hard to read/understand code, so focusing on the simple cases and
doing nothing for the rest would certainly not be an improvement.

My understanding from the discussion at the link above is that
using SAVE_EXPRs is only necessary when the expression is evaluated
(the warning doesn't evaluate them).  I didn't use the SAVE_EXPRs
in this patch even though they're readily available (I had initially
missed it) or unsharing because Jakub preferred to avoid the space
overhead (IIUC).  I didn't remove the expressions because (as
I explained) the only way I could find to do it was one that Richard
was quite emphatic should be avoided.

If you want me to use the SHARE_EXPRs I can easily make that change
(if that counts for anything that would be my preference).  If there's
preference for removing the saved bounds in free_lang_data pass I can
presumably do that, either in addition or instead.


After the front end is done the strings
don't serve any purpose (and I don't think ever will) and could
be removed.  I looked for a way to do it but couldn't find one
other than the free_lang_data pass in tree.c that Richard had
initially said wasn't the right place.  Sounds like he's
reconsidered but at this point, given that VLA parameters are
used only infraquently, and VLAs with these nontrivial bounds
are exceedingly rare, going to the trouble of removing them
doesn't seem worth the effort.

But I'm not sure that inventing a new method for smuggling the data
around is all that wise or necessary here.   I don't see a message from
anyone suggesting that, but I could have missed it.


No one suggested "smuggling" anything around.  It also wasn't
my intent, nor do I think the code code that.  It just stores
the bounds in a form that the middle end can cope with.  There
are other front-end-only attributes that store strings (e.g.,
attribute deprecated) so this is not new.  But as I said, I'm
open to removing either the strings or the expressions.  I'd
just like to know which before I do the work this time.

Martin


Re: [PATCH] simplify-rtx: Optimize (x - 1) * y + y [PR98334]

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/17/20 7:55 AM, Jakub Jelinek wrote:
> Hi!
>
> We don't try to optimize for signed x, y (int) (x - 1U) * y + y
> into x * y, we can't do that with signed x * y, because the former
> is well defined for INT_MIN and -1, while the latter is not.
> We could perhaps optimize it during isel or some very late optimization
> where we'd turn magically flag_wrapv, but we don't do that yet.
>
> This patch optimizes it in simplify-rtx.c, such that we can optimize
> it during combine.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-12-17  Jakub Jelinek  
>
>   PR rtl-optimization/98334
>   * simplify-rtx.c (simplify_context::simplify_binary_operation_1):
>   Optimize (X - 1) * Y + Y to X * Y or (X + 1) * Y - Y to X * Y.
>
>   * gcc.target/i386/pr98334.c: New test.
OK.  Though I think we could do this in gimple as well which would
probably eliminate the need to do it in RTL.

jeff



Re: [PATCH] libtool.m4: update GNU/Hurd test from upstream

2021-01-04 Thread Samuel Thibault via Gcc-patches
Hello,

Jeff Law, le lun. 04 janv. 2021 13:29:53 -0700, a ecrit:
> On 12/23/20 6:12 PM, Samuel Thibault wrote:
> > In upstream libtool, 47a889a4ca20 ("Improve GNU/Hurd support.") fixed
> > detection of shlibpath_overrides_runpath, thus avoiding unnecessary relink.
> > This backports it.
> >
> > ChangeLog:
> >
> > * libtool.m4: Match gnu* along other GNU systems.
> > * libffi/configure: Re-generate.
> > * libgomp/configure: Re-generate.
> >
> > * libgo/config/libtool.m4: Match gnu* along other GNU systems.
> > * libgo/configure: Re-generate.
> OK for the trunk.

(I do not have commit access to gcc)

Samuel


Re: [PATCH] ira: Skip some pseudos in move_unallocated_pseudos

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/22/20 11:40 PM, Kewen.Lin via Gcc-patches wrote:
> Hi Segher,
>
> on 2020/12/22 下午9:55, Segher Boessenkool wrote:
>> Hi!
>>
>> Just a dumb formatting comment:
>>
>> On Tue, Dec 22, 2020 at 04:05:39PM +0800, Kewen.Lin wrote:
>>> This patch is to make move_unallocated_pseudos consistent
>>> to what we have in function find_moveable_pseudos, where we
>>> record the original pseudo into pseudo_replaced_reg only if
>>> validate_change succeeds with newreg.  To ensure every
>>> unallocated pseudo in move_unallocated_pseudos has expected
>>> information, it's better to add a check and skip it if it's
>>> unexpected.  This avoids possible ICEs in future.
>>>
>>> btw, I happened to found this in the bootstrapping for one
>>> experimental local patch, which is considered as impractical.
>>> --- a/gcc/ira.c
>>> +++ b/gcc/ira.c
>>> @@ -5111,6 +5111,11 @@ move_unallocated_pseudos (void)
>>>{
>>> int idx = i - first_moveable_pseudo;
>>> rtx other_reg = pseudo_replaced_reg[idx];
>>> +   /* If there is no appropriate pseudo in pseudo_replaced_reg, it
>>> +  means validate_change fails for this new pseudo in function
>>> +  find_moveable_pseudos, then bypass it here.*/
>> Dot space space.
> Good catch, thanks!  I forgot to reformat after polishing the comments.
> Will fix it with other potential comments.
>
>> The patch sounds fine to me.  Hard to tell without seeing the patch that
>> exposed the problem (for onlookers like me who do not know this code
>> well, anyway ;-) )
> The patch which made this issue exposed looks like:
>
> +; Like *rotl3_insert_3 but work with nonzero_bits rather than
> +; explicit AND.
> +(define_insn "*rotl3_insert_8"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> +(ior:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "u6bit_cint_operand" "n"))
> + (match_operand:GPR 3 "gpc_reg_operand" "0")))]
> +  "HOST_WIDE_INT_1U << INTVAL (operands[2])
> +   > nonzero_bits (operands[3], mode)"
> +{
> +  if (mode == SImode)
> +return "rlwimi %0,%1,%h2,0,31-%h2";
> +  else
> +return "rldimi %0,%1,%H2,0";
> +}
> +  [(set_attr "type" "insert")])
>
> Some insn matches this pattern in combine, later ira tries to introduce
> one new pseudo since it meets the checks in find_moveable_pseudos, but
> it fails in the call to validate_change since the nonzero_bits is more
> rough and can't satisfy the pattern condition, leaving the unexpected
> entry in pseudo_replaced_reg.
But what doesn't make any sense to me is pseudo_replaced_reg[] is only
set when validation is successful in find_moveable_pseudos.   So I can't
see how this patch actually helps the problem you're describing.

jeff



[PATCH] c++: Fix deduction from the type of an NTTP

2021-01-04 Thread Patrick Palka via Gcc-patches
In the testcase nontype-auto17.C below, the calls to f and g are invalid
because neither deduction nor defaulting of the template parameter T
yields a valid specialization.  Deducing T doesn't work because T is
only used in a non-deduced context, and defaulting T doesn't work
because its default argument makes the type of M invalid.

But with -std=c++17 or later, we incorrectly accept both calls.  With
C++17 (specifically P0127R2), we're allowed to try to deduce T from
the argument 42 that's been tentatively deduced for M.  The problem is
that when unify walks into the type of M, it immediately gives up on
the TYPENAME_TYPE and doesn't register any new unifications (so the type
of M is still unknown) -- and then we go on to unify M with 42 anyway.
Later in type_unification_real, we blindly use the default argument for
T to complete the template argument vector, and we end up with the bogus
specializations f and g.

This patch fixes this issue by checking whether the type of a NTTP is
still dependent after walking into the type.  If it is, it means we
couldn't deduce all the template parameters used in its type, and so we
shouldn't yet unify the NTTP.

(The new testcase ttp33.C demonstrates the need for the TEMPLATE_PARM_LEVEL
check; without it, we would ICE on this testcase from the call to tsubst.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.c (unify) : After walking into
the type of the TEMPLATE_PARM_INDEX, substitute into the type a
second time.  If the type is still dependent, don't unify it.

gcc/testsuite/ChangeLog:

* g++.dg/template/partial5.C: Adjust directives to expect the
same errors across all dialects.
* g++.dg/cpp1z/nontype-auto17.C: New test.
* g++.dg/cpp1z/nontype-auto18.C: New test.
* g++.dg/template/ttp33.C: New test.
---
 gcc/cp/pt.c | 10 +-
 gcc/testsuite/g++.dg/cpp1z/nontype-auto17.C | 10 ++
 gcc/testsuite/g++.dg/cpp1z/nontype-auto18.C |  6 ++
 gcc/testsuite/g++.dg/template/partial5.C|  2 +-
 gcc/testsuite/g++.dg/template/ttp33.C   | 10 ++
 5 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto17.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto18.C
 create mode 100644 gcc/testsuite/g++.dg/template/ttp33.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 19fd4c1d8a4..f1e8b01bc01 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -23581,13 +23581,21 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  /* We haven't deduced the type of this parameter yet.  */
  if (cxx_dialect >= cxx17
  /* We deduce from array bounds in try_array_deduction.  */
- && !(strict & UNIFY_ALLOW_INTEGER))
+ && !(strict & UNIFY_ALLOW_INTEGER)
+ && TEMPLATE_PARM_LEVEL (parm) <= TMPL_ARGS_DEPTH (targs))
{
  /* Deduce it from the non-type argument.  */
  tree atype = TREE_TYPE (arg);
  RECUR_AND_CHECK_FAILURE (tparms, targs,
   tparm, atype,
   UNIFY_ALLOW_NONE, explain_p);
+ /* Now check whether the type of this parameter is still
+dependent, and give up if so.  */
+ ++processing_template_decl;
+ tparm = tsubst (tparm, targs, tf_none, NULL_TREE);
+ --processing_template_decl;
+ if (uses_template_parms (tparm))
+   return unify_success (explain_p);
}
  else
/* Try again later.  */
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto17.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto17.C
new file mode 100644
index 000..509eb0e98e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto17.C
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++11 } }
+
+template  struct K { };
+
+template  int f(K); // { dg-error 
"void" }
+int a = f(K<42>{}); // { dg-error "no match" }
+
+struct S { using type = void; };
+template  int g(K); // { dg-message 
"deduction" }
+int b = g(K<42>{}); // { dg-error "no match" }
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto18.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto18.C
new file mode 100644
index 000..46873672714
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto18.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+
+template  struct K { };
+struct S { using type = int; };
+template  int f(K);
+int c = f(K<42>{});
diff --git a/gcc/testsuite/g++.dg/template/partial5.C 
b/gcc/testsuite/g++.dg/template/partial5.C
index a56229770f4..40d8c45b087 100644
--- a/gcc/testsuite/g++.dg/template/partial5.C
+++ b/gcc/testsuite/g++.dg/template/partial5.C
@@ -14,7 +14,7 @@ template
 struct Y { };
 
 template
-struct Y { }; // { dg-error "" "" { target { ! c++17 } } }
+struct Y { }; // { dg-error "" }
 
 

Re: [PATCH] dec_math.f90 needs to be xfailed

2021-01-04 Thread Steve Kargl via Gcc-patches
On Mon, Jan 04, 2021 at 02:30:43PM -0700, Jeff Law wrote:
> 
> On 1/2/21 1:34 AM, Steve Kargl via Gcc-patches wrote:
> > Can someone, anyone, please commit the following trivially patch?
> > gfortran.dg/dec_math.f90 will never pass on i?86-*-freebsd*.
> Why will the test never pass on that platform?  I don't mind installing
> the patch, but I'd like to have a bit more background first :-)
> 

The testcase assumes REAL(10) has 64-bits of precision.  On
i?86-*-freebsd, the i387 FPU control word is set to 53-bits.
The test program is not set up to deal with 11-bits of 
missing precision.

The current code uses the preprocessor to enable testing of
REAL(10) with something like

#ifdef __GFC_REAL_10__
   real(10) x
   print *, digits(x)
#endif

There are 44 instances of '#ifdef __GFC_REAL_10__'.  Each of those
would need to be modified to determine if __FreeBSD__ and __i386__
are defined, and gfortran would need to be modified to define
those macros!

IOW, xfailing the test is the easiest path forward for something
that will never pass.

-- 
Steve


Re: [PATCH] configure: Extend SHF_GNU_RETAIN conftest to check for unsupported gold

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/16/20 11:39 AM, Jozef Lawrynowicz wrote:
> Since "6fbec038f7a Use SHF_GNU_RETAIN to preserve symbol definitions",
> GCC could create sections with the 'R' flag, which GAS would map to
> SHF_GNU_RETAIN.
>
> SHF_GNU_RETAIN support was not available in gold until Binutils commit
> ff4bc37d77, on 2020-12-14. Until the support was added, invalid binaries
> could be created if some special sections, such as .init_array, had
> SHF_GNU_RETAIN set.
>
> HAVE_GAS_SHF_GNU_RETAIN is now disabled if a version of gold without
> SHF_GNU_RETAIN support is detected.
>
> I tested that HAVE_GAS_SHF_GNU_RETAIN was enabled/disabled as
> appropriate in a number of configurations, using both in-tree and
> out-of-tree Binutils:
> - Before the Binutils gold patch:
>   * gold disabled == SHF_GNU_RETAIN enabled
>   * gold enabled == SHF_GNU_RETAIN disabled
> - After the Binutils gold patch:
>   * gold disabled == SHF_GNU_RETAIN enabled
>   * gold enabled == SHF_GNU_RETAIN enabled 
>
> Successfully bootstrapped for x86_64-pc-linux-gnu.
>
> Ok to apply?
>
> 0001-configure-Extend-SHF_GNU_RETAIN-conftest-to-check-fo.patch
>
> From af3000bbacc6d8ca57581c11790032b73ea944ac Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Wed, 16 Dec 2020 18:33:54 +
> Subject: [PATCH] configure: Extend SHF_GNU_RETAIN conftest to check for
>  unsupported gold
>
> Since "6fbec038f7a Use SHF_GNU_RETAIN to preserve symbol definitions",
> GCC could create sections with the 'R' flag, which GAS would map to
> SHF_GNU_RETAIN.
>
> SHF_GNU_RETAIN support was not available in gold until Binutils commit
> ff4bc37d77, on 2020-12-14. Until the support was added, invalid binaries
> could be created if some special sections, such as .init_array, had
> SHF_GNU_RETAIN set.
>
> HAVE_GAS_SHF_GNU_RETAIN is now disabled if a version of gold without
> SHF_GNU_RETAIN support is detected.
>
> gcc/ChangeLog:
>
>   PR target/98210
>   * configure: Regenerate.
>   * configure.ac (gcc_cv_as_shf_gnu_retain): Disable
>   HAVE_GAS_SHF_GNU_RETAIN for gold versions that do not support it.
Sorry.  I would strongly recommend against testing like this using
version numbers.  Test the feature, not the version #, date, etc.

jeff



Re: [PATCH] Add line debug info for virtual thunks (PR ipa/97937)

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 1:06 PM, Bernd Edlinger wrote:
> Hi,
>
>
> currently there is a problem when debugging a virtual thunk.  That is
> a decl with DECL_IGNORED_P.  Currently the line information displayed
> in gdb is completely bogus, thus the last line of whatever function
> is immediately before the PC of the thunk.
>
> This patch improves the debug experience at least a bit by emitting
> at the line number information where the thunk has been defined.
> I do not dare to touch anything but dwarf2 debug info, therefore
> the patch is a bit awkward.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.
>
> 0001-Add-line-debug-info-for-virtual-thunks.patch
>
> From 0a44bb870e90623689cae484f8a8899706480876 Mon Sep 17 00:00:00 2001
> From: Bernd Edlinger 
> Date: Sun, 3 Jan 2021 11:18:39 +0100
> Subject: [PATCH] Add line debug info for virtual thunks
>
> There is no full debug info since the DECL_IGNORED_P
> flag is set on virtual thunks.
> But instead of no line info at all, emit at least
> the location of the function decl.
>
> 2021-01-03  Bernd Edlinger  
>
>   PR ipa/97937
>   * final.c (final_start_function_1): Always emit function start line
>   information for dwarf2 debug.
>   (final_end_function): Always call end_function for dwarf2 debug.
>   * varasm.c (assemble_start_function): Always call begin_function
>   for dwarf2 debug.


> ---
>  gcc/final.c  | 9 +++--
>  gcc/varasm.c | 2 +-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/final.c b/gcc/final.c
> index fc9a05e..5a274c1 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -1735,7 +1735,12 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, 
> int *seen,
>last_filename);
>  
>if (!dwarf2_debug_info_emitted_p (current_function_decl))
> -dwarf2out_begin_prologue (0, 0, NULL);
> +{
> +  if (write_symbols == DWARF2_DEBUG)
> + dwarf2out_begin_prologue (last_linenum, last_columnnum, last_filename);
> +  else
> + dwarf2out_begin_prologue (0, 0, NULL);
> +}
The only way you're getting into this code is for DEBUG_DWARF2 and
VMS_AND_DWARF2_DEBUG and in the latter case we want to make the same
fix.  So drop the newly added conditional and just make the code
something like this:


if (!dwarf2_debug_info_emitted_p (current_function_decl))
  dwarf2out_begin_prologue (last_linenum, last_columnnum, last_filename)


>  
>  #ifdef LEAF_REG_REMAP
>if (crtl->uses_only_leaf_regs)
> @@ -1879,7 +1884,7 @@ final_end_function (void)
>  {
>app_disable ();
>  
> -  if (!DECL_IGNORED_P (current_function_decl))
> +  if (!DECL_IGNORED_P (current_function_decl) || write_symbols == 
> DWARF2_DEBUG)
>  debug_hooks->end_function (high_function_linenum);
>  
>/* Finally, output the function epilogue:
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index ce5d449..513922d 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -1930,7 +1930,7 @@ assemble_start_function (tree decl, const char *fnname)
>ASM_OUTPUT_FUNCTION_PREFIX (asm_out_file, fnname);
>  #endif
>  
> -  if (!DECL_IGNORED_P (decl))
> +  if (!DECL_IGNORED_P (decl) || write_symbols == DWARF2_DEBUG)
>  (*debug_hooks->begin_function) (decl);
I'd drop the DWARF2_DEBUG conditionals in these two hunks as well. 
There's no reason why we wouldn't want to emit suitable debug
information for other formats. 

Jeff



Re: [PATCH] dec_math.f90 needs to be xfailed

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/2/21 1:34 AM, Steve Kargl via Gcc-patches wrote:
> Can someone, anyone, please commit the following trivially patch?
> gfortran.dg/dec_math.f90 will never pass on i?86-*-freebsd*.
Why will the test never pass on that platform?  I don't mind installing
the patch, but I'd like to have a bit more background first :-)


Jeff



Re: [PATCH] Restore input_location after recursive expand_call_inline

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 1:12 PM, Bernd Edlinger wrote:
> Hi,
>
> I spotted a place where input_location is clobbered accidentally.
>
> That is in a recursive call to expand_call_inline.  The input_location
> is usually restored by goto egress in this function.
>
> Additionally the return value of the recursive expand call is thrown
> away, which does not look like a good idea.
>
> Although this causes no problems ATM, I wanted to fix it anyway.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.
>
> 0001-Restore-input_location-after-recursive-expand_call_i.patch
>
> From 88b963bba7b32972abf0ea44a01c03d643d7c6ca Mon Sep 17 00:00:00 2001
> From: Bernd Edlinger 
> Date: Mon, 4 Jan 2021 11:35:31 +0100
> Subject: [PATCH] Restore input_location after recursive expand_call_inline
>
> This is just a precautionary fix.
>
> 2021-01-04  Bernd Edlinger  
>
>   * tree-inline.c (expand_call_inline): Restore input_location.
>   Return result from recursive call.
I suspect that we're always supposed to inline in this case.  As
asserting that successfully_inlined is true before jumping to "egress"
seems wise.

OK with that change after the usual testing.

Jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 02:10:39PM -0700, Jeff Law wrote:
> > I explained what the code handles and when in the pipeline in
> > the discussion of the previous patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559770.html
> Right, but that message talks about GC.  This is not a GC issue.
> 
> This feels like we need a SAVE_EXPR to me to ensure single evaluation
> and an unshare_expr to avoid problems with destructive gimplification.

unshare_expr will not duplicate SAVE_EXPRs.
So, one would need to unshare with special handling of SAVE_EXPRs that would
throw them away (for the simple arguments case) rather than handling them
normally.

Jakub



Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 2:00 PM, Jakub Jelinek wrote:
> On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches wrote:
>>> Sorry, I forgot to include the ChangeLog:
>>>
>>> gcc/ChangeLog:
>>> 
>>> 2020-12-31  Xi Ruoyao 
>>> 
>>> PR target/98491
>>> * config/mips/mips.c (mips_symbol_insns): Do not use
>>>   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
>> So I absolutely agree the current code is wrong as it does an out of
>> bounds array access.
>>
>>
>> Would it be better to instead to change MSA_SUPPORTED_MODE_P to evaluate
>> to zero if MODE is MAX_MACHINE_MODE?  That would protect all the uses of
>> MSA_SUPPORTED_MODE_P.    Something like this perhaps?
> But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of
> any target that would protect all macros that deal with modes that way.
>
> So, perhaps best would be stop using the MAX_MACHINE_MODE as magic value
> for that function and instead use say VOIDmode that shouldn't normally
> appear either?
I think we have to allow VOIDmode because constants don't necessarily
have modes.   And I certainly agree that using MAX_MACHINE_MODE like
this is ugly and error prone (as we can see from the BZ).

I also couldn't convince myself that the code and comments were actually
consistent, particularly for MSA targets which the comment claims can
never handle constants for ld/st (and thus should be returning 0 for
MAX_MACHINE_MODE).  Though maybe mips_symbol_insns_1 ultimately handles
that correctly.


>
> But I don't really see anything wrong on the mips_symbol_insns above
> change either.
Me neither.  I'm just questioning if bullet-proofing in the
MSA_SUPPORTED_MODE_P would be a better option.  While I've worked in the
MIPS port in the past, I don't really have any significannt experience
with the MSA support.

jeff



Re: [PATCH] rtl-ssa: Fix updates to call clobbers [PR98403]

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/21/20 9:32 AM, Richard Sandiford wrote:
> In the PR, fwprop was changing a call instruction and tripped
> an assert when trying to update a list of call clobbers.
> There are two ways we could handle this: remove the call clobber
> and then add it back, or assume that the clobber will stay in its
> current place.
>
> At the moment we don't have enough information to safely move
> calls around, so the second approach seems simpler and more
> efficient.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
>   PR rtl-optimization/98403
>   * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Explain
>   why we don't remove call clobbers.
>   (function_info::apply_changes_to_insn): Don't attempt to add
>   call clobbers here.
>
> gcc/testsuite/
>   PR rtl-optimization/98403
>   * g++.dg/opt/pr98403.C: New test.
OK
jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 1:53 PM, Martin Sebor wrote:
> On 1/4/21 12:23 PM, Jeff Law wrote:
>>
>>
>> On 1/4/21 12:19 PM, Jakub Jelinek wrote:
>>> On Mon, Jan 04, 2021 at 12:14:15PM -0700, Jeff Law via Gcc-patches
>>> wrote:
> Doing the STRING_CST is certainly less fragile since the SSA names
> created at gimplification time could even be ggc_freed when no longer
> used in the IL.
 Obviously we can't use SSA_NAMEs as they're specific to each
 function as
 they get compiled.  But what's not as clear to me is why we can't
 use a
 SAVE_EXPR of the original expression that indicates the size of the
 parameter.
>>> The gimplifier is destructive, so if the expressions are partly
>>> (e.g. in
>>> those SAVE_EXPRs) shared with what is in the actual IL, we lose.
>>> And if they aren't shared and there are side-effects, if we tried to
>>> gimplify them again we'd get the side-effects duplicated.
>>> So it all depends on what the code wants to handle, if e.g. just
>>> values of
>>> parameters with simple arithmetics on those and punt on everything
>>> else,
>>> then it is doable, but generally it is not.
>
> I explained what the code handles and when in the pipeline in
> the discussion of the previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559770.html
Right, but that message talks about GC.  This is not a GC issue.

This feels like we need a SAVE_EXPR to me to ensure single evaluation
and an unshare_expr to avoid problems with destructive gimplification.



>
>> I would expect the expressions to be values of parameters (or objects in
>> static storage) and simple arithemetic on them.  If there's other cases,
>> punting seems appropriate.
>>
>> Martin -- are there nontrivial expressions we need to be worried
>> about here?
>
> At the moment the middle warnings only consider parameters, like
> the N in
>
>   void f (int N, int[N]);
>
>   void g (void)
>   {
>     int a[3];
>     f (sizeof a, a);   // warning
>   }
>
> The front end redeclaration warnings consider all expressions,
> including
>
>   int f (void);
>
>   void g (int[f () + 1]);
>   void g (int[f () + 2]);   // warning
>
> The patch turns these complex bounds into strings that the front
> end compares instead.

If you can have an arbitrary expression, such as a function call like
that, then ISTM that a SAVE_EPR is mandatory as you can't call the
function more than once.  BUt it also seems to me that for cases that
aren't simple arithmetic of leaf nodes we could just punt.  I doubt
we're going to miss significant real world diagnostics by doing that.

> After the front end is done the strings
> don't serve any purpose (and I don't think ever will) and could
> be removed.  I looked for a way to do it but couldn't find one
> other than the free_lang_data pass in tree.c that Richard had
> initially said wasn't the right place.  Sounds like he's
> reconsidered but at this point, given that VLA parameters are
> used only infraquently, and VLAs with these nontrivial bounds
> are exceedingly rare, going to the trouble of removing them
> doesn't seem worth the effort.
But I'm not sure that inventing a new method for smuggling the data
around is all that wise or necessary here.   I don't see a message from
anyone suggesting that, but I could have missed it.

jeff



Re: cxx status update

2021-01-04 Thread Nathan Sidwell

On 1/4/21 3:33 PM, Gerald Pfeifer wrote:

Hi Nathan,

On Mon, 4 Jan 2021, Nathan Sidwell wrote:

Here's a patch describing the c++20 modules status.  While there I
noticed the coroutines project status was out of date (it's done).

ok?


you don't need extra approval for release note entries in your
areas of expertise/maintainership, but I'm always happy to help
and have a look.

I skimmed these changes and they look good.


The one part that caught my attention is

   -   Modules 
   +   Modules 

where I'm wondering why you are adding the  here?


That's an error :)  thanks for the advice

nathan


--
Nathan Sidwell


Re: [PATCH] c++: Fix ICE with __builtin_bit_cast [PR98469]

2021-01-04 Thread Jason Merrill via Gcc-patches

On 1/4/21 3:48 PM, Jakub Jelinek wrote:

On Mon, Jan 04, 2021 at 03:44:46PM -0500, Jason Merrill wrote:

This change is OK, but part of the problem is that we're trying to do
overload resolution for an S copy/move constructor, which we shouldn't be
because bit_cast is a prvalue, so in C++17 and up we should use it to
directly initialize the target without any implied constructor call.

It seems we're mishandling this because the code in
build_special_member_call specifically looks for TARGET_EXPR or CONSTRUCTOR,
and BIT_CAST_EXPR is neither of those.

Wrapping a BIT_CAST_EXPR of aggregate type in a TARGET_EXPR would address
this, and any other places that expect a class prvalue to come in the form
of a TARGET_EXPR.


I can try that tomorrow.  Won't that cause copying through extra temporary
in some cases though, or is that guaranteed to be optimized?


It won't cause any extra copying when it's used to initialize another 
object (like the return value of std::bit_cast).  Class prvalues are 
always expressed with a TARGET_EXPR in the front end; the TARGET_EXPR 
melts away when used as an initializer, it only creates a temporary when 
it's used in another way.


Jason



Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches wrote:
> > Sorry, I forgot to include the ChangeLog:
> >
> > gcc/ChangeLog:
> > 
> > 2020-12-31  Xi Ruoyao 
> > 
> > PR target/98491
> > * config/mips/mips.c (mips_symbol_insns): Do not use
> >   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
> So I absolutely agree the current code is wrong as it does an out of
> bounds array access.
> 
> 
> Would it be better to instead to change MSA_SUPPORTED_MODE_P to evaluate
> to zero if MODE is MAX_MACHINE_MODE?  That would protect all the uses of
> MSA_SUPPORTED_MODE_P.    Something like this perhaps?

But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of
any target that would protect all macros that deal with modes that way.

So, perhaps best would be stop using the MAX_MACHINE_MODE as magic value
for that function and instead use say VOIDmode that shouldn't normally
appear either?

But I don't really see anything wrong on the mips_symbol_insns above
change either.

Jakub



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Martin Sebor via Gcc-patches

On 1/4/21 12:23 PM, Jeff Law wrote:



On 1/4/21 12:19 PM, Jakub Jelinek wrote:

On Mon, Jan 04, 2021 at 12:14:15PM -0700, Jeff Law via Gcc-patches wrote:

Doing the STRING_CST is certainly less fragile since the SSA names
created at gimplification time could even be ggc_freed when no longer
used in the IL.

Obviously we can't use SSA_NAMEs as they're specific to each function as
they get compiled.  But what's not as clear to me is why we can't use a
SAVE_EXPR of the original expression that indicates the size of the
parameter.

The gimplifier is destructive, so if the expressions are partly (e.g. in
those SAVE_EXPRs) shared with what is in the actual IL, we lose.
And if they aren't shared and there are side-effects, if we tried to
gimplify them again we'd get the side-effects duplicated.
So it all depends on what the code wants to handle, if e.g. just values of
parameters with simple arithmetics on those and punt on everything else,
then it is doable, but generally it is not.


I explained what the code handles and when in the pipeline in
the discussion of the previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559770.html


I would expect the expressions to be values of parameters (or objects in
static storage) and simple arithemetic on them.  If there's other cases,
punting seems appropriate.

Martin -- are there nontrivial expressions we need to be worried about here?


At the moment the middle warnings only consider parameters, like
the N in

  void f (int N, int[N]);

  void g (void)
  {
int a[3];
f (sizeof a, a);   // warning
  }

The front end redeclaration warnings consider all expressions,
including

  int f (void);

  void g (int[f () + 1]);
  void g (int[f () + 2]);   // warning

The patch turns these complex bounds into strings that the front
end compares instead.  After the front end is done the strings
don't serve any purpose (and I don't think ever will) and could
be removed.  I looked for a way to do it but couldn't find one
other than the free_lang_data pass in tree.c that Richard had
initially said wasn't the right place.  Sounds like he's
reconsidered but at this point, given that VLA parameters are
used only infraquently, and VLAs with these nontrivial bounds
are exceedingly rare, going to the trouble of removing them
doesn't seem worth the effort.

Martin




Jeff





Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-01-04 Thread Jeff Law via Gcc-patches


On 12/31/20 4:34 PM, Xi Ruoyao via Gcc-patches wrote:
> On 2021-01-01 07:29 +0800, Xi Ruoyao wrote:
>> An invalid use of MSA_SUPPORTED_MODE_P is causing ICE on mips64el with -mmsa.
>> The detailed analysis is posted on bugzilla:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98491
>>
>> The attached patch fixes this issue by handling the special case of
>> MSA_SUPPORTED_MODE_P explicitly.
>>
>> Please keep me in CC since I'm not a subscriber.
>>
>> And, I don't have GIT write access.
> Sorry, I forgot to include the ChangeLog:
>
> gcc/ChangeLog:
> 
> 2020-12-31  Xi Ruoyao 
> 
> PR target/98491
> * config/mips/mips.c (mips_symbol_insns): Do not use
>   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
So I absolutely agree the current code is wrong as it does an out of
bounds array access.


Would it be better to instead to change MSA_SUPPORTED_MODE_P to evaluate
to zero if MODE is MAX_MACHINE_MODE?  That would protect all the uses of
MSA_SUPPORTED_MODE_P.    Something like this perhaps?



Jeff
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index b4a60a55d80..a159bb22381 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2418,6 +2418,7 @@ enum reg_class
 /* True if MODE is vector and supported in a MSA vector register.  */
 #define MSA_SUPPORTED_MODE_P(MODE) \
   (ISA_HAS_MSA \
+   && (MODE) != MAX_MACHINE_MODE
&& GET_MODE_SIZE (MODE) == UNITS_PER_MSA_REG\
&& (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT\
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT))


[PATCH][tree-optimization]Optimize combination of comparisons to dec+compare

2021-01-04 Thread Eugene Rozenfeld via Gcc-patches
Ping.

-Original Message-
From: Eugene Rozenfeld 
Sent: Tuesday, December 22, 2020 3:01 PM
To: Richard Biener ; gcc-patches@gcc.gnu.org
Subject: RE: Optimize combination of comparisons to dec+compare

Re-sending my question and re-attaching the patch.

Richard, can you please clarify your feedback?

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On Behalf Of Eugene 
Rozenfeld via Gcc-patches
Sent: Tuesday, December 15, 2020 2:06 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: Optimize combination of comparisons to dec+compare

Richard,

> Do we already handle x < y || x <= CST to x <= y - CST?

That is an invalid transformation: e.g., consider x=3, y=4, CST=2.
Can you please clarify?

Thanks,

Eugene

-Original Message-
From: Richard Biener  
Sent: Thursday, December 10, 2020 12:21 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Optimize combination of comparisons to dec+compare

On Thu, Dec 10, 2020 at 1:52 AM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> This patch adds a pattern for optimizing x < y || x == XXX_MIN to x <= 
> y-1 if y is an integer with TYPE_OVERFLOW_WRAPS.

Do we already handle x < y || x <= CST to x <= y - CST?
That is, the XXX_MIN case is just a special-case of generic anti-range testing? 
 For anti-range testing with signed types we pun to unsigned when possible.

> This fixes pr96674.
>
> Tested on x86_64-pc-linux-gnu.
>
> For this function
>
> bool f(unsigned a, unsigned b)
> {
> return (b == 0) | (a < b);
> }
>
> the code without the patch is
>
> test   esi,esi
> sete   al
> cmpesi,edi
> seta   dl
> or eax,edx
> ret
>
> the code with the patch is
>
> subesi,0x1
> cmpesi,edi
> setae  al
> ret
>
> Eugene
>
> gcc/
> PR tree-optimization/96674
> * match.pd: New pattern x < y || x == XXX_MIN --> x <= y - 1
>
> gcc/testsuite
> * gcc.dg/pr96674.c: New test.
>


0001-Optimize-combination-of-comparisons-to-dec-compare.patch
Description: 0001-Optimize-combination-of-comparisons-to-dec-compare.patch


Re: [PATCH] c++: Fix ICE with __builtin_bit_cast [PR98469]

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 03:44:46PM -0500, Jason Merrill wrote:
> This change is OK, but part of the problem is that we're trying to do
> overload resolution for an S copy/move constructor, which we shouldn't be
> because bit_cast is a prvalue, so in C++17 and up we should use it to
> directly initialize the target without any implied constructor call.
> 
> It seems we're mishandling this because the code in
> build_special_member_call specifically looks for TARGET_EXPR or CONSTRUCTOR,
> and BIT_CAST_EXPR is neither of those.
> 
> Wrapping a BIT_CAST_EXPR of aggregate type in a TARGET_EXPR would address
> this, and any other places that expect a class prvalue to come in the form
> of a TARGET_EXPR.

I can try that tomorrow.  Won't that cause copying through extra temporary
in some cases though, or is that guaranteed to be optimized?

Jakub



Re: [C PATCH] [testsuite] bogus warning [P98029]

2021-01-04 Thread Joseph Myers
On Wed, 23 Dec 2020, Uecker, Martin wrote:

> With the fix to PR98047 "C: Drop qualifiers of assignment expressions." 
> also the new incorrect warning for assignment of certain volatile expressions
> introduced by dropping qualifiers in lvalue conversion (PR97702)
> disappeared [P98029]. This patch only adds a test case.
> 
> -- Martin
> 
> 
> C: Add test for incorrect warning for assignment of certain volatile
> expressions fixed by commit 58a45ce [PR98029]
> 
> 2020-12-12  Martin Uecker  
> 
> gcc/testsuite/
>  PR c/98029
>  * gcc.dg/pr98029.c: New test.   

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c++: Fix ICE with __builtin_bit_cast [PR98469]

2021-01-04 Thread Jason Merrill via Gcc-patches

On 12/30/20 4:13 AM, Jakub Jelinek wrote:

Hi!

On the following testcase we ICE during constexpr evaluation (for warnings),
because the IL has ADDR_EXPR of BIT_CAST_EXPR and ADDR_EXPR case asserts
the result is not a CONSTRUCTOR.
I've tried to force a temporary for those in call.c next to:
 if (convs->need_temporary_p
 || TREE_CODE (expr) == CONSTRUCTOR
 || TREE_CODE (expr) == VA_ARG_EXPR)
but that resulted in a lot of ICEs, so this patch just punts on lval
evaluation of BIT_CAST_EXPR instead, normally __builtin_bit_cast is called
from std::bit_cast which is constexpr and therefore the BIT_CAST_EXPR
isn't evaluated there during parsing or tsubst and when evaluating the call
to std::bit_cast the NRV optimized return is assigned to some temporary or
variable and so BIT_CAST_EXPR is not evaluated as lval.


This change is OK, but part of the problem is that we're trying to do 
overload resolution for an S copy/move constructor, which we shouldn't 
be because bit_cast is a prvalue, so in C++17 and up we should use it to 
directly initialize the target without any implied constructor call.


It seems we're mishandling this because the code in 
build_special_member_call specifically looks for TARGET_EXPR or 
CONSTRUCTOR, and BIT_CAST_EXPR is neither of those.


Wrapping a BIT_CAST_EXPR of aggregate type in a TARGET_EXPR would 
address this, and any other places that expect a class prvalue to come 
in the form of a TARGET_EXPR.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-30  Jakub Jelinek  

PR c++/98469
* constexpr.c (cxx_eval_constant_expression) :
Punt if lval is true.

* g++.dg/cpp2a/bit-cast8.C: New test.
* g++.dg/cpp2a/bit-cast9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-12-23 22:44:05.398093175 +0100
+++ gcc/cp/constexpr.c  2020-12-29 10:32:44.865030881 +0100
@@ -6900,6 +6900,15 @@ cxx_eval_constant_expression (const cons
return t;
  
  case BIT_CAST_EXPR:

+  if (lval)
+   {
+ if (!ctx->quiet)
+   error_at (EXPR_LOCATION (t),
+ "address of a call to %qs is not a constant expression",
+ "__builtin_bit_cast");
+ *non_constant_p = true;
+ return t;
+   }
r = cxx_eval_bit_cast (ctx, t, non_constant_p, overflow_p);
break;
  
--- gcc/testsuite/g++.dg/cpp2a/bit-cast8.C.jj	2020-12-29 10:35:31.547140723 +0100

+++ gcc/testsuite/g++.dg/cpp2a/bit-cast8.C  2020-12-29 10:34:26.431879120 
+0100
@@ -0,0 +1,11 @@
+// PR c++/98469
+// { dg-do compile { target c++20 } }
+// { dg-options "-Wall" }
+
+struct S { int s; };
+
+S
+foo ()
+{
+  return __builtin_bit_cast (S, 0);
+}
--- gcc/testsuite/g++.dg/cpp2a/bit-cast9.C.jj   2020-12-29 10:35:35.018101365 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast9.C  2020-12-29 10:35:05.905431494 
+0100
@@ -0,0 +1,15 @@
+// PR c++/98469
+// { dg-do compile { target c++20 } }
+// { dg-options "-Wall" }
+
+template
+constexpr T
+bit_cast (const F &f) noexcept
+{
+  return __builtin_bit_cast (T, f);
+}
+struct S { int s; };
+constexpr int foo (const S &x) { return x.s; }
+constexpr int bar () { return foo (bit_cast (0)); }
+constexpr int x = bar ();
+static_assert (!x);

Jakub





Re: [C PATCH] [testsuite] bogus warning [P98029]

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/23/20 1:00 AM, Uecker, Martin wrote:
>
> With the fix to PR98047 "C: Drop qualifiers of assignment expressions." 
> also the new incorrect warning for assignment of certain volatile expressions
> introduced by dropping qualifiers in lvalue conversion (PR97702)
> disappeared [P98029]. This patch only adds a test case.
>
> -- Martin
>
>
> C: Add test for incorrect warning for assignment of certain volatile
> expressions fixed by commit 58a45ce [PR98029]
> 
> 2020-12-12  Martin Uecker  
>
> gcc/testsuite/
>  PR c/98029
>  * gcc.dg/pr98029.c: New test.   
OK
jeff



Re: cxx status update

2021-01-04 Thread Gerald Pfeifer
Hi Nathan,

On Mon, 4 Jan 2021, Nathan Sidwell wrote:
> Here's a patch describing the c++20 modules status.  While there I 
> noticed the coroutines project status was out of date (it's done).
> 
> ok?

you don't need extra approval for release note entries in your
areas of expertise/maintainership, but I'm always happy to help
and have a look.

I skimmed these changes and they look good.


The one part that caught my attention is

  -   Modules 
  +   Modules 

where I'm wondering why you are adding the  here? 

Is this to top-align that cell since other cells in that row have 
multiple lines? If so, we should be able to generally handle this
via CSS. Check out gcc.css and the existing entry 

  table.cxxdrstatus td:nth-child(4) { text-align:center; }

there. If you want this for all rows, something like the following 
should do what you want?

  table.cxxdrstatus td:nth-child(1) { vertical-align:top; }

Or do you only want to do that for that one row? Then we could add
a new class.

Either way, if you don't want to dive in, got ahead and commit as is,
let me know what you want to accomplish, and I'll have a look. :-)

Gerald


Re: [PATCH] libtool.m4: update GNU/Hurd test from upstream

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/23/20 6:12 PM, Samuel Thibault wrote:
> In upstream libtool, 47a889a4ca20 ("Improve GNU/Hurd support.") fixed
> detection of shlibpath_overrides_runpath, thus avoiding unnecessary relink.
> This backports it.
>
> ChangeLog:
>
>   * libtool.m4: Match gnu* along other GNU systems.
>   * libffi/configure: Re-generate.
>   * libgomp/configure: Re-generate.
>
>   * libgo/config/libtool.m4: Match gnu* along other GNU systems.
>   * libgo/configure: Re-generate.
OK for the trunk.

jeff



Re: [PATCH v2] C-family : Add attribute 'unavailable'.

2021-01-04 Thread Joseph Myers
On Mon, 21 Dec 2020, Iain Sandoe wrote:

> Hi Joseph,
> 
> Nathan has approved this from the C++ perspective (and Martin said that his
> comments were  "not necessarily to object to the idea behind the attribute but
> to draw attention to the fact that it's not suitable for standard APIs.”)
> 
> So, I wonder, do you have more comments on the revised patch, or is this
> now OK for master?

If  
is the current version, one of the three copies of the attribute 
documentation (the one for it as a function attribute) starts with "The 
@code{deprecated} attribute results in an error", when it should refer to 
@code{unavailable} not @code{deprecated}.  I don't have further comments 
beyond that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] build: libcody: Link with -lsocket -lnsl if necessary [PR98316]

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/17/20 6:27 AM, Nathan Sidwell wrote:
> On 12/17/20 7:21 AM, Rainer Orth wrote:
>> With the introduction of C++20 modules and libcody, cc1plus and
>> cc1objplus gained a dependency on the socket functions.  Before those
>> were merged into libc in Solaris 11.4, one needed to link with
>> -lsocket -lnsl
>> on Solaris, so that merge broke the Solaris 11.3 build.
>>
>> While we already have 4 different checks for those libraries in the
>> tree, I decided to import autoconf-archive's AX_LIB_SOCKET_NSL macro
>> instead.  At the same time, the patch only links libcody and the
>> networking libs where needed (cc1plus, cc1objplus).
>>
>> Bootstrapped without regressions on i386-pc-solaris2.11 (Solaris 11.3
>> and 11.4), sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.
>>
>> Ok for master?
>
> Thank you very much for working on this.  Approved to the extent of my
> approval rating!
Which is sufficient in my book.  But in case anyone wants to be
pedantic, I'll give it my ACK.
jeff



[PATCH] Restore input_location after recursive expand_call_inline

2021-01-04 Thread Bernd Edlinger
Hi,

I spotted a place where input_location is clobbered accidentally.

That is in a recursive call to expand_call_inline.  The input_location
is usually restored by goto egress in this function.

Additionally the return value of the recursive expand call is thrown
away, which does not look like a good idea.

Although this causes no problems ATM, I wanted to fix it anyway.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
From 88b963bba7b32972abf0ea44a01c03d643d7c6ca Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Mon, 4 Jan 2021 11:35:31 +0100
Subject: [PATCH] Restore input_location after recursive expand_call_inline

This is just a precautionary fix.

2021-01-04  Bernd Edlinger  

	* tree-inline.c (expand_call_inline): Restore input_location.
	Return result from recursive call.
---
 gcc/tree-inline.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 360b85f..9f7d914 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4840,9 +4840,9 @@ expand_call_inline (basic_block bb, gimple *stmt, copy_body_data *id,
   gimple_call_set_fndecl (stmt, edge->callee->decl);
   update_stmt (stmt);
   id->src_node->remove ();
-  expand_call_inline (bb, stmt, id, to_purge);
+  successfully_inlined = expand_call_inline (bb, stmt, id, to_purge);
   maybe_remove_unused_call_args (cfun, stmt);
-  return true;
+  goto egress;
 }
   fn = cg_edge->callee->decl;
   cg_edge->callee->get_untransformed_body ();
-- 
1.9.1



[PATCH] Add line debug info for virtual thunks (PR ipa/97937)

2021-01-04 Thread Bernd Edlinger
Hi,


currently there is a problem when debugging a virtual thunk.  That is
a decl with DECL_IGNORED_P.  Currently the line information displayed
in gdb is completely bogus, thus the last line of whatever function
is immediately before the PC of the thunk.

This patch improves the debug experience at least a bit by emitting
at the line number information where the thunk has been defined.
I do not dare to touch anything but dwarf2 debug info, therefore
the patch is a bit awkward.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
From 0a44bb870e90623689cae484f8a8899706480876 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Sun, 3 Jan 2021 11:18:39 +0100
Subject: [PATCH] Add line debug info for virtual thunks

There is no full debug info since the DECL_IGNORED_P
flag is set on virtual thunks.
But instead of no line info at all, emit at least
the location of the function decl.

2021-01-03  Bernd Edlinger  

	PR ipa/97937
	* final.c (final_start_function_1): Always emit function start line
	information for dwarf2 debug.
	(final_end_function): Always call end_function for dwarf2 debug.
	* varasm.c (assemble_start_function): Always call begin_function
	for dwarf2 debug.
---
 gcc/final.c  | 9 +++--
 gcc/varasm.c | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fc9a05e..5a274c1 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1735,7 +1735,12 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, int *seen,
  last_filename);
 
   if (!dwarf2_debug_info_emitted_p (current_function_decl))
-dwarf2out_begin_prologue (0, 0, NULL);
+{
+  if (write_symbols == DWARF2_DEBUG)
+	dwarf2out_begin_prologue (last_linenum, last_columnnum, last_filename);
+  else
+	dwarf2out_begin_prologue (0, 0, NULL);
+}
 
 #ifdef LEAF_REG_REMAP
   if (crtl->uses_only_leaf_regs)
@@ -1879,7 +1884,7 @@ final_end_function (void)
 {
   app_disable ();
 
-  if (!DECL_IGNORED_P (current_function_decl))
+  if (!DECL_IGNORED_P (current_function_decl) || write_symbols == DWARF2_DEBUG)
 debug_hooks->end_function (high_function_linenum);
 
   /* Finally, output the function epilogue:
diff --git a/gcc/varasm.c b/gcc/varasm.c
index ce5d449..513922d 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1930,7 +1930,7 @@ assemble_start_function (tree decl, const char *fnname)
   ASM_OUTPUT_FUNCTION_PREFIX (asm_out_file, fnname);
 #endif
 
-  if (!DECL_IGNORED_P (decl))
+  if (!DECL_IGNORED_P (decl) || write_symbols == DWARF2_DEBUG)
 (*debug_hooks->begin_function) (decl);
 
   /* Make function name accessible from other files, if appropriate.  */
-- 
1.9.1



[r11-6448 Regression] FAIL: g++.dg/modules/builtin-3_a.C -std=c++2a scan-lang-dump module "Wrote GMF:-[0-9]* type_decl:'::__builtin_va_list'@builtins" on Linux/x86_64

2021-01-04 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

a5469584f61abeec98ff89347294f9ed72eca280 is the first bad commit
commit a5469584f61abeec98ff89347294f9ed72eca280
Author: Nathan Sidwell 
Date:   Mon Jan 4 07:45:36 2021 -0800

c++: Add stdlib module test cases

caused

FAIL: g++.dg/modules/builtin-3_a.C -std=c++17  scan-lang-dump module 
"Writing:-[0-9]*'s named merge key \\(decl\\) type_decl:'::__builtin_va_list'"
FAIL: g++.dg/modules/builtin-3_a.C -std=c++17  scan-lang-dump module "Wrote 
GMF:-[0-9]* type_decl:'::__builtin_va_list'@builtins"
FAIL: g++.dg/modules/builtin-3_a.C -std=c++2a  scan-lang-dump module 
"Writing:-[0-9]*'s named merge key \\(decl\\) type_decl:'::__builtin_va_list'"
FAIL: g++.dg/modules/builtin-3_a.C -std=c++2a  scan-lang-dump module "Wrote 
GMF:-[0-9]* type_decl:'::__builtin_va_list'@builtins"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-6448/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/builtin-3_a.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/builtin-3_a.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: Fix testsuite/g++.dg/cpp1y/constexpr-66093.C execution failure...

2021-01-04 Thread Mike Stump via Gcc-patches
On Jan 1, 2021, at 6:41 PM, Alexandre Oliva  wrote:
> 
> On Jan  1, 2021, Mike Stump  wrote:
> 
>> On Jan 1, 2021, at 3:37 PM, Alexandre Oliva  wrote:
>>> 
>>> On Dec 29, 2020, Mike Stump  wrote:
>>> 
 a[i-1] = i;
>>> 
>>> 'fraid that won't pass:
>>> 
>>> for (int i = 0; i < n; i++) {
>>> assert (a[i] == i);
>>> }
> 
>> Ok, how about your version with the comment updated?
> 
> Err, are we talking about the same testcase?
> There aren't any comments in this one.

Oh, It's possible the comments were stripped from the bug report or initial 
email.  Never mind then about the comment part.

Re: [RFC] [avr] Toolchain Integration for Testsuite Execution (avr cc0 to mode_cc0 conversion)

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/31/20 7:13 AM, abebeos wrote:
>
>
> On Wed, 30 Dec 2020 at 05:41, Jeff Law  > wrote:
>
>
>
> On 12/23/20 9:01 AM, abebeos wrote:
> >
> >
> > On Sun, 13 Dec 2020 at 20:14, abebeos  
> > >+abeb...@gmail.com
>   >>
> > wrote:
> >
> >
> >
> >     On Fri, 11 Dec 2020 at 20:32, Jeff Law  
> >     >> wrote:
> >
> >
> >
> >         On 12/9/20 6:12 AM, abebeos via Gcc-patches wrote:
> >         > Essence:
> >         >
> >         > I need a confirmation that the testsuite setup as
> presented in:
> >         >
> >         > https://github.com/abebeos/avr-gnu
> 
> >          >
> >         >
> >         > works fine.
> >         >
> >         > The problem with the avr target is that the testsuite
> cannot
> >         be run easily,
> >         > mainly because of the need for a special simulated-target
> >         setup, which does
> >         > not work for avr as documented. This led developers to a
> >         dead-end with
> >         > their non-cc0-avr-backends (the non-cc0 backend is needed
> >         thus avr is not
> >         > dropped from gcc11).
> >         >
> >         > I integrated a toolchain/testsetup to be able to run
> the gcc
> >         testsuite
> >         > against a simulated avr target.
> >         >
> >         > I then used this toolchain to test 2 different existent
> >         > non-cc0-avr-backends (from pipcet and saaadhu, both
> github).
> >         >
> >         > The result is that saaadhu's backend seems to be working
> >         100%. It has
> >         > identical testsuite results with the existing (but
> >         deprecated) cc0-backend,
> >         > which means that it can be used "as-is" for inclusion
> in gcc11.
> >         >
> >         > Please note that I did this work in context of a bounty @
> >         bountysouce, more
> >         > information within the issue:
> >         >
> >         > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92729#c35
> 
> >          >
> >         I haven't looked at the github repo.  But I do have a couple
> >         comments here.
> >
> >         First, the author of the changes (pipcet and saaadhu)
> need to have
> >         copyright assignments on file with the FSF.  Otherwise
> we can
> >         not use
> >         their work at all.
> >
> >         Second, the work needs to be submitted for inclusion.  I
> don't
> >         recall
> >         seeing an official submission from either of them to
> gcc-patches.
> >
> >         I'm definitely curious about the testing setup and
> whether or
> >         not it can
> >         be replicated into our Jenkins setup. 
> >
> >
> >     Where can I find this Jenkins setup?
> >
> >
> > To close this: assuming " into our Jenkins setup" is some redhat
> > internal jenkins setup.
> No, it's public.
>
> http://gcc.gnu.org/jenkins 
>
>
> (sidenote: This resolves on my side to the (insecure)
> http://3.14.90.209:8080/ )
Yup.

>
> Is the source-code of  http://gcc.gnu.org/jenkins
>  available somewhere? I could not locate it.
Jenkins is a project independent of GCC for building continuous
testing/delivery systems.  See http://jenkins.io

jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 12:19 PM, Jakub Jelinek wrote:
> On Mon, Jan 04, 2021 at 12:14:15PM -0700, Jeff Law via Gcc-patches wrote:
>>> Doing the STRING_CST is certainly less fragile since the SSA names
>>> created at gimplification time could even be ggc_freed when no longer
>>> used in the IL.
>> Obviously we can't use SSA_NAMEs as they're specific to each function as
>> they get compiled.  But what's not as clear to me is why we can't use a
>> SAVE_EXPR of the original expression that indicates the size of the
>> parameter.
> The gimplifier is destructive, so if the expressions are partly (e.g. in
> those SAVE_EXPRs) shared with what is in the actual IL, we lose.
> And if they aren't shared and there are side-effects, if we tried to
> gimplify them again we'd get the side-effects duplicated.
> So it all depends on what the code wants to handle, if e.g. just values of
> parameters with simple arithmetics on those and punt on everything else,
> then it is doable, but generally it is not.
I would expect the expressions to be values of parameters (or objects in
static storage) and simple arithemetic on them.  If there's other cases,
punting seems appropriate.

Martin -- are there nontrivial expressions we need to be worried about here?


Jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 12:14:15PM -0700, Jeff Law via Gcc-patches wrote:
> > Doing the STRING_CST is certainly less fragile since the SSA names
> > created at gimplification time could even be ggc_freed when no longer
> > used in the IL.
> Obviously we can't use SSA_NAMEs as they're specific to each function as
> they get compiled.  But what's not as clear to me is why we can't use a
> SAVE_EXPR of the original expression that indicates the size of the
> parameter.

The gimplifier is destructive, so if the expressions are partly (e.g. in
those SAVE_EXPRs) shared with what is in the actual IL, we lose.
And if they aren't shared and there are side-effects, if we tried to
gimplify them again we'd get the side-effects duplicated.
So it all depends on what the code wants to handle, if e.g. just values of
parameters with simple arithmetics on those and punt on everything else,
then it is doable, but generally it is not.

Jakub



Re: [PATCH] explow, aarch64: Fix force-Pmode-to-mem for ILP32 [PR97269]

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 4:46 AM, Richard Sandiford via Gcc-patches wrote:
> This patch fixes a mode/rtx mismatch for ILP32 targets in:
>
> mem = force_const_mem (ptr_mode, imm);
>
> where imm can be Pmode rather than ptr_mode.
>
> The patch uses convert_memory_address to convert the Pmode address
> to ptr_mode before the call.  However, immediate addresses can in
> general contain unspecs, and convert_memory_address wasn't set up
> to handle those.
>
> The patch therefore adds some generic unspec handling to
> convert_memory_address_addr_space_1.  As the comment says, we can add
> a target hook if this behaviour turns out to be wrong for some targets.
> But I think what the patch does is a strict improvement over the status
> quo: without it, we would try to force the unspec into a register,
> but nevertheless wrap the result in a (const ...).  That in turn
> would be invalid rtl and seems bound to generate an ICE later.
>
> I tested the explow.c part using -fstack-protector with local hacks
> to force SYMBOL_FORCE_TO_MEM for UNSPEC_SALT_ADDR.
>
> Fixes c-c++-common/torture/pr57945.c and various other tests.
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK for the explow bits?
>
> Richard
>
>
> gcc/
>   PR target/97269
>   * explow.c (convert_memory_address_addr_space_1): Handle UNSPECs
>   nested in CONSTs.
>   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use
>   convert_memory_address to convert symbolic immediates to ptr_mode
>   before forcing them to memory.
OK.    I can't think of a case where the operand-by-operand conversion
wouldn't work, but as we both know, targets sometimes to "odd" things.  
So a conditional ACK and we'll see how the various targets in the tester
respond.

jeff



Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 5:59 AM, Richard Biener via Gcc-patches wrote:
> On Sun, Dec 20, 2020 at 6:43 PM Martin Sebor  wrote:
>> On 12/19/20 12:48 AM, Richard Biener via Gcc-patches wrote:
>>> On December 19, 2020 1:55:02 AM GMT+01:00, Martin Sebor via Gcc-patches 
>>>  wrote:
 To keep tree expressions stored by the front end in attribute
 access for nontrivial VLA bounds from getting corrupted during
 Gimplification and to avoid breaking the preconditions verified
 by the LTO streamer that no such trees exist in the IL,
 the attached patch replaces those bounds with a string
 representation of those expressions (as STRING_CST).  It also
 tweaks the pretty-printer to improve the formatting of the VLA
 bounds and avoid inserting spurious spaces in some cases.

 The strings are only used by the front end to verify that
 redeclarations of the same function match in the form and bounds
 of their VLA arguments, and they're not needed in the middle end.
 I considered removing them just before the front end finishes but
 I couldn't find an efficient way to do that.  Is there some data
 structure that stores all function declarations in a translation
 unit?  If there is, then traversing it and removing the attribute
 arguments might also be an option, either in addition to this
 change or in lieu of it.
>>> There is the free lang data pass in tree.c which walks all reachable tree 
>>> nodes.
>> You said in response to Honza's suggestion in pr97172 to do that:
>>
>>The frontend needs to make sure no frontend specific tree codes
>>leak into GENERIC so GENERICization is the place where the FE
>>should clear those.  FLD is too late and doing it there would
>>be a hack.
>>
>> Are you now saying you've had a change of heart and that doing it
>> there isn't a hack after all?
> Well, removing a FE specific attribute [when having non-constant arg] might
> be OK there.  But note you asked for a "pass over all tree nodes" thing
> and I just pointed out free-lang-data.
>
> Doing the STRING_CST is certainly less fragile since the SSA names
> created at gimplification time could even be ggc_freed when no longer
> used in the IL.
Obviously we can't use SSA_NAMEs as they're specific to each function as
they get compiled.  But what's not as clear to me is why we can't use a
SAVE_EXPR of the original expression that indicates the size of the
parameter.


Jeff



[PING][PATCH] correct -Wmismatched-new-delete for template instantiations (PR 98305)

2021-01-04 Thread Martin Sebor via Gcc-patches

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562141.html

On 12/16/20 7:10 PM, Martin Sebor wrote:

The -Wmismatched-new-delete detection of operator members of class
template instantiations is incomplete and overly simplistic, leading
to incorrect results and false positives.  Rather than reinventing
the wheel and parsing the mangled qualified names of the operators
the attached patch uses the demangler to compare their names.  Since
the code is only entered rarely (for user- defined overloads of
the operators, the cost of the demangling should be negligible in
most code bases).

Tested on x86_64-linux.

Martin




Re: [committed] libgomp: Avoid bad "up" link in libgomp docs

2021-01-04 Thread Nathan Sidwell

On 12/28/20 6:00 PM, Gerald Pfeifer wrote:

A bit ago I asked Sandra for advise on how to get rid of the bad
"up" link in our libgomp docs, and she kindly provided the patch
below which I now pushed.

Is there something I could have done differently so that Author:
in Git lists her instead of me?  Or should Author: list the committer?


add --author='sandra  to the git commit line.  You can 
also add --signoff there to add your stamp of approval


nathan

--
Nathan Sidwell


cxx status update

2021-01-04 Thread Nathan Sidwell
Here's a patch describing the c++20 modules status.  While there I 
noticed the coroutines project status was out of date (it's done).


ok?

nathan
--
Nathan Sidwell
diff --git i/htdocs/gcc-11/changes.html w/htdocs/gcc-11/changes.html
index f457b7e7..e044d710 100644
--- i/htdocs/gcc-11/changes.html
+++ w/htdocs/gcc-11/changes.html
@@ -228,7 +228,8 @@ a work-in-progress.
   The default mode has been changed to -std=gnu++17.
   Several C++20 features have been implemented:
 
-  the compiler now supports consteval virtual functions
+  the compiler now supports consteval virtual
+	functions
   P2082R1, Fixing CTAD for aggregates
   P0593R6, Pseudo-destructors end object lifetimes
   P1907R1, Inconsistencies with non-type template parameters (complete
@@ -236,6 +237,9 @@ a work-in-progress.
   P1975R0, Fixing the wording of parenthesized aggregate-initialization
   P1009R2, Array size deduction in new-expressions
   P1099R5, using enum
+  Modules, Requires -fmodules-ts and some aspects
+  are incomplete.  Refer
+  to C++ 20 Status
 
   
   Several C++ Defect Reports have been resolved, e.g.:
diff --git i/htdocs/projects/cxx-status.html w/htdocs/projects/cxx-status.html
index 403d6740..d8e6d8f8 100644
--- i/htdocs/projects/cxx-status.html
+++ w/htdocs/projects/cxx-status.html
@@ -418,43 +418,65 @@

 
 
-   Modules 
+   Modules 
   https://wg21.link/p1103r3";>P1103R3
-  No (Modules Wiki) 
-   
+  11 (requires -fmodules-ts)
+	(No Private Module Fragment,
+	Parser-level Global Module Entity Merging,
+	Global Module Implications of extern "C/C++",
+	or Partition-specific Definition Visibility) 
+   __cpp_modules >= 201810L (Date of p1103r3)
 
 
-  https://wg21.link/p1766r1";>P1766R1
+  https://wg21.link/p1766r1";>P1766R1 
+   No 
+   
 
 
   https://wg21.link/p1811r0";>P1811R0
+  11
+   
 
 
-  https://wg21.link/p1703r1";>P1703R1
+  https://wg21.link/p1703r1";>P1703R1 (superceded by p1857)
+  11
+   
 
 
   
   https://wg21.link/p1874r1";>P1874R1
+  11
+   
 
 
   
   https://wg21.link/p1979r0";>P1979R0
+  11
+   
 
 
   
   https://wg21.link/p1779r3";>P1779R3
+  11
+   
 
 
   
   https://wg21.link/p1857r3";>P1857R3
+  11
+   
 
 
   
   https://wg21.link/p2115r0";>P2115R0
+  11
+   
 
 
   
   https://wg21.link/p1815r2";>P1815R2
+   No 
+   
 
 
Coroutines 
@@ -887,16 +909,16 @@
 
Coroutines 
   https://wg21.link/n4649";>N4649
-   https://gcc.gnu.org/wiki/cxx-coroutines";>In progress 
-  
-  
+   10 
+  -fcoroutines
+  __cpp_impl_coroutine >= 201902L
 
 
Modules 
   https://wg21.link/n4720";>N4720
-   https://gcc.gnu.org/wiki/cxx-modules";>In progress 
-  
-  
+   11 
+  -fmodules-ts
+  __cpp_modules >= 201810L
 
   
 


Re: docs: Fix wording describing the hwaddress sanitizer

2021-01-04 Thread Sandra Loosemore

On 1/4/21 4:15 AM, Matthew Malcomson wrote:

The original documentation added to mention the clash between
-fsanitize=address and -fsanitize=hwaddress used confusing wording trying to
say that -fsanitize=hwaddress is only available on AArch64.

It read as if -fsanitize=address were only supported on AArch64.

This patch fixes that wording by being more explicit.

gcc/ChangeLog:

PR other/98437
* doc/invoke.texi (-fsanitize=address): Fix wording describing
clash with -fsanitize=hwaddress.


Looks good to me.

-Sandra


Re: [PATCH v4] rs6000, vector integer multiply/divide/modulo instructions

2021-01-04 Thread Carl Love via Gcc-patches
Segher, Will:

Just wanted to ping you both on this patch.  It has been out there for
awhile.

  Carl

On Mon, 2020-12-07 at 16:31 -0800, Carl Love wrote:
> Will:
> 
> I have addressed you comments with regards to the Change Log
> entries.  
> 
> The extra define vec_div was removed.
> 
> Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.
> 
> The extra MULLD_V2DI case statement entry was removed.
> 
> Added comment in rs6000.md about size for vector types per discussion
> with Pat.
> 
>   Carl
> 
> 
> GCC maintainers:
> 
> The following patch adds new builtins for the vector integer
> multiply,
> divide and modulo operations.  The builtins are: vec_mulh(),
> vec_dive(), vec_mod() for signed and unsigned integers and long
> longintegers. The existing support for the vec_div()and vec_mul()
> builtins emulate the vector operations with multiple scalar
> instructions.  This patch adds support for these builtins using the
> new
> vector instructions for Power 10.
> 
> The patch was compiled and tested on:
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl Love
> 
> -
> 
> From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00
> 2001
> From: Carl Love 
> Date: Fri, 4 Sep 2020 19:24:22 -0500
> Subject: [PATCH] rs6000, vector integer multiply/divide/modulo
> instructions
> 
> 2020-12-07  Carl Love  
> 
> gcc/
>   * config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): New
> defines.
>   * config/rs6000/altivec.md (VIlong): Move define to file
> vsx.md.
>   * config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
>   DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
>   DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
>   MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
>   Add builtin define.
>   (MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
>   * config/rs6000/rs6000-call.c (altivec_overloaded_builtins):
> Add
>   VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
>   P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD,
> P10_BUILTIN_VEC_VMULH
>   overloaded definitions.
>   (builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
>   P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
>   P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
>   P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
>   P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
>   statements for builtins.
>   * config/rs6000/rs6000.md (bits): Add new attribute sizes.
>   * config/rs6000/vsx.md (VIlong): New define_mode_iterator.
>   (UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
>   (vsx_mul_v2di): Add if TARGET_POWER10 statement.
>   (vsx_udiv_v2di): Add if TARGET_POWER10 statement.
>   (dives_, diveu_, div3, uvdiv3,
>   mods_, modu_, mulhs_, mulhu_,
> mulv2di3):
>   Add define_insn, mode is VIlong.
>   doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive,
> vec_mod): Add
>   builtin descriptions.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h   |   4 +
>  gcc/config/rs6000/altivec.md  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def  |  22 +
>  gcc/config/rs6000/rs6000-call.c   |  53 +++
>  gcc/config/rs6000/rs6000.md   |   4 +-
>  gcc/config/rs6000/vsx.md  | 212 +++---
>  gcc/doc/extend.texi   | 120 ++
>  .../powerpc/builtins-1-p10-runnable.c | 398
> ++
>  8 files changed, 762 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..b678e5cf28d 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)   __builtin_vec_strir_p (a)
>  #define vec_stril_p(a)   __builtin_vec_stril_p (a)
>  
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md

c++: Add stdlib module test cases

2021-01-04 Thread Nathan Sidwell

The remaining modules tests use the std library.  These are those.

gcc/testsuite/
* g++.dg/modules/binding-1_a.H: New.
* g++.dg/modules/binding-1_b.H: New.
* g++.dg/modules/binding-1_c.C: New.
* g++.dg/modules/binding-2.H: New.
* g++.dg/modules/builtin-3_a.C: New.
* g++.dg/modules/global-2_a.C: New.
* g++.dg/modules/global-2_b.C: New.
* g++.dg/modules/global-3_a.C: New.
* g++.dg/modules/global-3_b.C: New.
* g++.dg/modules/hello-1_a.C: New.
* g++.dg/modules/hello-1_b.C: New.
* g++.dg/modules/iostream-1_a.H: New.
* g++.dg/modules/iostream-1_b.C: New.
* g++.dg/modules/part-5_a.C: New.
* g++.dg/modules/part-5_b.C: New.
* g++.dg/modules/part-5_c.C: New.
* g++.dg/modules/stdio-1_a.H: New.
* g++.dg/modules/stdio-1_b.C: New.
* g++.dg/modules/string-1_a.H: New.
* g++.dg/modules/string-1_b.C: New.
* g++.dg/modules/string-view1.C: New.
* g++.dg/modules/string-view2.C: New.
* g++.dg/modules/tinfo-1.C: New.
* g++.dg/modules/tinfo-2_a.H: New.
   * g++.dg/modules/tinfo-2_b.C: New.
* g++.dg/modules/tname-spec-1_a.H: New.
* g++.dg/modules/tname-spec-1_b.C: New.
* g++.dg/modules/xtreme-header-1.h: New.
* g++.dg/modules/xtreme-header-1_a.H: New.
* g++.dg/modules/xtreme-header-1_b.C: New.
* g++.dg/modules/xtreme-header-1_c.C: New.
* g++.dg/modules/xtreme-header-2.h: New.
* g++.dg/modules/xtreme-header-2_a.H: New.
* g++.dg/modules/xtreme-header-2_b.C: New.
* g++.dg/modules/xtreme-header-2_c.C: New.
* g++.dg/modules/xtreme-header-3.h: New.
* g++.dg/modules/xtreme-header-3_a.H: New.
* g++.dg/modules/xtreme-header-3_b.C: New.
* g++.dg/modules/xtreme-header-3_c.C: New.
* g++.dg/modules/xtreme-header-4.h: New.
* g++.dg/modules/xtreme-header-4_a.H: New.
* g++.dg/modules/xtreme-header-4_b.C: New.
* g++.dg/modules/xtreme-header-4_c.C: New.
* g++.dg/modules/xtreme-header-5.h: New.
* g++.dg/modules/xtreme-header-5_a.H: New.
* g++.dg/modules/xtreme-header-5_b.C: New.
* g++.dg/modules/xtreme-header-5_c.C: New.
* g++.dg/modules/xtreme-header-6.h: New.
* g++.dg/modules/xtreme-header-6_a.H: New.
* g++.dg/modules/xtreme-header-6_b.C: New.
* g++.dg/modules/xtreme-header-6_c.C: New.
* g++.dg/modules/xtreme-header.h: New.
* g++.dg/modules/xtreme-header_a.H: New.
* g++.dg/modules/xtreme-header_b.C: New.
* g++.dg/modules/xtreme-tr1.h: New.
* g++.dg/modules/xtreme-tr1_a.H: New.
* g++.dg/modules/xtreme-tr1_b.C: New.


--
Nathan Sidwell


module-stdlib-tests.diff.gz
Description: application/gzip


Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Richard Sandiford via Gcc-patches
Andreas Schwab  writes:
> On Jan 04 2021, Richard Sandiford wrote:
>
>> Andreas Schwab  writes:
>>> That doesn't build with gcc 4.8:
>>
>> Which subversion are you using?
>
> This is 4.8.1.

Hmm, OK.  I guess that raises the question whether “supporting GCC 4.8”
means supporting every patchlevel, or just the latest.

Richard


Re: [07/23] Add a class that multiplexes two pointer types

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/17/20 8:44 AM, Nathan Sidwell wrote:
> On 12/17/20 10:38 AM, Richard Sandiford via Gcc-patches wrote:
>> Tom Tromey  writes:
 "Richard" == Richard Sandiford via Gcc-patches
  writes:
>>>
>>> Richard> +// A class that stores a choice "A or B", where A has type
>>> T1 * and B has
>>> Richard> +// type T2 *.  Both T1 and T2 must have an alignment
>>> greater than 1, since
>>> Richard> +// the low bit is used to identify B over A.  T1 and T2
>>> can be the same.
>>>
>>> It seems like the alignment requirement could be static_assert'd, which
>>> would make using this class a bit safer.
>>
>> Yeah, I wondered about doing that, but in principle there's nothing
>> to stop people using the class for something like a char*, provided that
>> the start is suitably aligned.  So having gcc_checking_assert is a
>> compromise: it only provides run-time checking rather than compile-time
>> checking, but it doesn't place any artificial limits on how the class
>> can be used.
>
> I don;t know whether our optimizers are smart enough (credulous
> enough?) to know that the low bits of any pointer to an aligned type
> must be zero, and therefore optimize the assert away in those cases?
>
> If they're not something like
>
>  assert (align (T) > 1 || !(reinterpret_cast (ptr) & 1))
I believe both gimple optimizers and the RTL optimizers are aware of the
restrictions on the low bits for STRICT_ALIGNMENT targets and should be
able to optimize the assert away.

jeff



Re: [20/23] rtlanal: Add simple_regno_set

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/16/20 5:47 PM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 11/13/20 1:21 AM, Richard Sandiford via Gcc-patches wrote:
>>> This patch adds a routine for finding a “simple” SET for a register
>>> definition.  See the comment in the patch for details.
>>>
>>> gcc/
>>> * rtl.h (simple_regno_set): Declare.
>>> * rtlanal.c (simple_regno_set): New function.
>> So I was a bit confused that this is supposed to reject read-write, but
>> what it's really rejecting is a narrow subset of read-write.  In
>> particular it rejects things that are potentially RMW via subregs. It
>> doesn't prevent the destination from appearing as a source operand.  You
>> might consider clarifying the comment.
> Yeah, in hindsight it was a mistake to spell out the RMW point
> separately when it was really just an extra condition on the subreg.
>
> I'd tweaked this comment and the mux-utils.h one (in response
> to Martin's feedback) while doing the cross-target testing,
> but forgot to include the changes to the committed version.
> (The tested versions were otherwise identical, honest.)
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
>   * mux-utils.h (pointer_mux::m_ptr): Tweak description of contents.
>   * rtlanal.c (simple_regno_set): Tweak description to clarify the
>   RMW condition.
OK
jeff



Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Jeff Law via Gcc-patches



On 12/16/20 5:29 PM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 11/13/20 1:15 AM, Richard Sandiford via Gcc-patches wrote:
>>> We already have two splay tree implementations: the old C one in
>>> libiberty and a templated reimplementation of it in typed-splay-tree.h.
>>> However, they have some drawbacks:
>>>
>>> - They hard-code the assumption that nodes should have both a key and
>>>   a value, which isn't always true.
>>>
>>> - They use the two-phase method of lookup, and so nodes need to store
>>>   a temporary back pointer.  We can avoid that overhead by using the
>>>   top-down method (as e.g. the bitmap tree code already does).
>>>
>>> - The tree node has to own the key and the value.  For some use cases
>>>   it's more convenient to embed the tree links in the value instead.
>>>
>>> Also, a later patch wants to use splay trees to represent an
>>> adaptive total order: the splay tree itself records whether node N1
>>> is less than node N2, and (in the worst case) comparing nodes is
>>> a splay operation.
>>>
>>> This patch therefore adds an alternative implementation.  The main
>>> features are:
>>>
>>> - Nodes can optionally point back to their parents.
>>>
>>> - An Accessors class abstracts accessing child nodes and (where
>>>   applicable) parent nodes, so that the information can be embedded
>>>   in larger data structures.
>>>
>>> - There is no fixed comparison function at the class level.  Instead,
>>>   individual functions that do comparisons take a comparison function
>>>   argument.
>>>
>>> - There are two styles of comparison function, optimised for different
>>>   use cases.  (See the comments in the patch for details.)
>>>
>>> - It's possible to do some operations directly on a given node,
>>>   without knowing whether it's the root.  This includes the comparison
>>>   use case described above.
>>>
>>> This of course has its own set of drawbacks.  It's really providing
>>> splay utility functions rather than a true ADT, and so is more low-level
>>> than the existing routines.  It's mostly geared for cases in which the
>>> client code wants to participate in the splay operations to some extent.
>>>
>>> gcc/
>>> * Makefile.in (OBJS): Add splay-tree-utils.o.
>>> * system.h: Include  when INCLUDE_ARRAY is defined.
>>> * selftest.h (splay_tree_cc_tests): Declare.
>>> * selftest-run-tests.c (selftest::run_tests): Run splay_tree_cc_tests.
>>> * splay-tree-utils.h: New file.
>>> * splay-tree-utils.tcc: Likewise.
>>> * splay-tree-utils.cc: Likewise.
>> I must admit, I'm not a fan of adding another splay tree.  Though I
>> suspect the one in libiberty will be there forever since there could
>> well be clients outside our source base.
>>
>> The typed_splay_tree implementation however is internal to GCC and only
>> has a couple users (the JIT and fixit hints).  Is there any chance we
>> could convert those two users to the new one?  Your cover hints that's
>> not the case, but I'm going to explicitly ask :-)
> Yeah, I agree it's not great to have three versions.  I had a look at
> converting the uses of typed_splay_tree, and all of them seem to be a
> natural fit for the new scheme.  In particular, although typed_splay_tree
> maps keys to values, in practice the keys are already part of the values.
>
> However, I think a natural conversion would need a couple of new helpers
> for “get or insert” type operations.  Would it be OK to wait until GCC 12
> stage 1 for that?
Yea, at this point deferring the conversion to gcc-12 seems to make the
most sense.
jeff



Re: [RFC] middle-end: Extend CSE to understand vector extracts.

2021-01-04 Thread Jeff Law via Gcc-patches



On 1/4/21 7:13 AM, Richard Biener wrote:
> On Mon, 4 Jan 2021, Tamar Christina wrote:
>
>> Hi Richi, 
>>
>>> -Original Message-
>>> From: Richard Biener 
>>> Sent: Monday, January 4, 2021 1:33 PM
>>> To: Tamar Christina 
>>> Cc: gcc-patches@gcc.gnu.org; nd ; i...@airs.com;
>>> l...@redhat.com
>>> Subject: Re: [RFC] middle-end: Extend CSE to understand vector extracts.
>>>
>>> On Mon, 4 Jan 2021, Tamar Christina wrote:
>>>
 Hi All,

 I am trying to get CSE to re-use constants already inside a vector
 rather than re-materializing the constant again.

 Basically consider the following case:

 #include 
 #include 

 uint64_t
 test (uint64_t a, uint64x2_t b, uint64x2_t* rt) {
   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
   uint64_t res = a | arr[0];
   uint64x2_t val = vld1q_u64 (arr);
   *rt = vaddq_u64 (val, b);
   return res;
 }

 The actual behavior is inconsequential however notice that the same
 constants are used in the vector (arr and later val) and in the 
 calculation of
>>> res.
 The code we generate for this however is quite sub-optimal:

 test:
 adrpx2, .LC0
 sub sp, sp, #16
 ldr q1, [x2, #:lo12:.LC0]
 mov x2, 16502
 movkx2, 0x1023, lsl 16
 movkx2, 0x4308, lsl 32
 add v1.2d, v1.2d, v0.2d
 movkx2, 0x942, lsl 48
 orr x0, x0, x2
 str q1, [x1]
 add sp, sp, 16
 ret
 .LC0:
 .xword  667169396713799798
 .xword  667169396713799798

 Essentially we materialize the same constant twice.  The reason for
 this is because the front-end lowers the constant extracted from arr[0]
>>> quite early on.
 If you look into the result of fre you'll find

:
   arr[0] = 667169396713799798;
   arr[1] = 667169396713799798;
   res_7 = a_6(D) | 667169396713799798;
   _16 = __builtin_aarch64_ld1v2di (&arr);
   _17 = VIEW_CONVERT_EXPR(_16);
   _11 = b_10(D) + _17;
   *rt_12(D) = _11;
   arr ={v} {CLOBBER};
   return res_7;

 Which makes sense for further optimization.  However come expand time
 if the constant isn't representable in the target arch it will be
 assigned to a register again.

 (insn 8 5 9 2 (set (reg:V2DI 99)
 (const_vector:V2DI [
 (const_int 667169396713799798 [0x942430810234076]) 
 repeated x2
 ])) "cse.c":7:12 -1
  (nil))
 ...
 (insn 14 13 15 2 (set (reg:DI 103)
 (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
  (nil))
 (insn 15 14 16 2 (set (reg:DI 102 [ res ])
 (ior:DI (reg/v:DI 96 [ a ])
 (reg:DI 103))) "cse.c":8:12 -1
  (nil))

 And since it's out of the immediate range of the scalar instruction
 used combine won't be able to do anything here.

 This will then trigger the re-materialization of the constant twice.

 So I figured the best place to handle this is in CSE since in some
 uArch it's far cheaper to extract a constant from a vector than to 
 materialize
>>> it.
 Particularly doing it pre-RA has the benefit of allowing RA to decide
 whether it needs to move the constant between register files or not as
 some uArch can perform scalar operation both on the SIMD and GENREG
>>> side.
 The issue is I don't know that much about CSE.  I have been reading
 through the source and think I have a basic understanding of how it
 works but this email is to see if I'm on the right track or not (to
 something that is acceptable upstream).

 My current patch for CSE is:

 diff --git a/gcc/cse.c b/gcc/cse.c
 index 36bcfc354d8..3cee53bed85 100644
 --- a/gcc/cse.c
 +++ b/gcc/cse.c
 @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
  #include "regs.h"
  #include "function-abi.h"
 +#include "expr.h"

  /* The basic idea of common subexpression elimination is to go
 through the code, keeping a record of expressions that would @@
 -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
  someplace else, so it isn't worth cse'ing.  */
else if (GET_CODE (SET_SRC (x)) == CALL)
 ;
 +  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
 +   {
 + /* First register the vector itself.  */
 + sets[n_sets++].rtl = x;
 + rtx src = SET_SRC (x);
 + machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
 +  /* Go over the constants of the CONST_VECTOR in forward order, 
 to
 +put them in the same order in the SETS array.  */
 + for (

Re: [committed] doc: Remove HSAIL from Language Standards

2021-01-04 Thread Martin Jambor
Hi Gerald,

On Tue, Dec 29 2020, Gerald Pfeifer wrote:
> On Tue, 29 Dec 2020, Martin Jambor wrote:
>>> commit 7e999bd84f47205dc44b0f2dc90b53b3c888ca48
>>> Author: Gerald Pfeifer 
>>> Date:   Mon Dec 28 21:41:55 2020 +0100
>>>
>>> doc: Remove HSAIL from Language Standards
>>> 
>>> Support for HSAIL has been deprecated with GCC 10 and their web server
>>> has been down for weeks.
>
> Please note the above.

HSAIL front-end has not been deprecated, at least as far as I know.
Look at the last paragraph in
https://gcc.gnu.org/pipermail/gcc/2020-April/000330.html

I trust you that HSA Foundation's web server was down for weeks but it
is not down now, http://www.hsafoundation.com/standards/ loads for me
fine and "HSA Programmer Reference Manual Specification 1.01" available
from that page describes the HSAIL that the FE implements.

>
>> The HSAIL/BRIG consuming front-end has not been removed, as opposed to
>> the HSAIL/BRIG emitting back-end which was, and is still part of the
>> compiler, so this should not have been applied, I'm afraid.
>
> This section in the documentation is about language standards and
> reference to those standards.  hsafoundation.com literally has been
> dead for a while, and there is not (evident) other location where
> the standard appears vailable.
>
> Also, wasn't the latest version 1.0.3 whereas our doc snippet still
> referred to 1.0.1.

The last one is 1.2 but our FE only supports 1.0.1.  I have forgotten
the differences between the two and am too lazy to go dig for them now
but the reference to the standard version was correct.

>
>> Given the state of HSAIL, the front-end is probably not very useful
>> either and so we may remove it in near term too, but it seems to still
>> mostly work and causes no trouble, so at least for now it stayed.
>> 
>> Can you please revert this?
>
> A simple revert does not appear appropriate, but if you have a strong
> preference and see a benefit for our users, we surely can put something 
> back in again.  
>
> In that case, should the version updated to 1.0.3?  Apart from that 
> and the defunct web site, what are other changes compared to the status 
> pre commit 7e999bd84f47205dc44b0f2dc90b53b3c888ca48 ?
>
> Can you propose some text / changes taking into account the deprecation
> and related developments?  Can be rough, and I'll take care of the patch.

Given that hsafoundation.com works and the version of the standard
referenced in the removed text was correct, I still think that simply
reverting the patch is the simplest and correct thing to do.

Given that nobody bothered to update the FE to HSAIL 1.2 (which is 2.5
years old) and it is unlikely to have many users, maybe it is time to
deprecate the FE in GCC 11 (I guess it is not a promise to remove it in
12), but that is a different question.

Thanks,

Martin



[PATCH] PING implement pre-c++20 contracts

2021-01-04 Thread Jeff Chapman via Gcc-patches
Ping. re:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561135.html

> OK, I'll start with -alt then, thanks.
>
> Andrew is exactly correct, contracts-jac-alt is still the current branch
> we're focusing our upstreaming efforts on.
>
> It's trailing upstream master by a fair bit at this point. I'll get a
> merge pushed shortly.
>

The latest is still on the same branch, which hasn't been updated since
that last merge:
https://github.com/lock3/gcc/tree/contracts-jac-alt

Would you prefer me to keep it from trailing upstream too much through
regular merges, or would it be more beneficial for it to be left alone so
you have a more stable review target?

Please let me know if there's initial feedback I can start addressing, or
anything I can do to help the review process along in general.

Thank you,
Jeff Chapman


Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Andreas Schwab
On Jan 04 2021, Richard Sandiford wrote:

> Andreas Schwab  writes:
>> That doesn't build with gcc 4.8:
>
> Which subversion are you using?

This is 4.8.1.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Richard Sandiford via Gcc-patches
Andreas Schwab  writes:
> That doesn't build with gcc 4.8:

Which subversion are you using?  It works for me with stock gcc 4.8.5,
which is what I'd used to test the series for C++ compatiblity.

Richard

>
> In file included from ../../gcc/splay-tree-utils.h:491:0,
>  from ../../gcc/rtl-ssa.h:45,
>  from ../../gcc/fwprop.c:29:
> ../../gcc/splay-tree-utils.tcc:24:1: error: prototype for 'typename 
> base_splay_tree::node_type 
> base_splay_tree::get_child(typename Accessors::node_type, unsigned 
> int)' does not match any in class 'base_splay_tree'
>  base_splay_tree::get_child (node_type node, unsigned int index)
>  ^
> In file included from ../../gcc/rtl-ssa.h:45:0,
>  from ../../gcc/fwprop.c:29:
> ../../gcc/splay-tree-utils.h:125:20: error: candidate is: static typename 
> Accessors::node_type base_splay_tree::get_child(typename 
> Accessors::node_type, unsigned int)
>static node_type get_child (node_type, unsigned int);
> ^
>
> Andreas.


[libcody] Remove some std::move [PR 98368]

2021-01-04 Thread Nathan Sidwell

Compiling on clang showed a couple of pessimizations.  Fixed thusly.

libcody/
* client.cc (Client::ProcessResponse): Remove std::move
inside ?:
c++tools/
* resolver.cc (module_resolver::cmi_response): Remove
std::move of temporary.

--
Nathan Sidwell
diff --git i/c++tools/resolver.cc w/c++tools/resolver.cc
index c8d31153574..ef08de53072 100644
--- i/c++tools/resolver.cc
+++ w/c++tools/resolver.cc
@@ -226,9 +226,7 @@ module_resolver::cmi_response (Cody::Server *s, std::string &module)
   auto iter = map.find (module);
   if (iter == map.end ())
 {
-  std::string file;
-  if (default_map)
-	file = std::move (GetCMIName (module));
+  std::string file = default_map ? GetCMIName (module) : std::string ();
   auto res = map.emplace (module, file);
   iter = res.first;
 }
diff --git i/libcody/client.cc w/libcody/client.cc
index edfe44d34b2..ae69d190cb7 100644
--- i/libcody/client.cc
+++ w/libcody/client.cc
@@ -122,8 +122,7 @@ Packet Client::ProcessResponse (std::vector &words,
   Assert (!words.empty ());
   if (words[0] == u8"ERROR")
 return Packet (Client::PC_ERROR,
-		  std::move (words.size () == 2 ? words[1]
-			 : u8"malformed error response"));
+		   words.size () == 2 ? words[1]: u8"malformed error response");
 
   if (isLast && !read.IsAtEnd ())
 return Packet (Client::PC_ERROR,


RE: [RFC] middle-end: Extend CSE to understand vector extracts.

2021-01-04 Thread Richard Biener
On Mon, 4 Jan 2021, Tamar Christina wrote:

> Hi Richi, 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, January 4, 2021 1:33 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; i...@airs.com;
> > l...@redhat.com
> > Subject: Re: [RFC] middle-end: Extend CSE to understand vector extracts.
> > 
> > On Mon, 4 Jan 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > I am trying to get CSE to re-use constants already inside a vector
> > > rather than re-materializing the constant again.
> > >
> > > Basically consider the following case:
> > >
> > > #include 
> > > #include 
> > >
> > > uint64_t
> > > test (uint64_t a, uint64x2_t b, uint64x2_t* rt) {
> > >   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
> > >   uint64_t res = a | arr[0];
> > >   uint64x2_t val = vld1q_u64 (arr);
> > >   *rt = vaddq_u64 (val, b);
> > >   return res;
> > > }
> > >
> > > The actual behavior is inconsequential however notice that the same
> > > constants are used in the vector (arr and later val) and in the 
> > > calculation of
> > res.
> > >
> > > The code we generate for this however is quite sub-optimal:
> > >
> > > test:
> > > adrpx2, .LC0
> > > sub sp, sp, #16
> > > ldr q1, [x2, #:lo12:.LC0]
> > > mov x2, 16502
> > > movkx2, 0x1023, lsl 16
> > > movkx2, 0x4308, lsl 32
> > > add v1.2d, v1.2d, v0.2d
> > > movkx2, 0x942, lsl 48
> > > orr x0, x0, x2
> > > str q1, [x1]
> > > add sp, sp, 16
> > > ret
> > > .LC0:
> > > .xword  667169396713799798
> > > .xword  667169396713799798
> > >
> > > Essentially we materialize the same constant twice.  The reason for
> > > this is because the front-end lowers the constant extracted from arr[0]
> > quite early on.
> > > If you look into the result of fre you'll find
> > >
> > >:
> > >   arr[0] = 667169396713799798;
> > >   arr[1] = 667169396713799798;
> > >   res_7 = a_6(D) | 667169396713799798;
> > >   _16 = __builtin_aarch64_ld1v2di (&arr);
> > >   _17 = VIEW_CONVERT_EXPR(_16);
> > >   _11 = b_10(D) + _17;
> > >   *rt_12(D) = _11;
> > >   arr ={v} {CLOBBER};
> > >   return res_7;
> > >
> > > Which makes sense for further optimization.  However come expand time
> > > if the constant isn't representable in the target arch it will be
> > > assigned to a register again.
> > >
> > > (insn 8 5 9 2 (set (reg:V2DI 99)
> > > (const_vector:V2DI [
> > > (const_int 667169396713799798 [0x942430810234076]) 
> > > repeated x2
> > > ])) "cse.c":7:12 -1
> > >  (nil))
> > > ...
> > > (insn 14 13 15 2 (set (reg:DI 103)
> > > (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 
> > > -1
> > >  (nil))
> > > (insn 15 14 16 2 (set (reg:DI 102 [ res ])
> > > (ior:DI (reg/v:DI 96 [ a ])
> > > (reg:DI 103))) "cse.c":8:12 -1
> > >  (nil))
> > >
> > > And since it's out of the immediate range of the scalar instruction
> > > used combine won't be able to do anything here.
> > >
> > > This will then trigger the re-materialization of the constant twice.
> > >
> > > So I figured the best place to handle this is in CSE since in some
> > > uArch it's far cheaper to extract a constant from a vector than to 
> > > materialize
> > it.
> > >
> > > Particularly doing it pre-RA has the benefit of allowing RA to decide
> > > whether it needs to move the constant between register files or not as
> > > some uArch can perform scalar operation both on the SIMD and GENREG
> > side.
> > >
> > > The issue is I don't know that much about CSE.  I have been reading
> > > through the source and think I have a basic understanding of how it
> > > works but this email is to see if I'm on the right track or not (to
> > > something that is acceptable upstream).
> > >
> > > My current patch for CSE is:
> > >
> > > diff --git a/gcc/cse.c b/gcc/cse.c
> > > index 36bcfc354d8..3cee53bed85 100644
> > > --- a/gcc/cse.c
> > > +++ b/gcc/cse.c
> > > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
> > > #include "rtl-iter.h"
> > >  #include "regs.h"
> > >  #include "function-abi.h"
> > > +#include "expr.h"
> > >
> > >  /* The basic idea of common subexpression elimination is to go
> > > through the code, keeping a record of expressions that would @@
> > > -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
> > >  someplace else, so it isn't worth cse'ing.  */
> > >else if (GET_CODE (SET_SRC (x)) == CALL)
> > > ;
> > > +  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
> > > +   {
> > > + /* First register the vector itself.  */
> > > + sets[n_sets++].rtl = x;
> > > + rtx src = SET_SRC (x);
> > > + machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
> > > +  /* Go over the constants of the CONST_VECTOR in forward order, 
>

Re: [PATCH] vect, aarch64: Fix alignment units for IFN_MASK* [PR95401]

2021-01-04 Thread Richard Biener via Gcc-patches
On Mon, Jan 4, 2021 at 12:50 PM Richard Sandiford via Gcc-patches
 wrote:
>
> The IFN_MASK* functions take two leading arguments: a load or
> store pointer and a “cookie”.  The type of the cookie is the
> type of the access for TBAA purposes (like for MEM_REFs)
> while the value of the cookie is the alignment of the access.
> This PR was caused by a disagreement about whether the alignment
> is measured in bits or bytes.
>
> It looks like this goes back to PR68786, which made the
> vectoriser create its own cookie argument rather than reusing
> the one created by ifcvt.  The alignment value of the new cookie
> was measured in bytes (as needed by set_ptr_info_alignment)
> while the existing code expected it to be measured in bits.
> The folds I added for IFN_MASK_LOAD and STORE then made
> things worse.
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK for the explow bits?

OK.

Richard.

> Richard
>
>
> gcc/
> PR tree-vectorization/95401
> * config/aarch64/aarch64-sve-builtins.cc
> (gimple_folder::load_store_cookie): Use bits rather than bytes
> for the alignment argument to IFN_MASK_LOAD and IFN_MASK_STORE.
> * gimple-fold.c (gimple_fold_mask_load_store_mem_ref): Likewise.
> * tree-vect-stmts.c (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
>
> gcc/testsuite/
> PR tree-vectorization/95401
> * g++.dg/vect/pr95401.cc: New test.
> * g++.dg/vect/pr95401a.cc: Likewise.
> ---
>  gcc/config/aarch64/aarch64-sve-builtins.cc |  2 +-
>  gcc/gimple-fold.c  |  2 +-
>  gcc/testsuite/g++.dg/vect/pr95401.cc   | 13 +
>  gcc/testsuite/g++.dg/vect/pr95401a.cc  | 13 +
>  gcc/tree-vect-stmts.c  | 14 --
>  5 files changed, 36 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/vect/pr95401.cc
>  create mode 100644 gcc/testsuite/g++.dg/vect/pr95401a.cc
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index e73aa9ad8a9..6589438855a 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -2580,7 +2580,7 @@ gimple_folder::fold_contiguous_base (gimple_seq &stmts, 
> tree vectype)
>  tree
>  gimple_folder::load_store_cookie (tree type)
>  {
> -  return build_int_cst (build_pointer_type (type), TYPE_ALIGN_UNIT (type));
> +  return build_int_cst (build_pointer_type (type), TYPE_ALIGN (type));
>  }
>
>  /* Fold the call to a call to INSTANCE, with the same arguments.  */
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 3148c6b84d9..1f3b7d0818d 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -5201,7 +5201,7 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree 
> vectype)
>if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>  return NULL_TREE;
>
> -  unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align) * BITS_PER_UNIT;
> +  unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
>if (TYPE_ALIGN (vectype) != align)
>  vectype = build_aligned_type (vectype, align);
>tree offset = build_zero_cst (TREE_TYPE (alias_align));
> diff --git a/gcc/testsuite/g++.dg/vect/pr95401.cc 
> b/gcc/testsuite/g++.dg/vect/pr95401.cc
> new file mode 100644
> index 000..6a56dab0957
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr95401.cc
> @@ -0,0 +1,13 @@
> +// { dg-additional-options "-mavx2 -O3" { target avx2_runtime } }
> +// { dg-additional-sources pr95401a.cc }
> +
> +extern int var_9;
> +extern unsigned var_14;
> +extern int arr_16[];
> +#include 
> +void test() {
> +  for (short a = 0; a < (short)var_9; a += 12140)
> +for (short b = 0; b < 8; b++)
> +  if (std::max(var_14, 1U))
> +arr_16[a + b] = 0;
> +}
> diff --git a/gcc/testsuite/g++.dg/vect/pr95401a.cc 
> b/gcc/testsuite/g++.dg/vect/pr95401a.cc
> new file mode 100644
> index 000..71b054c7621
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr95401a.cc
> @@ -0,0 +1,13 @@
> +// { dg-do compile }
> +
> +#include "../../gcc.dg/vect/tree-vect.h"
> +
> +int var_9 = 1693986256, var_14;
> +int arr_16[11];
> +void test();
> +int main()
> +{
> +  check_vect();
> +  test();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 11737a38a56..90822aba0ee 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -7964,7 +7964,7 @@ vectorizable_store (vec_info *vinfo,
>   /* Emit:
>MASK_STORE_LANES (DATAREF_PTR, ALIAS_PTR, VEC_MASK,
>  VEC_ARRAY).  */
> - unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> + unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype));
>   tree alias_ptr = build_int_cst (ref_type, align);
>   call = gimple_build_call_internal (IFN_MASK_STORE_LANES, 4,
> 

[libcody] Windows absdir fix

2021-01-04 Thread Nathan Sidwell

An obvious thinko in dirve name check :(

libcody/
* resolver.cc (IsAbsDir): Fix string indexing.

Signed-off-by: Nathan Sidwell 

pushed to trunk

--
Nathan Sidwell
diff --git i/Makefile.def w/Makefile.def
index c45be5bff45..d87c2a4bc3c 100644
--- i/Makefile.def
+++ w/Makefile.def
@@ -89,7 +89,8 @@ host_modules= { module= libcody; bootstrap=true;
 		missing= info;
 		missing= install-pdf;
 		missing= install-html;
-		missing= install-info; };
+		missing= install-info;
+		missing= check; };
 host_modules= { module= libdecnumber; bootstrap=true; };
 host_modules= { module= libgui; };
 host_modules= { module= libiberty; bootstrap=true;


RE: [RFC] middle-end: Extend CSE to understand vector extracts.

2021-01-04 Thread Tamar Christina via Gcc-patches
Hi Richi, 

> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 4, 2021 1:33 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; i...@airs.com;
> l...@redhat.com
> Subject: Re: [RFC] middle-end: Extend CSE to understand vector extracts.
> 
> On Mon, 4 Jan 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > I am trying to get CSE to re-use constants already inside a vector
> > rather than re-materializing the constant again.
> >
> > Basically consider the following case:
> >
> > #include 
> > #include 
> >
> > uint64_t
> > test (uint64_t a, uint64x2_t b, uint64x2_t* rt) {
> >   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
> >   uint64_t res = a | arr[0];
> >   uint64x2_t val = vld1q_u64 (arr);
> >   *rt = vaddq_u64 (val, b);
> >   return res;
> > }
> >
> > The actual behavior is inconsequential however notice that the same
> > constants are used in the vector (arr and later val) and in the calculation 
> > of
> res.
> >
> > The code we generate for this however is quite sub-optimal:
> >
> > test:
> > adrpx2, .LC0
> > sub sp, sp, #16
> > ldr q1, [x2, #:lo12:.LC0]
> > mov x2, 16502
> > movkx2, 0x1023, lsl 16
> > movkx2, 0x4308, lsl 32
> > add v1.2d, v1.2d, v0.2d
> > movkx2, 0x942, lsl 48
> > orr x0, x0, x2
> > str q1, [x1]
> > add sp, sp, 16
> > ret
> > .LC0:
> > .xword  667169396713799798
> > .xword  667169396713799798
> >
> > Essentially we materialize the same constant twice.  The reason for
> > this is because the front-end lowers the constant extracted from arr[0]
> quite early on.
> > If you look into the result of fre you'll find
> >
> >:
> >   arr[0] = 667169396713799798;
> >   arr[1] = 667169396713799798;
> >   res_7 = a_6(D) | 667169396713799798;
> >   _16 = __builtin_aarch64_ld1v2di (&arr);
> >   _17 = VIEW_CONVERT_EXPR(_16);
> >   _11 = b_10(D) + _17;
> >   *rt_12(D) = _11;
> >   arr ={v} {CLOBBER};
> >   return res_7;
> >
> > Which makes sense for further optimization.  However come expand time
> > if the constant isn't representable in the target arch it will be
> > assigned to a register again.
> >
> > (insn 8 5 9 2 (set (reg:V2DI 99)
> > (const_vector:V2DI [
> > (const_int 667169396713799798 [0x942430810234076]) repeated 
> > x2
> > ])) "cse.c":7:12 -1
> >  (nil))
> > ...
> > (insn 14 13 15 2 (set (reg:DI 103)
> > (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
> >  (nil))
> > (insn 15 14 16 2 (set (reg:DI 102 [ res ])
> > (ior:DI (reg/v:DI 96 [ a ])
> > (reg:DI 103))) "cse.c":8:12 -1
> >  (nil))
> >
> > And since it's out of the immediate range of the scalar instruction
> > used combine won't be able to do anything here.
> >
> > This will then trigger the re-materialization of the constant twice.
> >
> > So I figured the best place to handle this is in CSE since in some
> > uArch it's far cheaper to extract a constant from a vector than to 
> > materialize
> it.
> >
> > Particularly doing it pre-RA has the benefit of allowing RA to decide
> > whether it needs to move the constant between register files or not as
> > some uArch can perform scalar operation both on the SIMD and GENREG
> side.
> >
> > The issue is I don't know that much about CSE.  I have been reading
> > through the source and think I have a basic understanding of how it
> > works but this email is to see if I'm on the right track or not (to
> > something that is acceptable upstream).
> >
> > My current patch for CSE is:
> >
> > diff --git a/gcc/cse.c b/gcc/cse.c
> > index 36bcfc354d8..3cee53bed85 100644
> > --- a/gcc/cse.c
> > +++ b/gcc/cse.c
> > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
> > #include "rtl-iter.h"
> >  #include "regs.h"
> >  #include "function-abi.h"
> > +#include "expr.h"
> >
> >  /* The basic idea of common subexpression elimination is to go
> > through the code, keeping a record of expressions that would @@
> > -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
> >  someplace else, so it isn't worth cse'ing.  */
> >else if (GET_CODE (SET_SRC (x)) == CALL)
> > ;
> > +  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
> > +   {
> > + /* First register the vector itself.  */
> > + sets[n_sets++].rtl = x;
> > + rtx src = SET_SRC (x);
> > + machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
> > +  /* Go over the constants of the CONST_VECTOR in forward order, to
> > +put them in the same order in the SETS array.  */
> > + for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
> > +   {
> > + rtx y = gen_rtx_SUBREG (elem_mode, SET_DEST (x), i);
> > + sets[n_sets++].rtl = PATTERN (gen_move_insn (y,
> CONST_VECTOR_ELT 

[PATCH] tree-optimization/98308 - set vector type for mask of masked load

2021-01-04 Thread Richard Biener
This makes sure to set the vector type on an invariant mask argument
for a masked load and SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-04  Richard Biener  

PR tree-optimization/98308
* tree-vect-stmts.c (vectorizable_load): Set invariant mask
SLP vectype.

* gcc.dg/vect/pr98308.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr98308.c | 16 
 gcc/tree-vect-stmts.c   | 11 +++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr98308.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr98308.c 
b/gcc/testsuite/gcc.dg/vect/pr98308.c
new file mode 100644
index 000..7d717b1ee51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr98308.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-march=skylake-avx512" { target avx512f } } */
+
+extern unsigned long long int arr_86[];
+extern unsigned long long int arr_87[][15];
+
+void test(_Bool a, unsigned short c[][15], unsigned char d[])
+{
+  for (short h = 0; h < 10; h++)
+for (char i = 0; i < 15; i += 2)
+  {
+   arr_86[0] = d[0];
+   arr_87[h][0] = a ? c[h][i] : 0;
+  }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9e9467531a3..54fb68b216f 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -8563,6 +8563,17 @@ vectorizable_load (vec_info *vinfo,
 
   if (!vec_stmt) /* transformation not required.  */
 {
+  if (slp_node
+ && mask
+ && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
+   mask_vectype))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"incompatible vector types for invariants\n");
+ return false;
+   }
+
   if (!slp)
STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
 
-- 
2.26.2


Re: [RFC] middle-end: Extend CSE to understand vector extracts.

2021-01-04 Thread Richard Biener
On Mon, 4 Jan 2021, Tamar Christina wrote:

> Hi All,
> 
> I am trying to get CSE to re-use constants already inside a vector rather than
> re-materializing the constant again.
> 
> Basically consider the following case:
> 
> #include 
> #include 
> 
> uint64_t
> test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
> {
>   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
>   uint64_t res = a | arr[0];
>   uint64x2_t val = vld1q_u64 (arr);
>   *rt = vaddq_u64 (val, b);
>   return res;
> }
> 
> The actual behavior is inconsequential however notice that the same constants
> are used in the vector (arr and later val) and in the calculation of res.
> 
> The code we generate for this however is quite sub-optimal:
> 
> test:
> adrpx2, .LC0
> sub sp, sp, #16
> ldr q1, [x2, #:lo12:.LC0]
> mov x2, 16502
> movkx2, 0x1023, lsl 16
> movkx2, 0x4308, lsl 32
> add v1.2d, v1.2d, v0.2d
> movkx2, 0x942, lsl 48
> orr x0, x0, x2
> str q1, [x1]
> add sp, sp, 16
> ret
> .LC0:
> .xword  667169396713799798
> .xword  667169396713799798
> 
> Essentially we materialize the same constant twice.  The reason for this is
> because the front-end lowers the constant extracted from arr[0] quite early 
> on.
> If you look into the result of fre you'll find
> 
>:
>   arr[0] = 667169396713799798;
>   arr[1] = 667169396713799798;
>   res_7 = a_6(D) | 667169396713799798;
>   _16 = __builtin_aarch64_ld1v2di (&arr);
>   _17 = VIEW_CONVERT_EXPR(_16);
>   _11 = b_10(D) + _17;
>   *rt_12(D) = _11;
>   arr ={v} {CLOBBER};
>   return res_7;
> 
> Which makes sense for further optimization.  However come expand time if the
> constant isn't representable in the target arch it will be assigned to a
> register again.
> 
> (insn 8 5 9 2 (set (reg:V2DI 99)
> (const_vector:V2DI [
> (const_int 667169396713799798 [0x942430810234076]) repeated x2
> ])) "cse.c":7:12 -1
>  (nil))
> ...
> (insn 14 13 15 2 (set (reg:DI 103)
> (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
>  (nil))
> (insn 15 14 16 2 (set (reg:DI 102 [ res ])
> (ior:DI (reg/v:DI 96 [ a ])
> (reg:DI 103))) "cse.c":8:12 -1
>  (nil))
> 
> And since it's out of the immediate range of the scalar instruction used
> combine won't be able to do anything here.
> 
> This will then trigger the re-materialization of the constant twice.
> 
> So I figured the best place to handle this is in CSE since in some uArch it's
> far cheaper to extract a constant from a vector than to materialize it.
> 
> Particularly doing it pre-RA has the benefit of allowing RA to decide whether 
> it
> needs to move the constant between register files or not as some uArch can
> perform scalar operation both on the SIMD and GENREG side.
> 
> The issue is I don't know that much about CSE.  I have been reading through 
> the
> source and think I have a basic understanding of how it works but this email 
> is
> to see if I'm on the right track or not (to something that is acceptable
> upstream).
> 
> My current patch for CSE is:
> 
> diff --git a/gcc/cse.c b/gcc/cse.c
> index 36bcfc354d8..3cee53bed85 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "rtl-iter.h"
>  #include "regs.h"
>  #include "function-abi.h"
> +#include "expr.h"
> 
>  /* The basic idea of common subexpression elimination is to go
> through the code, keeping a record of expressions that would
> @@ -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
>  someplace else, so it isn't worth cse'ing.  */
>else if (GET_CODE (SET_SRC (x)) == CALL)
> ;
> +  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
> +   {
> + /* First register the vector itself.  */
> + sets[n_sets++].rtl = x;
> + rtx src = SET_SRC (x);
> + machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
> +  /* Go over the constants of the CONST_VECTOR in forward order, to
> +put them in the same order in the SETS array.  */
> + for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
> +   {
> + rtx y = gen_rtx_SUBREG (elem_mode, SET_DEST (x), i);
> + sets[n_sets++].rtl = PATTERN (gen_move_insn (y, 
> CONST_VECTOR_ELT (src, i)));
> +   }
> +   }
>else
> sets[n_sets++].rtl = x;
>  }
> @@ -4545,7 +4560,14 @@ cse_insn (rtx_insn *insn)
>struct set *sets = (struct set *) 0;
> 
>if (GET_CODE (x) == SET)
> -sets = XALLOCA (struct set);
> +{
> +  /* For CONST_VECTOR we wants to be able to CSE the vector itself along 
> with
> +elements inside the vector if the target says it's cheap.  */
> +  if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
> +   sets = XALLOCAVEC

Re: [PR97714] final: accept markers at line 0

2021-01-04 Thread Richard Biener via Gcc-patches
On Wed, Dec 23, 2020 at 12:56 AM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> Back when I introduced debug markers, I seem to have been under the
> impression that location line 0 would only ever occur for unknown and
> builtin locations.
>
> Though line 0 never comes up in normal processing of source files, and
> debug info formats often cannot represent them, I suppose there's no
> need to preemptively discard them during final.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

>
> for  gcc/ChangeLog
>
> PR debug/97714
> * final.c (notice_source_line): Narrow down the condition to
> skip a line-0 marker.
>
> for  gcc/testsuite/ChangeLog
>
> PR debug/97714
> * gcc.dg/pr97714.c: New.
> ---
>  gcc/final.c  |   11 +--
>  gcc/testsuite/gcc.dg/debug/pr97714.c |   11 +++
>  2 files changed, 16 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/pr97714.c
>
> diff --git a/gcc/final.c b/gcc/final.c
> index fc9a05e335f47..45f7f39f0885b 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -3250,12 +3250,11 @@ notice_source_line (rtx_insn *insn, bool *is_stmt)
>  {
>location_t loc = NOTE_MARKER_LOCATION (insn);
>expanded_location xloc = expand_location (loc);
> -  if (xloc.line == 0)
> -   {
> - gcc_checking_assert (LOCATION_LOCUS (loc) == UNKNOWN_LOCATION
> -  || LOCATION_LOCUS (loc) == BUILTINS_LOCATION);
> - return false;
> -   }
> +  if (xloc.line == 0
> + && (LOCATION_LOCUS (loc) == UNKNOWN_LOCATION
> + || LOCATION_LOCUS (loc) == BUILTINS_LOCATION))
> +   return false;
> +
>filename = xloc.file;
>linenum = xloc.line;
>columnnum = xloc.column;
> diff --git a/gcc/testsuite/gcc.dg/debug/pr97714.c 
> b/gcc/testsuite/gcc.dg/debug/pr97714.c
> new file mode 100644
> index 0..dba17831d6943
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/pr97714.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -g" } */
> +
> +void
> +function ()
> +{
> +  if (0)
> +{
> +#line 0 "whatever"
> +}
> +}
>
> --
> Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
>Free Software Activist GNU Toolchain Engineer
> Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: make FOR_EACH_IMM_USE_STMT safe for early exits (was: Re: move sincos after pre)

2021-01-04 Thread Richard Biener via Gcc-patches
On Tue, Dec 22, 2020 at 11:03 PM Alexandre Oliva  wrote:
>
> On Oct 28, 2020, Richard Biener  wrote:
>
> >> BTW, any reason why we are not (yet?) using something like:
> >>
> >> #define FOR_EACH_IMM_USE_STMT(STMT, ITER, SSAVAR)   \
> >> for (auto_end_imm_use_stmt_traverse auto_end  \
> >> STMT) = first_imm_use_stmt (&(ITER), (SSAVAR))), \
> >> &(ITER))); \
> >> !end_imm_use_stmt_p (&(ITER));   \
> >> (void) ((STMT) = next_imm_use_stmt (&(ITER
>
> > Just laziness.  Or rather last time I remembered this I tried to do it
> > more fancy via range-for but didn't get very far due to the nested
> > iteration ...
>
> Use a dtor to automatically remove ITER from IMM_USE list in
> FOR_EACH_IMM_USE_STMT.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

Hmm - while the change looks good, doesn't it end up
calling end_imm_use_stmt_tranverse twice for those
uses still calling BREAK_FROM_IMM_USE_STMT?

Thus, please remove uses of BREAK_FROM_IMM_USE_STMT
together with this patch.

Thanks,
Richard.

>
> for  gcc/ChangeLog
>
> * ssa-iterators.h (end_imm_use_stmt_traverse): Forward
> declare.
> (auto_end_imm_use_stmt_traverse): New struct.
> (FOR_EACH_IMM_USE_STMT): Use it.
> ---
>  gcc/ssa-iterators.h |   29 -
>  1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
> index 98724da14522d..817997aa952d4 100644
> --- a/gcc/ssa-iterators.h
> +++ b/gcc/ssa-iterators.h
> @@ -77,16 +77,35 @@ struct imm_use_iterator
> !end_readonly_imm_use_p (&(ITER));  \
> (void) ((DEST) = next_readonly_imm_use (&(ITER
>
> -/* Use this iterator to visit each stmt which has a use of SSAVAR.  */
> +/* Forward declare for use in the class below.  */
> +static inline void end_imm_use_stmt_traverse (imm_use_iterator *);
> +
> +/* arrange to automatically call, upon descruction, end_imm_use_stmt_traverse
> +   with a given pointer to imm_use_iterator.  */
> +struct auto_end_imm_use_stmt_traverse
> +{
> +  imm_use_iterator *imm;
> +  auto_end_imm_use_stmt_traverse (imm_use_iterator *imm)
> +  : imm (imm) {}
> +  ~auto_end_imm_use_stmt_traverse ()
> +  { end_imm_use_stmt_traverse (imm); }
> +};
> +
> +/* Use this iterator to visit each stmt which has a use of SSAVAR.  The
> +   destructor of the auto_end_imm_use_stmt_traverse object deals with 
> removing
> +   ITER from SSAVAR's IMM_USE list even when leaving the scope early.  */
>
>  #define FOR_EACH_IMM_USE_STMT(STMT, ITER, SSAVAR)  \
> -  for ((STMT) = first_imm_use_stmt (&(ITER), (SSAVAR));\
> +  for (struct auto_end_imm_use_stmt_traverse   \
> + auto_end_imm_use_stmt_traverse\
> +  STMT) = first_imm_use_stmt (&(ITER), (SSAVAR))), \
> +&(ITER))); \
> !end_imm_use_stmt_p (&(ITER));  \
> (void) ((STMT) = next_imm_use_stmt (&(ITER
>
> -/* Use this to terminate the FOR_EACH_IMM_USE_STMT loop early.  Failure to
> -   do so will result in leaving a iterator marker node in the immediate
> -   use list, and nothing good will come from that.   */
> +/* These used to be needed to exit FOR_EACH_IMM_USE_STMT early, without 
> leaving
> +   ITER behind in the stmt lists.  This is now taken care of automatically, 
> but
> +   it doesn't hurt to use the macros for documentation purposes.  */
>  #define BREAK_FROM_IMM_USE_STMT(ITER)  \
> {   \
>   end_imm_use_stmt_traverse (&(ITER));  \
>
> --
> Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
>Free Software Activist GNU Toolchain Engineer
> Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] store VLA bounds in attribute access as strings (PR 97172)

2021-01-04 Thread Richard Biener via Gcc-patches
On Sun, Dec 20, 2020 at 6:43 PM Martin Sebor  wrote:
>
> On 12/19/20 12:48 AM, Richard Biener via Gcc-patches wrote:
> > On December 19, 2020 1:55:02 AM GMT+01:00, Martin Sebor via Gcc-patches 
> >  wrote:
> >> To keep tree expressions stored by the front end in attribute
> >> access for nontrivial VLA bounds from getting corrupted during
> >> Gimplification and to avoid breaking the preconditions verified
> >> by the LTO streamer that no such trees exist in the IL,
> >> the attached patch replaces those bounds with a string
> >> representation of those expressions (as STRING_CST).  It also
> >> tweaks the pretty-printer to improve the formatting of the VLA
> >> bounds and avoid inserting spurious spaces in some cases.
> >>
> >> The strings are only used by the front end to verify that
> >> redeclarations of the same function match in the form and bounds
> >> of their VLA arguments, and they're not needed in the middle end.
> >> I considered removing them just before the front end finishes but
> >> I couldn't find an efficient way to do that.  Is there some data
> >> structure that stores all function declarations in a translation
> >> unit?  If there is, then traversing it and removing the attribute
> >> arguments might also be an option, either in addition to this
> >> change or in lieu of it.
> >
> > There is the free lang data pass in tree.c which walks all reachable tree 
> > nodes.
>
> You said in response to Honza's suggestion in pr97172 to do that:
>
>The frontend needs to make sure no frontend specific tree codes
>leak into GENERIC so GENERICization is the place where the FE
>should clear those.  FLD is too late and doing it there would
>be a hack.
>
> Are you now saying you've had a change of heart and that doing it
> there isn't a hack after all?

Well, removing a FE specific attribute [when having non-constant arg] might
be OK there.  But note you asked for a "pass over all tree nodes" thing
and I just pointed out free-lang-data.

Doing the STRING_CST is certainly less fragile since the SSA names
created at gimplification time could even be ggc_freed when no longer
used in the IL.

Richard.

> Martin
>
> >> The patch was tested on x86_64-linux.
> >>
> >> Martin
> >
>


RE: [RFC] AArch64: Have RTL patterns recognize DI extracts from vectors at offset 0 as no-op.

2021-01-04 Thread Tamar Christina via Gcc-patches
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, January 4, 2021 12:29 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [RFC] AArch64: Have RTL patterns recognize DI extracts from
> vectors at offset 0 as no-op.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > I have been looking into a class of problems where GCC is not
> > recognizing that a subreg of lane 0 (using little-endian as example)
> > of a vector register and passing that to an instruction.
> >
> > As an example consider
> >
> > poly64_t
> > testcase (uint8x16_t input, poly64x2_t mask) {
> > poly64_t prodL =
> vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)input, 0),
> >vgetq_lane_p64(mask, 0));
> > poly64_t prodH = vmull_high_p64((poly64x2_t)input, mask);
> > return prodL + prodH;
> > }
> >
> > Where we generate
> >
> > testcase:
> > dup d2, v0.d[0]
> > dup d3, v1.d[0]
> > pmull2  v0.1q, v0.2d, v1.2d
> > pmull   v2.1q, v2.1d, v3.1d
> > add d0, d2, d0
> > fmovx0, d0
> > ret
> >
> > whereas it should have been, which clang generates:
> >
> > testcase:
> > pmull   v2.1q, v0.1d, v1.1d
> > pmull2  v0.1q, v0.2d, v1.2d
> > add v0.2d, v0.2d, v2.2d
> > fmovx0, d0
> > ret
> >
> > Now this can be naively solved by just adding the RTL patterns for the
> > vec_selects as the example in the patch, but this doesn't solve the
> > overall problem and I am wondering how to best do this.
> >
> > One approach would be to extend combine's noop detection in
> > noop_move_p to recognize these cases.
> >
> > The downside here is that the conversion becomes implicit in the rtl. i.e.
> > you'll see a SET of a V2DI but a use of DI for that same register.
> > I'm not sure the semantics of RTL allow such implicit uses?
> 
> It's OK to set a hard register in one mode and use it in a different mode
> (without subregs), but it's not possible to do the same using pseudos.
> 
> > The second approach I can think of is to extend reload to recognize
> > these no-ops and give the same register and mark the extract as unused
> > such that DSE cleans it up.
> >
> > But there's probably a better approach I didn't think of :)
> 
> FWIW, for MIPS we tended to handle this kind of thing using matching
> constraints.  E.g. for:
> 
> (define_insn_and_split "aarch64_simd_mov_from_low"
>   [(set (match_operand: 0 "register_operand" "=w,?r")
> (vec_select:
>   (match_operand:VQMOV_NO2E 1 "register_operand" "w,w")
>   (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half" "")))]
>   "TARGET_SIMD"
>   "@
>#
>umov\t%0, %1.d[0]"
>   "&& reload_completed && aarch64_simd_register (operands[0],
> mode)"
>   [(set (match_dup 0) (match_dup 1))]
>   {
> operands[1] = aarch64_replace_reg_mode (operands[1], mode);
>   }
>   [(set_attr "type" "mov_reg,neon_to_gp")
>(set_attr "length" "4")]
> )
> 
> use something like "0,w" for operand 1, so that the first alternative can be
> split to nothing:

Ah, interesting, I indeed didn't think of this approach.  I'll go experiment.

Thanks!

> 
> ;; When TARGET_64BIT, all SImode integer and accumulator registers ;;
> should already be in sign-extended form (see
> TARGET_TRULY_NOOP_TRUNCATION ;; and truncdisi2).  We can therefore
> get rid of register->register ;; instructions if we constrain the source to 
> be in
> the same register as ;; the destination.
> ;;
> ;; Only the pre-reload scheduler sees the type of the register alternatives; 
> ;;
> we split them into nothing before the post-reload scheduler runs.
> ;; These alternatives therefore have type "move" in order to reflect ;; what
> happens if the two pre-reload operands cannot be tied, and are ;; instead
> allocated two separate GPRs.  We don't distinguish between ;; the GPR and
> LO cases because we don't usually know during pre-reload ;; scheduling
> whether an operand will be LO or not.
> (define_insn_and_split "extendsidi2"
>   [(set (match_operand:DI 0 "register_operand" "=d,l,d")
> (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand"
> "0,0,m")))]
>   "TARGET_64BIT"
>   "@
>#
>#
>lw\t%0,%1"
>   "&& reload_completed && register_operand (operands[1], VOIDmode)"
>   [(const_int 0)]
> {
>   emit_note (NOTE_INSN_DELETED);
>   DONE;
> }
>   [(set_attr "move_type" "move,move,load")
>(set_attr "mode" "DI")])
> 
> It'll need some experimentation though.  E.g. is it worth providing a w<-w
> alternative as well, with ? or ^ to disparage it?
> 
> Independently of that, it might be worth trying to add a memory alternative,
> so that we can load spilled values directly from memory instead of first
> reloading the vector.
> 
> Thanks,
> Richard


[PATCH] tree-optimization/98464 - replace loop info with avail at uses

2021-01-04 Thread Richard Biener
This does VN replacement in loop nb_iterations consistent with
the rest of the IL by using availability at the definition site
of uses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-04  Richard Biener  

PR tree-optimization/98464
* tree-ssa-sccvn.c (vn_valueize_for_srt): Rename from ...
(vn_valueize_wrapper): ... this.  Temporarily adjust vn_context_bb.
(process_bb): Adjust.

* g++.dg/opt/pr98464.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr98464.C | 155 +
 gcc/tree-ssa-sccvn.c   |  23 +++--
 2 files changed, 172 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr98464.C

diff --git a/gcc/testsuite/g++.dg/opt/pr98464.C 
b/gcc/testsuite/g++.dg/opt/pr98464.C
new file mode 100644
index 000..38d0c1b1f7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr98464.C
@@ -0,0 +1,155 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O3 -fno-tree-dce" }
+
+template < typename, typename, template < typename > class _Op,
typename... _Args > struct __detector {
+  using type = _Op< _Args... >;
+};
+ template < typename _Default, template < typename > class _Op,
typename... _Args > using __detected_or = __detector< _Default, void, _Op, 
_Args... >;
+ template < typename _Default, template < typename > class _Op,
typename... _Args > using __detected_or_t = typename __detected_or< _Default, 
_Op, _Args... >::type;
+ template < typename, typename > struct __replace_first_arg;
+ template < template < typename > class _Template, typename _Up, typename 
_Tp,typename... _Types > struct __replace_first_arg< _Template< 
_Tp, _Types... >, _Up > {
+  using type = _Template< _Up >;
+};
+ template < class > class min_pointer;
+ class MoveOnly;
+ struct pointer_traits {
+  template < typename _Up >   using rebind = typename __replace_first_arg< 
min_pointer< int >, _Up >::type;
+};
+ template < typename _Iterator > class __normal_iterator {
+public:   __normal_iterator(_Iterator);
+};
+ struct __allocator_traits_base {
+  template < typename _Tp > using __pointer = typename _Tp::pointer;
+};
+ template < typename _Alloc > struct allocator_traits : 
__allocator_traits_base {
+  typedef typename _Alloc::value_type value_type;
+  using pointer = __detected_or_t< value_type, __pointer, _Alloc >;
+  template < typename _Tp > struct _Ptr {
+   using type = pointer_traits::rebind< _Tp >;
+ };
+  using const_pointer = typename _Ptr< value_type >::type;
+  using size_type = int;
+  static pointer allocate(_Alloc __a, size_type __n) {
+   return __a.allocate(__n);
+ }
+};
+ template < typename _ForwardIterator, typename _Allocator > void 
_Destroy(_ForwardIterator __first, _ForwardIterator __last, _Allocator) {
+  for (;
+__first != __last;
+++__first) ;
+}
+ template < typename _InputIterator, typename _ForwardIterator,
typename _Allocator > _ForwardIterator __uninitialized_copy_a(_InputIterator, 
_ForwardIterator, _Allocator);
+ template < typename _InputIterator, typename _ForwardIterator,
typename _Allocator > _ForwardIterator 
__uninitialized_move_if_noexcept_a(_InputIterator __last,   
  _ForwardIterator __result,
 _Allocator __alloc) {
+  return __uninitialized_copy_a(__last, __result, __alloc);
+}
+ template < typename _ForwardIterator, typename _Size, typename _Allocator 
> _ForwardIterator __uninitialized_default_n_a(_ForwardIterator __first,
  _Size __n, _Allocator) {
+  for (;
+__n;
+--__n, ++__first) ;
+  return __first;
+}
+ template < typename _Alloc > struct _Vector_base {
+  typedef _Alloc _Tp_alloc_type;
+  typedef typename _Tp_alloc_type ::pointer pointer;
+  struct _Vector_impl_data {
+   pointer _M_start;
+   pointer _M_finish;
+   pointer _M_end_of_storage;
+ };
+  struct _Vector_impl : _Tp_alloc_type, _Vector_impl_data {
+   _Vector_impl(_Tp_alloc_type) {
+ }
+ };
+  _Vector_base(long __n, _Alloc __a) : _M_impl(__a) {
+   _M_impl._M_end_of_storage = _M_impl._M_start + __n;
+ }
+  _Vector_impl _M_impl;
+  pointer _M_allocate(long __n) {
+   return __n ? allocator_traits< _Tp_alloc_type >::allocate(_M_impl, __n) 
   : pointer();
+ }
+};
+ template < typename, typename _Alloc > class vector : _Vector_base< 
_Alloc > {
+public:   typedef typename _Alloc::pointer pointer;
+  typedef __normal_iterator<   typename allocator_traits< _Alloc 
>::const_pointer >   const_iterator;
+  typedef _Alloc allocator_type;

Re: [PATCH] Fix --enable-gather-detailed-mem-stats build.

2021-01-04 Thread Richard Biener via Gcc-patches
On Wed, Dec 16, 2020 at 10:52 AM Martin Liška  wrote:
>
> On 12/16/20 10:38 AM, Rainer Orth wrote:
> > Hi Jakub,
> >
> >> On Wed, Dec 16, 2020 at 10:20:09AM +0100, Martin Liška wrote:
> >>> So vec_mem_desc is not initialized before a static member in module.cc.
> >>> We can fix it by using constructor attribute.
> > [...]
> >>> +   of all static variables.  */
> >>> +
> >>> +static void
> >>> +__attribute__((constructor (101)))
> >>
> >> I think this needs to be guarded based on which compiler is used to compile
> >> GCC.  Perhaps we could say that we don't support
> >> --enable-gather-detailed-mem-stats when the compiler isn't built by GCC (or
> >> other compiler that supports the constructor attribute) and #error on that.
> >
> > not only that: if a target doesn't support constructor priorities (like
> > Solaris 11.3, unlike 11.4), this makes gcc error out.
> >
> >   Rainer
> >
>
> I see, I'm then suggesting version 3 of the patch that does not depend
> on a constructor.
>
> Thoughts?

I guess that works.  Note we're trying to limit the number of dynamic
initializers so the new loc_spans one would be another candidate to
dynamically allocate instead.

But OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> Martin


Re: [RFC] AArch64: Have RTL patterns recognize DI extracts from vectors at offset 0 as no-op.

2021-01-04 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> I have been looking into a class of problems where GCC is not recognizing that
> a subreg of lane 0 (using little-endian as example) of a vector register and
> passing that to an instruction.
>
> As an example consider
>
> poly64_t
> testcase (uint8x16_t input, poly64x2_t mask)
> {
> poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)input, 0),
>  vgetq_lane_p64(mask, 0));
> poly64_t prodH = vmull_high_p64((poly64x2_t)input, mask);
> return prodL + prodH;
> }
>
> Where we generate
>
> testcase:
>   dup d2, v0.d[0]
>   dup d3, v1.d[0]
>   pmull2  v0.1q, v0.2d, v1.2d
>   pmull   v2.1q, v2.1d, v3.1d
>   add d0, d2, d0
>   fmovx0, d0
>   ret
>
> whereas it should have been, which clang generates:
>
> testcase:
>   pmull   v2.1q, v0.1d, v1.1d
>   pmull2  v0.1q, v0.2d, v1.2d
>   add v0.2d, v0.2d, v2.2d
>   fmovx0, d0
>   ret
>
> Now this can be naively solved by just adding the RTL patterns for the
> vec_selects as the example in the patch, but this doesn't solve the overall
> problem and I am wondering how to best do this.
>
> One approach would be to extend combine's noop detection in noop_move_p to
> recognize these cases.
>
> The downside here is that the conversion becomes implicit in the rtl. i.e.
> you'll see a SET of a V2DI but a use of DI for that same register.  I'm not 
> sure
> the semantics of RTL allow such implicit uses?

It's OK to set a hard register in one mode and use it in a different mode
(without subregs), but it's not possible to do the same using pseudos.

> The second approach I can think of is to extend reload to recognize these 
> no-ops
> and give the same register and mark the extract as unused such that DSE cleans
> it up.
>
> But there's probably a better approach I didn't think of :)

FWIW, for MIPS we tended to handle this kind of thing using matching
constraints.  E.g. for:

(define_insn_and_split "aarch64_simd_mov_from_low"
  [(set (match_operand: 0 "register_operand" "=w,?r")
(vec_select:
  (match_operand:VQMOV_NO2E 1 "register_operand" "w,w")
  (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half" "")))]
  "TARGET_SIMD"
  "@
   #
   umov\t%0, %1.d[0]"
  "&& reload_completed && aarch64_simd_register (operands[0], mode)"
  [(set (match_dup 0) (match_dup 1))]
  {
operands[1] = aarch64_replace_reg_mode (operands[1], mode);
  }
  [(set_attr "type" "mov_reg,neon_to_gp")
   (set_attr "length" "4")]
)

use something like "0,w" for operand 1, so that the first alternative
can be split to nothing:

;; When TARGET_64BIT, all SImode integer and accumulator registers
;; should already be in sign-extended form (see TARGET_TRULY_NOOP_TRUNCATION
;; and truncdisi2).  We can therefore get rid of register->register
;; instructions if we constrain the source to be in the same register as
;; the destination.
;;
;; Only the pre-reload scheduler sees the type of the register alternatives;
;; we split them into nothing before the post-reload scheduler runs.
;; These alternatives therefore have type "move" in order to reflect
;; what happens if the two pre-reload operands cannot be tied, and are
;; instead allocated two separate GPRs.  We don't distinguish between
;; the GPR and LO cases because we don't usually know during pre-reload
;; scheduling whether an operand will be LO or not.
(define_insn_and_split "extendsidi2"
  [(set (match_operand:DI 0 "register_operand" "=d,l,d")
(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
  "TARGET_64BIT"
  "@
   #
   #
   lw\t%0,%1"
  "&& reload_completed && register_operand (operands[1], VOIDmode)"
  [(const_int 0)]
{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}
  [(set_attr "move_type" "move,move,load")
   (set_attr "mode" "DI")])

It'll need some experimentation though.  E.g. is it worth providing
a w<-w alternative as well, with ? or ^ to disparage it?

Independently of that, it might be worth trying to add a memory
alternative, so that we can load spilled values directly from
memory instead of first reloading the vector.

Thanks,
Richard


[RFC] middle-end: Extend CSE to understand vector extracts.

2021-01-04 Thread Tamar Christina via Gcc-patches
Hi All,

I am trying to get CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include 
#include 

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
adrpx2, .LC0
sub sp, sp, #16
ldr q1, [x2, #:lo12:.LC0]
mov x2, 16502
movkx2, 0x1023, lsl 16
movkx2, 0x4308, lsl 32
add v1.2d, v1.2d, v0.2d
movkx2, 0x942, lsl 48
orr x0, x0, x2
str q1, [x1]
add sp, sp, 16
ret
.LC0:
.xword  667169396713799798
.xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

   :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
(const_vector:V2DI [
(const_int 667169396713799798 [0x942430810234076]) repeated x2
])) "cse.c":7:12 -1
 (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
(const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
 (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
(ior:DI (reg/v:DI 96 [ a ])
(reg:DI 103))) "cse.c":8:12 -1
 (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

So I figured the best place to handle this is in CSE since in some uArch it's
far cheaper to extract a constant from a vector than to materialize it.

Particularly doing it pre-RA has the benefit of allowing RA to decide whether it
needs to move the constant between register files or not as some uArch can
perform scalar operation both on the SIMD and GENREG side.

The issue is I don't know that much about CSE.  I have been reading through the
source and think I have a basic understanding of how it works but this email is
to see if I'm on the right track or not (to something that is acceptable
upstream).

My current patch for CSE is:

diff --git a/gcc/cse.c b/gcc/cse.c
index 36bcfc354d8..3cee53bed85 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "regs.h"
 #include "function-abi.h"
+#include "expr.h"

 /* The basic idea of common subexpression elimination is to go
through the code, keeping a record of expressions that would
@@ -4306,6 +4307,20 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets)
 someplace else, so it isn't worth cse'ing.  */
   else if (GET_CODE (SET_SRC (x)) == CALL)
;
+  else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+   {
+ /* First register the vector itself.  */
+ sets[n_sets++].rtl = x;
+ rtx src = SET_SRC (x);
+ machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src));
+  /* Go over the constants of the CONST_VECTOR in forward order, to
+put them in the same order in the SETS array.  */
+ for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++)
+   {
+ rtx y = gen_rtx_SUBREG (elem_mode, SET_DEST (x), i);
+ sets[n_sets++].rtl = PATTERN (gen_move_insn (y, CONST_VECTOR_ELT 
(src, i)));
+   }
+   }
   else
sets[n_sets++].rtl = x;
 }
@@ -4545,7 +4560,14 @@ cse_insn (rtx_insn *insn)
   struct set *sets = (struct set *) 0;

   if (GET_CODE (x) == SET)
-sets = XALLOCA (struct set);
+{
+  /* For CONST_VECTOR we wants to be able to CSE the vector itself along 
with
+elements inside the vector if the target says it's cheap.  */
+  if (GET_CODE (SET_SRC (x)) == CONST_VECTOR)
+   sets = XALLOCAVEC (struct set, const_vector_encoded_nelts (SET_SRC (x)) 
+ 1);
+  else
+   sets = XALLOCA (struct set);
+}
   else if (GET_CODE (x) == PARALLEL)
 sets = XALLOCAVEC (struct set, XVECLEN (x, 0));

--

This extends the sets that CSE uses to perform CSE to not only contain the
CONST_VECTOR but also the individual elements

[RFC] AArch64: Have RTL patterns recognize DI extracts from vectors at offset 0 as no-op.

2021-01-04 Thread Tamar Christina via Gcc-patches
Hi All,

I have been looking into a class of problems where GCC is not recognizing that
a subreg of lane 0 (using little-endian as example) of a vector register and
passing that to an instruction.

As an example consider

poly64_t
testcase (uint8x16_t input, poly64x2_t mask)
{
poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)input, 0),
   vgetq_lane_p64(mask, 0));
poly64_t prodH = vmull_high_p64((poly64x2_t)input, mask);
return prodL + prodH;
}

Where we generate

testcase:
dup d2, v0.d[0]
dup d3, v1.d[0]
pmull2  v0.1q, v0.2d, v1.2d
pmull   v2.1q, v2.1d, v3.1d
add d0, d2, d0
fmovx0, d0
ret

whereas it should have been, which clang generates:

testcase:
pmull   v2.1q, v0.1d, v1.1d
pmull2  v0.1q, v0.2d, v1.2d
add v0.2d, v0.2d, v2.2d
fmovx0, d0
ret

Now this can be naively solved by just adding the RTL patterns for the
vec_selects as the example in the patch, but this doesn't solve the overall
problem and I am wondering how to best do this.

One approach would be to extend combine's noop detection in noop_move_p to
recognize these cases.

The downside here is that the conversion becomes implicit in the rtl. i.e.
you'll see a SET of a V2DI but a use of DI for that same register.  I'm not sure
the semantics of RTL allow such implicit uses?

The second approach I can think of is to extend reload to recognize these no-ops
and give the same register and mark the extract as unused such that DSE cleans
it up.

But there's probably a better approach I didn't think of :)

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_crypto_pmullv2di): Example RTL.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pmull_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
05d18f8bd3ac09c56c82dc73cff855315eb302b7..7bdb93869dbbedc786575b5f89f39c4c6d0d76d0
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7231,6 +7231,20 @@ (define_insn "aarch64_crypto_pmulldi"
   [(set_attr "type" "crypto_pmull")]
 )
 
+(define_insn "*aarch64_crypto_pmullv2di"
+  [(set (match_operand:TI 0 "register_operand" "=w")
+(unspec:TI  [(vec_select:DI
+   (match_operand:V2DI 1 "register_operand" "w")
+   (parallel [
+ (match_operand:SI 2 "const_int_operand" "Z")]))
+(match_operand:DI 3 "register_operand" "w")]
+   UNSPEC_PMULL))]
+ "TARGET_SIMD && TARGET_AES"
+ "pmull\\t%0.1q, %1.1d, %3.1d"
+  [(set_attr "type" "crypto_pmull")]
+)
+
+
 (define_insn "aarch64_crypto_pmullv2di"
  [(set (match_operand:TI 0 "register_operand" "=w")
(unspec:TI [(match_operand:V2DI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/pmull_2.c 
b/gcc/testsuite/gcc.target/aarch64/pmull_2.c
new file mode 100644
index 
..d9d47518fab2b582329b6332e3a9c7d97c148192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pmull_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a+crypto -O3" } */
+
+#include "arm_neon.h"
+
+poly64_t
+testcase (uint8x16_t input, poly64x2_t mask)
+{
+poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)input, 0),
+  vgetq_lane_p64(mask, 0));
+poly64_t prodH = vmull_high_p64((poly64x2_t)input, mask);
+return prodL + prodH;
+}
+
+/* { dg-final { scan-assembler-times "pmull\\tv" 1 } } */


-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 05d18f8bd3ac09c56c82dc73cff855315eb302b7..7bdb93869dbbedc786575b5f89f39c4c6d0d76d0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7231,6 +7231,20 @@ (define_insn "aarch64_crypto_pmulldi"
   [(set_attr "type" "crypto_pmull")]
 )
 
+(define_insn "*aarch64_crypto_pmullv2di"
+  [(set (match_operand:TI 0 "register_operand" "=w")
+(unspec:TI  [(vec_select:DI
+			(match_operand:V2DI 1 "register_operand" "w")
+			(parallel [
+			  (match_operand:SI 2 "const_int_operand" "Z")]))
+		 (match_operand:DI 3 "register_operand" "w")]
+		UNSPEC_PMULL))]
+ "TARGET_SIMD && TARGET_AES"
+ "pmull\\t%0.1q, %1.1d, %3.1d"
+  [(set_attr "type" "crypto_pmull")]
+)
+
+
 (define_insn "aarch64_crypto_pmullv2di"
  [(set (match_operand:TI 0 "register_operand" "=w")
(unspec:TI [(match_operand:V2DI 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/pmull_2.c b/gcc/testsuite/gcc.target/aarch64/pmull_2.c
new file mode 100644
index ..d9d47518fab2b582329b6332e3a9c7d97c148192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pmull_2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-addition

[committed] aarch64: Improve vcombine codegen [PR89057]

2021-01-04 Thread Richard Sandiford via Gcc-patches
This patch fixes a codegen regression in the handling of things like:

  __temp.val[0]  \
= vcombine_##funcsuffix (__b.val[0], \
 vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \

in the 64-bit vst[234] functions.  The zero was forced into a
register at expand time, and we relied on combine to fuse the
zero and combine back together into a single combinez pattern.
The problem is that the zero could be hoisted before combine
gets a chance to do its thing.

Tested on aarch64-linux-gnu and aarch64_be-elf.  Pushed to trunk so far.
Since it's a regression from GCC 7, I'll backport as far it can go
without modification.

Richard


gcc/
PR target/89057
* config/aarch64/aarch64-simd.md (aarch64_combine): Accept
aarch64_simd_reg_or_zero for operand 2.  Use the combinez patterns
to handle zero operands.

gcc/testsuite/
PR target/89057
* gcc.target/aarch64/pr89057.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md | 15 ---
 gcc/testsuite/gcc.target/aarch64/pr89057.c | 16 
 2 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr89057.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index c4e3b896295..85770c84f0a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3340,11 +3340,20 @@ (define_insn "@aarch64_combinez_be"
 (define_expand "aarch64_combine"
   [(match_operand: 0 "register_operand")
(match_operand:VDC 1 "register_operand")
-   (match_operand:VDC 2 "register_operand")]
+   (match_operand:VDC 2 "aarch64_simd_reg_or_zero")]
   "TARGET_SIMD"
 {
-  aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
-
+  if (operands[2] == CONST0_RTX (mode))
+{
+  if (BYTES_BIG_ENDIAN)
+   emit_insn (gen_aarch64_combinez_be (operands[0], operands[1],
+ operands[2]));
+  else
+   emit_insn (gen_aarch64_combinez (operands[0], operands[1],
+  operands[2]));
+}
+  else
+aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
   DONE;
 }
 )
diff --git a/gcc/testsuite/gcc.target/aarch64/pr89057.c 
b/gcc/testsuite/gcc.target/aarch64/pr89057.c
new file mode 100644
index 000..1e200245ddd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr89057.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O3" } */
+
+#include 
+
+void
+f (int32_t *dst, int32_t *src, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  int32x2x3_t a = vld3_s32 (src + i * 6);
+  int32x2x3_t b = { a.val[2], a.val[1], a.val[0] };
+  vst3_s32 (dst + i * 6, b);
+}
+}
+
+/* { dg-final { scan-assembler-not {\tins\t} } } */


[PATCH] tree-optimization/98282 - classify V_C_E as nary

2021-01-04 Thread Richard Biener
This avoids running into memory reference code in compute_avail by
properly classifying unfolded reference trees on constants.

Bootstrapped & tested on x86_64-unknown-linux-gnu, pushed.

2021-01-04  Richard Biener  

PR tree-optimization/98282
* tree-ssa-sccvn.c (vn_get_stmt_kind): Classify tcc_reference on
invariants as VN_NARY.

* g++.dg/opt/pr98282.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr98282.C | 80 ++
 gcc/tree-ssa-sccvn.c   |  3 +-
 2 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr98282.C

diff --git a/gcc/testsuite/g++.dg/opt/pr98282.C 
b/gcc/testsuite/g++.dg/opt/pr98282.C
new file mode 100644
index 000..545084104d5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr98282.C
@@ -0,0 +1,80 @@
+// PR tree-optimization/98282
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2" }
+
+template  struct g;
+template  struct g { typedef b c; };
+template  typename g::c &&d(b &&e) {
+  return static_cast::c &&>(e);
+}
+void *operator new(__SIZE_TYPE__, void *f) { return f; }
+struct h;
+struct k {
+  using i = h *;
+};
+struct D {
+  k::i j;
+};
+struct p : D {
+  p(p &&) : D() {}
+};
+struct r {
+  using l = int;
+  r(r &&) : ad() {}
+  l *ad;
+};
+struct s {
+  static s m();
+};
+struct t {
+  template  void operator=(ah);
+};
+struct I {
+  template  void q(o ai) {
+*ai = aj();
+s::m();
+  }
+  h aj();
+};
+template  class as;
+struct J {
+  int a;
+  char av;
+};
+template  struct aw : J {
+  void ax(...) {}
+};
+template 
+struct aw, an, n...> : aw, n...> {
+  using az = as;
+  using ba = aw;
+  char bb;
+  an &bc() { return *reinterpret_cast(this); }
+  void ax(az *bd) {
+if (bb)
+  new (bd) an(d(bc()));
+ba::ax(bd);
+  }
+};
+template  struct as : aw, n...> {
+  as();
+  as(as &&be) { be.ax(this); }
+  void operator=(as be) { be.ax(this); }
+};
+struct h {
+  as bg;
+};
+using bh = t;
+struct u {
+  bh bj;
+};
+I bk();
+template  void bl() {
+  h a;
+  bk().q(&a);
+}
+template  void bn(int) {
+  u b;
+  b.bj = bl;
+}
+void bp() { bn(0); }
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index d944b9565ac..19defc0 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -543,7 +543,8 @@ vn_get_stmt_kind (gimple *stmt)
 || code == IMAGPART_EXPR
 || code == VIEW_CONVERT_EXPR
 || code == BIT_FIELD_REF)
-   && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
+   && (TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME
+   || is_gimple_min_invariant (TREE_OPERAND (rhs1, 0
  return VN_NARY;
 
/* Fallthrough.  */
-- 
2.26.2


[committed] aarch64: Use the MUL VL form of SVE PRF[BHWD]

2021-01-04 Thread Richard Sandiford via Gcc-patches
The expansions of the svprf[bhwd] instructions weren't taking
advantage of the immediate addressing mode.

Tested on aarch64-linux-gnu and aarch64_be-elf.  Pushed to trunk so far.
Will backport to GCC 10 “soon”.

Richard


gcc/
* config/aarch64/aarch64.c (offset_6bit_signed_scaled_p): New function.
(offset_6bit_unsigned_scaled_p): Fix typo in comment.
(aarch64_sve_prefetch_operand_p): Accept MUL VLs in the range
[-32, 31].

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/prfb.c: Test for a MUL VL range of
[-32, 31].
* gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise.
---
 gcc/config/aarch64/aarch64.c  | 17 +--
 .../gcc.target/aarch64/sve/acle/asm/prfb.c| 47 +--
 .../gcc.target/aarch64/sve/acle/asm/prfd.c| 47 +--
 .../gcc.target/aarch64/sve/acle/asm/prfh.c| 47 +--
 .../gcc.target/aarch64/sve/acle/asm/prfw.c| 47 +--
 5 files changed, 146 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5dbb9aa8924..a96b84cd927 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7436,7 +7436,18 @@ offset_4bit_signed_scaled_p (machine_mode mode, 
poly_int64 offset)
  && IN_RANGE (multiple, -8, 7));
 }
 
-/* Return true if OFFSET is a unsigned 6-bit value multiplied by the size
+/* Return true if OFFSET is a signed 6-bit value multiplied by the size
+   of MODE.  */
+
+static inline bool
+offset_6bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+  HOST_WIDE_INT multiple;
+  return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -32, 31));
+}
+
+/* Return true if OFFSET is an unsigned 6-bit value multiplied by the size
of MODE.  */
 
 static inline bool
@@ -18494,11 +18505,11 @@ bool
 aarch64_sve_prefetch_operand_p (rtx op, machine_mode mode)
 {
   struct aarch64_address_info addr;
-  if (!aarch64_classify_address (&addr, op, mode, false))
+  if (!aarch64_classify_address (&addr, op, mode, false, ADDR_QUERY_ANY))
 return false;
 
   if (addr.type == ADDRESS_REG_IMM)
-return known_eq (addr.const_offset, 0);
+return offset_6bit_signed_scaled_p (mode, addr.const_offset);
 
   return addr.type == ADDRESS_REG_REG;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c
index d2b2777e66f..c90730a037c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c
@@ -200,8 +200,7 @@ TEST_PREFETCH (prfb_vnum_0, uint8_t,
 
 /*
 ** prfb_vnum_1:
-** incbx0
-** prfbpldl1keep, p0, \[x0\]
+** prfbpldl1keep, p0, \[x0, #1, mul vl\]
 ** ret
 */
 TEST_PREFETCH (prfb_vnum_1, uint16_t,
@@ -209,24 +208,44 @@ TEST_PREFETCH (prfb_vnum_1, uint16_t,
   svprfb_vnum (p0, x0, 1, SV_PLDL1KEEP))
 
 /*
-** prfb_vnum_2:
-** incbx0, all, mul #2
-** prfbpldl1keep, p0, \[x0\]
+** prfb_vnum_31:
+** prfbpldl1keep, p0, \[x0, #31, mul vl\]
 ** ret
 */
-TEST_PREFETCH (prfb_vnum_2, uint32_t,
-  svprfb_vnum (p0, x0, 2, SV_PLDL1KEEP),
-  svprfb_vnum (p0, x0, 2, SV_PLDL1KEEP))
+TEST_PREFETCH (prfb_vnum_31, uint16_t,
+  svprfb_vnum (p0, x0, 31, SV_PLDL1KEEP),
+  svprfb_vnum (p0, x0, 31, SV_PLDL1KEEP))
 
 /*
-** prfb_vnum_3:
-** incbx0, all, mul #3
-** prfbpldl1keep, p0, \[x0\]
+** prfb_vnum_32:
+** cntd(x[0-9]+)
+** lsl (x[0-9]+), \1, #?8
+** add (x[0-9]+), (\2, x0|x0, \2)
+** prfbpldl1keep, p0, \[\3\]
+** ret
+*/
+TEST_PREFETCH (prfb_vnum_32, uint16_t,
+  svprfb_vnum (p0, x0, 32, SV_PLDL1KEEP),
+  svprfb_vnum (p0, x0, 32, SV_PLDL1KEEP))
+
+/*
+** prfb_vnum_m32:
+** prfbpldl1keep, p0, \[x0, #-32, mul vl\]
+** ret
+*/
+TEST_PREFETCH (prfb_vnum_m32, uint16_t,
+  svprfb_vnum (p0, x0, -32, SV_PLDL1KEEP),
+  svprfb_vnum (p0, x0, -32, SV_PLDL1KEEP))
+
+/*
+** prfb_vnum_m33:
+** ...
+** prfbpldl1keep, p0, \[x[0-9]+\]
 ** ret
 */
-TEST_PREFETCH (prfb_vnum_3, uint64_t,
-  svprfb_vnum (p0, x0, 3, SV_PLDL1KEEP),
-  svprfb_vnum (p0, x0, 3, SV_PLDL1KEEP))
+TEST_PREFETCH (prfb_vnum_m33, uint16_t,
+  svprfb_vnum (p0, x0, -33, SV_PLDL1KEEP),
+  svprfb_vnum (p0, x0, -33, SV_PLDL1KEEP))
 
 /*
 ** prfb_vnum_x1:
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c
index 72b2e64154e..869ef3d3eeb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c
@@ -200,8 +200,7 @@ TEST_PREFETCH (prfd_vnum_0, u

Re: docs: Fix wording describing the hwaddress sanitizer

2021-01-04 Thread Gerald Pfeifer
On Mon, 4 Jan 2021, Martin Liška wrote:
> I support the suggested patch.

Thank you, Martin.

Matthew, I would argue that for patches like this you do not need to 
seek approval (though you are aways welcome to seek review).

And thanks for clarifying this wording. 

For the benefit of the doubt: This is ok. :)

Gerald


[PATCH] vect, aarch64: Fix alignment units for IFN_MASK* [PR95401]

2021-01-04 Thread Richard Sandiford via Gcc-patches
The IFN_MASK* functions take two leading arguments: a load or
store pointer and a “cookie”.  The type of the cookie is the
type of the access for TBAA purposes (like for MEM_REFs)
while the value of the cookie is the alignment of the access.
This PR was caused by a disagreement about whether the alignment
is measured in bits or bytes.

It looks like this goes back to PR68786, which made the
vectoriser create its own cookie argument rather than reusing
the one created by ifcvt.  The alignment value of the new cookie
was measured in bytes (as needed by set_ptr_info_alignment)
while the existing code expected it to be measured in bits.
The folds I added for IFN_MASK_LOAD and STORE then made
things worse.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK for the explow bits?

Richard


gcc/
PR tree-vectorization/95401
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::load_store_cookie): Use bits rather than bytes
for the alignment argument to IFN_MASK_LOAD and IFN_MASK_STORE.
* gimple-fold.c (gimple_fold_mask_load_store_mem_ref): Likewise.
* tree-vect-stmts.c (vectorizable_store): Likewise.
(vectorizable_load): Likewise.

gcc/testsuite/
PR tree-vectorization/95401
* g++.dg/vect/pr95401.cc: New test.
* g++.dg/vect/pr95401a.cc: Likewise.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc |  2 +-
 gcc/gimple-fold.c  |  2 +-
 gcc/testsuite/g++.dg/vect/pr95401.cc   | 13 +
 gcc/testsuite/g++.dg/vect/pr95401a.cc  | 13 +
 gcc/tree-vect-stmts.c  | 14 --
 5 files changed, 36 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr95401.cc
 create mode 100644 gcc/testsuite/g++.dg/vect/pr95401a.cc

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index e73aa9ad8a9..6589438855a 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2580,7 +2580,7 @@ gimple_folder::fold_contiguous_base (gimple_seq &stmts, 
tree vectype)
 tree
 gimple_folder::load_store_cookie (tree type)
 {
-  return build_int_cst (build_pointer_type (type), TYPE_ALIGN_UNIT (type));
+  return build_int_cst (build_pointer_type (type), TYPE_ALIGN (type));
 }
 
 /* Fold the call to a call to INSTANCE, with the same arguments.  */
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3148c6b84d9..1f3b7d0818d 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -5201,7 +5201,7 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree 
vectype)
   if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
 return NULL_TREE;
 
-  unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align) * BITS_PER_UNIT;
+  unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
   if (TYPE_ALIGN (vectype) != align)
 vectype = build_aligned_type (vectype, align);
   tree offset = build_zero_cst (TREE_TYPE (alias_align));
diff --git a/gcc/testsuite/g++.dg/vect/pr95401.cc 
b/gcc/testsuite/g++.dg/vect/pr95401.cc
new file mode 100644
index 000..6a56dab0957
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr95401.cc
@@ -0,0 +1,13 @@
+// { dg-additional-options "-mavx2 -O3" { target avx2_runtime } }
+// { dg-additional-sources pr95401a.cc }
+
+extern int var_9;
+extern unsigned var_14;
+extern int arr_16[];
+#include 
+void test() {
+  for (short a = 0; a < (short)var_9; a += 12140)
+for (short b = 0; b < 8; b++)
+  if (std::max(var_14, 1U))
+arr_16[a + b] = 0;
+}
diff --git a/gcc/testsuite/g++.dg/vect/pr95401a.cc 
b/gcc/testsuite/g++.dg/vect/pr95401a.cc
new file mode 100644
index 000..71b054c7621
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr95401a.cc
@@ -0,0 +1,13 @@
+// { dg-do compile }
+
+#include "../../gcc.dg/vect/tree-vect.h"
+
+int var_9 = 1693986256, var_14;
+int arr_16[11];
+void test();
+int main()
+{
+  check_vect();
+  test();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 11737a38a56..90822aba0ee 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7964,7 +7964,7 @@ vectorizable_store (vec_info *vinfo,
  /* Emit:
   MASK_STORE_LANES (DATAREF_PTR, ALIAS_PTR, VEC_MASK,
 VEC_ARRAY).  */
- unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
+ unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype));
  tree alias_ptr = build_int_cst (ref_type, align);
  call = gimple_build_call_internal (IFN_MASK_STORE_LANES, 4,
 dataref_ptr, alias_ptr,
@@ -8079,7 +8079,7 @@ vectorizable_store (vec_info *vinfo,
  if (final_mask)
{
  align = least_bit_hwi (misalign | align);
- tree ptr = build_int_cst (ref_type, align);
+ tree ptr = build_in

Re: docs: Fix wording describing the hwaddress sanitizer

2021-01-04 Thread Martin Liška

On 1/4/21 12:15 PM, Matthew Malcomson wrote:

|This patch fixes that wording by being more explicit.|


I support the suggested patch.

Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-04 Thread Martin Liška

On 1/4/21 12:01 PM, Martin Liška wrote:

Anyway, I'm going to update server hook first and I'll create an issue for 
GitPython.


So I was not correct about this. Also the server hooks uses now GitPython
to identify modified files.

I've just created an issue for that:
https://github.com/gitpython-developers/GitPython/issues/1099

Martin


[PATCH] explow, aarch64: Fix force-Pmode-to-mem for ILP32 [PR97269]

2021-01-04 Thread Richard Sandiford via Gcc-patches
This patch fixes a mode/rtx mismatch for ILP32 targets in:

  mem = force_const_mem (ptr_mode, imm);

where imm can be Pmode rather than ptr_mode.

The patch uses convert_memory_address to convert the Pmode address
to ptr_mode before the call.  However, immediate addresses can in
general contain unspecs, and convert_memory_address wasn't set up
to handle those.

The patch therefore adds some generic unspec handling to
convert_memory_address_addr_space_1.  As the comment says, we can add
a target hook if this behaviour turns out to be wrong for some targets.
But I think what the patch does is a strict improvement over the status
quo: without it, we would try to force the unspec into a register,
but nevertheless wrap the result in a (const ...).  That in turn
would be invalid rtl and seems bound to generate an ICE later.

I tested the explow.c part using -fstack-protector with local hacks
to force SYMBOL_FORCE_TO_MEM for UNSPEC_SALT_ADDR.

Fixes c-c++-common/torture/pr57945.c and various other tests.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK for the explow bits?

Richard


gcc/
PR target/97269
* explow.c (convert_memory_address_addr_space_1): Handle UNSPECs
nested in CONSTs.
* config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use
convert_memory_address to convert symbolic immediates to ptr_mode
before forcing them to memory.
---
 gcc/config/aarch64/aarch64.c |  5 -
 gcc/explow.c | 20 
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ed09fe0c6f8..d3ca6dc037e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5224,8 +5224,11 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
+ if (int_mode != ptr_mode)
+   imm = convert_memory_address (ptr_mode, imm);
+
  if (const_offset != 0
- && targetm.cannot_force_const_mem (int_mode, imm))
+ && targetm.cannot_force_const_mem (ptr_mode, imm))
{
  gcc_assert (can_create_pseudo_p ());
  base = aarch64_force_temporary (int_mode, dest, base);
diff --git a/gcc/explow.c b/gcc/explow.c
index 65f904a3d14..b8c9fbed03e 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -378,6 +378,26 @@ convert_memory_address_addr_space_1 (scalar_int_mode 
to_mode ATTRIBUTE_UNUSED,
}
   break;
 
+case UNSPEC:
+  /* Assume that all UNSPECs in a constant address can be converted
+operand-by-operand.  We could add a target hook if some targets
+require different behavior.  */
+  if (in_const && GET_MODE (x) == from_mode)
+   {
+ unsigned int n = XVECLEN (x, 0);
+ rtvec v = gen_rtvec (n);
+ for (unsigned int i = 0; i < n; ++i)
+   {
+ rtx op = XVECEXP (x, 0, i);
+ if (GET_MODE (op) == from_mode)
+   op = convert_memory_address_addr_space_1 (to_mode, op, as,
+ in_const, no_emit);
+ RTVEC_ELT (v, i) = op;
+   }
+ return gen_rtx_UNSPEC (to_mode, v, XINT (x, 1));
+   }
+  break;
+
 default:
   break;
 }


Re: [PATCH] loop-niter: Recognize popcount idioms even with char, short and __int128 [PR95771]

2021-01-04 Thread Richard Biener
On Sun, 3 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, we punt unnecessarily on popcount loop idioms if
> the type is smaller than int or larger than long long.
> Smaller type than int can be handled by zero-extending the argument to
> unsigned int, and types twice as long as long long by doing
> __builtin_popcountll on both halves of the __int128.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-01-03  Jakub Jelinek  
> 
>   PR tree-optimization/95771
>   * tree-ssa-loop-niter.c (number_of_iterations_popcount): Handle types
>   with precision smaller than int's precision and types with precision
>   twice as large as long long.  Formatting fixes.
> 
>   * gcc.target/i386/pr95771.c: New test.
> 
> --- gcc/tree-ssa-loop-niter.c.jj  2020-10-12 12:30:34.348813027 +0200
> +++ gcc/tree-ssa-loop-niter.c 2021-01-02 19:38:17.858170445 +0100
> @@ -2666,27 +2666,45 @@ number_of_iterations_popcount (loop_p lo
>  
>/* We found a match. Get the corresponding popcount builtin.  */
>tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
> -  if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))
> +  if (TYPE_PRECISION (TREE_TYPE (src)) <= TYPE_PRECISION (integer_type_node))
>  fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> -  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
> -(long_integer_type_node))
> +  else if (TYPE_PRECISION (TREE_TYPE (src))
> +== TYPE_PRECISION (long_integer_type_node))
>  fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
> -  else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
> -(long_long_integer_type_node))
> +  else if (TYPE_PRECISION (TREE_TYPE (src))
> +== TYPE_PRECISION (long_long_integer_type_node)
> +|| (TYPE_PRECISION (TREE_TYPE (src))
> +== 2 * TYPE_PRECISION (long_long_integer_type_node)))
>  fn = builtin_decl_implicit (BUILT_IN_POPCOUNTLL);
>  
> -  /* ??? Support promoting char/short to int.  */
>if (!fn)
>  return false;
>  
>/* Update NITER params accordingly  */
>tree utype = unsigned_type_for (TREE_TYPE (src));
>src = fold_convert (utype, src);
> -  tree call = fold_convert (utype, build_call_expr (fn, 1, src));
> +  if (TYPE_PRECISION (TREE_TYPE (src)) < TYPE_PRECISION (integer_type_node))
> +src = fold_convert (unsigned_type_node, src);
> +  tree call;
> +  if (TYPE_PRECISION (TREE_TYPE (src))
> +  == 2 * TYPE_PRECISION (long_long_integer_type_node))
> +{
> +  int prec = TYPE_PRECISION (long_long_integer_type_node);
> +  tree src1 = fold_convert (long_long_unsigned_type_node,
> + fold_build2 (RSHIFT_EXPR, TREE_TYPE (src),
> +  unshare_expr (src),
> +  build_int_cst (integer_type_node,
> + prec)));
> +  tree src2 = fold_convert (long_long_unsigned_type_node, src);
> +  call = build_call_expr (fn, 1, src1);
> +  call = fold_build2 (PLUS_EXPR, TREE_TYPE (call), call,
> +   build_call_expr (fn, 1, src2));
> +  call = fold_convert (utype, call);
> +}
> +  else
> +call = fold_convert (utype, build_call_expr (fn, 1, src));
>if (adjust)
> -iter = fold_build2 (MINUS_EXPR, utype,
> - call,
> - build_int_cst (utype, 1));
> +iter = fold_build2 (MINUS_EXPR, utype, call, build_int_cst (utype, 1));
>else
>  iter = call;
>  
> @@ -2703,10 +2721,9 @@ number_of_iterations_popcount (loop_p lo
>if (adjust)
>  {
>tree may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
> -   build_zero_cst
> -   (TREE_TYPE (src)));
> -  niter->may_be_zero =
> - simplify_using_initial_conditions (loop, may_be_zero);
> +   build_zero_cst (TREE_TYPE (src)));
> +  niter->may_be_zero
> + = simplify_using_initial_conditions (loop, may_be_zero);
>  }
>else
>  niter->may_be_zero = boolean_false_node;
> --- gcc/testsuite/gcc.target/i386/pr95771.c.jj2021-01-02 
> 19:53:27.499768266 +0100
> +++ gcc/testsuite/gcc.target/i386/pr95771.c   2021-01-02 19:53:12.782936545 
> +0100
> @@ -0,0 +1,67 @@
> +/* PR tree-optimization/95771 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mpopcnt -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " = __builtin_popcount" 6 "optimized" { 
> target int128 } } } */
> +/* { dg-final { scan-tree-dump-times " = __builtin_popcount" 4 "optimized" { 
> target { ! int128 } } } } */
> +
> +int
> +foo (unsigned char x)
> +{
> +  int i = 0;
> +  while (x)
> +{
> +  x &= x - 1;
> +  ++i;
> +}
> +  return i;
> +}
> +
> +int
> +bar (unsigned short x)
> +{
> +  int i = 0;
> +  while

docs: Fix wording describing the hwaddress sanitizer

2021-01-04 Thread Matthew Malcomson via Gcc-patches
The original documentation added to mention the clash between
-fsanitize=address and -fsanitize=hwaddress used confusing wording trying to
say that -fsanitize=hwaddress is only available on AArch64.

It read as if -fsanitize=address were only supported on AArch64.

This patch fixes that wording by being more explicit.

gcc/ChangeLog:

PR other/98437
* doc/invoke.texi (-fsanitize=address): Fix wording describing
clash with -fsanitize=hwaddress.



### Attachment also inlined for ease of reply###


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
97aea535d900c0145340cc7af9141097ca1cc492..b378e63d6bce12d4dd9293e4c88039298daac9e8
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14672,8 +14672,8 @@ the available options are shown at startup of the 
instrumented program.  See
 
@url{https://github.com/google/sanitizers/wiki/AddressSanitizerFlags#run-time-flags}
 for a list of supported options.
 The option cannot be combined with @option{-fsanitize=thread} or
-@option{-fsanitize=hwaddress}.  Note that the only target this option is
-currently supported on is AArch64.
+@option{-fsanitize=hwaddress}.  Note that the only target
+@option{-fsanitize=hwaddress} is currently supported on is AArch64.
 
 @item -fsanitize=kernel-address
 @opindex fsanitize=kernel-address

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
97aea535d900c0145340cc7af9141097ca1cc492..b378e63d6bce12d4dd9293e4c88039298daac9e8
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14672,8 +14672,8 @@ the available options are shown at startup of the 
instrumented program.  See
 
@url{https://github.com/google/sanitizers/wiki/AddressSanitizerFlags#run-time-flags}
 for a list of supported options.
 The option cannot be combined with @option{-fsanitize=thread} or
-@option{-fsanitize=hwaddress}.  Note that the only target this option is
-currently supported on is AArch64.
+@option{-fsanitize=hwaddress}.  Note that the only target
+@option{-fsanitize=hwaddress} is currently supported on is AArch64.
 
 @item -fsanitize=kernel-address
 @opindex fsanitize=kernel-address



[PATCH] tree-optimization/98393 - properly init matches when failing SLP

2021-01-04 Thread Richard Biener
This zeroes matches when failing SLP discovery because of the
work limit.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-04  Richard Biener  

PR tree-optimization/98393
* tree-vect-slp.c (vect_build_slp_tree): Properly zero matches
when hitting the limit.
---
 gcc/tree-vect-slp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f886d4d876c..2c2cf637e73 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1429,6 +1429,7 @@ vect_build_slp_tree (vec_info *vinfo,
   SLP_TREE_SCALAR_STMTS (res) = vNULL;
   SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def;
   vect_free_slp_tree (res);
+  memset (matches, 0, sizeof (bool) * group_size);
   return NULL;
 }
   --*limit;
-- 
2.26.2


Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-04 Thread Martin Liška

On 12/24/20 1:16 PM, Joel Brobecker wrote:

I have no idea who that is (if it is a single user at all,
if it isn't any user with git write permissions).


CCing Joel, he should help us how to set a git config
that will be used by the server hooks.


I am not sure that requiring both the server and the user to agree
on a non-default configuration value would be a practical idea.


I agree with that but I was unable to find a way how to "decode"
the filenames:



 From what I understand of the problem, I think the proper fix
is really to adapt the git-changelog script to avoid the need
for any assumption about the user's configuration. In particular,
how does the script get the list of files?


On server we use:
git diff HEAD~ --name-status

which works really fine with -z option:
Mcontrib/gcc-changelog/git_repository.pyAšpatně.txt

without it, the patch is quoted as well:

git diff HEAD~ --name-status | cat
M   contrib/gcc-changelog/git_repository.py
A   "\305\241patn\304\233.txt"


Poking around, it looks like
you guys are using the GitPython module, which I'm not familiar with,
unfortunately.  But as a reference point, the git-hooks simply use
the -z option to get the information in raw format, and thus avoids
the problem of filename quoting entirely. Does GitPython support
something similar? For instance, browing the GitPython documentation,
I found attributes a_raw_path and b_raw_path. Could that be the
solution (instead of using a_path and b_path?


Thanks for looking into it. Unfortunately, for a file called "špatně.txt"
I get for a_rawpath and b_rawpath:
b'"\\305\\241patn\\304\\233.txt"' b'"\\305\\241patn\\304\\233.txt"'



Either way, the solution will be independent of the git-hooks,
as I don't think they are actually involved, here.



Anyway, I'm going to update server hook first and I'll create an issue for 
GitPython.

Thanks for help,
Martin


RE: [PR66791][ARM] Replace __builtin_vext* with __buitlin_shuffle in vext intrinsics

2021-01-04 Thread Kyrylo Tkachov via Gcc-patches
Hi Prathamesh

> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 04 January 2021 10:27
> To: gcc Patches ; Kyrylo Tkachov
> 
> Subject: [PR66791][ARM] Replace __builtin_vext* with __buitlin_shuffle in
> vext intrinsics
> 
> Hi Kyrill,
> The attached patch replaces __builtin_vextv8qi with __builtin_shuffle
> for vext_s8.
> Just wanted to confirm if this is in the correct direction ?
> If yes, I will send a follow up patch that converts for all vext intrinsics.

Yeah, that does look correct (aarch64 does it that way).
As before, please make sure to delete any now-unused builtins as well.

Thanks,
Kyrill

> 
> Thanks,
> Prathamesh


[PR66791][ARM] Replace __builtin_vext* with __buitlin_shuffle in vext intrinsics

2021-01-04 Thread Prathamesh Kulkarni via Gcc-patches
Hi Kyrill,
The attached patch replaces __builtin_vextv8qi with __builtin_shuffle
for vext_s8.
Just wanted to confirm if this is in the correct direction ?
If yes, I will send a follow up patch that converts for all vext intrinsics.

Thanks,
Prathamesh


vext-1.diff
Description: Binary data


[PATCH] avr.exp: convert Dos newlines to Unix ones

2021-01-04 Thread Martin Liška

Pushed.

gcc/testsuite/ChangeLog:

* gcc.target/avr/avr.exp: Run dos2unix on the file.
---
 gcc/testsuite/gcc.target/avr/avr.exp | 80 ++--
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/gcc/testsuite/gcc.target/avr/avr.exp 
b/gcc/testsuite/gcc.target/avr/avr.exp
index ed8b297843f..c8af5053400 100644
--- a/gcc/testsuite/gcc.target/avr/avr.exp
+++ b/gcc/testsuite/gcc.target/avr/avr.exp
@@ -1,41 +1,41 @@
 # Copyright (C) 2008-2021 Free Software Foundation, Inc.
-
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 3 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with GCC; see the file COPYING3.  If not see
-# .
-
-# GCC testsuite that uses the `dg.exp' driver.
-
-# Exit immediately if this isn't an AVR target.
-if ![istarget avr-*-*] then {
-  return
-}
-
-# Load support procs.
-load_lib gcc-dg.exp
-
-# If a testcase doesn't have special options, use these.
-global DEFAULT_CFLAGS
-if ![info exists DEFAULT_CFLAGS] then {
-set DEFAULT_CFLAGS " -ansi -pedantic-errors"
-}
-
-# Initialize `dg'.
-dg-init
-
-# Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.{\[cCS\],cpp}]] \
-   "" $DEFAULT_CFLAGS
-
-# All done.
-dg-finish
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AVR target.
+if ![istarget avr-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.{\[cCS\],cpp}]] \
+   "" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
--
2.29.2



  1   2   >