Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-19 Thread Anders Magnusson

Morning Maciej,

  Then there is a fix for the PDP11 backend addressing an issue I found in
the handling of floating-point comparisons.  Unlike all the other changes
this one has not been regression-tested, not even built as I have no idea
how to prepare a development environment for a PDP11 target (also none of
my VAX pieces is old enough to support PDP11 machine code execution).
You could use simh /w 2.11BSD, or if you want to test it on real 
hardware I have a 11/83 where you could test?


-- R


Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Richard Biener via Gcc-patches
On Fri, Nov 20, 2020 at 12:58 AM Segher Boessenkool
 wrote:
>
> On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > > No, the vast majority of people will *not* (consciously) use them,
> > > because the target defaults will set things to useful values.
> > >
> > > The compiler could use saner "generic" defaults perhaps, but those will
> > > still not be satisfactory for anyone (except when they aren't generic in
> > > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > > important for performance.
> > Then fix the heuristics, don't add new PARAMS :-)
>
> I just said that cannot work?
>
> > It didn't even occur to me until now that you may be pushing to have the
> > ppc backend have different values for the PARAMS.  I would strongly
> > discourage that.  It's been a huge headache in the s390 backend already.
>
> It also makes a huge performance difference.  That the generic parts
> of GCC are only tuned for x86 (or not well tuned for anything?) is a
> huge roadblock for us.
>
> I am not saying we should have six hundred different tunings.  But we
> need a few (and we already *have* a few, not params but generic flags,
> just like many other targets fwiw).
>
> We *do* have a few custom param settings already, just like aarch64,
> ia64, and sh, actually.
>
> > >> In  my mind fixing things so they work with no magic arguments is best.
> > >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> > >> in between.  Others may clearly have different opinions here.
> > > There is no big difference between params and flags here, IMO -- it has
> > > to be a -f with a value as well, for good results.
> > Which is a signal that we have a deeper problem.  -f with a value is no
> > different than a param.
>
> Yes exactly.
>
> > > Since we have (almost) all such tunings in --param already, I'd say this
> > > one belongs there as well?
> > I'm not convinced at this point.
>
> Why not?
>
> We have way many params, yes.

--params were introduced to avoid "magic numbers" in code and at the
same time not overwhelm users with many -f options.  That they are
runtime-controllable was probably done because we could and because
it's nice for GCC developers.

>  But the first step to counteract that
> would be to deprecate and get rid of many existing ones, not to block
> having new ones which can be useful (while many of the existing ones are
> not).

Not sure about this - sure, if heuristic can be simplified to use N < M
(previous) "magic" numbers that's better.  But if "deprecating" just
involves pasting the current --param default literally into the heuristcs
then no, please not.

For this particular patch the question is if the heuristic is sound,
not the particular magic number.  And I have no opinion about this
(being this is the RTL unroller).

Richard.

>
> Or, we could accept that it is not really a problem at all.  You seem to
> have a strong opinion that it *is*, but I don't understand that; maybe
> you can explain a bit more?
>
> Thanks,
>
>
> Segher


Re: [PATCH] libsanitizer: fix SIGSEGV in fopen64 interceptor

2020-11-19 Thread Martin Liška

On 11/20/20 8:44 AM, Vyacheslav Barinov wrote:

Hello,

Okay, I proposed this check to upstream [1] and it has already been
accepted.


Hello.

Great. Please commit it to the llvm-project upstream and I'll make then
the patch cherry-pick.


We can either apply the fix or postpone it until next sync with
upstream.

Anyway the bug doesn't seem so bad if we were the only team who faced it during
all this time.


I see! But we still want to cherry-pick it.

Thanks,
Martin



Best Regards,
Vyacheslav Barinov

[1]: https://reviews.llvm.org/D91782

Martin Liška  writes:


On 11/19/20 12:28 PM, Slava Barinov via Gcc-patches wrote:

Null pointer in path argument leads to SIGSEGV in interceptor.


Hello.

I can't see we ever had the null check in master. I don't this it was lost
during a merge from master.

Why do we need the hunk?
Thanks,
Martin


libsanitizer/ChangeLog:
  * sanitizer_common/sanitizer_common_interceptors.inc: Check
path for null before dereference in fopen64 interceptor.
---
Notes:
  Apparently check has been lost during merge from upstream
   libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
index 729eead43c0..2ef23d9a50b 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
@@ -6081,7 +6081,7 @@ INTERCEPTOR(__sanitizer_FILE *, freopen, const char 
*path, const char *mode,
   INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char *mode) 
{
 void *ctx;
 COMMON_INTERCEPTOR_ENTER(ctx, fopen64, path, mode);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
+  if (path) COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
 COMMON_INTERCEPTOR_READ_RANGE(ctx, mode, REAL(strlen)(mode) + 1);
 __sanitizer_FILE *res = REAL(fopen64)(path, mode);
 COMMON_INTERCEPTOR_FILE_OPEN(ctx, res, path);







Re: [PATCH] libsanitizer: fix SIGSEGV in fopen64 interceptor

2020-11-19 Thread Vyacheslav Barinov via Gcc-patches
Hello,

Okay, I proposed this check to upstream [1] and it has already been
accepted. We can either apply the fix or postpone it until next sync with
upstream.

Anyway the bug doesn't seem so bad if we were the only team who faced it during
all this time.

Best Regards,
Vyacheslav Barinov

[1]: https://reviews.llvm.org/D91782

Martin Liška  writes:

> On 11/19/20 12:28 PM, Slava Barinov via Gcc-patches wrote:
>> Null pointer in path argument leads to SIGSEGV in interceptor.
>
> Hello.
>
> I can't see we ever had the null check in master. I don't this it was lost
> during a merge from master.
>
> Why do we need the hunk?
> Thanks,
> Martin
>
>> libsanitizer/ChangeLog:
>>  * sanitizer_common/sanitizer_common_interceptors.inc: Check
>>  path for null before dereference in fopen64 interceptor.
>> ---
>> Notes:
>>  Apparently check has been lost during merge from upstream
>>   libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
>> b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
>> index 729eead43c0..2ef23d9a50b 100644
>> --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
>> +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
>> @@ -6081,7 +6081,7 @@ INTERCEPTOR(__sanitizer_FILE *, freopen, const char 
>> *path, const char *mode,
>>   INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char 
>> *mode) {
>> void *ctx;
>> COMMON_INTERCEPTOR_ENTER(ctx, fopen64, path, mode);
>> -  COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
>> +  if (path) COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 
>> 1);
>> COMMON_INTERCEPTOR_READ_RANGE(ctx, mode, REAL(strlen)(mode) + 1);
>> __sanitizer_FILE *res = REAL(fopen64)(path, mode);
>> COMMON_INTERCEPTOR_FILE_OPEN(ctx, res, path);
>> 



Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Richard Biener via Gcc-patches
On Fri, Nov 20, 2020 at 12:30 AM Eric Botcazou  wrote:
>
> > Successfully bootstrapped/regtested on x86_64-linux and i686-linux,
> > including make install which looked problematic in PR97911.
> >
> > Ok for trunk?
>
> I cannot really approve, but this looks like a step in the right direction.

OK.

Thanks,
Richard.

> --
> Eric Botcazou
>
>


Re: [PATCH] Remove lambdas from _Rb_tree

2020-11-19 Thread François Dumont via Gcc-patches

Here is what I am testing.

I use your enum proposal as an alias for the bool type. I cannot use it 
as template parameter on _M_copy unless I put it at std namespace level 
to use it in definition of the outline _M_copy overload.


I also added some tests checking correct usage of 
__move_if_noexcept_cond. I prefer not to change this condition as 
proposed in this patch.


I wonder if I am right to check moved values in those tests ?

I also wonder after writing those tests if we shouldn't clear the moved 
instance, especially when values are moved ? I remember seeing some 
discussion about this but I don't know the conclusion.


    libstdc++: _Rb_tree code cleanup, remove lambdas

    Use new template parameters to replace usage of lambdas to move or not
    tree values on copy.

    libstdc++-v3/ChangeLog:

    * include/bits/move.h (_GLIBCXX_FWDREF): New.
    * include/bits/stl_tree.h: Adapt to use latter.
    (_Rb_tree<>::_M_clone_node): Add _MoveValue template parameter.
    (_Rb_tree<>::_M_mbegin): New.
    (_Rb_tree<>::_M_begin): Use latter.
    (_Rb_tree<>::_M_copy): Add _MoveValues template parameter.
    * testsuite/23_containers/map/allocator/move_cons.cc: New test.
    * testsuite/23_containers/multimap/allocator/move_cons.cc: 
New test.
    * testsuite/23_containers/multiset/allocator/move_cons.cc: 
New test.

    * testsuite/23_containers/set/allocator/move_cons.cc: New test.

Ok to commit once all tests have complete ?

François

On 19/11/20 12:31 pm, Jonathan Wakely wrote:

On 19/11/20 07:46 +0100, François Dumont via Libstdc++ wrote:

On 18/11/20 12:50 am, Jonathan Wakely wrote:

On 17/11/20 21:51 +0100, François Dumont via Libstdc++ wrote:
This is a change that has been done to _Hashtable and that I forgot 
to propose for _Rb_tree.


The _GLIBCXX_XREF macro can be easily removed of course.

    libstdc++: _Rb_tree code cleanup, remove lambdas.

    Use an additional template parameter on the clone 
method to propagate if the values must be

    copy or move rather than lambdas.

    libstdc++-v3/ChangeLog:

            * include/bits/move.h 
(_GLIBCXX_XREF): New.
            * 
include/bits/stl_tree.h: Adapt to use latter.
           
(_Rb_tree<>::_S_fwd_value_for): New.
           
(_Rb_tree<>::_M_clone_node): Add _Tree template parameter.

            Use _S_fwd_value_for.
           
(_Rb_tree<>::_M_cbegin): New.
           (_Rb_tree<>::_M_begin): 
Use latter.
           (_Rb_tree<>::_M_copy): 
Add _Tree template parameter.
           
(_Rb_tree<>::_M_move_data): Use rvalue reference for _Rb_tree 
parameter.
           
(_Rb_tree<>::_M_move_assign): Likewise.


Tested under Linux x86_64.

Ok to commit ?


GCC is in stage 3 now, so this should have been posted last week
really.


Ok, no problem, it can wait.

Still, following your advises here is what I come up with, much 
simpler indeed.


Yes, this simpler patch looks promising even though it's stage 3.


I just run a few tests for the moment but so far so good.

Thanks




diff --git a/libstdc++-v3/include/bits/move.h 
b/libstdc++-v3/include/bits/move.h

index 5a4dbdc823c..e0d68ca9108 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -158,9 +158,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  /// @} group utilities

+#define _GLIBCXX_XREF(_Tp) _Tp&&


I think this does improve the code that uses this. But the correct
name for this is forwarding reference, so I think FWDREF would be
better than XREF. XREF doesn't tell me anything about what it's for.


#define _GLIBCXX_MOVE(__val) std::move(__val)
#define _GLIBCXX_FORWARD(_Tp, __val) std::forward<_Tp>(__val)
#else
+#define _GLIBCXX_XREF(_Tp) const _Tp&
#define _GLIBCXX_MOVE(__val) (__val)
#define _GLIBCXX_FORWARD(_Tp, __val) (__val)
#endif
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h

index ec141ea01c7..128c7e2c892 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -478,11 +478,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg)
-#else
-      operator()(_Arg&& __arg)
-#endif
+      operator()(_GLIBCXX_XREF(_Arg) __arg)
      {
        _Link_type __node = 
static_cast<_Link_type>(_M_extract());

        if (__node)
@@ -544,11 +540,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg) const
-#else
-     

[PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

Segher & Bergner -
  Thanks for the reviews, here's the updated patch after fixing those things.
We now have an UNSPEC for xxsetaccz, and an accompanying change to
rs6000_rtx_costs to make it be cost 0 so that CSE doesn't try to replace it
with a bunch of register moves.

If bootstrap/regtest looks good, ok for trunk?

Thanks,
Aaron

gcc/
* gcc/config/rs6000/mma.md (unspec): Add assemble/extract UNSPECs.
(movoi): Change to movoo.
(*movpoi): Change to *movoo.
(movxi): Change to movxo.
(*movpxi): Change to *movxo.
(mma_assemble_pair): Change to OO mode.
(*mma_assemble_pair): New define_insn_and_split.
(mma_disassemble_pair): New define_expand.
(*mma_disassemble_pair): New define_insn_and_split.
(mma_assemble_acc): Change to XO mode.
(*mma_assemble_acc): Change to XO mode.
(mma_disassemble_acc): New define_expand.
(*mma_disassemble_acc): New define_insn_and_split.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to OO mode.
(mma_): Change to XO/OO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
(mma_): Change to XO/OO mode.
(mma_): Change to XO/OO mode.
(mma_): Change to XO mode.
(mma_): Change to XO mode.
* gcc/config/rs6000/predicates.md (input_operand): Allow opaque.
(mma_disassemble_output_operand): New predicate.
* gcc/config/rs6000/rs6000-builtin.def:
Changes to disassemble builtins.
* gcc/config/rs6000/rs6000-call.c (rs6000_return_in_memory):
Disallow __vector_pair/__vector_quad as return types.
(rs6000_promote_function_mode): Remove function return type
check because we can't test it here any more.
(rs6000_function_arg): Do not allow __vector_pair/__vector_quad
as as function arguments.
(rs6000_gimple_fold_mma_builtin):
Handle mma_disassemble_* builtins.
(rs6000_init_builtins): Create types for XO/OO modes.
* gcc/config/rs6000/rs6000-modes.def: DElete OI, XI,
POI, and PXI modes, and create XO and OO modes.
* gcc/config/rs6000/rs6000-string.c (expand_block_move):
Update to OO mode.
* gcc/config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached):
Update for XO/OO modes.
(rs6000_rtx_costs): Make UNSPEC_MMA_XXSETACCZ cost 0.
(rs6000_modes_tieable_p): Update for XO/OO modes.
(rs6000_debug_reg_global): Update for XO/OO modes.
(rs6000_setup_reg_addr_masks): Update for XO/OO modes.
(rs6000_init_hard_regno_mode_ok): Update for XO/OO modes.
(reg_offset_addressing_ok_p): Update for XO/OO modes.
(rs6000_emit_move): Update for XO/OO modes.
(rs6000_preferred_reload_class): Update for XO/OO modes.
(rs6000_split_multireg_move): Update for XO/OO modes.
(rs6000_mangle_type): Update for opaque types.
(rs6000_invalid_conversion): Update for XO/OO modes.
* gcc/config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P):
Update for XO/OO modes.
* gcc/config/rs6000/rs6000.md (RELOAD): Update for XO/OO modes.
gcc/testsuite/
* gcc.target/powerpc/mma-double-test.c (main): Call abort for failure.
* gcc.target/powerpc/mma-single-test.c (main): Call abort for failure.
* gcc.target/powerpc/pr96506.c: Rename to pr96506-1.c.
* gcc.target/powerpc/pr96506-2.c: New test.
---
 gcc/config/rs6000/mma.md  | 421 ++
 gcc/config/rs6000/predicates.md   |  12 +
 gcc/config/rs6000/rs6000-builtin.def  |  14 +-
 gcc/config/rs6000/rs6000-call.c   | 142 +++---
 gcc/config/rs6000/rs6000-modes.def|  10 +-
 gcc/config/rs6000/rs6000-string.c |   6 +-
 gcc/config/rs6000/rs6000.c| 193 
 gcc/config/rs6000/rs6000.h|   3 +-
 gcc/config/rs6000/rs6000.md   |   2 +-
 .../gcc.target/powerpc/mma-double-test.c  |   3 +
 .../gcc.target/powerpc/mma-single-test.c  |   3 +
 .../powerpc/{pr96506.c => pr96506-1.c}|  24 -
 gcc/testsuite/gcc.target/powerpc/pr96506-2.c  |  38 ++
 13 files changed, 508 insertions(+), 363 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{pr96506.c => pr96506-1.c} (61%)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96506-2.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a3fd28bdd0a..63bb73a01e7 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -19,24 +19,18 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
-;; The MMA patterns use the multi-register PXImode and POImode partial
-;; 

[r11-5185 Regression] FAIL: gcc.dg/pr97515.c scan-tree-dump-times evrp "goto" 1 on Linux/x86_64

2020-11-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

d0d8b5d83614d8f0d0e40c0520d4f40ffa01f8d9 is the first bad commit
commit d0d8b5d83614d8f0d0e40c0520d4f40ffa01f8d9
Author: Andrew MacLeod 
Date:   Thu Nov 19 17:41:30 2020 -0500

Process only valid shift ranges.

caused

FAIL: gcc.dg/pr97515.c scan-tree-dump-times evrp "goto" 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-5185/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr97515.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr97515.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr97515.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr97515.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-19 Thread Maciej W. Rozycki
Hi,

 [Paul, there's a PDP11 piece for you further down here and then 29/31.]

 This is the much-desired refurbishment of the VAX backend.  A little bit 
past the end of Stage 1, which I apologise for and which I do hope is not 
going to make it a no-no for GCC 11.  I feel quite satisfied anyway I was 
able to overcome all the difficulties outside the development itself I was 
faced with throughout this effort and fit it into quite a tight schedule 
between my departure from Western Digital effective Sep 1st and now.

 Special thanks to Anders "Ragge" Magnusson for persuading me, on my trip 
to Luleå, Sweden back in 2015, to adopt Lizzie, his VAXstation 4000/60 he 
used to use for VAX/NetBSD development and decided to part with, as she 
turned out to be the only VAX machine in my possession ready to undertake 
the task of GCC verification, and also quite a mighty one for such a 
mature piece of hardware.

 The port has turned out to have some issues, which I decided to address 
so as not to have to propagate or correct breakage with the MODE_CC update 
itself, hence the 28 preparatory patches.  I might have skipped maybe two 
changes as not really necessary, such as the addition of `movmem' pattern, 
but they were really low-hanging fruit, and then easy to lose if not done 
right away.  I have split MODE_CC conversion test cases off due to the 
size of the change.

 Then there is a fix for the PDP11 backend addressing an issue I found in 
the handling of floating-point comparisons.  Unlike all the other changes 
this one has not been regression-tested, not even built as I have no idea 
how to prepare a development environment for a PDP11 target (also none of 
my VAX pieces is old enough to support PDP11 machine code execution).

 Still I am fairly sure it is a correct change to make, and you should be 
able to confirm it quite easily perhaps by picking the same test case from 
31/31 that I used for the example RTL dump in 28/31 and using it along 
with said dump to match what the PDP11 backend produces.  Maybe you can 
use these test cases for PDP11 verification as well, as they are pretty 
generic except for the assembly match patterns of course.

 These changes have been regression-tested throughout development with the 
`vax-netbsdelf' target running NetBSD 9.0, using said VAXstation 4000/60, 
which uses the Mariah implemementation of the VAX architecture.  The host 
used was `powerpc64le-linux-gnu' and occasionally `x86_64-linux-gnu' as 
well; changes outside the VAX backend were all natively bootstrapped and 
regression-tested with both these hosts.

 Target regression-testing has been done across all the components that 
build (01/31 is required to build libgomp at `-O2), meaning the following 
parts have been excluded for the reasons stated:

1. libada -- not ported to VAX/NetBSD, machine/OS bindings not present.

2. libgfortran -- oddly enough for Fortran a piece requires IEEE 754
   floating-point arithmetic (possibly a porting problem too).

3. libgo -- not ported to VAX/NetBSD, machine/OS bindings are not present.

and the absence of the respective libraries caused failures with the 
respective frontends as well.

 One regression has been nominally caused, in C frontend testing:

FAIL: gcc.dg/lto/pr55660 c_lto_pr55660_0.o-c_lto_pr55660_1.o link, -O2 -flto 
-flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects

however it is a symptom of an unrelated bug in the LTO wrapper, which 
clears the PIC flag unconditionally:

case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
  flag_pic = 0;
  flag_pie = 0;
  flag_shlib = 0;
  break;

and causes a legitimate assembly warning:

/tmp/ccG0X3DQ.s: Assembler messages:
/tmp/ccG0X3DQ.s:17: Warning: Symbol n used as immediate operand in PIC mode.
/tmp/ccG0X3DQ.s:26: Warning: Symbol n used as immediate operand in PIC mode.

similarly to a preexisting failure for the same test case at `-O0':

FAIL: gcc.dg/lto/pr55660 c_lto_pr55660_0.o-c_lto_pr55660_1.o link, -O0 -flto 
-flto-partition=none -fuse-linker-plugin

and numerous other ones.  I'll file a PR to track this problem and see if 
I can address it quickly now that I'm done with the MODE_CC conversion, 
with the understanding that it may not be suitable for GCC 11 at this 
point of the development cycle.

 As I have refreshed the tree again for this submission and verification 
takes short of 48 hours per run, I'll be scheduling another full cycle and 
expect to have updated results in about a week's time as all being well I 
imagine I'll have to go throug three runs for the base results, results 
for the preparatory changes, and then the final results.  I'll see if I 
can arrange and run some benchmarking too.

 See individual change descriptions for details and code quality stats.

 Last not least for easier access I have made these changes available at 
, `users/macro/vax-mode-cc' branch.

 Comments, questions, concerns?

  Maciej


[PATCH 30/31] PR target/95294: VAX: Convert backend to MODE_CC representation

2020-11-19 Thread Maciej W. Rozycki
In the VAX ISA INSV bitfield insert instruction is the only computational
operation that keeps the condition codes, held in the PSL or Processor
Status Longword register, intact.  The instruction is flexible enough it
could potentially be used for data moves post-reload, but then reportedly
it is not the best choice performance-wise, and then we have no addition
operation available that would keep the condition codes unchanged.

Futhermore, as usually with a complex CISC ISA, for many operations we
have several machine instructions or instruction sequences to choose
from that set condition codes in a different manner.

Use the approach then where the condition codes only get introduced by
reload, by definining instruction splitters for RTL insns that change
condition codes in some way, by default considering them clobbered.

Then to prevent code generated from regressing too much provide insns
that include a `compare' operation setting the condition codes in
parallel to the main operation.  The manner condition codes are set by
each insn is supposed to be provided by the whatever the SELECT_CC_MODE
macro expands to.

Given that individual patterns provided for the same RTL basic operation
may set the condion codes differently keeping the information away from
the insn patterns themselves would cause a maintenance nightmare and
would be bound to fail in a horrible way sooner or later.  Therefore
instead let the patterns themselves choose which condition modes they
support, by having one or more subst iterators applied and then have
individual comparison operators require the specific condition mode each
according to the codes used by the operation.

While subst iterators only support one alternative each, there is
actually no problem with applying multiple ones to a single insn with
the result as intended, and if the corresponding subst attribute
supplies an empty NO-SUBST-VALUE, then no mess results even.  Make use
of this observation.

Add appropriate subst iterators to all the computational patterns then,
according to the condition codes they usably set, including DImode ones
and a substitute DImode comparison instruction in the absence of a CMPQ
machine instruction, however do not provide a `cbranchdi4' named pattern
as without a further development it regresses code quality by resorting
to the `__cmpdi2' libcall where a simpler operation would do, e.g. to
check for negativity the TSTL machine instruction may be executed over
the upper longword only.  This is good material for further work.

Do not apply subst iterators to the increment- or decrement-and-branch
patterns at this time; these may yet have to be reviewed, in particular
whether `*jsobneq_minus_one' is still relevant in the context of the
recent integer constant cost review.

Also add a couple of peepholes to help eliminating comparisons in some
problematic cases, such as with the BIT instruction which is bitwise-AND
for condition codes only that has no direct counterpart for the actual
calculation, because the BIC instruction which does do bitwise-AND and
produces a result implements the operation with a bitwise negation of
its input `mask' operand.  Or the FFS instruction which sets the Z
condition code according to its `field' input operand rather than the
result produced.  Or the bitfield comparisons we don't have generic
middle-end support for.

Code size stats are as follows, obtained from 17640 and 9086 executables
built in `check-c' and `check-c++' GCC testing respectively:

  check-c check-c++
  samples average  median  samples average  median
---
regressions  1813  0.578%  0.198%  289  0.349%  0.175%
unchanged   15160  0.000%  0.000% 8662  0.000%  0.000%
progressions  667 -0.589% -0.194%  135 -0.944% -0.191%

total   17640  0.037%  0.000% 9086 -0.003%  0.000%

Outliers:

old new change  %change filename

24062950+544+22.610 20111208-1.exe
43145329+1015   +23.528 pr39417.exe
22353055+820+36.689 990404-1.exe
26314213+1582   +60.129 pr57521.exe
30635579+2516   +82.142 2422-1.exe

and:

old new change  %change filename

63174845-1472   -23.302 vector-compare-1.exe
63134845-1468   -23.254 vector-compare-1.exe
64745002-1472   -22.737 vector-compare-1.exe
64705002-1468   -22.689 vector-compare-1.exe

We have some code quality regressions like:

10861:  9e ef d9 12 movab 11b40 ,r0
10865:  00 00 50
10868:  90 a0 03 a0 movb 0x3(r0),0x2(r0)
1086c:  02
1086d:  d1 60 8f 61 cmpl (r0),$0x64646261
10871:  62 64 64
10874:  13 07   beql 1087d 

to:

10861:  9e ef 

[PATCH 29/31] PDP11: Use `const_double_zero' to express double zero constant

2020-11-19 Thread Maciej W. Rozycki
We do not define a comparison operation between floating-point and
integer data, including integer zero constant.  Consequently the RTL
instruction stream presented to the post-reload comparison elimination
pass will include, where applicable, floating-point comparison insns
against `const_double:DF 0.0 [0x0.0p+0]' rather than `const_int 0 [0]',
meaning that the latter expression will not match when used in machine
description.

Use `const_double_zero' then for the relevant patterns to match the
intended RTL instructions.

gcc/
* config/pdp11/pdp11.md (fcc_cc, fcc_ccnz): Use
`const_double_zero' to express double zero constant.
---
 gcc/config/pdp11/pdp11.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/pdp11/pdp11.md b/gcc/config/pdp11/pdp11.md
index 7a4d50fdba9..cdef49f3979 100644
--- a/gcc/config/pdp11/pdp11.md
+++ b/gcc/config/pdp11/pdp11.md
@@ -105,7 +105,7 @@ (define_subst "fcc_cc"
(clobber (reg FCC_REGNUM))]
   ""
   [(set (reg:CC FCC_REGNUM)
-   (compare:CC (match_dup 1) (const_int 0)))
+   (compare:CC (match_dup 1) (const_double_zero)))
(set (match_dup 0) (match_dup 1))])
 
 (define_subst "fcc_ccnz"
@@ -113,7 +113,7 @@ (define_subst "fcc_ccnz"
(clobber (reg FCC_REGNUM))]
   ""
   [(set (reg:CCNZ FCC_REGNUM)
-   (compare:CCNZ (match_dup 1) (const_int 0)))
+   (compare:CCNZ (match_dup 1) (const_double_zero)))
(set (match_dup 0) (match_dup 1))])
 
 (define_subst_attr "cc_cc" "cc_cc" "_nocc" "_cc")
-- 
2.11.0



[PATCH 28/31] RTL: Add `const_double_zero' syntactic rtx

2020-11-19 Thread Maciej W. Rozycki
The use of a constant double zero is required for post-reload compare
elimination to be able to discard redundant floating-point comparisons,
for example with a VAX RTL instruction stream like:

(insn 34 4 3 2 (parallel [
(set (reg/v:DF 0 %r0 [orig:24 x ] [24])
(mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32]))
(clobber (reg:CC 16 %psl))
]) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":9:1 37 {*movdf}
 (nil))
(note 3 34 35 2 NOTE_INSN_FUNCTION_BEG)
(insn 35 3 36 2 (set (reg:CCZ 16 %psl)
(compare:CCZ (reg/v:DF 0 %r0 [orig:24 x ] [24])
(const_double:DF 0.0 [0x0.0p+0]))) 
".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 21 {*cmpdf_ccz}
 (nil))
(jump_insn 36 35 9 2 (set (pc)
(if_then_else (eq (reg:CCZ 16 %psl)
(const_int 0 [0]))
(label_ref 11)
(pc))) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 
537 {*branch_ccz}
 (int_list:REG_BR_PROB 536870916 (nil))
 -> 11)

that we want to transform into:

(insn 34 4 3 2 (parallel [
(set (reg:CCZ 16 %psl)
(compare:CCZ (mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32])
(const_double:DF 0.0 [0x0.0p+0])))
(set (reg/v:DF 0 %r0 [orig:24 x ] [24])
(mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32]))
]) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":9:1 40 
{*movdf_ccz}
 (nil))
(note 3 34 36 2 NOTE_INSN_FUNCTION_BEG)
(jump_insn 36 3 9 2 (set (pc)
(if_then_else (eq (reg:CCZ 16 %psl)
(const_int 0 [0]))
(label_ref 11)
(pc))) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 
537 {*branch_ccz}
 (int_list:REG_BR_PROB 536870916 (nil))
 -> 11)

with the upcoming MODE_CC representation.

For this we need to express the `const_double:DF 0.0 [0x0.0p+0]' rtx as
recorded above in the relevant pattern(s) in machine description.  The
way we represent double constants, as a host-dependent number of wide
integers, however means that we currently have no portable way to encode
a double zero constant in machine description.

Define a syntactic rtx alias then to represent `(const_double 0 0 ...)'
as if the suitable number of zeros have been supplied according to the
host-specific definition of CONST_DOUBLE_FORMAT.

gcc/
* read-rtl.c (rtx_reader::read_rtx_code): Handle syntactic
`const_double_zero' rtx.
* doc/rtl.texi (Constant Expression Types): Document it.
---
 gcc/doc/rtl.texi | 18 ++
 gcc/read-rtl.c   | 10 ++
 2 files changed, 28 insertions(+)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 22af5731bb6..f7a715d93cb 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1705,6 +1705,24 @@ machine's or host machine's floating point format.  To 
convert them to
 the precise bit pattern used by the target machine, use the macro
 @code{REAL_VALUE_TO_TARGET_DOUBLE} and friends (@pxref{Data Output}).
 
+@findex const_double_zero
+The host dependency for the number of integers used to store a double
+value makes it problematic for machine descriptions to use expressions
+of code @code{const_double} and therefore a syntactic alias has been
+provided:
+
+@smallexample
+(const_double_zero)
+@end smallexample
+
+standing for:
+
+@smallexample
+(const_double 0 0 @dots{})
+@end smallexample
+
+for matching the floating-point value zero, possibly the only useful one.
+
 @findex CONST_WIDE_INT
 @item (const_wide_int:@var{m} @var{nunits} @var{elt0} @dots{})
 This contains an array of @code{HOST_WIDE_INT}s that is large enough
diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
index 403f254f3cb..2922af5d111 100644
--- a/gcc/read-rtl.c
+++ b/gcc/read-rtl.c
@@ -1651,6 +1651,16 @@ rtx_reader::read_rtx_code (const char *code_name)
   return return_rtx;
 }
 
+  /* Handle "const_double_zero".  */
+  if (strcmp (code_name, "const_double_zero") == 0)
+{
+  code = CONST_DOUBLE;
+  return_rtx = rtx_alloc (code);
+  memset (return_rtx, 0, RTX_CODE_SIZE (code));
+  PUT_CODE (return_rtx, code);
+  return return_rtx;
+}
+
   /* If we end up with an insn expression then we free this space below.  */
   return_rtx = rtx_alloc_for_name (code_name);
   code = GET_CODE (return_rtx);
-- 
2.11.0



[PATCH 27/31] VAX: Make the `divmoddisi4' and `*amulsi4' comment notation consistent

2020-11-19 Thread Maciej W. Rozycki
Use a double colon to introduce the comments like elsewhere throughout
the VAX machine description.

gcc/
* config/vax/vax.md (divmoddisi4, *amulsi4): Make the comment
notation consistent with the rest of the file.
---
 gcc/config/vax/vax.md | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 4b0c26d1d58..1bb4e300cae 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -498,17 +498,17 @@ (define_insn "div3"
div2 %2,%0
div3 %2,%1,%0")
 
-;This is left out because it is very slow;
-;we are better off programming around the "lack" of this insn.
-;(define_insn "divmoddisi4"
-;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
-;  (div:SI (match_operand:DI 1 "general_operand" "g")
-;  (match_operand:SI 2 "general_operand" "g")))
-;   (set (match_operand:SI 3 "nonimmediate_operand" "=g")
-;  (mod:SI (match_dup 1)
-;  (match_dup 2)))]
-;  ""
-;  "ediv %2,%1,%0,%3")
+;; This is left out because it is very slow;
+;; we are better off programming around the "lack" of this insn.
+;;(define_insn "divmoddisi4"
+;;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
+;; (div:SI (match_operand:DI 1 "general_operand" "g")
+;; (match_operand:SI 2 "general_operand" "g")))
+;;   (set (match_operand:SI 3 "nonimmediate_operand" "=g")
+;; (mod:SI (match_dup 1)
+;; (match_dup 2)))]
+;;  ""
+;;  "ediv %2,%1,%0,%3")
 
 ;; Bit-and on the VAX is done with a clear-bits insn.
 (define_expand "and3"
@@ -740,14 +740,14 @@ (define_insn ""
   ""
   "rotl %2,%1,%0")
 
-;This insn is probably slower than a multiply and an add.
-;(define_insn "*amulsi4"
-;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
-;  (mult:SI (plus:SI (match_operand:SI 1 "general_operand" "g")
-;(match_operand:SI 2 "general_operand" "g"))
-;   (match_operand:SI 3 "general_operand" "g")))]
-;  ""
-;  "index %1,$0x8000,$0x7fff,%3,%2,%0")
+;; This insn is probably slower than a multiply and an add.
+;;(define_insn "*amulsi4"
+;;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
+;; (mult:SI (plus:SI (match_operand:SI 1 "general_operand" "g")
+;;   (match_operand:SI 2 "general_operand" "g"))
+;;  (match_operand:SI 3 "general_operand" "g")))]
+;;  ""
+;;  "index %1,$0x8000,$0x7fff,%3,%2,%0")
 
 ;; Special cases of bit-field insns which we should
 ;; recognize in preference to the general case.
-- 
2.11.0



[PATCH 26/31] VAX: Correct issues with commented-out insns

2020-11-19 Thread Maciej W. Rozycki
Correct issues with commented-out insns, which fail to build if enabled:

.../gcc/config/vax/vax.md:503:1: repeated operand number 1
.../gcc/config/vax/vax.md:503:1: repeated operand number 2

and then when the issue with the repeated operands has been corrected:

.../gcc/config/vax/vax.md:107:1: destination operand 0 allows non-lvalue
.../gcc/config/vax/vax.md:503:1: destination operand 0 allows non-lvalue
.../gcc/config/vax/vax.md:503:1: destination operand 3 allows non-lvalue
.../gcc/config/vax/vax.md:744:1: destination operand 0 allows non-lvalue

Fix the RTL with the repeated operands and change the relevant output
operand predicates not to allow immediates.

Also emit MOVO rather than MOVH assembly instruction with the `movti'
insn so that the condition codes are set according to the integer rather
than floating-point interpretation of the datum moved, as expected with
the operation associated with the pattern.

Finally give `*amulsi4' a name, for easier reference here and elsewhere.

We may eventually want to have some of these insns enabled at `-Os'.

ChangeLog:

* gcc/config/vax/vax.md (movti): Fix output predicate.  Emit
`movo' rather than `movh'.
(divmoddisi4): Fix output predicates, correct RTL.
(*amulsi4): Name insn.  Fix output predicate.
---
 gcc/config/vax/vax.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 2f6643abe5c..4b0c26d1d58 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -105,10 +105,10 @@ (define_insn "mov"
 
 ;; Some VAXen don't support this instruction.
 ;;(define_insn "movti"
-;;  [(set (match_operand:TI 0 "general_operand" "=g")
+;;  [(set (match_operand:TI 0 "nonimmediate_operand" "=g")
 ;; (match_operand:TI 1 "general_operand" "g"))]
 ;;  ""
-;;  "movh %1,%0")
+;;  "movo %1,%0")
 
 (define_insn "movdi"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=g")
@@ -501,12 +501,12 @@ (define_insn "div3"
 ;This is left out because it is very slow;
 ;we are better off programming around the "lack" of this insn.
 ;(define_insn "divmoddisi4"
-;  [(set (match_operand:SI 0 "general_operand" "=g")
+;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
 ;  (div:SI (match_operand:DI 1 "general_operand" "g")
 ;  (match_operand:SI 2 "general_operand" "g")))
-;   (set (match_operand:SI 3 "general_operand" "=g")
-;  (mod:SI (match_operand:DI 1 "general_operand" "g")
-;  (match_operand:SI 2 "general_operand" "g")))]
+;   (set (match_operand:SI 3 "nonimmediate_operand" "=g")
+;  (mod:SI (match_dup 1)
+;  (match_dup 2)))]
 ;  ""
 ;  "ediv %2,%1,%0,%3")
 
@@ -741,8 +741,8 @@ (define_insn ""
   "rotl %2,%1,%0")
 
 ;This insn is probably slower than a multiply and an add.
-;(define_insn ""
-;  [(set (match_operand:SI 0 "general_operand" "=g")
+;(define_insn "*amulsi4"
+;  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
 ;  (mult:SI (plus:SI (match_operand:SI 1 "general_operand" "g")
 ;(match_operand:SI 2 "general_operand" "g"))
 ;   (match_operand:SI 3 "general_operand" "g")))]
-- 
2.11.0



[PATCH 25/31] VAX: Fix predicates for widening multiply and multiply-add insns

2020-11-19 Thread Maciej W. Rozycki
It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload will keep it happily as
an immediate if a constraint permits it.  So the restriction posed by
such a predicate will be happily ignored, and moreover if a splitter is
added, such as required for MODE_CC support, the new instructions will
reject the original operands supplied, causing an ICE like below:

.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: Error: could not split 
insn
(insn 90 662 663 (set (reg:DI 10 %r10 [orig:97 _235 ] [97])
(mult:DI (sign_extend:DI (mem/c:SI (plus:SI (reg/f:SI 13 %fp)
(const_int -800 [0xfce0])) [14 %sfp+-800 S4 
A32]))
(sign_extend:DI (const_int -51 [0xffcd] 299 
{mulsidi3}
 (expr_list:REG_EQUAL (mult:DI (sign_extend:DI (subreg:SI (mem/c:DI 
(plus:SI (reg/f:SI 13 %fp)
(const_int -800 [0xfce0])) [14 
%sfp+-800 S8 A32]) 0))
(const_int -51 [0xffcd]))
(nil)))
during RTL pass: final
.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: internal compiler 
error: in final_scan_insn_1, at final.c:3073
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

Change the predicates used with the widening multiply and multiply-add
insns to allow immediates then, just as the constraints and the machine
instructions produced permit.

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (mulsidi3): Fix the multiplicand predicates.
(*maddsidi4, *maddsidi4_const): Likewise.  Name insns.
---
 gcc/config/vax/vax.md | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 34fdf67bb6d..2f6643abe5c 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -445,35 +445,32 @@ (define_insn "mul3"
 
 (define_insn "mulsidi3"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=g")
-   (mult:DI (sign_extend:DI
- (match_operand:SI 1 "nonimmediate_operand" "nrmT"))
-(sign_extend:DI
- (match_operand:SI 2 "nonimmediate_operand" "nrmT"]
+   (mult:DI
+ (sign_extend:DI (match_operand:SI 1 "general_operand" "nrmT"))
+ (sign_extend:DI (match_operand:SI 2 "general_operand" "nrmT"]
   ""
   "emul %1,%2,$0,%0")
 
-(define_insn ""
+(define_insn "*maddsidi4"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=g")
(plus:DI
-(mult:DI (sign_extend:DI
-  (match_operand:SI 1 "nonimmediate_operand" "nrmT"))
- (sign_extend:DI
-  (match_operand:SI 2 "nonimmediate_operand" "nrmT")))
-(sign_extend:DI (match_operand:SI 3 "nonimmediate_operand" "g"]
+ (mult:DI
+   (sign_extend:DI (match_operand:SI 1 "general_operand" "nrmT"))
+   (sign_extend:DI (match_operand:SI 2 "general_operand" "nrmT")))
+ (sign_extend:DI (match_operand:SI 3 "general_operand" "g"]
   ""
   "emul %1,%2,%3,%0")
 
 ;; 'F' constraint means type CONST_DOUBLE
-(define_insn ""
+(define_insn "*maddsidi4_const"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=g")
(plus:DI
-(mult:DI (sign_extend:DI
-  (match_operand:SI 1 "nonimmediate_operand" "nrmT"))
- (sign_extend:DI
-  (match_operand:SI 2 "nonimmediate_operand" "nrmT")))
-(match_operand:DI 3 "immediate_operand" "F")))]
+ (mult:DI
+   (sign_extend:DI (match_operand:SI 1 "general_operand" "nrmT"))
+   (sign_extend:DI (match_operand:SI 2 "general_operand" "nrmT")))
+ (match_operand:DI 3 "immediate_operand" "F")))]
   "GET_CODE (operands[3]) == CONST_DOUBLE
-&& CONST_DOUBLE_HIGH (operands[3]) == (CONST_DOUBLE_LOW (operands[3]) >> 
31)"
+   && CONST_DOUBLE_HIGH (operands[3]) == (CONST_DOUBLE_LOW (operands[3]) >> 
31)"
   "*
 {
   if (CONST_DOUBLE_HIGH (operands[3]))
-- 
2.11.0



[PATCH 24/31] VAX: Fix predicates and constraints for bitfield comparison insns

2020-11-19 Thread Maciej W. Rozycki
It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it.  So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE.  An actual
example will be given with a subsequent change.

Therefore, similarly to EXTV/EXTZV/INSV insns, remove inconsistencies
with predicates and constraints of bitfield comparison insns, observing
that a bitfield located in memory is byte-addressed by the respective
machine instructions and therefore SImode may only be used with a
register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one).

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (*cmpv_2): Name insn.
(*cmpv, *cmpzv, *cmpzv_2): Likewise.  Fix location predicate and
constraint.
---
 gcc/config/vax/vax.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index d8774cdd36c..34fdf67bb6d 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -853,20 +853,20 @@ (define_insn "*extv_aligned"
 
 ;; Register and non-offsettable-memory SImode cases of bit-field insns.
 
-(define_insn ""
+(define_insn "*cmpv"
   [(set (cc0)
(compare
-(sign_extract:SI (match_operand:SI 0 "register_operand" "r")
+(sign_extract:SI (match_operand:SI 0 "nonimmediate_operand" "ro")
  (match_operand:QI 1 "general_operand" "g")
  (match_operand:SI 2 "general_operand" "nrmT"))
 (match_operand:SI 3 "general_operand" "nrmT")))]
   ""
   "cmpv %2,%1,%0,%3")
 
-(define_insn ""
+(define_insn "*cmpzv"
   [(set (cc0)
(compare
-(zero_extract:SI (match_operand:SI 0 "register_operand" "r")
+(zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "ro")
  (match_operand:QI 1 "general_operand" "g")
  (match_operand:SI 2 "general_operand" "nrmT"))
 (match_operand:SI 3 "general_operand" "nrmT")))]
@@ -921,7 +921,7 @@ (define_insn "*extzv_non_const"
 ;; nonimmediate_operand is used to make sure that mode-ambiguous cases
 ;; don't match these (and therefore match the cases above instead).
 
-(define_insn ""
+(define_insn "*cmpv_2"
   [(set (cc0)
(compare
 (sign_extract:SI (match_operand:QI 0 "memory_operand" "m")
@@ -931,10 +931,10 @@ (define_insn ""
   ""
   "cmpv %2,%1,%0,%3")
 
-(define_insn ""
+(define_insn "*cmpzv_2"
   [(set (cc0)
(compare
-(zero_extract:SI (match_operand:QI 0 "nonimmediate_operand" "rm")
+(zero_extract:SI (match_operand:QI 0 "memory_operand" "m")
  (match_operand:QI 1 "general_operand" "g")
  (match_operand:SI 2 "general_operand" "nrmT"))
 (match_operand:SI 3 "general_operand" "nrmT")))]
-- 
2.11.0



[PATCH 23/31] VAX: Make `extv' an expander matching the remaining bitfield operations

2020-11-19 Thread Maciej W. Rozycki
We have matching insns defined for `sign_extract' and `zero_extract'
expressions, so make the three named patterns for bitfield operations
consistent and make `extv' an expander rather than an insn taking a
SImode, a QImode, and a SImode general operand for the LOC, SIZE, and
POS operands respectively, like with the `extzv' and `insv' patterns,
matching the machine instructions and giving the middle end more choice
as to which actual insn to choose in a given situation.

Given this program:

typedef struct
{
  int f0:1;
  int f1:7;
  int f8:8;
  int f16:16;
} bit_t;

typedef struct
{
  unsigned int f0:1;
  unsigned int f1:7;
  unsigned int f8:8;
  unsigned int f16:16;
} ubit_t;

typedef union
{
  bit_t b;
  int i;
} bit_u;

typedef union
{
  ubit_t b;
  unsigned int i;
} ubit_u;

int
ins1 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f1 = y;
  return x.i;
}

int
ext1 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

unsigned int
extz1 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

int
ins8 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f8 = y;
  return x.i;
}

int
ext8 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

unsigned int
extz8 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

int
ins16 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f16 = y;
  return x.i;
}

int
ext16 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

unsigned int
extz16 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

this results in the following code change:

@@ -16,12 +16,12 @@ ins1:
 .globl ext1
.type   ext1, @function
 ext1:
-   .word 0 # 19[c=0]  procedure_entry_mask
-   subl2 $4,%sp# 20[c=32]  addsi3
+   .word 0 # 18[c=0]  procedure_entry_mask
+   subl2 $4,%sp# 19[c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
-   cvtbl %r0,%r0   # 7 [c=4]  extendqisi2
-   ashl $-1,%r0,%r0# 14[c=40]  *vax.md:624
-   ret # 24[c=0]  return
+   extv $1,$7,%r0,%r0  # 7 [c=60]  *extv_non_const
+   cvtbl %r0,%r0   # 13[c=4]  extendqisi2
+   ret # 23[c=0]  return
.size   ext1, .-ext1
.align 1
 .globl extz1
@@ -49,12 +49,12 @@ ins8:
 .globl ext8
.type   ext8, @function
 ext8:
-   .word 0 # 20[c=0]  procedure_entry_mask
-   subl2 $4,%sp# 21[c=32]  addsi3
+   .word 0 # 18[c=0]  procedure_entry_mask
+   subl2 $4,%sp# 19[c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
-   cvtwl %r0,%r0   # 7 [c=4]  extendhisi2
-   ashl $-8,%r0,%r0# 15[c=40]  *vax.md:624
-   ret # 25[c=0]  return
+   rotl $24,%r0,%r0# 13[c=60]  *extv_non_const
+   cvtbl %r0,%r0
+   ret # 23[c=0]  return
.size   ext8, .-ext8
.align 1
 .globl extz8

If there is a performance degradation with the replacement sequences,
then it can and should be sorted within `extv_non_const'.

gcc/
* config/vax/vax.md (extv): Rename insn to...
(*extv): ... this.
(extv): New expander.
---
 gcc/config/vax/vax.md | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index f90ae89391f..d8774cdd36c 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -941,7 +941,15 @@ (define_insn ""
   ""
   "cmpzv %2,%1,%0,%3")
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand:SI 0 "general_operand" "")
+   (sign_extract:SI (match_operand:SI 1 "general_operand" "")
+(match_operand:QI 2 "general_operand" "")
+(match_operand:SI 3 "general_operand" "")))]
+  ""
+  "")
+
+(define_insn "*extv"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(sign_extract:SI (match_operand:QI 1 "memory_operand" "m")
 (match_operand:QI 2 "general_operand" "g")
-- 
2.11.0



[PATCH 22/31] VAX: Ensure PIC mode address is adjustable with aligned bitfield insns

2020-11-19 Thread Maciej W. Rozycki
With the `*insv_aligned', `*extzv_aligned' and `*extv_aligned' insns we
are going to adjust the bitfield location if it is in memory, so only
allow such location addresses that can be offset, excluding external
symbol references in the PIC mode in particular.

This fixes an ICE like:

during RTL pass: final
In file included from .../gcc/testsuite/gcc.dg/torture/vshuf-v16qi.c:11:
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc: In function 'test_13':
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:27:1: internal compiler error: 
in change_address_1, at emit-rtl.c:2275
.../gcc/testsuite/gcc.dg/torture/vshuf-16.inc:16:1: note: in expansion of macro 
'T'
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:28:1: note: in expansion of 
macro 'TESTS'
0x10a34b33 change_address_1
.../gcc/emit-rtl.c:2275
0x10a358af adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, 
int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x11d2505f output_97
.../gcc/config/vax/vax.md:806
0x10adec4b get_insn_template(int, rtx_insn*)
.../gcc/final.c:2070
0x10ae1c5b final_scan_insn_1
.../gcc/final.c:3039
0x10ae2257 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.c:3152
0x10ade9a3 final_1
.../gcc/final.c:2020
0x10ae6157 rest_of_handle_final
.../gcc/final.c:4658
0x10ae6697 execute
.../gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1
FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (internal compiler error)

triggered by an RTL instruction like:

(insn 97 96 98 (set (reg:SI 5 %r5 [88])
(zero_extract:SI (mem/c:SI (symbol_ref:SI ("b") ) [0 b+0 S4 A128])
(const_int 8 [0x8])
(const_int 24 [0x18]))) 
".../gcc/testsuite/gcc.dg/torture/vshuf-main.inc":28:1 97 {*extzv_aligned}
 (nil))

and removes these regressions:

FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (test for excess errors)

However expand typically presents pseudo-registers rather than memory
references to these insns, so a further rework is required to make a
better use of the code variant they are supposed to produce.  This at
least fixes the problem at hand.

gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned): Also make sure the memory address of a bitfield
location can be adjusted in the PIC mode.
---
 gcc/config/vax/vax.md | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 80f09d97727..f90ae89391f 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -762,11 +762,14 @@ (define_insn "*insv_aligned"
 (match_operand:QI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "general_operand" "g"))]
-   "(INTVAL (operands[1]) == 8 || INTVAL (operands[1]) == 16)
+  "(INTVAL (operands[1]) == 8 || INTVAL (operands[1]) == 16)
&& INTVAL (operands[2]) % INTVAL (operands[1]) == 0
&& (!MEM_P (operands[0])
-   || ! mode_dependent_address_p (XEXP (operands[0], 0),
- MEM_ADDR_SPACE (operands[0])))
+   || ((!flag_pic
+   || vax_acceptable_pic_operand_p (XEXP (operands[0], 0),
+true, true))
+  && !mode_dependent_address_p (XEXP (operands[0], 0),
+MEM_ADDR_SPACE (operands[0]
&& (!(REG_P (operands[0])
 || (SUBREG_P (operands[0]) && REG_P (SUBREG_REG (operands[0]
|| INTVAL (operands[2]) == 0)"
@@ -794,8 +797,11 @@ (define_insn "*extzv_aligned"
   "(INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
&& INTVAL (operands[3]) % INTVAL (operands[2]) == 0
&& (!MEM_P (operands[1])
-   || ! mode_dependent_address_p (XEXP (operands[1], 0),
- MEM_ADDR_SPACE (operands[1])))
+   || ((!flag_pic
+   || vax_acceptable_pic_operand_p (XEXP (operands[1], 0),
+true, true))
+  && !mode_dependent_address_p (XEXP (operands[1], 0),
+MEM_ADDR_SPACE (operands[1]
&& (!(REG_P (operands[1])
 || (SUBREG_P (operands[1]) && REG_P (SUBREG_REG (operands[1]
|| 

[PATCH 21/31] VAX: Remove EXTV/EXTZV/INSV instruction use from aligned case insns

2020-11-19 Thread Maciej W. Rozycki
The INSV machine instruction is the only computational operation in the
VAX ISA that keeps condition codes intact.  In preparation to MODE_CC
transition keep patterns apart then that make or do not make use of said
instruction.  For consistency update EXTV and EXTZV instruction uses
accordingly.  In expand SUBREGs will be presented as operands, so handle
that possibility in the insn condition.

This actually yields better code by avoiding EXTV/EXTZV instructions in
pseudo-aligned register cases previously resorting to those instructions:

@@ -42,7 +42,7 @@ ins8:
subl2 $4,%sp# 21[c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
movl 8(%ap),%r1 # 17[c=16]  movsi_2
-   insv %r1,$8,$8,%r0  # 9 [c=4]  *insv_aligned
+   insv %r1,$8,$8,%r0  # 9 [c=4]  *insv_2
ret # 25[c=0]  return
.size   ins8, .-ins8
.align 1
@@ -60,12 +60,12 @@ ext8:
 .globl extz8
.type   extz8, @function
 extz8:
-   .word 0 # 19[c=0]  procedure_entry_mask
-   subl2 $4,%sp# 20[c=32]  addsi3
+   .word 0 # 18[c=0]  procedure_entry_mask
+   subl2 $4,%sp# 19[c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
-   extzv $8,$8,%r0,%r1 # 13[c=60]  *extzv_aligned
-   movl %r1,%r0# 18[c=4]  movsi_2
-   ret # 24[c=0]  return
+   rotl $24,%r0,%r0# 13[c=60]  *extzv_non_const
+   movzbl %r0,%r0
+   ret # 23[c=0]  return
.size   extz8, .-extz8
.align 1
 .globl ins16
@@ -75,7 +75,7 @@ ins16:
subl2 $4,%sp# 21[c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
movl 8(%ap),%r1 # 17[c=16]  movsi_2
-   insv %r1,$16,$16,%r0# 9 [c=4]  *insv_aligned
+   insv %r1,$16,$16,%r0# 9 [c=4]  *insv_2
ret # 25[c=0]  return
.size   ins16, .-ins16
.align 1
@@ -94,8 +94,9 @@ ext16:
 extz16:
.word 0 # 18[c=0]  procedure_entry_mask
subl2 $4,%sp# 19[c=32]  addsi3
-   movl 4(%ap),%r1 # 2 [c=16]  movsi_2
-   extzv $16,$16,%r1,%r0   # 7 [c=60]  *extzv_aligned
+   movl 4(%ap),%r0 # 2 [c=16]  movsi_2
+   rotl $16,%r0,%r0# 7 [c=60]  *extzv_non_const
+   movzwl %r0,%r0
movzwl %r0,%r0  # 13[c=4]  zero_extendhisi2
ret # 23[c=0]  return
.size   extz16, .-extz16

demonstrated with this program:

typedef struct
{
  int f0:1;
  int f1:7;
  int f8:8;
  int f16:16;
} bit_t;

typedef struct
{
  unsigned int f0:1;
  unsigned int f1:7;
  unsigned int f8:8;
  unsigned int f16:16;
} ubit_t;

typedef union
{
  bit_t b;
  int i;
} bit_u;

typedef union
{
  ubit_t b;
  unsigned int i;
} ubit_u;

int
ins1 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f1 = y;
  return x.i;
}

int
ext1 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

unsigned int
extz1 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

int
ins8 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f8 = y;
  return x.i;
}

int
ext8 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

unsigned int
extz8 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

int
ins16 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f16 = y;
  return x.i;
}

int
ext16 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

unsigned int
extz16 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

It also papers over a regression:

FAIL: gcc.dg/pr83623.c (internal compiler error)
FAIL: gcc.dg/pr83623.c (test for excess errors)

from an ICE like:

during RTL pass: final
.../gcc/testsuite/gcc.dg/pr83623.c: In function 'foo':
.../gcc/testsuite/gcc.dg/pr83623.c:13:1: internal compiler error: in 
change_address_1, at emit-rtl.c:2275
0x10a056e3 change_address_1
.../gcc/emit-rtl.c:2275
0x10a0645f adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, 
int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x11cb588f output_97
.../gcc/config/vax/vax.md:808
0x10aafb2f get_insn_template(int, rtx_insn*)
.../gcc/final.c:2070
0x10ab2b3f final_scan_insn_1
.../gcc/final.c:3039
0x10ab313b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.c:3152
0x10aaf887 final_1
.../gcc/final.c:2020
0x10ab703b rest_of_handle_final
.../gcc/final.c:4658
0x10ab757b execute
.../gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1
FAIL: gcc.dg/pr83623.c (internal compiler error)

triggered by an RTL instruction like:

(insn 17 14 145 (set (reg:SI 1 %r1)
(zero_extract:SI (mem/c:SI (symbol_ref:SI ("x") ) [1 x+0 S4 A128])
   

[PATCH 20/31] VAX: Fix predicates and constraints for EXTV/EXTZV/INSV insns

2020-11-19 Thread Maciej W. Rozycki
It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it.  So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE.  An actual
example will be given with a subsequent change.

Remove such inconsistencies we have with the EXTV/EXTZV/INSV insns then,
observing that a bitfield located in memory is byte-addressed by the
respective machine instructions and therefore SImode may only be used
with a register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one), which has already been taken
into account with the constraints currently used, except for `*insv_2'.
The QI machine mode may be used for the bitfield location with any kind
of memory operand, but we got the constraint wrong, although harmlessly
in reality, with `*insv'.  Fix that for consistency though.

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned, *extv_non_const, *extzv_non_const): Name insns.
Fix location predicate.
(*extzv): Name insn.
(*insv): Likewise.  Fix location constraint.
(*insv_2): Likewise, and the predicate.
---
 gcc/config/vax/vax.md | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index f8e1c2eb02b..de90848a600 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -757,8 +757,8 @@ (define_insn ""
 ;; These handle aligned 8-bit and 16-bit fields,
 ;; which can usually be done with move instructions.
 
-(define_insn ""
-  [(set (zero_extract:SI (match_operand:SI 0 "register_operand" "+ro")
+(define_insn "*insv_aligned"
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "+ro")
 (match_operand:QI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "general_operand" "g"))]
@@ -786,9 +786,9 @@ (define_insn ""
   return \"movw %3,%0\";
 }")
 
-(define_insn ""
+(define_insn "*extzv_aligned"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=")
-   (zero_extract:SI (match_operand:SI 1 "register_operand" "ro")
+   (zero_extract:SI (match_operand:SI 1 "nonimmediate_operand" "ro")
 (match_operand:QI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "(INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
@@ -814,9 +814,9 @@ (define_insn ""
   return \"movzwl %1,%0\";
 }")
 
-(define_insn ""
+(define_insn "*extv_aligned"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
-   (sign_extract:SI (match_operand:SI 1 "register_operand" "ro")
+   (sign_extract:SI (match_operand:SI 1 "nonimmediate_operand" "ro")
 (match_operand:QI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "(INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
@@ -842,7 +842,7 @@ (define_insn ""
   return \"cvtwl %1,%0\";
 }")
 
-;; Register-only SImode cases of bit-field insns.
+;; Register and non-offsettable-memory SImode cases of bit-field insns.
 
 (define_insn ""
   [(set (cc0)
@@ -869,9 +869,9 @@ (define_insn ""
 ;; by a bicl or sign extension.  Because we might end up choosing ext[z]v
 ;; anyway, we can't allow immediate values for the primary source operand.
 
-(define_insn ""
+(define_insn "*extv_non_const"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
-   (sign_extract:SI (match_operand:SI 1 "register_operand" "ro")
+   (sign_extract:SI (match_operand:SI 1 "nonimmediate_operand" "ro")
 (match_operand:QI 2 "general_operand" "g")
 (match_operand:SI 3 "general_operand" "nrmT")))]
   ""
@@ -886,9 +886,9 @@ (define_insn ""
   return \"rotl %R3,%1,%0\;cvtwl %0,%0\";
 }")
 
-(define_insn ""
+(define_insn "*extzv_non_const"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
-   (zero_extract:SI (match_operand:SI 1 "register_operand" "ro")
+   (zero_extract:SI (match_operand:SI 1 "nonimmediate_operand" "ro")
 (match_operand:QI 2 "general_operand" "g")
 (match_operand:SI 3 "general_operand" "nrmT")))]
   ""
@@ -962,7 +962,7 @@ (define_expand "extzv"
   ""
   "")
 
-(define_insn ""
+(define_insn "*extzv"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(zero_extract:SI (match_operand:QI 1 "memory_operand" "m")

[PATCH 19/31] VAX: Add the `movmemhi' instruction

2020-11-19 Thread Maciej W. Rozycki
The MOVC3 machine instruction has `memmove' semantics[1]:

"The operation of the instruction is such that overlap of the source and
destination strings does not affect the result."

so use it to provide the `movmemhi' instruction as well.

References:

[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
3.10 "Character-String Instructions", p. 3-162

gcc/
* config/vax/vax.md (cpymemhi1): Rename insn to...
(movmemhi1): ... this.
(cpymemhi): Update accordingly.  Remove constraints.
(movmemhi): New expander.

gcc/testsuite/
* gcc.target/vax/movmem.c: New test.
---
 gcc/config/vax/vax.md | 24 ++--
 gcc/testsuite/gcc.target/vax/movmem.c | 23 +++
 2 files changed, 41 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/vax/movmem.c

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 66f03df1932..f8e1c2eb02b 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -206,16 +206,28 @@ (define_insn "movstrictqi"
 }")
 
 ;; This is here to accept 4 arguments and pass the first 3 along
-;; to the cpymemhi1 pattern that really does the work.
+;; to the movmemhi1 pattern that really does the work.
 (define_expand "cpymemhi"
-  [(set (match_operand:BLK 0 "general_operand" "=g")
-   (match_operand:BLK 1 "general_operand" "g"))
-   (use (match_operand:HI 2 "general_operand" "g"))
+  [(set (match_operand:BLK 0 "memory_operand" "")
+   (match_operand:BLK 1 "memory_operand" ""))
+   (use (match_operand:HI 2 "general_operand" ""))
+   (match_operand 3 "" "")]
+  ""
+  "
+{
+  emit_insn (gen_movmemhi1 (operands[0], operands[1], operands[2]));
+  DONE;
+}")
+
+(define_expand "movmemhi"
+  [(set (match_operand:BLK 0 "memory_operand" "")
+   (match_operand:BLK 1 "memory_operand" ""))
+   (use (match_operand:HI 2 "general_operand" ""))
(match_operand 3 "" "")]
   ""
   "
 {
-  emit_insn (gen_cpymemhi1 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_movmemhi1 (operands[0], operands[1], operands[2]));
   DONE;
 }")
 
@@ -224,7 +236,7 @@ (define_expand "cpymemhi"
 ;; that anything generated as this insn will be recognized as one
 ;; and that it won't successfully combine with anything.
 
-(define_insn "cpymemhi1"
+(define_insn "movmemhi1"
   [(set (match_operand:BLK 0 "memory_operand" "=o")
(match_operand:BLK 1 "memory_operand" "o"))
(use (match_operand:HI 2 "general_operand" "g"))
diff --git a/gcc/testsuite/gcc.target/vax/movmem.c 
b/gcc/testsuite/gcc.target/vax/movmem.c
new file mode 100644
index 000..b907d8a376d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/movmem.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+#include 
+
+void *
+memmove8 (void *to, const void *from, size_t size)
+{
+  unsigned char s8 = size;
+  return __builtin_memmove (to, from, s8);
+}
+
+/* Expect assembly like:
+
+   movl 4(%ap),%r6
+   movzbl 12(%ap),%r7
+   movl 8(%ap),%r8
+   movc3 %r7,(%r8),(%r6)
+   movl %r6,%r0
+
+ */
+
+/* { dg-final { scan-assembler "\tmovc3 " } } */
-- 
2.11.0



[PATCH 18/31] VAX: Add a test for the `cpymemhi' instruction

2020-11-19 Thread Maciej W. Rozycki
gcc/testsuite/
* gcc.target/vax/cpymem.c: New test.
---
 gcc/testsuite/gcc.target/vax/cpymem.c | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/vax/cpymem.c

diff --git a/gcc/testsuite/gcc.target/vax/cpymem.c 
b/gcc/testsuite/gcc.target/vax/cpymem.c
new file mode 100644
index 000..91805a1a5eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/cpymem.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+#include 
+
+void *
+memcpy8 (void *to, const void *from, size_t size)
+{
+  unsigned char s8 = size;
+  return __builtin_memcpy (to, from, s8);
+}
+
+/* Expect assembly like:
+
+   movl 4(%ap),%r6
+   movzbl 12(%ap),%r7
+   movl 8(%ap),%r8
+   movc3 %r7,(%r8),(%r6)
+   movl %r6,%r0
+
+ */
+
+/* { dg-final { scan-assembler "\tmovc3 " } } */
-- 
2.11.0



[PATCH 17/31] VAX: Actually produce QImode and HImode `ctz' operations

2020-11-19 Thread Maciej W. Rozycki
The middle end does not refer to `ctzqi2'/`ctzhi2' or `ffsqi2'/`ffshi2'
patterns by name where `__builtin_ctz' or `__builtin_ffs' respectively
is invoked for an argument of the QImode or HImode type, and instead it
extends the data type before passing it to `ctzsi2' or `ffssi2'.

Avoid the redundant operation and use a peephole2 to convert it to the
right RTL expression that will collapse the two operations into a single
machine instruction instead unless we need the extended intermediate
result for another purpose.

gcc/
* config/vax/builtins.md: Add a peephole2 for QImode and HImode
`ctz' operations.
(any_extend): New code iterator.

gcc/testsuite/
* gcc.target/vax/ctzhi.c: New test.
* gcc.target/vax/ctzqi.c: New test.
* gcc.target/vax/ffshi.c: New test.
* gcc.target/vax/ffsqi.c: New test.
---
 gcc/config/vax/builtins.md   | 22 ++
 gcc/testsuite/gcc.target/vax/ctzhi.c | 20 
 gcc/testsuite/gcc.target/vax/ctzqi.c | 20 
 gcc/testsuite/gcc.target/vax/ffshi.c | 24 
 gcc/testsuite/gcc.target/vax/ffsqi.c | 24 
 5 files changed, 110 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/vax/ctzhi.c
 create mode 100644 gcc/testsuite/gcc.target/vax/ctzqi.c
 create mode 100644 gcc/testsuite/gcc.target/vax/ffshi.c
 create mode 100644 gcc/testsuite/gcc.target/vax/ffsqi.c

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index b7ed9762c23..e96ac3f52ab 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -29,6 +29,8 @@ (define_mode_attr bb_mem [(QI "m") (HI "Q") (SI "Q")])
 (define_int_iterator bit [0 1])
 (define_int_attr ccss [(0 "cc") (1 "ss")])
 
+(define_code_iterator any_extend [sign_extend zero_extend])
+
 (define_expand "ffs2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "")
(ffs:SI (match_operand:VAXint 1 "general_operand" "")))]
@@ -57,6 +59,26 @@ (define_insn "ctz2"
   ""
   "ffs $0,$,%1,%0")
 
+;; Our FFS hardware instruction supports any field width,
+;; so handle narrower inputs directly as well.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+(any_extend:SI (match_operand:VAXintQH 1 "general_operand")))
+   (parallel
+ [(set (match_operand:SI 2 "nonimmediate_operand")
+  (ctz:SI (match_dup 0)))
+  (set (cc0)
+  (compare (match_dup 2)
+   (const_int 0)))])]
+  "rtx_equal_p (operands[0], operands[2]) || peep2_reg_dead_p (2, operands[0])"
+  [(parallel
+ [(set (match_dup 2)
+  (ctz:SI (match_dup 1)))
+  (set (cc0)
+  (compare (match_dup 1)
+   (const_int 0)))])]
+  "")
+
 (define_expand "sync_lock_test_and_set"
   [(match_operand:VAXint 0 "nonimmediate_operand" "=")
(match_operand:VAXint 1 "memory_operand" "+m")
diff --git a/gcc/testsuite/gcc.target/vax/ctzhi.c 
b/gcc/testsuite/gcc.target/vax/ctzhi.c
new file mode 100644
index 000..fcc9f06f7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/ctzhi.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-peephole2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" } { "" } } */
+
+typedef unsigned int __attribute__ ((mode (HI))) int_t;
+
+int
+ctzhi (int_t *x)
+{
+  return __builtin_ctz (*x);
+}
+
+/* Expect assembly like:
+
+   ffs $0,$16,*4(%ap),%r0
+
+ */
+
+/* { dg-final { scan-rtl-dump-times "Splitting with gen_peephole2" 1 
"peephole2" } } */
+/* { dg-final { scan-assembler "\tffs \\\$0,\\\$16," } } */
diff --git a/gcc/testsuite/gcc.target/vax/ctzqi.c 
b/gcc/testsuite/gcc.target/vax/ctzqi.c
new file mode 100644
index 000..067334b09e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/ctzqi.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-peephole2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" } { "" } } */
+
+typedef unsigned int __attribute__ ((mode (QI))) int_t;
+
+int
+ctzqi (int_t *x)
+{
+  return __builtin_ctz (*x);
+}
+
+/* Expect assembly like:
+
+   ffs $0,$8,*4(%ap),%r0
+
+ */
+
+/* { dg-final { scan-rtl-dump-times "Splitting with gen_peephole2" 1 
"peephole2" } } */
+/* { dg-final { scan-assembler "\tffs \\\$0,\\\$8," } } */
diff --git a/gcc/testsuite/gcc.target/vax/ffshi.c 
b/gcc/testsuite/gcc.target/vax/ffshi.c
new file mode 100644
index 000..db592fb5724
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/ffshi.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-peephole2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" } { "" } } */
+
+typedef int __attribute__ ((mode (HI))) int_t;
+
+int
+ffshi (int_t *x)
+{
+  return __builtin_ffs (*x);
+}
+
+/* Expect assembly like:
+
+   ffs $0,$16,*4(%ap),%r0
+   jneq .L2
+   mnegl $1,%r0
+.L2:
+   incl %r0
+
+ */
+
+/* { dg-final { scan-rtl-dump-times "Splitting with gen_peephole2" 1 

[PATCH 16/31] VAX: Also provide QImode and HImode `ctz' and `ffs' operations

2020-11-19 Thread Maciej W. Rozycki
The FFS machine instruction provides for arbitrary input bitfield widths
so take advantage of this and convert `ffssi2' and `ctzsi2' to templates
for all the three of QI, HI, SI machine modes.

Test cases will be added separately.

gcc/
* config/vax/builtins.md (width): New mode attribute.
(ffssi2): Rework expander into...
(ffs2): ... this.
(ctzsi2): Rework insn into...
(ctz2): ... this.
---
 gcc/config/vax/builtins.md | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index e8cefe70d25..b7ed9762c23 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -23,14 +23,15 @@ (define_constants
   ]
 )
 
+(define_mode_attr width [(QI "8") (HI "16") (SI "32")])
 (define_mode_attr bb_mem [(QI "m") (HI "Q") (SI "Q")])
 
 (define_int_iterator bit [0 1])
 (define_int_attr ccss [(0 "cc") (1 "ss")])
 
-(define_expand "ffssi2"
+(define_expand "ffs2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "")
-   (ffs:SI (match_operand:SI 1 "general_operand" "")))]
+   (ffs:SI (match_operand:VAXint 1 "general_operand" "")))]
   ""
   "
 {
@@ -39,22 +40,22 @@ (define_expand "ffssi2"
   rtx cond = gen_rtx_NE (VOIDmode, cc0_rtx, const0_rtx);
   rtx target = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, label_ref, pc_rtx);
 
-  emit_insn (gen_ctzsi2 (operands[0], operands[1]));
+  emit_insn (gen_ctz2 (operands[0], operands[1]));
   emit_jump_insn (gen_rtx_SET (pc_rtx, target));
-  emit_insn (gen_negsi2 (operands[0], const1_rtx));
+  emit_insn (gen_neg2 (operands[0], const1_rtx));
   emit_label (label);
-  emit_insn (gen_addsi3 (operands[0], operands[0], const1_rtx));
+  emit_insn (gen_add3 (operands[0], operands[0], const1_rtx));
   DONE;
 }")
 
-(define_insn "ctzsi2"
+(define_insn "ctz2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rQ")
-   (ctz:SI (match_operand:SI 1 "general_operand" "nrQT")))
+   (ctz:SI (match_operand:VAXint 1 "general_operand" "nrQT")))
(set (cc0)
(compare (match_dup 1)
 (const_int 0)))]
   ""
-  "ffs $0,$32,%1,%0")
+  "ffs $0,$,%1,%0")
 
 (define_expand "sync_lock_test_and_set"
   [(match_operand:VAXint 0 "nonimmediate_operand" "=")
-- 
2.11.0



[PATCH 15/31] VAX: Provide the `ctz' operation

2020-11-19 Thread Maciej W. Rozycki
Our `ffssi2_internal' pattern and the machine FFS instruction, which
technically is a bitfield operation, match the `ctz' operation exactly,
with the result produced for the bitfield source operand of zero equal
to its width as specified with another machine instruction operand, not
directly expressed in RTL and currently hardcoded in the assembly code
produced.  In our terms this is the bit size of the machine mode used,
and although it's SImode now let's be flexible for an upcoming change.

The operation also sets the Z condition code according to the value of
the source operand.

gcc/
* config/vax/builtins.md (ffssi2_internal): Rename insn to...
(ctzsi2): ... this.  Update the RTL operation.
(ffssi2): Update accordingly.
* gcc/config/vax/vax.c (vax_notice_update_cc): Handle CTZ.
* gcc/config/vax/vax.h (CTZ_DEFINED_VALUE_AT_ZERO): New macro.

gcc/testsuite/
* gcc.target/vax/ctzsi.c: New test.
---
 gcc/config/vax/builtins.md   |  6 +++---
 gcc/config/vax/vax.c |  3 +++
 gcc/config/vax/vax.h |  4 
 gcc/testsuite/gcc.target/vax/ctzsi.c | 15 +++
 4 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/vax/ctzsi.c

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index 7e27854a8b0..e8cefe70d25 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -39,7 +39,7 @@ (define_expand "ffssi2"
   rtx cond = gen_rtx_NE (VOIDmode, cc0_rtx, const0_rtx);
   rtx target = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, label_ref, pc_rtx);
 
-  emit_insn (gen_ffssi2_internal (operands[0], operands[1]));
+  emit_insn (gen_ctzsi2 (operands[0], operands[1]));
   emit_jump_insn (gen_rtx_SET (pc_rtx, target));
   emit_insn (gen_negsi2 (operands[0], const1_rtx));
   emit_label (label);
@@ -47,9 +47,9 @@ (define_expand "ffssi2"
   DONE;
 }")
 
-(define_insn "ffssi2_internal"
+(define_insn "ctzsi2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rQ")
-   (ffs:SI (match_operand:SI 1 "general_operand" "nrQT")))
+   (ctz:SI (match_operand:SI 1 "general_operand" "nrQT")))
(set (cc0)
(compare (match_dup 1)
 (const_int 0)))]
diff --git a/gcc/config/vax/vax.c b/gcc/config/vax/vax.c
index b6c2210ca6b..69a05b33e95 100644
--- a/gcc/config/vax/vax.c
+++ b/gcc/config/vax/vax.c
@@ -1135,6 +1135,9 @@ vax_notice_update_cc (rtx exp, rtx insn ATTRIBUTE_UNUSED)
case REG:
  cc_status.flags = CC_NO_OVERFLOW;
  break;
+   case CTZ:
+ cc_status.flags = CC_NOT_NEGATIVE;
+ break;
default:
  break;
}
diff --git a/gcc/config/vax/vax.h b/gcc/config/vax/vax.h
index 146b0a6e2b2..43182ff1d88 100644
--- a/gcc/config/vax/vax.h
+++ b/gcc/config/vax/vax.h
@@ -683,3 +683,7 @@ VAX operand formatting codes:
by the proper FDE definition.  */
 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, PC_REGNUM)
 
+/* Upon failure to find the bit the FFS hardware instruction returns
+   the position of the bit immediately following the field specified.  */
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE (MODE), 2)
diff --git a/gcc/testsuite/gcc.target/vax/ctzsi.c 
b/gcc/testsuite/gcc.target/vax/ctzsi.c
new file mode 100644
index 000..8be42712c77
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/ctzsi.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int
+ctzsi (unsigned int x)
+{
+  return __builtin_ctz (x);
+}
+
+/* Expect assembly like:
+
+   ffs $0,$32,4(%ap),%r0
+
+ */
+
+/* { dg-final { scan-assembler "\tffs \\\$0,\\\$32," } } */
-- 
2.11.0



[PATCH 14/31] VAX: Add tests for `sync_lock_test_and_set' and `sync_lock_release'

2020-11-19 Thread Maciej W. Rozycki
Based on gcc.dg/pr61756.c.

gcc/testsuite/
* gcc.target/vax/bbcci.c: New test.
* gcc.target/vax/bbssi.c: New test.
---
 gcc/testsuite/gcc.target/vax/bbcci.c | 20 
 gcc/testsuite/gcc.target/vax/bbssi.c | 20 
 2 files changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/vax/bbcci.c
 create mode 100644 gcc/testsuite/gcc.target/vax/bbssi.c

diff --git a/gcc/testsuite/gcc.target/vax/bbcci.c 
b/gcc/testsuite/gcc.target/vax/bbcci.c
new file mode 100644
index 000..f58d3a75e7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/bbcci.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+#include 
+
+extern volatile atomic_flag guard;
+
+void
+try_atomic_flag_clear (void)
+{
+  atomic_flag_clear ();
+}
+
+/* Expect assembly like:
+
+   jbcci $0,guard,.L2
+.L2:
+
+ */
+
+/* { dg-final { scan-assembler "\tjbcci \\\$0,guard," } } */
diff --git a/gcc/testsuite/gcc.target/vax/bbssi.c 
b/gcc/testsuite/gcc.target/vax/bbssi.c
new file mode 100644
index 000..65111e9bdf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/bbssi.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+#include 
+
+extern volatile atomic_flag guard;
+
+void
+try_atomic_flag_test_and_set (void)
+{
+  atomic_flag_test_and_set ();
+}
+
+/* Expect assembly like:
+
+   jbssi $0,guard,.L1
+.L1:
+
+ */
+
+/* { dg-final { scan-assembler "\tjbssi \\\$0,guard," } } */
-- 
2.11.0



[PATCH 13/31] VAX: Add a test for the SImode `ffs' operation

2020-11-19 Thread Maciej W. Rozycki
gcc/testsuite/
* gcc.target/vax/ffssi.c: New test.
---
 gcc/testsuite/gcc.target/vax/ffssi.c | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/vax/ffssi.c

diff --git a/gcc/testsuite/gcc.target/vax/ffssi.c 
b/gcc/testsuite/gcc.target/vax/ffssi.c
new file mode 100644
index 000..3e7a3c2b301
--- /dev/null
+++ b/gcc/testsuite/gcc.target/vax/ffssi.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+int
+ffssi (int x)
+{
+  return __builtin_ffs (x);
+}
+
+/* Expect assembly like:
+
+   ffs $0,$32,%r1,%r0
+   jneq .L2
+   mnegl $1,%r0
+.L2:
+   incl %r0
+
+ */
+
+/* { dg-final { scan-assembler "\tffs \\\$0,\\\$32," } } */
-- 
2.11.0



[PATCH 12/31] VAX: Actually enable `builtins.md' now that it is fully functional

2020-11-19 Thread Maciej W. Rozycki
Test cases will follow.

gcc/
* config/vax/vax.md: Include `builtins.md'.
---
 gcc/config/vax/vax.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index e6b217fd0d7..66f03df1932 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -1634,3 +1634,5 @@ (define_expand "nonlocal_goto"
   emit_barrier ();
   DONE;
 })
+
+(include "builtins.md")
-- 
2.11.0



[PATCH 11/31] VAX: Correct `sync_lock_test_and_set' and `sync_lock_release' builtins

2020-11-19 Thread Maciej W. Rozycki
Remove an ICE like:

during RTL pass: expand
.../libatomic/tas_n.c: In function 'libat_test_and_set_1':
.../libatomic/tas_n.c:39:1: internal compiler error: in patch_jump_insn, at 
cfgrtl.c:1298
   39 | }
  | ^
0x108a09ff patch_jump_insn
.../gcc/cfgrtl.c:1298
0x108a0b07 redirect_branch_edge
.../gcc/cfgrtl.c:1325
0x108a124b rtl_redirect_edge_and_branch
.../gcc/cfgrtl.c:1458
0x1087f6d3 redirect_edge_and_branch(edge_def*, basic_block_def*)
.../gcc/cfghooks.c:373
0x11d6264b try_forward_edges
.../gcc/cfgcleanup.c:562
0x11d6b0eb try_optimize_cfg
.../gcc/cfgcleanup.c:2960
0x11d6ba4f cleanup_cfg(int)
.../gcc/cfgcleanup.c:3174
0x10870b3f execute
.../gcc/cfgexpand.c:6763

triggered with an RTL pattern like:

(jump_insn 8 7 20 2 (parallel [
(set (pc)
(if_then_else (ne (zero_extract:SI (mem/v:QI (mem/f/c:SI 
(reg/f:SI 16 virtual-incoming-args) [1 mptr+0 S4 A32]) [-1  S1 A8])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 0 [0]))
(label_ref 10)
(pc)))
(set (zero_extract:SI (mem/v:QI (mem/f/c:SI (reg/f:SI 16 
virtual-incoming-args) [1 mptr+0 S4 A32]) [-1  S1 A8])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
]) ".../libatomic/tas_n.c":38:12 -1
 (nil)
 -> 10)

caused by a volatile memory reference used that is not accepted by the
`memory_operand' predicate of the `jbbssiqi' insn explicitly referred
from the `sync_lock_test_and_setqi' expander.  Also seen with:

FAIL: gcc.dg/pr61756.c (internal compiler error)

Define a new `any_memory_operand' predicate accepting both ordinary and
volatile memory references and use it with the `jbbi' insn,
so as to address the ICE.

Also remove useless operations from the `sync_lock_test_and_set'
and `sync_lock_release' expanders as those always either complete
or fail and therefore never fall through to using their template other
than to match operands.  Wrap `jbbi' into `unspec_volatile'
instead so that the jump does not get removed or reordered.  Share one
index to avoid a complication around the iterators since the index is
nowhere referred to anyway and the pattern required pulled by its name.

Test cases will be added separately.

gcc/
* config/vax/predicates.md (volatile_mem_operand)
(any_memory_operand): New predicates.
* config/vax/builtins.md (VUNSPEC_UNLOCK): Remove constant.
(sync_lock_test_and_set): Remove `set' and `unspec'
operations, match operands only.  Reformat.
(sync_lock_release): Likewise.  Remove cruft.
(jbbi): Wrap into `unspec_volatile', use
`any_memory_operand' predicate.
---
 gcc/config/vax/builtins.md   | 36 +---
 gcc/config/vax/predicates.md | 16 
 2 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index 8bbcd603d13..7e27854a8b0 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -19,8 +19,7 @@
 
 (define_constants
   [
-(VUNSPEC_LOCK 100) ; sync lock and test
-(VUNSPEC_UNLOCK 101)   ; sync lock release
+(VUNSPEC_LOCK 100) ; sync lock operations
   ]
 )
 
@@ -58,10 +57,9 @@ (define_insn "ffssi2_internal"
   "ffs $0,$32,%1,%0")
 
 (define_expand "sync_lock_test_and_set"
-  [(set (match_operand:VAXint 0 "nonimmediate_operand" "=")
-   (unspec:VAXint [(match_operand:VAXint 1 "memory_operand" "+m")
-   (match_operand:VAXint 2 "const_int_operand" "n")
-  ] VUNSPEC_LOCK))]
+  [(match_operand:VAXint 0 "nonimmediate_operand" "=")
+   (match_operand:VAXint 1 "memory_operand" "+m")
+   (match_operand:VAXint 2 "const_int_operand" "n")]
   ""
   "
 {
@@ -72,46 +70,46 @@ (define_expand "sync_lock_test_and_set"
 
   label = gen_label_rtx ();
   emit_move_insn (operands[0], const1_rtx);
-  emit_jump_insn (gen_jbbssi (operands[1], const0_rtx, label, 
operands[1]));
+  emit_jump_insn (gen_jbbssi (operands[1], const0_rtx, label,
+   operands[1]));
   emit_move_insn (operands[0], const0_rtx);
   emit_label (label);
   DONE;
 }")
 
 (define_expand "sync_lock_release"
-  [(set (match_operand:VAXint 0 "memory_operand" "+m")
-   (unspec:VAXint [(match_operand:VAXint 1 "const_int_operand" "n")
-  ] VUNSPEC_UNLOCK))]
+  [(match_operand:VAXint 0 "memory_operand" "+m")
+   (match_operand:VAXint 1 "const_int_operand" "n")]
   ""
   "
 {
   rtx label;
+
   if (operands[1] != const0_rtx)
 FAIL;
-#if 1
+
   label = gen_label_rtx ();
-  emit_jump_insn (gen_jbbcci (operands[0], const0_rtx, label, 
operands[0]));
+  emit_jump_insn (gen_jbbcci (operands[0], const0_rtx, label,
+   operands[0]));
   emit_label (label);

[PATCH 10/31] VAX: Use an int iterator to produce individual interlocked branches

2020-11-19 Thread Maciej W. Rozycki
With mode-specific interlocked branch insns already folded into iterated
templates now fold the two templates into one too, observing that the
only difference between them is the value of the bit branched on, which
is of course reflected both in the RTL expression and the instruction
produced.  Use an int iterator to iterate over the bit value, making use
of the newly-added wide integer support, and substituting patterns as
necessary to produce equivalent individual insns.  No functional change.

gcc/
* config/vax/builtins.md (bit): New int iterator.
(ccss): New int attribute.
(jbbssi, jbbcci): Fold insns into...
(jbbi): ... this.
---
 gcc/config/vax/builtins.md | 29 +++--
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index 473b44f489f..8bbcd603d13 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -26,6 +26,9 @@ (define_constants
 
 (define_mode_attr bb_mem [(QI "m") (HI "Q") (SI "Q")])
 
+(define_int_iterator bit [0 1])
+(define_int_attr ccss [(0 "cc") (1 "ss")])
+
 (define_expand "ffssi2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "")
(ffs:SI (match_operand:SI 1 "general_operand" "")))]
@@ -75,24 +78,6 @@ (define_expand "sync_lock_test_and_set"
   DONE;
 }")
 
-(define_insn "jbbssi"
-  [(parallel
-[(set (pc)
- (if_then_else
-   (eq (zero_extract:SI
- (match_operand:VAXint 0 "memory_operand" "")
- (const_int 1)
- (match_operand:SI 1 "general_operand" "nrmT"))
-   (const_int 1))
-   (label_ref (match_operand 2 "" ""))
-   (pc)))
- (set (zero_extract:SI (match_operand:VAXint 3 "memory_operand" "+0")
-  (const_int 1)
-  (match_dup 1))
- (const_int 1))])]
-  ""
-  "jbssi %1,%0,%l2")
-
 (define_expand "sync_lock_release"
   [(set (match_operand:VAXint 0 "memory_operand" "+m")
(unspec:VAXint [(match_operand:VAXint 1 "const_int_operand" "n")
@@ -113,7 +98,7 @@ (define_expand "sync_lock_release"
   DONE;
 }")
 
-(define_insn "jbbcci"
+(define_insn "jbbi"
   [(parallel
 [(set (pc)
  (if_then_else
@@ -121,12 +106,12 @@ (define_insn "jbbcci"
  (match_operand:VAXint 0 "memory_operand" "")
  (const_int 1)
  (match_operand:SI 1 "general_operand" "nrmT"))
-   (const_int 0))
+   (const_int bit))
(label_ref (match_operand 2 "" ""))
(pc)))
  (set (zero_extract:SI (match_operand:VAXint 3 "memory_operand" "+0")
   (const_int 1)
   (match_dup 1))
- (const_int 0))])]
+ (const_int bit))])]
   ""
-  "jbcci %1,%0,%l2")
+  "jbi %1,%0,%l2")
-- 
2.11.0



[PATCH 09/31] VAX: Use a mode iterator to produce individual interlocked branches

2020-11-19 Thread Maciej W. Rozycki
Regardless of the machine mode all the interlocked branches of the same
kind, one of the two provided by the ISA, use the same RTL patterns and
machine instructions, except for the memory operand's constraint.

Remove code duplication then and make use of a mode iterator combined
with an attribute to expand the same insn patterns with the constraint
suitably substituted from a single template.  No functional change.

gcc/
* config/vax/builtins.md (bb_mem): New mode attribute.
(jbbssiqi, jbbssihi, jbbssisi): Fold insns into...
(jbbssi): ... this.
(jbbcciqi, jbbccihi, jbbccisi): Likewise...
(jbbcci): ... this.
---
 gcc/config/vax/builtins.md | 96 --
 1 file changed, 15 insertions(+), 81 deletions(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index 6bce7a85add..473b44f489f 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -24,6 +24,8 @@ (define_constants
   ]
 )
 
+(define_mode_attr bb_mem [(QI "m") (HI "Q") (SI "Q")])
+
 (define_expand "ffssi2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "")
(ffs:SI (match_operand:SI 1 "general_operand" "")))]
@@ -73,58 +75,24 @@ (define_expand "sync_lock_test_and_set"
   DONE;
 }")
 
-(define_insn "jbbssiqi"
-  [(parallel
-[(set (pc)
- (if_then_else
-   (ne (zero_extract:SI (match_operand:QI 0 "memory_operand" "g")
-(const_int 1)
-(match_operand:SI 1 "general_operand" "nrm"))
-   (const_int 0))
-   (label_ref (match_operand 2 "" ""))
-   (pc)))
- (set (zero_extract:SI (match_operand:QI 3 "memory_operand" "+0")
-  (const_int 1)
-  (match_dup 1))
- (const_int 1))])]
-  ""
-  "jbssi %1,%0,%l2")
-
-(define_insn "jbbssihi"
+(define_insn "jbbssi"
   [(parallel
 [(set (pc)
  (if_then_else
-   (ne (zero_extract:SI (match_operand:HI 0 "memory_operand" "Q")
-(const_int 1)
-(match_operand:SI 1 "general_operand" "nrm"))
-   (const_int 0))
-   (label_ref (match_operand 2 "" ""))
-   (pc)))
- (set (zero_extract:SI (match_operand:HI 3 "memory_operand" "+0")
-  (const_int 1)
-  (match_dup 1))
- (const_int 1))])]
-  ""
-  "jbssi %1,%0,%l2")
-
-(define_insn "jbbssisi"
-  [(parallel
-[(set (pc)
- (if_then_else
-   (ne (zero_extract:SI (match_operand:SI 0 "memory_operand" "Q")
-(const_int 1)
-(match_operand:SI 1 "general_operand" "nrm"))
-   (const_int 0))
+   (eq (zero_extract:SI
+ (match_operand:VAXint 0 "memory_operand" "")
+ (const_int 1)
+ (match_operand:SI 1 "general_operand" "nrmT"))
+   (const_int 1))
(label_ref (match_operand 2 "" ""))
(pc)))
- (set (zero_extract:SI (match_operand:SI 3 "memory_operand" "+0")
+ (set (zero_extract:SI (match_operand:VAXint 3 "memory_operand" "+0")
   (const_int 1)
   (match_dup 1))
  (const_int 1))])]
   ""
   "jbssi %1,%0,%l2")
 
-
 (define_expand "sync_lock_release"
   [(set (match_operand:VAXint 0 "memory_operand" "+m")
(unspec:VAXint [(match_operand:VAXint 1 "const_int_operand" "n")
@@ -145,54 +113,20 @@ (define_expand "sync_lock_release"
   DONE;
 }")
 
-(define_insn "jbbcciqi"
-  [(parallel
-[(set (pc)
- (if_then_else
-   (eq (zero_extract:SI (match_operand:QI 0 "memory_operand" "g")
-(const_int 1)
-(match_operand:SI 1 "general_operand" "nrm"))
-   (const_int 0))
-   (label_ref (match_operand 2 "" ""))
-   (pc)))
- (set (zero_extract:SI (match_operand:QI 3 "memory_operand" "+0")
-  (const_int 1)
-  (match_dup 1))
- (const_int 0))])]
-  ""
-  "jbcci %1,%0,%l2")
-
-(define_insn "jbbccihi"
+(define_insn "jbbcci"
   [(parallel
 [(set (pc)
  (if_then_else
-   (eq (zero_extract:SI (match_operand:HI 0 "memory_operand" "Q")
-(const_int 1)
-(match_operand:SI 1 "general_operand" "nrm"))
+   (eq (zero_extract:SI
+ (match_operand:VAXint 0 "memory_operand" "")
+ (const_int 1)
+ (match_operand:SI 1 "general_operand" "nrmT"))
(const_int 0))
(label_ref (match_operand 2 "" ""))
(pc)))
- (set (zero_extract:SI (match_operand:HI 3 "memory_operand" "+0")
+ (set (zero_extract:SI (match_operand:VAXint 3 "memory_operand" "+0")
   (const_int 1)
 

[PATCH 08/31] jump: Also handle jumps wrapped in UNSPEC or UNSPEC_VOLATILE

2020-11-19 Thread Maciej W. Rozycki
VAX has interlocked branch instructions used for atomic operations and
we want to have them wrapped in UNSPEC_VOLATILE so as not to have code
carried across.  This however breaks with jump optimization and leads
to an ICE in the build of libbacktrace like:

.../libbacktrace/mmap.c:190:1: internal compiler error: in fixup_reorder_chain, 
at cfgrtl.c:3934
  190 | }
  | ^
0x1087d46b fixup_reorder_chain
.../gcc/cfgrtl.c:3934
0x1087f29f cfg_layout_finalize()
.../gcc/cfgrtl.c:4447
0x1087c74f execute
.../gcc/cfgrtl.c:3662

on RTL like:

(jump_insn 18 17 150 4 (unspec_volatile [
(set (pc)
(if_then_else (eq (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 
]) [-1  S4 A32])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
(label_ref 20)
(pc)))
(set (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 ]) [-1  S4 A32])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
] 101) ".../libbacktrace/mmap.c":135:14 158 {jbbssisi}
 (nil)
 -> 20)

when those branches are enabled with a follow-up change.  Also showing
with:

FAIL: gcc.dg/pr61756.c (internal compiler error)

Handle branches wrapped in UNSPEC_VOLATILE then and, for consistency,
also in UNSPEC.  The presence of UNSPEC_VOLATILE will prevent such
branches from being removed as they won't be accepted by `onlyjump_p',
we just need to let them through.

gcc/
* jump.c (pc_set): Also accept a jump wrapped in UNSPEC or
UNSPEC_VOLATILE.
(any_uncondjump_p, any_condjump_p): Update comment accordingly.
---
 gcc/jump.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/jump.c b/gcc/jump.c
index 34a8f209e20..f4c735540f0 100644
--- a/gcc/jump.c
+++ b/gcc/jump.c
@@ -850,9 +850,17 @@ pc_set (const rtx_insn *insn)
   pat = PATTERN (insn);
 
   /* The set is allowed to appear either as the insn pattern or
- the first set in a PARALLEL.  */
-  if (GET_CODE (pat) == PARALLEL)
-pat = XVECEXP (pat, 0, 0);
+ the first set in a PARALLEL, UNSPEC or UNSPEC_VOLATILE.  */
+  switch (GET_CODE (pat))
+{
+case PARALLEL:
+case UNSPEC:
+case UNSPEC_VOLATILE:
+  pat = XVECEXP (pat, 0, 0);
+  break;
+default:
+  break;
+}
   if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
 return pat;
 
@@ -860,7 +868,7 @@ pc_set (const rtx_insn *insn)
 }
 
 /* Return true when insn is an unconditional direct jump,
-   possibly bundled inside a PARALLEL.  */
+   possibly bundled inside a PARALLEL, UNSPEC or UNSPEC_VOLATILE.  */
 
 int
 any_uncondjump_p (const rtx_insn *insn)
@@ -876,9 +884,9 @@ any_uncondjump_p (const rtx_insn *insn)
 }
 
 /* Return true when insn is a conditional jump.  This function works for
-   instructions containing PC sets in PARALLELs.  The instruction may have
-   various other effects so before removing the jump you must verify
-   onlyjump_p.
+   instructions containing PC sets in PARALLELs, UNSPECs or UNSPEC_VOLATILEs.
+   The instruction may have various other effects so before removing the jump
+   you must verify onlyjump_p.
 
Note that unlike condjump_p it returns false for unconditional jumps.  */
 
-- 
2.11.0



[PATCH 07/31] RTL: Also support HOST_WIDE_INT with int iterators

2020-11-19 Thread Maciej W. Rozycki
Add wide integer aka 'w' rtx format support to int iterators so that
machine description can iterate over `const_int' expressions.

This is made by expanding standard integer aka 'i' format support,
observing that any standard integer already present in any of our
existing RTL code will also fit into HOST_WIDE_INT, so there is no need
for a separate handler.  Any truncation of the number parsed is made by
the caller.  An assumption is made however that no place relies on
capping out of range values to INT_MAX.

Now the 'p' format is handled explicitly rather than being implied by
rtx being a SUBREG, so actually assert that it is, just to play safe.

gcc/
* read-rtl.c: Add a page-feed separator at the start of iterator
code.
(struct iterator_group): Change the return type to HOST_WIDE_INT
for the `find_builtin' member.  Likewise the second parameter
type for the `apply_iterator' member.
(atoll) [!HAVE_ATOQ]: Reorder.
(find_mode, find_code): Change the return type to HOST_WIDE_INT.
(apply_mode_iterator, apply_code_iterator)
(apply_subst_iterator): Change the second parameter type to
HOST_WIDE_INT.
(find_int): Handle input suitable for HOST_WIDE_INT output.
(apply_int_iterator): Rewrite in terms of explicit format
interpretation.
(rtx_reader::read_rtx_operand) <'w'>: Fold into...
<'i', 'n', 'p'>: ... this.
* doc/md.texi (Int Iterators): Document 'w' rtx format support.
---
 gcc/doc/md.texi |  10 ++--
 gcc/read-rtl.c  | 165 ++--
 2 files changed, 93 insertions(+), 82 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 813875b973b..762a6cf050e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -11201,11 +11201,11 @@ The construct:
 @end smallexample
 
 defines a pseudo integer constant @var{name} that can be instantiated as
-@var{inti} if condition @var{condi} is true.  Each @var{int}
-must have the same rtx format.  @xref{RTL Classes}. Int iterators can appear
-in only those rtx fields that have 'i' as the specifier. This means that
-each @var{int} has to be a constant defined using define_constant or
-define_c_enum.
+@var{inti} if condition @var{condi} is true.  Each @var{int} must have the
+same rtx format.  @xref{RTL Classes}.  Int iterators can appear in only
+those rtx fields that have 'i', 'n', 'w', or 'p' as the specifier.  This
+means that each @var{int} has to be a constant defined using define_constant
+or define_c_enum.
 
 As with mode and code iterators, each pattern that uses @var{name} will be
 expanded @var{n} times, once with all uses of @var{name} replaced by
diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
index 3ec83a60baf..403f254f3cb 100644
--- a/gcc/read-rtl.c
+++ b/gcc/read-rtl.c
@@ -77,12 +77,12 @@ struct iterator_group {
 
   /* Treat the given string as the name of a standard mode, etc., and
  return its integer value.  */
-  int (*find_builtin) (const char *);
+  HOST_WIDE_INT (*find_builtin) (const char *);
 
   /* Make the given rtx use the iterator value given by the third argument.
  If the iterator applies to operands, the second argument gives the
  operand index, otherwise it is ignored.  */
-  void (*apply_iterator) (rtx, unsigned int, int);
+  void (*apply_iterator) (rtx, unsigned int, HOST_WIDE_INT);
 
   /* Return the C token for the given standard mode, code, etc.  */
   const char *(*get_c_token) (int);
@@ -139,7 +139,7 @@ static void one_time_initialization (void);
 
 /* Global singleton.  */
 rtx_reader *rtx_reader_ptr = NULL;
-
+
 /* The mode and code iterator structures.  */
 static struct iterator_group modes, codes, ints, substs;
 
@@ -152,9 +152,49 @@ static vec iterator_uses;
 /* The list of all attribute uses in the current rtx.  */
 static vec attribute_uses;
 
+/* Provide a version of a function to read a long long if the system does
+   not provide one.  */
+#if (HOST_BITS_PER_WIDE_INT > HOST_BITS_PER_LONG   \
+ && !HAVE_DECL_ATOLL   \
+ && !defined (HAVE_ATOQ))
+HOST_WIDE_INT atoll (const char *);
+
+HOST_WIDE_INT
+atoll (const char *p)
+{
+  int neg = 0;
+  HOST_WIDE_INT tmp_wide;
+
+  while (ISSPACE (*p))
+p++;
+  if (*p == '-')
+neg = 1, p++;
+  else if (*p == '+')
+p++;
+
+  tmp_wide = 0;
+  while (ISDIGIT (*p))
+{
+  HOST_WIDE_INT new_wide = tmp_wide*10 + (*p - '0');
+  if (new_wide < tmp_wide)
+   {
+ /* Return INT_MAX equiv on overflow.  */
+ tmp_wide = HOST_WIDE_INT_M1U >> 1;
+ break;
+   }
+  tmp_wide = new_wide;
+  p++;
+}
+
+  if (neg)
+tmp_wide = -tmp_wide;
+  return tmp_wide;
+}
+#endif
+
 /* Implementations of the iterator_group callbacks for modes.  */
 
-static int
+static HOST_WIDE_INT
 find_mode (const char *name)
 {
   int i;
@@ -167,7 +207,7 @@ find_mode (const char *name)
 }
 
 static 

[PATCH 06/31] VAX: Correct fatal issues with the `ffs' builtin

2020-11-19 Thread Maciej W. Rozycki
The `builtins.md' machine description fragment is not included anywhere
and is therefore dead code, which has become bitrotten due to non-use.

If actually enabled, it does not build due to the use of an unknown `t'
constraint:

.../gcc/config/vax/builtins.md:42:1: error: undefined machine-specific 
constraint at this point: "t"
.../gcc/config/vax/builtins.md:42:1: note:  in operand 1

which came from commit becb93d02cc1 ("builtins.md (ffssi2_internal):
Correct constraint."), which was not applied as posted and reviewed; `T'
was meant to be used instead.

Once this has been fixed this code still fails building:

.../gcc/config/vax/builtins.md: In function 'rtx_def* gen_ffssi2(rtx, rtx)':
.../gcc/config/vax/builtins.md:35:19: error: 'gen_bne' was not declared in this
scope; did you mean 'gen_use'?
   35 |   emit_jump_insn (gen_bne (label));
  |   ^~~
  |   gen_use
make[2]: *** [Makefile:1122: insn-emit.o] Error 1

Finally the FFS machine instruction sets the Z condition code according
to the comparison of the value held in the source operand against zero
rather than the value held in the target operand.  If the source operand
is found hold zero, then the target operand is set to the width of the
source operand, 32 for SImode (FFS supports arbitrary widths).

Correct the build issues then and update RTL to match the operation of
the machine instruction.  A test case will be added separately.

gcc/
* config/vax/builtins.md (ffssi2): Make preparation statements
actually buildable.
(ffssi2_internal): Fix input constraints; make the RTL pattern
match reality for `cc0'.
---
 gcc/config/vax/builtins.md | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index ac0e0271ddd..6bce7a85add 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -31,8 +31,12 @@ (define_expand "ffssi2"
   "
 {
   rtx label = gen_label_rtx ();
+  rtx label_ref = gen_rtx_LABEL_REF (VOIDmode, label);
+  rtx cond = gen_rtx_NE (VOIDmode, cc0_rtx, const0_rtx);
+  rtx target = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, label_ref, pc_rtx);
+
   emit_insn (gen_ffssi2_internal (operands[0], operands[1]));
-  emit_jump_insn (gen_bne (label));
+  emit_jump_insn (gen_rtx_SET (pc_rtx, target));
   emit_insn (gen_negsi2 (operands[0], const1_rtx));
   emit_label (label);
   emit_insn (gen_addsi3 (operands[0], operands[0], const1_rtx));
@@ -41,8 +45,10 @@ (define_expand "ffssi2"
 
 (define_insn "ffssi2_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rQ")
-   (ffs:SI (match_operand:SI 1 "general_operand" "nrQt")))
-   (set (cc0) (match_dup 0))]
+   (ffs:SI (match_operand:SI 1 "general_operand" "nrQT")))
+   (set (cc0)
+   (compare (match_dup 1)
+(const_int 0)))]
   ""
   "ffs $0,$32,%1,%0")
 
-- 
2.11.0



[PATCH 05/31] VAX: Rationalize expression and address costs

2020-11-19 Thread Maciej W. Rozycki
Expression costs are required to be given in terms of COSTS_N_INSNS (n),
which is defined to stand for the count of single fast instructions, and
actually returns `n * 4'.  The VAX backend however instead operates on
naked numbers, causing an anomaly for the integer const zero rtx, where
the cost given is 4 as opposed to 1 for integers in the [1:63] range, as
well as -1 for comparisons.  This is because the value of 0 returned by
`vax_rtx_costs' is converted to COSTS_N_INSNS (1) in `pattern_cost':

  return cost > 0 ? cost : COSTS_N_INSNS (1);

Consequently, where feasible, 1 or -1 are preferred over 0 by the middle
end causing code pessimization, e.g. rather than producing this:

subl2 $4,%sp
movl 4(%ap),%r0
jgtr .L2
addl2 $2,%r0
.L2:
ret

or this:

subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
jlss .L6
addl2 $2,%r0
.L6:
ret

code is produced like this:

subl2 $4,%sp
movl 4(%ap),%r0
cmpl %r0,$1
jgeq .L2
addl2 $2,%r0
.L2:
ret

or this:

subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
cmpl %r0,$-1
jleq .L6
addl2 $2,%r0
.L6:
ret

from this:

int
compare_mov (int x)
{
  if (x > 0)
return x;
  else
return x + 2;
}

and this:

int
compare_add (int x, int y)
{
  int z;

  z = x + y;
  if (z < 0)
return z;
  else
return z + 2;
}

respectively, which is slower and larger both at a time.

Furthermore once the backend is converted to MODE_CC this anomaly makes
it usually impossible to remove redundant comparisons in the comparison
elimination pass, because most VAX instructions set the condition codes
as per the relation of the instruction's result to 0 and not -1.

The middle end has some other assumptions as to rtx costs being given in
terms of COSTS_N_INSNS, so wrap all the VAX rtx costs then as they stand
into COSTS_N_INSNS invocations, effectively scaling the costs by 4 while
preserving their relative values, except for the integer const zero rtx
given the value of `COSTS_N_INSNS (1) / 2', half of a fast instruction
(this can be further halved if needed in the future).

Adjust address costs likewise so that they remain proportional to the
new absolute values of rtx costs.

Code size stats are as follows, collected from 17639 executables built
in `check-c' GCC testing:

  samples average  median
--
regressions  1420  0.400%  0.195%
unchanged   13811  0.000%  0.000%
progressions 2408 -0.504% -0.201%
--
total   17639 -0.037%  0.000%

with a small number of outliers only (over 5% size change):

old new change  %change filename

49915249 258 5.1693 981001-1.exe
26372777 140 5.3090 interchange-6.exe
21872307 120 5.4869 sprintf.x7
39694197 228 5.7445 pr28982a.exe
82648816 552 6.6795 vector-compare-1.exe
51995575 376 7.2321 pr28982b.exe
21132411 29814.1031 20030323-1.exe
21132411 29814.1031 20030323-1.exe
21132411 29814.1031 20030323-1.exe

so it seems we are looking good, and we have complementing reductions
to compensate:

old new change  %change filename

29192631-288-9.8663 pr57521.exe
34273167-260-7.5868 sabd_1.exe
29852765-220-7.3701 ssad-run.exe
29852765-220-7.3701 ssad-run.exe
29852765-220-7.3701 usad-run.exe
29852765-220-7.3701 usad-run.exe
45094253-256-5.6775 vshuf-v2sf.exe
45414285-256-5.6375 vshuf-v2si.exe
46734417-256-5.4782 vshuf-v2df.exe
29932841-152-5.0785 abs-2.x4
29932841-152-5.0785 abs-3.x4

This actually causes `loop-8.c' to regress:

FAIL: gcc.dg/loop-8.c scan-rtl-dump-times loop2_invariant "Decided" 1
FAIL: gcc.dg/loop-8.c scan-rtl-dump-not loop2_invariant "without introducing a 
new temporary register"

but upon a closer inspection this is a red herring.  Old code looks as
follows:

.file   "loop-8.c"
.text
.align 1
.globl f
.type   f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl 8(%ap),%r3
movl $42,(%r2)
clrl %r0
movl $42,%r1
movl %r1,%r4
jbr .L2
.L5:
movl %r4,%r1
.L2:
movl %r1,(%r3)[%r0]
incl %r0
cmpl %r0,$100
jeql .L6
movl $42,(%r2)[%r0]
bicl3 $-2,%r0,%r1
jeql .L5
movl %r0,%r1
jbr .L2
.L6:
ret
.size   f, .-f

while new one is like below:

.file   "loop-8.c"
.text
.align 1
.globl f
.type   f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl $42,(%r2)+
movl 8(%ap),%r1
clrl %r0

[PATCH 04/31] VAX/testsuite: Run target testing over all the usual optimization levels

2020-11-19 Thread Maciej W. Rozycki
It makes sense to use what other targets do and run all the VAX test
cases over all the usual optimization levels, so make `vax.exp' use our
`gcc-dg-runtest' rather than the generic `dg-runtest' test driver.

This breaks `pr56875.c' however, which is optimized away at levels above
`-O0' as a result of how it has been written for calculations to make no
effect:

FAIL: gcc.target/vax/pr56875.c   -O1   scan-assembler ashq 
.*,\\$0x,
FAIL: gcc.target/vax/pr56875.c   -O2   scan-assembler ashq 
.*,\\$0x,
FAIL: gcc.target/vax/pr56875.c   -O3 -g   scan-assembler ashq 
.*,\\$0x,
FAIL: gcc.target/vax/pr56875.c   -Os   scan-assembler ashq 
.*,\\$0x,
FAIL: gcc.target/vax/pr56875.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   scan-assembler ashq .*,\\$0x,
FAIL: gcc.target/vax/pr56875.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler ashq .*,\\$0x,

Rather than keeping it at `-O0' update the test case for its code to do
make effect while retaining its sense.  Also reformat it according to
our requirements.

gcc/testsuite/
* gcc.target/vax/vax.exp: Use `gcc-dg-runtest' rather than
`dg-runtest'.
* gcc.target/vax/pr56875.c (dg-options): Make empty.
(a): Rewrite for calculations to make effect.  Reformat.
---
 gcc/testsuite/gcc.target/vax/pr56875.c | 11 ---
 gcc/testsuite/gcc.target/vax/vax.exp   |  2 +-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/vax/pr56875.c 
b/gcc/testsuite/gcc.target/vax/pr56875.c
index f409afe88e7..191e05e166e 100644
--- a/gcc/testsuite/gcc.target/vax/pr56875.c
+++ b/gcc/testsuite/gcc.target/vax/pr56875.c
@@ -1,13 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O0" } */
+/* { dg-options "" } */
 /* { dg-final { scan-assembler "ashq .*,\\\$0x," } } */
 /* { dg-final { scan-assembler-not "ashq .*,\\\$-1," } } */
 
-void
-a (void)
+unsigned long long
+a (unsigned long i)
 {
-   unsigned long i = 1;
-   unsigned long long v;
-
-   v = ~ (unsigned long long) 0 << i;
+  return ~(unsigned long long) 0 << i;
 }
diff --git a/gcc/testsuite/gcc.target/vax/vax.exp 
b/gcc/testsuite/gcc.target/vax/vax.exp
index 4f480559e12..678e9007686 100644
--- a/gcc/testsuite/gcc.target/vax/vax.exp
+++ b/gcc/testsuite/gcc.target/vax/vax.exp
@@ -34,7 +34,7 @@ if ![info exists DEFAULT_CFLAGS] then {
 dg-init
 
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
"" $DEFAULT_CFLAGS
 
 # All done.
-- 
2.11.0



[PATCH 03/31] VAX: Define LEGITIMATE_PIC_OPERAND_P

2020-11-19 Thread Maciej W. Rozycki
The VAX ELF psABI does not permit the use of all hardware operand modes
for PIC symbol references due to the need to use PC-relative addressing
for symbols that end up local and the need to make references indirect
symbols that end up global.

Therefore symbols referred as immediates may only be used with the move
and push address (MOVA and PUSHA) instructions and their PC-relative
displacement address mode, as there is no genuine PC-relative immediate
available that all the other instructions would have to use.

Furthermore global symbol references must not have an offset applied,
which has to be added with a separate instruction, because there is no
support now for GOT entries for external `symbol+offset' references, so
any indirect GOT references made by the static linker from the original
direct symbol references must not have an addend applied.  Consequently
no addend is allowed even if a given external symbol turns out local,
for whatever reason, at the static link time.

Define the LEGITIMATE_PIC_OPERAND_P macro then, a corresponding function
and predicate to exclude the relevant expressions as required, and then
a constraint so that reloads are produced where needed, and use the new
facilities in the machine description, folding corresponding duplicated
patterns for local and external symbols together.  Rewrite predicates to
make use of the new function, rename them to match their sense and also
remove ones no longer used.

All this fixing an ICE like this:

during RTL pass: postreload
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c: In function 'testE':
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:89:1: internal compiler 
error: in reload_combine_note_use, at postreload.c:1559
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:96:65: note: in expansion 
of macro 'T'
0x10fe84cb reload_combine_note_use
.../gcc/postreload.c:1559
0x10fe8857 reload_combine_note_use
.../gcc/postreload.c:1621
0x10fe8303 reload_combine_note_use
.../gcc/postreload.c:1517
0x10fe7c7b reload_combine
.../gcc/postreload.c:1408
0x10fe3417 reload_cse_regs
.../gcc/postreload.c:67
0x10feaf9f execute
.../gcc/postreload.c:2358

due to the presence of a pseudo register post-reload:

(insn 435 228 229 13 (set (reg:SI 1 %r1)
(mem/c:SI (reg/f:SI 341) [25 sE+12 S4 A8])) 
".../gcc/testsuite/gcc.c-torture/execute/20040709-2.c":96:65 12 {movsi_2}
 (nil))

(due to the use of an offset `sE+12' symbol reference) and removing
these regressions:

FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr52028.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.dg/torture/pr52028.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)

gcc/
* config/vax/constraints.md (A): New constraint.
* config/vax/predicates.md (external_symbolic_operand)
(external_const_operand): Remove predicates.
(local_symbolic_operand): Rename to...
(pic_symbolic_operand): ... this, and rework.
(external_memory_operand): Rename to...
(non_pic_external_memory_operand): ... this, and rework.
(illegal_blk_memory_operand, illegal_addsub_di_memory_operand):
Update accordingly.
* config/vax/vax-protos.h 

[PATCH 02/31] VAX: Remove `c' operand format specifier overload

2020-11-19 Thread Maciej W. Rozycki
The `c' operand format specifier is handled directly by the middle end
in `output_asm_insn':

   %cN means require operand N to be a constant
  and print the constant expression with no punctuation.

however it resorts to the target for constants that are not valid
addresses:

else if (letter == 'c')
  {
if (CONSTANT_ADDRESS_P (operands[opnum]))
  output_addr_const (asm_out_file, operands[opnum]);
else
  output_operand (operands[opnum], 'c');
  }

The VAX backend expects the fallback never to happen and overloads `c'
with the branch condition code.  This is confusing however and it is not
like we are short of letters, so instead make the branch condition code
use `k', and then for consistency make `K' the reverse branch condition
code format specifier.  This is safe to do as we provide no means to use
a computed branch condition code in user `asm'.

gcc/
* config/vax/vax.c (print_operand): Replace `c' and `C' with
`k' and `K' respectively.
* config/vax/vax.md (*branch, *branch_reversed): Update
accordingly.
---
 gcc/config/vax/vax.c  | 4 ++--
 gcc/config/vax/vax.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/vax/vax.c b/gcc/config/vax/vax.c
index da4e6cb1745..0b3b76ed6da 100644
--- a/gcc/config/vax/vax.c
+++ b/gcc/config/vax/vax.c
@@ -509,9 +509,9 @@ print_operand (FILE *file, rtx x, int code)
 fputc (ASM_DOUBLE_CHAR, file);
   else if (code == '|')
 fputs (REGISTER_PREFIX, file);
-  else if (code == 'c')
+  else if (code == 'k')
 fputs (cond_name (x), file);
-  else if (code == 'C')
+  else if (code == 'K')
 fputs (rev_cond_name (x), file);
   else if (code == 'D' && CONST_INT_P (x) && INTVAL (x) < 0)
 fprintf (file, "$" NEG_HWI_PRINT_HEX16, INTVAL (x));
diff --git a/gcc/config/vax/vax.md b/gcc/config/vax/vax.md
index 4897ce44505..e3018a0ee06 100644
--- a/gcc/config/vax/vax.md
+++ b/gcc/config/vax/vax.md
@@ -,7 +,7 @@ (define_insn "*branch"
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "j%c0 %l1")
+  "j%k0 %l1")
 
 ;; Recognize reversed jumps.
 (define_insn "*branch_reversed"
@@ -1122,7 +1122,7 @@ (define_insn "*branch_reversed"
  (pc)
  (label_ref (match_operand 1 "" ""]
   ""
-  "j%C0 %l1") ; %C0 negates condition
+  "j%K0 %l1") ; %K0 negates condition
 
 ;; Recognize jbs, jlbs, jbc and jlbc instructions.  Note that the operand
 ;; of jlbs and jlbc insns are SImode in the hardware.  However, if it is
-- 
2.11.0



[PATCH 01/31] PR target/58901: reload: Handle SUBREG of MEM with a mode-dependent address

2020-11-19 Thread Maciej W. Rozycki
From: Matt Thomas 

Fix an ICE with the handling of RTL expressions like:

(subreg:QI (mem/c:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 0 %r0 [orig:67 i ] 
[67])
(const_int 4 [0x4]))
(reg/v/f:SI 7 %r7 [orig:59 doacross ] [59]))
(const_int 40 [0x28])) [1 MEM[(unsigned int *)doacross_63 + 40B + 
i_106 * 4]+0 S4 A32]) 0)

that causes the compilation of libgomp to fail:

during RTL pass: reload
.../libgomp/ordered.c: In function 'GOMP_doacross_wait':
.../libgomp/ordered.c:507:1: internal compiler error: in change_address_1, at 
emit-rtl.c:2275
  507 | }
  | ^
0x10a3462b change_address_1
.../gcc/emit-rtl.c:2275
0x10a353a7 adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, 
int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x10ae2993 alter_subreg(rtx_def**, bool)
.../gcc/final.c:3368
0x10ae25cf cleanup_subreg_operands(rtx_insn*)
.../gcc/final.c:3322
0x110922a3 reload(rtx_insn*, int)
.../gcc/reload1.c:1232
0x10de2bf7 do_reload
.../gcc/ira.c:5812
0x10de3377 execute
.../gcc/ira.c:5986

in a `vax-netbsdelf' build, where an attempt is made to change the mode
of the contained memory reference to the mode of the containing SUBREG.
Such RTL expressions are produced by the VAX shift and rotate patterns
(`ashift', `ashiftrt', `rotate', `rotatert') where the count operand
always has the QI mode regardless of the mode, either SI or DI, of the
datum shifted or rotated.

Such a mode change cannot work where the memory reference uses the
indexed addressing mode, where a multiplier is implied that in the VAX
ISA depends on the width of the memory access requested and therefore
changing the machine mode would change the address calculation as well.

Avoid the attempt then by forcing the reload of any SUBREGs containing
a mode-dependent memory reference, also fixing these regressions:

FAIL: gcc.c-torture/compile/pr46883.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/compile/pr46883.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/20050629-1.c (internal compiler error)
FAIL: gcc.dg/20050629-1.c (test for excess errors)
FAIL: c-c++-common/torture/pr53505.c   -Os  (internal compiler error)
FAIL: c-c++-common/torture/pr53505.c   -Os  (test for excess errors)
FAIL: gfortran.dg/coarray_failed_images_1.f08   -Os  (internal compiler error)
FAIL: gfortran.dg/coarray_stopped_images_1.f08   -Os  (internal compiler error)

First posted at: .

gcc/
PR target/58901
* reload.c (reload_inner_reg_of_subreg): Also request reloading
for pseudo registers associated with mode dependent memory
references.
(push_reload): Handle pseudo registers.
---
 gcc/reload.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/gcc/reload.c b/gcc/reload.c
index 445f9bdca43..dbf83733815 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -838,6 +838,7 @@ static bool
 reload_inner_reg_of_subreg (rtx x, machine_mode mode, bool output)
 {
   rtx inner;
+  int regno;
 
   /* Only SUBREGs are problematical.  */
   if (GET_CODE (x) != SUBREG)
@@ -849,10 +850,21 @@ reload_inner_reg_of_subreg (rtx x, machine_mode mode, 
bool output)
   if (CONSTANT_P (inner) || GET_CODE (inner) == PLUS)
 return true;
 
-  /* If INNER is not a hard register, then INNER will not need reloading.  */
-  if (!(REG_P (inner) && HARD_REGISTER_P (inner)))
+  /* If INNER is not a register, then INNER will not need reloading.  */
+  if (!REG_P (inner))
 return false;
 
+  regno = REGNO (inner);
+
+  /* If INNER is not a hard register, then INNER will not need reloading
+ unless it's a mode dependent memory reference.  */
+  if (regno >= FIRST_PSEUDO_REGISTER)
+return (!output
+   && reg_equiv_mem (regno) != 0
+   && 

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.

If bootstrap/regtest passes for ppc64le and x86_64, ok for trunk?

gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* c-family/c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
* c/c-aux-info.c (gen_type): Support opaque types.
* cp/error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* cp/pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* cp/typeck.c (structural_comptypes): Handle comparison
of opaque types.
* dwarf2out.c (is_base_type): Handle opaque types.
(loc_descriptor): Handle opaque modes like VOIDmode/BLKmode.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
---
 gcc/builtins.c| 1 +
 gcc/c-family/c-pretty-print.c | 2 ++
 gcc/c/c-aux-info.c| 4 
 gcc/cp/error.c| 4 
 gcc/cp/pt.c   | 2 ++
 gcc/cp/typeck.c   | 1 +
 gcc/dwarf2out.c   | 4 +++-
 gcc/expr.c| 1 +
 gcc/ipa-devirt.c  | 1 +
 gcc/tree-streamer.c   | 1 +
 gcc/tree.c| 2 ++
 gcc/typeclass.h   | 2 +-
 12 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 42c52a1925e..0958abcae49 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2228,6 +2228,7 @@ type_to_class (tree type)
 case ARRAY_TYPE:  return (TYPE_STRING_FLAG (type)
   ? string_type_class : array_type_class);
 case LANG_TYPE:   return lang_type_class;
+case OPAQUE_TYPE:  return opaque_type_class;
 default:  return no_type_class;
 }
 }
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 8953e3b678b..3027703056b 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -342,6 +342,7 @@ c_pretty_printer::simple_type_specifier (tree t)
   break;
 
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
@@ -662,6 +663,7 @@ c_pretty_printer::direct_abstract_declarator (tree t)
 
 case IDENTIFIER_NODE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c
index ffc8099856d..41f5598de38 100644
--- a/gcc/c/c-aux-info.c
+++ b/gcc/c/c-aux-info.c
@@ -413,6 +413,10 @@ gen_type (const char *ret_val, tree t, formals_style style)
  data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
  break;
 
+   case OPAQUE_TYPE:
+ data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
+ break;
+
case VOID_TYPE:
  data_type = "void";
  break;
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 396558be17f..d27545d1223 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -529,6 +529,7 @@ dump_type (cxx_pretty_printer *pp, tree t, int flags)
 case INTEGER_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -874,6 +875,7 @@ dump_type_prefix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -997,6 +999,7 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -2810,6 +2813,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case ENUMERAL_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case COMPLEX_TYPE:
diff --git a/gcc/cp/pt.c 

[PATCH] c++: Fix wrong error with constexpr destructor [PR97427]

2020-11-19 Thread Marek Polacek via Gcc-patches
When I implemented the code to detect modifying const objects in
constexpr contexts, we couldn't have constexpr destructors, so I didn't
consider them.  But now we can and that caused a bogus error in this
testcase: [class.dtor]p5 says that "const and volatile semantics are not
applied on an object under destruction.  They stop being in effect when
the destructor for the most derived object starts." so we have to clear
the TREE_READONLY flag we set on the object after the constructors have
been called to mark it as no-longer-under-construction.  In the ~Foo
call it's now an object under destruction, so don't report those errors.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/97427
* constexpr.c (cxx_set_object_constness): New function.
(cxx_eval_call_expression): Set new_obj for destructors too.
Call cxx_set_object_constness to set/unset TREE_READONLY of
the object under construction/destruction.

gcc/testsuite/ChangeLog:

PR c++/97427
* g++.dg/cpp2a/constexpr-dtor10.C: New test.
---
 gcc/cp/constexpr.c| 49 +--
 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor10.C | 16 ++
 2 files changed, 49 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor10.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 625410327b8..ef37b3043a5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2187,6 +2187,27 @@ cxx_eval_thunk_call (const constexpr_ctx *ctx, tree t, 
tree thunk_fndecl,
   non_constant_p, overflow_p);
 }
 
+/* If OBJECT is of const class type, evaluate it to a CONSTRUCTOR and set
+   its TREE_READONLY flag according to READONLY_P.  Used for constexpr
+   'tors to detect modifying const objects in a constexpr context.  */
+
+static void
+cxx_set_object_constness (const constexpr_ctx *ctx, tree object,
+ bool readonly_p, bool *non_constant_p,
+ bool *overflow_p)
+{
+  if (CLASS_TYPE_P (TREE_TYPE (object))
+  && CP_TYPE_CONST_P (TREE_TYPE (object)))
+{
+  /* Subobjects might not be stored in ctx->global->values but we
+can get its CONSTRUCTOR by evaluating *this.  */
+  tree e = cxx_eval_constant_expression (ctx, object, /*lval*/false,
+non_constant_p, overflow_p);
+  if (TREE_CODE (e) == CONSTRUCTOR && !*non_constant_p)
+   TREE_READONLY (e) = readonly_p;
+}
+}
+
 /* Subroutine of cxx_eval_constant_expression.
Evaluate the call expression tree T in the context of OLD_CALL expression
evaluation.  */
@@ -2515,11 +2536,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
 
   depth_ok = push_cx_call_context (t);
 
-  /* Remember the object we are constructing.  */
+  /* Remember the object we are constructing or destructing.  */
   tree new_obj = NULL_TREE;
-  if (DECL_CONSTRUCTOR_P (fun))
+  if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
 {
-  /* In a constructor, it should be the first `this' argument.
+  /* In a cdtor, it should be the first `this' argument.
 At this point it has already been evaluated in the call
 to cxx_bind_parameters_in_call.  */
   new_obj = TREE_VEC_ELT (new_call.bindings, 0);
@@ -2656,6 +2677,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  unsigned save_heap_alloc_count = ctx->global->heap_vars.length ();
  unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count;
 
+ /* If this is a constexpr destructor, the object's const and volatile
+semantics are no longer in effect; see [class.dtor]p5.  */
+ if (new_obj && DECL_DESTRUCTOR_P (fun))
+   cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/false,
+ non_constant_p, overflow_p);
+
  tree jump_target = NULL_TREE;
  cxx_eval_constant_expression (_with_save_exprs, body,
lval, non_constant_p, overflow_p,
@@ -2686,19 +2713,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 the object is no longer under construction, and its possible
 'const' semantics now apply.  Make a note of this fact by
 marking the CONSTRUCTOR TREE_READONLY.  */
- if (new_obj
- && CLASS_TYPE_P (TREE_TYPE (new_obj))
- && CP_TYPE_CONST_P (TREE_TYPE (new_obj)))
-   {
- /* Subobjects might not be stored in ctx->global->values but we
-can get its CONSTRUCTOR by evaluating *this.  */
- tree e = cxx_eval_constant_expression (ctx, new_obj,
-/*lval*/false,
-non_constant_p,
-overflow_p);
- if 

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.

If bootstrap/regtest passes for ppc64le and x86_64, ok for trunk?

gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* c-family/c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
* c/c-aux-info.c (gen_type): Support opaque types.
* cp/error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* cp/pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* cp/typeck.c (structural_comptypes): Handle comparison
of opaque types.
* dwarf2out.c (is_base_type): Handle opaque types.
(loc_descriptor): Handle opaque modes like VOIDmode/BLKmode.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
---
 gcc/builtins.c| 1 +
 gcc/c-family/c-pretty-print.c | 2 ++
 gcc/c/c-aux-info.c| 4 
 gcc/cp/error.c| 4 
 gcc/cp/pt.c   | 2 ++
 gcc/cp/typeck.c   | 1 +
 gcc/dwarf2out.c   | 4 +++-
 gcc/expr.c| 1 +
 gcc/ipa-devirt.c  | 1 +
 gcc/tree-streamer.c   | 1 +
 gcc/tree.c| 2 ++
 gcc/typeclass.h   | 2 +-
 12 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 42c52a1925e..0958abcae49 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2228,6 +2228,7 @@ type_to_class (tree type)
 case ARRAY_TYPE:  return (TYPE_STRING_FLAG (type)
   ? string_type_class : array_type_class);
 case LANG_TYPE:   return lang_type_class;
+case OPAQUE_TYPE:  return opaque_type_class;
 default:  return no_type_class;
 }
 }
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 8953e3b678b..3027703056b 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -342,6 +342,7 @@ c_pretty_printer::simple_type_specifier (tree t)
   break;
 
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
@@ -662,6 +663,7 @@ c_pretty_printer::direct_abstract_declarator (tree t)
 
 case IDENTIFIER_NODE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c
index ffc8099856d..41f5598de38 100644
--- a/gcc/c/c-aux-info.c
+++ b/gcc/c/c-aux-info.c
@@ -413,6 +413,10 @@ gen_type (const char *ret_val, tree t, formals_style style)
  data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
  break;
 
+   case OPAQUE_TYPE:
+ data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
+ break;
+
case VOID_TYPE:
  data_type = "void";
  break;
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 396558be17f..d27545d1223 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -529,6 +529,7 @@ dump_type (cxx_pretty_printer *pp, tree t, int flags)
 case INTEGER_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -874,6 +875,7 @@ dump_type_prefix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -997,6 +999,7 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -2810,6 +2813,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case ENUMERAL_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case COMPLEX_TYPE:
diff --git a/gcc/cp/pt.c 

[PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread Aaron Sawdey via Gcc-patches
For some reason this patch never showed up on gcc-patches.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> Begin forwarded message:
> 
> From: acsaw...@linux.ibm.com
> Subject: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]
> Date: November 19, 2020 at 12:58:47 PM CST
> To: gcc-patches@gcc.gnu.org
> Cc: seg...@kernel.crashing.org, wschm...@linux.ibm.com, 
> berg...@linux.ibm.com, Aaron Sawdey 
> 
> From: Aaron Sawdey 
> 
> Segher & Bergner -
>  Thanks for the reviews, here's the updated patch after fixing those things.
> We now have an UNSPEC for xxsetaccz, and an accompanying change to
> rs6000_rtx_costs to make it be cost 0 so that CSE doesn't try to replace it
> with a bunch of register moves.
> 
> If bootstrap/regtest looks good, ok for trunk?
> 
> Thanks,
>Aaron
> 
> gcc/
>   * gcc/config/rs6000/mma.md (unspec): Add assemble/extract UNSPECs.
>   (movoi): Change to movoo.
>   (*movpoi): Change to *movoo.
>   (movxi): Change to movxo.
>   (*movpxi): Change to *movxo.
>   (mma_assemble_pair): Change to OO mode.
>   (*mma_assemble_pair): New define_insn_and_split.
>   (mma_disassemble_pair): New define_expand.
>   (*mma_disassemble_pair): New define_insn_and_split.
>   (mma_assemble_acc): Change to XO mode.
>   (*mma_assemble_acc): Change to XO mode.
>   (mma_disassemble_acc): New define_expand.
>   (*mma_disassemble_acc): New define_insn_and_split.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to OO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   * gcc/config/rs6000/predicates.md (input_operand): Allow opaque.
>   (mma_disassemble_output_operand): New predicate.
>   * gcc/config/rs6000/rs6000-builtin.def:
>   Changes to disassemble builtins.
>   * gcc/config/rs6000/rs6000-call.c (rs6000_return_in_memory):
>   Disallow __vector_pair/__vector_quad as return types.
>   (rs6000_promote_function_mode): Remove function return type
>   check because we can't test it here any more.
>   (rs6000_function_arg): Do not allow __vector_pair/__vector_quad
>   as as function arguments.
>   (rs6000_gimple_fold_mma_builtin):
>   Handle mma_disassemble_* builtins.
>   (rs6000_init_builtins): Create types for XO/OO modes.
>   * gcc/config/rs6000/rs6000-modes.def: DElete OI, XI,
>   POI, and PXI modes, and create XO and OO modes.
>   * gcc/config/rs6000/rs6000-string.c (expand_block_move):
>   Update to OO mode.
>   * gcc/config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached):
>   Update for XO/OO modes.
>   (rs6000_rtx_costs): Make UNSPEC_MMA_XXSETACCZ cost 0.
>   (rs6000_modes_tieable_p): Update for XO/OO modes.
>   (rs6000_debug_reg_global): Update for XO/OO modes.
>   (rs6000_setup_reg_addr_masks): Update for XO/OO modes.
>   (rs6000_init_hard_regno_mode_ok): Update for XO/OO modes.
>   (reg_offset_addressing_ok_p): Update for XO/OO modes.
>   (rs6000_emit_move): Update for XO/OO modes.
>   (rs6000_preferred_reload_class): Update for XO/OO modes.
>   (rs6000_split_multireg_move): Update for XO/OO modes.
>   (rs6000_mangle_type): Update for opaque types.
>   (rs6000_invalid_conversion): Update for XO/OO modes.
>   * gcc/config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P):
>   Update for XO/OO modes.
>   * gcc/config/rs6000/rs6000.md (RELOAD): Update for XO/OO modes.
> gcc/testsuite/
>   * gcc.target/powerpc/mma-double-test.c (main): Call abort for failure.
>   * gcc.target/powerpc/mma-single-test.c (main): Call abort for failure.
>   * gcc.target/powerpc/pr96506.c: Rename to pr96506-1.c.
>   * gcc.target/powerpc/pr96506-2.c: New test.
> ---
> gcc/config/rs6000/mma.md  | 421 ++
> gcc/config/rs6000/predicates.md   |  12 +
> gcc/config/rs6000/rs6000-builtin.def  |  14 +-
> gcc/config/rs6000/rs6000-call.c   | 142 +++---
> gcc/config/rs6000/rs6000-modes.def|  10 +-
> gcc/config/rs6000/rs6000-string.c |   6 +-
> gcc/config/rs6000/rs6000.c| 193 
> gcc/config/rs6000/rs6000.h|   3 +-
> gcc/config/rs6000/rs6000.md   |   2 +-
> .../gcc.target/powerpc/mma-double-test.c  |   3 +
> .../gcc.target/powerpc/mma-single-test.c  |   3 +
> .../powerpc/{pr96506.c => pr96506-1.c}|  24 -
> gcc/testsuite/gcc.target/powerpc/pr96506-2.c  |  38 ++
> 13 files changed, 508 insertions(+), 363 deletions(-)
> rename 

[r11-5181 Regression] FAIL: gcc.dg/vect/vect-35.c scan-tree-dump vect "can't determine dependence between" on Linux/x86_64

2020-11-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

0862d007b564eca8c9a48fca0e689dd3f90db828 is the first bad commit
commit 0862d007b564eca8c9a48fca0e689dd3f90db828
Author: Jan Hubicka 
Date:   Thu Nov 19 20:16:26 2020 +0100

Fix two bugs in operand_equal_p

caused

FAIL: gcc.dg/vect/vect-35-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35-big-array.c -flto -ffat-lto-objects  scan-tree-dump 
vect "can't determine dependence between"
FAIL: gcc.dg/vect/vect-35-big-array.c scan-tree-dump-times vect "vectorized 1 
loops" 1
FAIL: gcc.dg/vect/vect-35-big-array.c scan-tree-dump vect "can't determine 
dependence between"
FAIL: gcc.dg/vect/vect-35.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35.c -flto -ffat-lto-objects  scan-tree-dump vect "can't 
determine dependence between"
FAIL: gcc.dg/vect/vect-35.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35.c scan-tree-dump vect "can't determine dependence 
between"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-5181/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] PowerPC: Add float128/Decimal conversions

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Add float128/Decimal conversions.

I accidently posted this patch on an internal IBM mailing list instead of
gcc-patches.

This patch replaces the following two patches:

September 24th, 2020:
Message-ID: <20200924203545.gd31...@ibm-toto.the-meissners.org>

October 22nd, 2020:
Message-ID: <2020100603.ga11...@ibm-toto.the-meissners.org>

This is a simplification of those patches.  Those patches were initially
written before I was using the final glibc 2.32 (Advance Toolchain AT14.0).
With using that glibc and with the previous IEEE patches submitted, I can
simplify the conversions to just use the long double defaults, compiling the
modules for IEEE 128-bit long double.  It works because stdio.h/gcc switches
the sprintf call to __sprintfieee128, and the strtold call to __strtof128.

While most of the Decimal <-> Long double tests now pass when long doubles are
IEEE 128-bit, there is one test that fails:

c-c++-common/dfp/convert-bfp-11.c

This test explicitly expects long double to be IBM 128-bit extended double.  A
later patch will fix this.

If the glibc is not 2.32 or later, this code just compiles to using abort.
That way the user won't get unknown reference errors due to the calls to the
glibc 2.32 functions that aren't in previous glibcs.

This patch is one of three critical patches needed to be able to build
compilers where the default is IEEE 128-bit.  The other patches were the
patches to rename the built-in functions, and the patches for prs 97543 and
97643 that were both posted earlier.

I have tested this patch on a little endian power9 system running Linux,
building bootstrap compilers with the 3 long double flavors (long double is
128-bit IEEE, long double is 128-bit IBM, and long double is 64-bit).  There
are no regressions with long double set to 128-bit IBM.

With the exception of convert-bfp-11.c mentioned above, none of the regressions
in the long double set to 128-bit IEEE affect the decimal support.

Can I check this into the master branch?

libgcc/
2020-11-17  Michael Meissner  

* config/rs6000/t-float128 (fp128_dec_funcs): New macro.
(ibm128_dec_funcs): New macro.
(fp128_ppc_funcs): Add the Decimal <-> __float128 conversions.
(fp128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(ibm128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(FP128_CFLAGS_DECIMAL): New macro.
(IBM128_CFLAGS_DECIMAL): New macro.
* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
---
 libgcc/config/rs6000/_dd_to_kf.c | 58 +++
 libgcc/config/rs6000/_kf_to_dd.c | 57 ++
 libgcc/config/rs6000/_kf_to_sd.c | 58 +++
 libgcc/config/rs6000/_kf_to_td.c | 56 ++
 libgcc/config/rs6000/_sd_to_kf.c | 59 
 libgcc/config/rs6000/_td_to_kf.c | 58 +++
 libgcc/config/rs6000/t-float128  | 26 +-
 7 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/rs6000/_dd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_kf_to_dd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_sd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_td.c
 create mode 100644 libgcc/config/rs6000/_sd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_td_to_kf.c

diff --git a/libgcc/config/rs6000/_dd_to_kf.c b/libgcc/config/rs6000/_dd_to_kf.c
new file mode 100644
index 000..93601fa280e
--- /dev/null
+++ b/libgcc/config/rs6000/_dd_to_kf.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Decimal64 -> _Float128 conversion.  */
+
+/* FINE_GRAINED_LIBRARIES is used so we can isolate just to dd_to_tf 

[PATCH] PowerPC: Set long double size for IBM/IEEE.

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Set long double size for IBM/IEEE.

I originally posted this patch to an internal IBM mailing list instead of
gcc-patches.

As I was working with compilers where the long double default was 64-bit, it
became annoying to have to use two options to switch to one of the 128-bit long
double types (i.e. you need both -mlong-double-128 and the
-mabi={ieee,ibm}longdouble to switch the long double type).

I did this patch so that if you explicitly set the long double ABI via the
-mabi= option, it would automatically set the long double size if that was not
set explicitly.

gcc/
2020-11-17  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_option_override_internal): If the
user explicitly used -mabi=ieeelongdouble or -mabi=ibmlongdouble,
set the long double size to 128.
* doc/invoke.texi (PowerPC options): Document that an explicit
-mabi=ieeelongdouble or -mabi=ibmlongdouble implicitly sets
-mlong-double-128.
---
 gcc/config/rs6000/rs6000.c | 9 +++--
 gcc/doc/invoke.texi| 7 ---
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 35e9c844e17..6edd17a0b69 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4131,8 +4131,13 @@ rs6000_option_override_internal (bool global_init_p)
 
   /* Use long double size to select the appropriate long double.  We use
  TYPE_PRECISION to differentiate the 3 different long double types.  We map
- 128 into the precision used for TFmode.  */
-  int default_long_double_size = (RS6000_DEFAULT_LONG_DOUBLE_SIZE == 64
+ 128 into the precision used for TFmode.
+
+ If the user explicitly used -mabi=ieeelongdouble or -mabi=ibmlongdouble,
+ but the compiler was configured for default 64-bit long doubles, set the
+ long double to be 128.  */
+  int default_long_double_size = ((RS6000_DEFAULT_LONG_DOUBLE_SIZE == 64
+  && !global_options_set.x_rs6000_ieeequad)
  ? 64
  : FLOAT_PRECISION_TFmode);
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3510a54c6c4..89d530f1d1e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27113,7 +27113,8 @@ Change the current ABI to use IBM extended-precision 
long double.
 This is not likely to work if your system defaults to using IEEE
 extended-precision long double.  If you change the long double type
 from IEEE extended-precision, the compiler will issue a warning unless
-you use the @option{-Wno-psabi} option.  Requires @option{-mlong-double-128}
+you use the @option{-Wno-psabi} option.  If this option is used, it
+will implicitly enable @option{-mlong-double-128}.
 to be enabled.
 
 @item -mabi=ieeelongdouble
@@ -27122,8 +27123,8 @@ Change the current ABI to use IEEE extended-precision 
long double.
 This is not likely to work if your system defaults to using IBM
 extended-precision long double.  If you change the long double type
 from IBM extended-precision, the compiler will issue a warning unless
-you use the @option{-Wno-psabi} option.  Requires @option{-mlong-double-128}
-to be enabled.
+you use the @option{-Wno-psabi} option.  If this option is used, it
+will implicitly enable @option{-mlong-double-128}.
 
 @item -mabi=elfv1
 @opindex mabi=elfv1
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] PowerPC: Map IEEE 128-bit long double built-in functions

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Map IEEE 128-bit long double built-in functions.

I posted this patch by accident to an internal IBM mailing list instead of
gcc-patches.

This patch replaces patches previously submitted:

September 24th, 2020:
Message-ID: <20200924203159.ga31...@ibm-toto.the-meissners.org>

October 9th, 2020:
Message-ID: <20201009043543.ga11...@ibm-toto.the-meissners.org>

October 24th, 2020:
Message-ID: <2020100346.ga8...@ibm-toto.the-meissners.org>

This patch maps the built-in functions that take or return long double
arguments on systems where long double is IEEE 128-bit.

This patch goes through the built-in functions and changes the name of the
math, scanf, and printf built-in functions to use the functions that GLIBC
provides when long double uses the IEEE 128-bit representation.

In addition, changing the name in GCC allows the Fortran compiler to
automatically use the correct name.

To map the math functions, typically this patch changes l to
__ieee128.  However there are some exceptions that are handled with this
patch.

To map the printf functions,  is mapped to __ieee128.

To map the scanf functions,  is mapped to __isoc99_ieee128.

With the other IEEE long double patches, I have tested this patch by building 3
bootstrap compilers on a little endian power9 system, using the Advance
Toolchain AT14.0 library, which uses GLIBC 2.32:

1)  One compiler defaulted long double to IBM extended double;
2)  One compiler defaulted long double to IEEE 128-bit; (and)
3)  One compiler defaulted long double to 64 bit.

I was able to bootstrap each compiler and run make check.  In addition for the
compilers using the two 128-bit long double types (IBM, IEEE), I have built the
spec 2017 benchmark for both power9 and power10.

At the moment, there are some differences between between the three runs for
make check.  I have some patches to fix these issue that I've done in the past,
and I will be working on resubmitting them in the future.

In addition, there are 3 fortran benchmarks (ieee/large_2.f90,
default_format_2.f90, and default_format_denormal_2.f90) that now pass when the
long double default is IEEE 128-bit.

Can I check this into the master branch?

gcc/
2020-11-17  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
support for mapping built-in function names for long double
built-in functions if long double is IEEE 128-bit.

gcc/testsuite/
2020-11-17  Michael Meissner  

* gcc.target/powerpc/float128-longdouble-math.c: New test.
* gcc.target/powerpc/float128-longdouble-stdio.c: New test.
* gcc.target/powerpc/float128-math.c: Adjust test for new name
being generated.  Add support for running test on power10.  Add
support for running if long double defaults to 64-bits.
---
 gcc/config/rs6000/rs6000.c| 135 --
 .../powerpc/float128-longdouble-math.c| 442 ++
 .../powerpc/float128-longdouble-stdio.c   |  36 ++
 .../gcc.target/powerpc/float128-math.c|  16 +-
 4 files changed, 589 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a5188553593..35e9c844e17 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27065,57 +27065,128 @@ rs6000_globalize_decl_name (FILE * stream, tree decl)
library before you can switch the real*16 type at compile time.
 
We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  We
-   only do this if the default is that long double is IBM extended double, and
-   the user asked for IEEE 128-bit.  */
+   only do this transformation if the __float128 type is enabled.  This
+   prevents us from doing the transformation on older 32-bit ports that might
+   have enabled using IEEE 128-bit floating point as the default long double
+   type.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
-  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
   && TREE_CODE (decl) == FUNCTION_DECL
-  && DECL_IS_UNDECLARED_BUILTIN (decl))
+  && DECL_IS_UNDECLARED_BUILTIN (decl)
+  && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
 {
   size_t len = IDENTIFIER_LENGTH (id);
   const char *name = IDENTIFIER_POINTER (id);
+  char *newname = NULL;
 
-  if (name[len - 1] == 'l')
+  /* See if it is one of the built-in functions with an unusual name.  */
+  switch (DECL_FUNCTION_CODE (decl))
{
- bool uses_ieee128_p = false;
- tree type = TREE_TYPE (decl);
- machine_mode ret_mode = TYPE_MODE (type);
+   case BUILT_IN_DREML:
+ newname = xstrdup ("__remainderieee128");
+ 

Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > No, the vast majority of people will *not* (consciously) use them,
> > because the target defaults will set things to useful values.
> >
> > The compiler could use saner "generic" defaults perhaps, but those will
> > still not be satisfactory for anyone (except when they aren't generic in
> > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > important for performance.
> Then fix the heuristics, don't add new PARAMS :-)

I just said that cannot work?

> It didn't even occur to me until now that you may be pushing to have the
> ppc backend have different values for the PARAMS.  I would strongly
> discourage that.  It's been a huge headache in the s390 backend already.

It also makes a huge performance difference.  That the generic parts
of GCC are only tuned for x86 (or not well tuned for anything?) is a
huge roadblock for us.

I am not saying we should have six hundred different tunings.  But we
need a few (and we already *have* a few, not params but generic flags,
just like many other targets fwiw).

We *do* have a few custom param settings already, just like aarch64,
ia64, and sh, actually.

> >> In  my mind fixing things so they work with no magic arguments is best. 
> >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> >> in between.  Others may clearly have different opinions here.
> > There is no big difference between params and flags here, IMO -- it has
> > to be a -f with a value as well, for good results.
> Which is a signal that we have a deeper problem.  -f with a value is no
> different than a param.

Yes exactly.

> > Since we have (almost) all such tunings in --param already, I'd say this
> > one belongs there as well?
> I'm not convinced at this point. 

Why not?

We have way many params, yes.  But the first step to counteract that
would be to deprecate and get rid of many existing ones, not to block
having new ones which can be useful (while many of the existing ones are
not).

Or, we could accept that it is not really a problem at all.  You seem to
have a strong opinion that it *is*, but I don't understand that; maybe
you can explain a bit more?

Thanks,


Segher


[PATCH] PowerPC: PR 97791: Fix gnu attributes.

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: PR 97791: Fix gnu attributes.

Note, I originally posted this to an internal IBM mailing list, not to
gcc-patches.  Sorry about that.

This patch does two things to fix setting gnu attribute #4 (long double status)

1) Only set gnu attribute #4 if long double was passed.  Passing __float128
when long double is IBM or __ibm128 when long double is IEEE no longer sets the
attribute.  This resulted in a lot of false positives, such as using __float128
and no long double support.

2) Do not set the gnu attribute if a mode used by long double (TF or DF) is
used in a move.  The moves do not differentiate between the long double type
and similar types.  Delete the three tests that tested this.

I wrote the code for the move several years.  I wanted to flag that an object
that used the appropriate long double type got flagged.  Unfortunately, at the
RTL level, we have lost the type nodes, so we can't tell the difference between
two types that use the same mode (for instance if long double is 64-bit, the
attribute would be set if you used normal doubles, and not long doubles).  Alan
Modra and I discussed this, and we think this is just the right thing to do.

It has been tested on power8 big endian Linux server systems and power9 little
endian Linux server systems, and there were no regressions.

gcc/
2020-11-17  Michael Meissner  

PR gcc/97791
* config/rs6000/rs6000-call.c (init_cumulative_args): Only set
that long double was returned if the type is actually long
double.
(rs6000_function_arg_advance_1): Only set that long double was
passed if the type is actually long double.
* config/rs6000/rs6000.c (rs6000_emit_move): Delete code that sets
whether long double was passed based on the modes used in moves.

gcc/testsuite/
2020-11-17  Michael Meissner  

PR target/97791
* gcc.target/powerpc/gnuattr1.c: Delete.
* gcc.target/powerpc/gnuattr2.c: Delete.
* gcc.target/powerpc/gnuattr3.c: Delete.
---
 gcc/config/rs6000/rs6000-call.c | 13 -
 gcc/config/rs6000/rs6000.c  | 17 -
 gcc/testsuite/gcc.target/powerpc/gnuattr1.c | 15 ---
 gcc/testsuite/gcc.target/powerpc/gnuattr2.c | 17 -
 gcc/testsuite/gcc.target/powerpc/gnuattr3.c | 15 ---
 5 files changed, 4 insertions(+), 73 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr1.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr3.c

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 3bd89a79bad..8294e22fb85 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -6539,11 +6539,8 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
{
  rs6000_passes_float = true;
  if ((HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT)
- && (FLOAT128_IBM_P (return_mode)
- || FLOAT128_IEEE_P (return_mode)
- || (return_type != NULL
- && (TYPE_MAIN_VARIANT (return_type)
- == long_double_type_node
+ && return_type != NULL
+ && TYPE_MAIN_VARIANT (return_type) == long_double_type_node)
rs6000_passes_long_double = true;
 
  /* Note if we passed or return a IEEE 128-bit type.  We changed
@@ -7001,10 +6998,8 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS *cum, 
machine_mode mode,
{
  rs6000_passes_float = true;
  if ((HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT)
- && (FLOAT128_IBM_P (mode)
- || FLOAT128_IEEE_P (mode)
- || (type != NULL
- && TYPE_MAIN_VARIANT (type) == long_double_type_node)))
+ && type != NULL
+ && TYPE_MAIN_VARIANT (type) == long_double_type_node)
rs6000_passes_long_double = true;
 
  /* Note if we passed or return a IEEE 128-bit type.  We changed the
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b6fd21a5d6f..a5188553593 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10076,23 +10076,6 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
mode)
   && GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
 gcc_unreachable ();
 
-#ifdef HAVE_AS_GNU_ATTRIBUTE
-  /* If we use a long double type, set the flags in .gnu_attribute that say
- what the long double type is.  This is to allow the linker's warning
- message for the wrong long double to be useful, even if the function does
- not do a call (for example, doing a 128-bit add on power9 if the long
- double type is IEEE 128-bit.  Do not set this if __ibm128 or __floa128 are
- used if they aren't the default long dobule type.  */
-  if 

Re: [PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread David Malcolm via Gcc-patches
On Thu, 2020-11-19 at 23:41 +0100, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> As mentioned in the PR, the previous PR91029 patch was testing
> op2 >= 0 which is unnecessary, even negative op2 values will work the
> same,
> furthermore, from if a % b > 0 we can deduce a > 0 rather than just a
> >= 0
> (0 % b would be 0), and it actually valid even for other constants
> than 0,
> a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5]
> would
> result in a % b in [0, 5].  Also, we can deduce a range for the other
> operand, if we know
> a % b >= 20, then b must be (in absolute value for signed modulo) >
> 20,
> for a % [0, 20] the result would be [0, 19].
> 
> The following patch implements all of that, bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?
> 
> 2020-11-19  Jakub Jelinek  
> 
>   PR tree-optimization/91029
>   * range-op.cc (operator_trunc_mod::op1_range): Don't require
> signed
>   types, nor require that op2 >= 0.  Implement (a % b) >= x && x
> > 0
>   implies a >= x and (a % b) <= x && x < 0 implies a <= x.
>   (operator_trunc_mod::op2_range): New method.
> 
>   * gcc.dg/tree-ssa/pr91029-1.c: New test.
>   * gcc.dg/tree-ssa/pr91029-2.c: New test.
> 
> --- gcc/range-op.cc.jj2020-11-19 20:09:39.531862131 +0100
> +++ gcc/range-op.cc   2020-11-19 20:44:24.507774154 +0100
> @@ -2637,6 +2637,9 @@ public:
>virtual bool op1_range (irange , tree type,
> const irange ,
> const irange ) const;
> +  virtual bool op2_range (irange , tree type,
> +   const irange ,
> +   const irange ) const;
>  } op_trunc_mod;

Should these various overrides of vfuncs be labeled "OVERRIDE" rather
than "virtual", to use the override specifier?  In fact, given that we
now require C++11, presumably we can spell that as "override" and lose
the macro.

Dave



[PATCH] PowerPC: PR libgcc/97543, fix 64-bit long double issues

2020-11-19 Thread Michael Meissner via Gcc-patches
PowerPC: PR libgcc/97543, fix 64-bit long double issues

I meant to post this to the gcc-patches mailing list last Thursday, but I see I
posted this to an internal IBM mailing list.

This patch replaces the previous iterations of this patch:

October 22nd, 2020:
Message-ID: <2020100510.ga11...@ibm-toto.the-meissners.org>

October 28th, 2020:
Message-ID: <20201029004204.ga15...@ibm-toto.the-meissners.org

If you use a compiler with long double defaulting to 64-bit instead of 128-bit
with IBM extended double, you get linker warnings about mis-matches in the gnu
attributes for long double (PR libgcc/97543).  Even if the compiler is
configured to have long double be 64 bit as the default with the configuration
option '--without-long-double-128' you get the warnings.

You also get the same issues if you use a compiler with long double defaulting
to IEEE 128-bit instead of IBM extended double (PR libgcc/97643).

The issue is the way libgcc.a/libgcc.so is built.  Right now when building
libgcc under Linux, the long double size is set to 128-bits when building
libgcc.  However, the gnu attributes are set, leading to the warnings.

One feature of the current GNU attribute implementation is if you have a shared
library (such as libgcc_s.so), the GNU attributes for the shared library is an
inclusive OR of all of the modules within the library.  This means if any
module uses the -mlong-double-128 option and uses long double, the GNU
attributes for the library will indicate that it uses 128-bit IBM long
doubles.  If you have a static library, you will get the warning only if you
actually reference a module with the attribute set.

This patch does two things:

1)  All of the modules that support IBM 128-bit long doubles explicitly set
the ABI to IBM extended double.

2) I turned off GNU attributes for building the shared library or for
building the IBM 128-bit long double support.

I have discussed this patch with Alan Modra, and made several changes based on
his suggestions.

I have tested this on a little endian power9 system running Linux by building
three separate compilers, using the Advance Toolchain AT14.0 which uses GLIBC
2.32:

1)  A compiler where the long double default is IBM 128-bit double;
2)  A compiler where the long double default is IEEE 128-bit double; (and)
3)  A compiler where the long double default is 64-bit.

Note, for the IEEE build (#2), the other patches that I will be submitting are
needed to enable the full build.

For each of the 3 compilers, I then tested some code with long double's and
verified that each of the long double options worked without generating
warnings.

In addition, I have tested this patch on a big endian power8 system running
Linux, and there were no regressions.

Can I install this patch into the master branch?  Since this is a bug for
64-bit long doubles, I would like to back port it to GCC 10, and GCC 9 after a
shake-in period.

libgcc/
2020-11-17  Michael Meissner  

PR libgcc/97543
PR libgcc/97643
* config/rs6000/t-linux (IBM128_STATIC_OBJS): New make variable.
(IBM128_SHARED_OBJS): New make variable.
(IBM128_OBJS): New make variable.  Set all objects to use the
explicit IBM format, and disable gnu attributes.
(IBM128_CFLAGS): New make variable.
(gcc_s_compile): Add -mno-gnu-attribute to all shared library
modules.
---
 libgcc/config/rs6000/t-linux | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/libgcc/config/rs6000/t-linux b/libgcc/config/rs6000/t-linux
index ed821947b66..72e9c2770a6 100644
--- a/libgcc/config/rs6000/t-linux
+++ b/libgcc/config/rs6000/t-linux
@@ -6,3 +6,25 @@ HOST_LIBGCC2_CFLAGS += -mlong-double-128
 # smaller and faster libgcc code.  Directly specifying -mcmodel=small
 # would need to take into account targets for which -mcmodel is invalid.
 HOST_LIBGCC2_CFLAGS += -mno-minimal-toc
+
+# On the modules that deal with IBM 128-bit values, make sure that TFmode uses
+# the IBM extended double format.  Also turn off gnu attributes on the static
+# modules.
+IBM128_STATIC_OBJS = ibm-ldouble$(objext) _powitf2$(objext) \
+ ppc64-fp$(objext) _divtc3$(object) _multc3$(object) \
+ _fixtfdi$(object) _fixunstfdi$(object) \
+ _floatditf$(objext) _floatunsditf$(objext)
+IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext):_s$(objext))
+IBM128_OBJS= $(IBM128_STATIC_OBJS) $(IBM128_SHARED_OBJS)
+
+IBM128_CFLAGS  = -Wno-psabi -mabi=ibmlongdouble -mno-gnu-attribute
+
+$(IBM128_OBJS) : INTERNAL_CFLAGS += $(IBM128_CFLAGS)
+
+# Turn off gnu attributes for long double size on all of the shared library
+# modules, but leave it on for the static modules, except for the functions
+# that explicitly process IBM 128-bit floating point.  Shared libraries only
+# have one gnu attribute for the whole library, and it can lead to warnings if
+# 

Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Eric Botcazou
> Successfully bootstrapped/regtested on x86_64-linux and i686-linux,
> including make install which looked problematic in PR97911.
> 
> Ok for trunk?

I cannot really approve, but this looks like a step in the right direction.

-- 
Eric Botcazou




Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Segher Boessenkool
On Wed, Nov 04, 2020 at 08:44:03AM -0800, Carl Love wrote:
> +#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
> +#define vec_div(a, b) __builtin_vec_div (a, b)
> +#define vec_dive(a, b) __builtin_vec_dive (a, b)
> +#define vec_mod(a, b) __builtin_vec_mod (a, b)

This should be

#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))

etc...  I see we have quite a few cases in altivec.h already that do not
get that right.  Something to fix, and apparently not too important in
practice ;-)

>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])

Hrm, you move this one to vsx.md, but leave VIshort here (instead of
moving that to altivec.md).  Oh well, something needs to be done about
this split anyway.

> +BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
> +BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
> +BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
> +BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
> +BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
> +BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
> +BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
> +BU_P10V_AV_2 (VMODU_V4SI, "vmoduw", CONST, vmodu_v4si)
> +BU_P10V_AV_2 (VMULHS_V2DI, "vmulhsd", CONST, vmulhs_v2di)
> +BU_P10V_AV_2 (VMULHS_V4SI, "vmulhsw", CONST, vmulhs_v4si)
> +BU_P10V_AV_2 (VMULHU_V2DI, "vmulhud", CONST, vmulhu_v2di)
> +BU_P10V_AV_2 (VMULHU_V4SI, "vmulhuw", CONST, vmulhu_v4si)
> +BU_P10V_AV_2 (VMULLD_V2DI, "vmulld", CONST, mulv2di3)

So I would remove the leading "v" from all these pattern names, since
all of them have a mode in the name already.

> +(define_mode_attr VIlong_char [(V2DI "d")
> +(V4SI "w")])

This is just a subset of  -- use that, instead?

; A generic w/d attribute, for things like cmpw/cmpd.
(define_mode_attr wd [(QI"b")
  (HI"h")
  (SI"w")
  (DI"d")
  (V16QI "b")
  (V8HI  "h")
  (V4SI  "w")
  (V2DI  "d")
  (V1TI  "q")
  (TI"q")])

(never mind the name, heh -- it still is nice and short ;-) )

> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

The scalar mulh we can describe without unspecs, cannot that be done
here as well?

The type attr is problematic...  At least make it the same as the other
vector int multiplies?  That is veccomplex?

> +Vector Integer Multiply-Divide-Modulo

Use "/" instead of "-" here?  "-" normally is used for things like
"multiply-sum", not to mean "or".

> +For each integer value i from 0 to 3, do the following. The integer value in
> +word element i of a is multiplied by the integer value in word
> +element i of b. The high-order 32 bits of the 64-bit product are placed into
> +word element i of the vector returned.

I think you should quote the "i"?  @code{i} or similar.  I don't think
you need to mark up the digits, phew :-)

Please repost with those things fixed?  Thanks!


Segher


Re: [PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread Andrew MacLeod via Gcc-patches

On 11/19/20 5:41 PM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, the previous PR91029 patch was testing
op2 >= 0 which is unnecessary, even negative op2 values will work the same,
furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
(0 % b would be 0), and it actually valid even for other constants than 0,
a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
result in a % b in [0, 5].  Also, we can deduce a range for the other
operand, if we know
a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
for a % [0, 20] the result would be [0, 19].

The following patch implements all of that, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?



OK.

I was having a hard time keeping it all straight! the op1_range and 
op2_range calculations can be real head spinners sometimes.


Andrew




[PATCH] Process only valid shift ranges.

2020-11-19 Thread Andrew MacLeod via Gcc-patches
When shifting outside the valid range of [0, precision-1], we can choose 
to process just the valid ones since the rest is undefined.


This allows us to produce results for x << [0,2][+INF, +INF] by 
discarding  the invalid ranges and processing just [0,2].


THis is particularly important when using a value that is limited by a 
branch, as demonstrated in the testcase.


As Jakub suggested in the PR, we can mask the shift value with the full 
range of valid shift values, and use the result of that.

If that is undefined, then we fall back to our old undefined behaviour.

Bootstrapped on x86_64-pc-linux-gnu, no regressions.  Pushed.

Andrew






commit d0d8b5d83614d8f0d0e40c0520d4f40ffa01f8d9
Author: Andrew MacLeod 
Date:   Thu Nov 19 17:41:30 2020 -0500

Process only valid shift ranges.

When shifting outside the valid range of [0, precision-1], we can
choose to process just the valid ones since the rest is undefined.
this allows us to produce results for x << [0,2][+INF, +INF] by discarding
the invalid ranges and processing just [0,2].

gcc/
PR tree-optimization/93781
* range-op.cc (get_shift_range): Rename from
undefined_shift_range_check and now return valid shift ranges.
(operator_lshift::fold_range): Use result from get_shift_range.
(operator_rshift::fold_range): Ditto.
gcc/testsuite/
* gcc.dg/tree-ssa/pr93781-1.c: New.
* gcc.dg/tree-ssa/pr93781-2.c: New.
* gcc.dg/tree-ssa/pr93781-3.c: New.

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6be60073d19..5bf37e1ad82 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -80,30 +80,25 @@ empty_range_varying (irange , tree type,
 return false;
 }
 
-// Return TRUE if shifting by OP is undefined behavior, and set R to
-// the appropriate range.
+// Return false if shifting by OP is undefined behavior.  Otherwise, return
+// true and the range it is to be shifted by.  This allows trimming out of
+// undefined ranges, leaving only valid ranges if there are any.
 
 static inline bool
-undefined_shift_range_check (irange , tree type, const irange )
+get_shift_range (irange , tree type, const irange )
 {
   if (op.undefined_p ())
-{
-  r.set_undefined ();
-  return true;
-}
+return false;
 
-  // Shifting by any values outside [0..prec-1], gets undefined
-  // behavior from the shift operation.  We cannot even trust
-  // SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl
-  // shifts, and the operation at the tree level may be widened.
-  if (wi::lt_p (op.lower_bound (), 0, TYPE_SIGN (op.type ()))
-  || wi::ge_p (op.upper_bound (),
-  TYPE_PRECISION (type), TYPE_SIGN (op.type (
-{
-  r.set_varying (type);
-  return true;
-}
-  return false;
+  // Build valid range and intersect it with the shift range.
+  r = value_range (build_int_cst_type (op.type (), 0),
+  build_int_cst_type (op.type (), TYPE_PRECISION (type) - 1));
+  r.intersect (op);
+
+  // If there are no valid ranges in the shift range, returned false.
+  if (r.undefined_p ())
+return false;
+  return true;
 }
 
 // Return TRUE if 0 is within [WMIN, WMAX].
@@ -1465,13 +1460,20 @@ operator_lshift::fold_range (irange , tree type,
 const irange ,
 const irange ) const
 {
-  if (undefined_shift_range_check (r, type, op2))
-return true;
+  int_range_max shift_range;
+  if (!get_shift_range (shift_range, type, op2))
+{
+  if (op2.undefined_p ())
+   r.set_undefined ();
+  else
+   r.set_varying (type);
+  return true;
+}
 
   // Transform left shifts by constants into multiplies.
-  if (op2.singleton_p ())
+  if (shift_range.singleton_p ())
 {
-  unsigned shift = op2.lower_bound ().to_uhwi ();
+  unsigned shift = shift_range.lower_bound ().to_uhwi ();
   wide_int tmp = wi::set_bit_in_zero (shift, TYPE_PRECISION (type));
   int_range<1> mult (type, tmp, tmp);
 
@@ -1487,7 +1489,7 @@ operator_lshift::fold_range (irange , tree type,
 }
   else
 // Otherwise, invoke the generic fold routine.
-return range_operator::fold_range (r, type, op1, op2);
+return range_operator::fold_range (r, type, op1, shift_range);
 }
 
 void
@@ -1709,11 +1711,17 @@ operator_rshift::fold_range (irange , tree type,
 const irange ,
 const irange ) const
 {
-  // Invoke the generic fold routine if not undefined..
-  if (undefined_shift_range_check (r, type, op2))
-return true;
+  int_range_max shift;
+  if (!get_shift_range (shift, type, op2))
+{
+  if (op2.undefined_p ())
+   r.set_undefined ();
+  else
+   r.set_varying (type);
+  return true;
+}
 
-  return range_operator::fold_range (r, type, op1, op2);
+  return range_operator::fold_range (r, type, op1, shift);
 }
 
 void

[PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, the previous PR91029 patch was testing
op2 >= 0 which is unnecessary, even negative op2 values will work the same,
furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
(0 % b would be 0), and it actually valid even for other constants than 0,
a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
result in a % b in [0, 5].  Also, we can deduce a range for the other
operand, if we know
a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
for a % [0, 20] the result would be [0, 19].

The following patch implements all of that, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2020-11-19  Jakub Jelinek  

PR tree-optimization/91029
* range-op.cc (operator_trunc_mod::op1_range): Don't require signed
types, nor require that op2 >= 0.  Implement (a % b) >= x && x > 0
implies a >= x and (a % b) <= x && x < 0 implies a <= x.
(operator_trunc_mod::op2_range): New method.

* gcc.dg/tree-ssa/pr91029-1.c: New test.
* gcc.dg/tree-ssa/pr91029-2.c: New test.

--- gcc/range-op.cc.jj  2020-11-19 20:09:39.531862131 +0100
+++ gcc/range-op.cc 2020-11-19 20:44:24.507774154 +0100
@@ -2637,6 +2637,9 @@ public:
   virtual bool op1_range (irange , tree type,
  const irange ,
  const irange ) const;
+  virtual bool op2_range (irange , tree type,
+ const irange ,
+ const irange ) const;
 } op_trunc_mod;
 
 void
@@ -2686,24 +2689,58 @@ operator_trunc_mod::wi_fold (irange ,
 bool
 operator_trunc_mod::op1_range (irange , tree type,
   const irange ,
-  const irange ) const
+  const irange &) const
 {
-  // PR 91029.  Check for signed truncation with op2 >= 0.
-  if (TYPE_SIGN (type) == SIGNED && wi::ge_p (op2.lower_bound (), 0, SIGNED))
+  // PR 91029.
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  // (a % b) >= x && x > 0 , then a >= x.
+  if (wi::gt_p (lhs.lower_bound (), 0, sign))
+{
+  r = value_range (type, lhs.lower_bound (), wi::max_value (prec, sign));
+  return true;
+}
+  // (a % b) <= x && x < 0 , then a <= x.
+  if (wi::lt_p (lhs.upper_bound (), 0, sign))
+{
+  r = value_range (type, wi::min_value (prec, sign), lhs.upper_bound ());
+  return true;
+}
+  return false;
+}
+
+bool
+operator_trunc_mod::op2_range (irange , tree type,
+  const irange ,
+  const irange &) const
+{
+  // PR 91029.
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  // (a % b) >= x && x > 0 , then b is in ~[-x, x] for signed
+  //  or b > x for unsigned.
+  if (wi::gt_p (lhs.lower_bound (), 0, sign))
+{
+  if (sign == SIGNED)
+   r = value_range (type, wi::neg (lhs.lower_bound ()),
+lhs.lower_bound (), VR_ANTI_RANGE);
+  else if (wi::lt_p (lhs.lower_bound (), wi::max_value (prec, sign),
+sign))
+   r = value_range (type, lhs.lower_bound () + 1,
+wi::max_value (prec, sign));
+  else
+   return false;
+  return true;
+}
+  // (a % b) <= x && x < 0 , then b is in ~[x, -x].
+  if (wi::lt_p (lhs.upper_bound (), 0, sign))
 {
-  unsigned prec = TYPE_PRECISION (type);
-  // if a % b > 0 , then a >= 0.
-  if (wi::gt_p (lhs.lower_bound (), 0, SIGNED))
-   {
- r = value_range (type, wi::zero (prec), wi::max_value (prec, SIGNED));
- return true;
-   }
-  // if a % b < 0 , then a <= 0.
-  if (wi::lt_p (lhs.upper_bound (), 0, SIGNED))
-   {
- r = value_range (type, wi::min_value (prec, SIGNED), wi::zero (prec));
- return true;
-   }
+  if (wi::gt_p (lhs.upper_bound (), wi::min_value (prec, sign), sign))
+   r = value_range (type, lhs.upper_bound (),
+wi::neg (lhs.upper_bound ()), VR_ANTI_RANGE);
+  else
+   return false;
+  return true;
 }
   return false;
 }
--- gcc/testsuite/gcc.dg/tree-ssa/pr91029-1.c.jj2020-11-19 
20:19:50.400414120 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr91029-1.c   2020-11-19 20:19:50.400414120 
+0100
@@ -0,0 +1,68 @@
+/* PR tree-optimization/91029 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+void kill (void);
+int xx;
+
+void f1 (int i, int j)
+{
+  if ((i % j) == 3)
+{
+  xx = (i < 3);
+  if (xx)
+kill ();
+}
+}
+
+void f2 (int i, int j)
+{
+  if ((i % j) > 0)
+{
+  xx = (i <= 0);
+  if (xx)
+kill ();
+}
+}
+
+void f3 (int i, int j)
+{
+  if ((i % j) == -3)
+{
+  xx = (i > -3);
+  if (xx)
+kill ();
+}
+}
+
+void f4 (int i, int j)
+{
+  if ((i % j) < 0)
+{
+  xx = (i >= 0);
+  if 

Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 03:50:27PM +0100, Jakub Jelinek via Gcc-patches wrote:
> So, I think the problem is that for make .PHONY targets are just
> "rebuilt" always, so it is very much undesirable for the cc1plus$(exeext)
> etc. dependencies to include .PHONY targets, but I was using
> them - cc1plus.prev which would depend on some *.serial and
> e.g. cc1.serial depending on c and c depending on cc1$(exeext).
> 
> The following so far only very lightly tested patch rewrites this
> so that *.serial and *.prev aren't .PHONY targets, but instead just
> make variables.
> 
> I was worried that the order in which the language makefile fragments are
> included (which is quite random, what order we get from the filesystem
> matching */config-lang.in) would be a problem but it seems to work fine.

Successfully bootstrapped/regtested on x86_64-linux and i686-linux,
including make install which looked problematic in PR97911.

Ok for trunk?

> 2020-11-19  Jakub Jelinek  
> 
> gcc/
>   * configure.ac: In SERIAL_LIST use lang words without .serial
>   suffix.  Change $lang.prev from a target to variable and instead
>   of depending on *.serial expand to the *.serial variable if
>   the word is in the SERIAL_LIST at all, otherwise to nothing.
>   * configure: Regenerated.
> gcc/c/
>   * Make-lang.in (c.serial): Change from goal to a variable.
>   (.PHONY): Drop c.serial.
> gcc/ada/
>   * gcc-interface/Make-lang.in (ada.serial): Change from goal to a
>   variable.
>   (.PHONY): Drop ada.serial and ada.prev.
>   (gnat1$(exeext)): Depend on $(ada.serial) rather than ada.serial.
> gcc/brig/
>   * Make-lang.in (brig.serial): Change from goal to a variable.
>   (.PHONY): Drop brig.serial and brig.prev.
>   (brig1$(exeext)): Depend on $(brig.serial) rather than brig.serial.
> gcc/cp/
>   * Make-lang.in (c++.serial): Change from goal to a variable.
>   (.PHONY): Drop c++.serial and c++.prev.
>   (cc1plus$(exeext)): Depend on $(c++.serial) rather than c++.serial.
> gcc/d/
>   * Make-lang.in (d.serial): Change from goal to a variable.
>   (.PHONY): Drop d.serial and d.prev.
>   (d21$(exeext)): Depend on $(d.serial) rather than d.serial.
> gcc/fortran/
>   * Make-lang.in (fortran.serial): Change from goal to a variable.
>   (.PHONY): Drop fortran.serial and fortran.prev.
>   (f951$(exeext)): Depend on $(fortran.serial) rather than
>   fortran.serial.
> gcc/go/
>   * Make-lang.in (go.serial): Change from goal to a variable.
>   (.PHONY): Drop go.serial and go.prev.
>   (go1$(exeext)): Depend on $(go.serial) rather than go.serial.
> gcc/jit/
>   * Make-lang.in (jit.serial): Change from goal to a
>   variable.
>   (.PHONY): Drop jit.serial and jit.prev.
>   ($(LIBGCCJIT_FILENAME)): Depend on $(jit.serial) rather than
>   jit.serial.
> gcc/lto/
>   * Make-lang.in (lto1.serial, lto2.serial): Change from goals to
>   variables.
>   (.PHONY): Drop lto1.serial, lto2.serial, lto1.prev and lto2.prev.
>   ($(LTO_EXE)): Depend on $(lto1.serial) rather than lto1.serial.
>   ($(LTO_DUMP_EXE)): Depend on $(lto2.serial) rather than lto2.serial.
> gcc/objc/
>   * Make-lang.in (objc.serial): Change from goal to a variable.
>   (.PHONY): Drop objc.serial and objc.prev.
>   (cc1obj$(exeext)): Depend on $(objc.serial) rather than objc.serial.
> gcc/objcp/
>   * Make-lang.in (obj-c++.serial): Change from goal to a variable.
>   (.PHONY): Drop obj-c++.serial and obj-c++.prev.
>   (cc1objplus$(exeext)): Depend on $(obj-c++.serial) rather than
>   obj-c++.serial.

Jakub



Re: [PATCH] c++, v2: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 05:30:06PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

Successfully bootstrapped/regtested on both x86_64-linux and i686-linux now.

Jakub



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 1:01 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
>> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
>>> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
 On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> guojiufu  writes:
>> When unroll loops, if there are calls inside the loop, those calls
>> may raise negative impacts for unrolling.  This patch adds a param
>> param_max_unrolled_calls, and checks if the number of calls inside
>> the loop bigger than this param, loop is prevent from unrolling.
>>
>> This patch is checking the _average_ number of calls which is the
>> summary of call numbers multiply the possibility of the call maybe
>> executed.  The _average_ number could be a fraction, to keep the
>> precision, the param is the threshold number multiply 1.
>>
>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>
>> gcc/ChangeLog
>> 2020-08-19  Jiufu Guo   
>>
>>  * params.opt (param_max_unrolled_average_calls_x1): New param.
>>  * cfgloop.h (average_num_loop_calls): New declare.
>>  * cfgloopanal.c (average_num_loop_calls): New function.
>>  * loop-unroll.c (decide_unroll_constant_iteration,
>>  decide_unroll_runtime_iterations,
>>  decide_unroll_stupid): Check average_num_loop_calls and
>>  param_max_unrolled_average_calls_x1.
 So what's the motivation behind adding a PARAM to control this
 behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
 tune behavior (though I've made the same lapse in judgment myself).  In
 my mind a PARAM is really more about controlling pathological behavior.
>>> But we (Power) need very different tuning than what others apparently
>>> need.  It is similar to inlining, in that that also differs a lot
>>> between archs how aggressively to do that optimally.
>> But what I think that argues is that we've got a gap in the costing
>> model and/or how its being used.  Throwing PARAMS at the problem isn't
>> really useful for the end user.  The vast majority aren't going to use
>> them and of the ones that do, most are probably going to get it wrong.
> No, the vast majority of people will *not* (consciously) use them,
> because the target defaults will set things to useful values.
>
> The compiler could use saner "generic" defaults perhaps, but those will
> still not be satisfactory for anyone (except when they aren't generic in
> fact but instead tuned for one arch ;-) ) -- unrolling is just too
> important for performance.
Then fix the heuristics, don't add new PARAMS :-)

It didn't even occur to me until now that you may be pushing to have the
ppc backend have different values for the PARAMS.  I would strongly
discourage that.  It's been a huge headache in the s390 backend already.

>
>> In  my mind fixing things so they work with no magic arguments is best. 
>> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
>> in between.  Others may clearly have different opinions here.
> There is no big difference between params and flags here, IMO -- it has
> to be a -f with a value as well, for good results.
Which is a signal that we have a deeper problem.  -f with a value is no
different than a param.

>
> Since we have (almost) all such tunings in --param already, I'd say this
> one belongs there as well?
I'm not convinced at this point. 

jeff



RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-19 Thread Eugene Rozenfeld via Gcc-patches
Thank you for installing my patch Jeff!

Yes, I intend to contribute regularly. I'm working on getting copyright 
assignment/disclaimer paperwork approved by my employer. I'll apply for commit 
privs after that.

Eugene

-Original Message-
From: Jeff Law  
Sent: Wednesday, November 18, 2020 11:33 AM
To: Richard Biener ; Eugene Rozenfeld 

Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with 
three xors.



On 11/17/20 12:57 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 17, 2020 at 3:19 AM Eugene Rozenfeld 
>  wrote:
>> Thank you for the review Richard!
>>
>> I re-worked the patch based on your suggestions (attached).
>> I made the change to reuse the first bit_xor in both patterns and I added :s 
>> to the last xor in the first pattern.
>> For the second pattern I didn't add :s because I think the simplification is 
>> beneficial even if the second or third bit_xor has more than one use since 
>> we are simplifying them to just a single operand (@2). If that is incorrect, 
>> please explain why.
> Ah, true, that's correct.
>
> The patch is OK.
I've installed this on the trunk.

Eugene, if you're going to contribute regularly you should probably go ahead 
and get commit privs so that you can commit ACK's patches yourself.   There 
should be a link to a form from this page:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgitwrite.htmldata=04%7C01%7Ceugene.rozenfeld%40microsoft.com%7Ca31f5335968749cdfe1708d88bf8bfb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637413247775369684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=BfQGAtU%2F8IJ8%2BRcvuNI8qKWgnC9oPFtWQhXo1RzlBTU%3Dreserved=0


Jeff



Re: [PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 12/11/20 17:34 +, Jonathan Wakely wrote:

On 11/11/20 19:08 +0100, Jakub Jelinek via Libstdc++ wrote:

On Wed, Nov 11, 2020 at 05:24:42PM +, Jonathan Wakely wrote:

--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -684,7 +684,14 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
static inline __gthread_t
__gthread_self (void)
{
+#if __GLIBC_PREREQ(2, 27)


What if it is a non-glibc system where __GLIBC_PREREQ macro isn't defined?
I think you'd get then
error: missing binary operator before token "("
So I think you want
#if defined __GLIBC__ && defined __GLIBC_PREREQ
#if __GLIBC_PREREQ(2, 27)
return pthread_self ();
#else
return __gthrw_(pthread_self) ();
#else
return __gthrw_(pthread_self) ();
#endif
or similar.



Here's a fixed version of the patch.

I've moved the glibc-specific code in this_thread::get_id() into a new
macro defined in config/os/gnu-linux/os_defines.h (where we already
know we are dealing with glibc). That means we don't do the
__GLIBC_PREREQ check directly in , it's hidden away in a
target-specific header.

Tested powerpc64le-linux (glibc 2.17 and 2.32), sparc-solaris2.11 and
powerpc-aix.


I've committed this version which only fixes this_thread::get_id() in
libstdc++, and doesn't change __gthread_self in gthr-posix.h

Due to a recent change to replace other uses of __gthread_self with
calls to this_thread::get_id(), fixing it there fixes all uses in
libstdc++.

Tested x86_64-linux, powerpc-aix, sparc-solaris2.11, committed to
trunk.


commit 08b4d325711d5c6f68ac29443aba3fd7aa173ac8
Author: Jonathan Wakely 
Date:   Thu Nov 19 21:07:06 2020

libstdc++: Avoid calling undefined __gthread_self weak symbol [PR 95989]

Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be
complicated by the fact that prior to glibc 2.27 libc.a didn't have the
pthread_self symbol, but libc.so.6 did.  The configure checks would need
to try to link both statically and dynamically, and the result would
depend on whether the static libc.a happens to be installed during
configure (which could vary between different systems using the same
version of glibc). Doing it properly is left for a future date, as that
will be needed anyway after glibc moves all pthread symbols from
libpthread to libc. When that happens we should revisit the whole
approach of using weak symbols for pthread symbols.

For the purposes of std::this_thread::get_id() we call
pthread_self() directly when using glibc 2.27 or later. Otherwise, if
__gthread_active_p() is true then we know the libpthread symbol is
available so we call that. Otherwise, we are single-threaded and just
use ((__gthread_t)1) as the thread ID.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

An earlier version of this patch also changed __gthread_self() to use
__GLIBC_PREREQ(2, 27) and only use the weak symbol for older glibc. Tha
might still make sense to do, but isn't needed by libstdc++ now.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* config/os/gnu-linux/os_defines.h (_GLIBCXX_NATIVE_THREAD_ID):
Define new macro to get reliable thread ID.
* include/bits/std_thread.h: (this_thread::get_id): Use new
macro if it's defined.
* testsuite/30_threads/jthread/95989.cc: New test.
* testsuite/30_threads/this_thread/95989.cc: New test.

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index f821486ec8f5..01bfa9ddd4f2 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -49,4 +49,16 @@
 // version dynamically in case it has changed since libstdc++ was configured.
 #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23)
 
+#if 

c++: Template hash access

2020-11-19 Thread Nathan Sidwell


This exposes the template specialization table, so the modules
machinery may access it.  The hashed entity (tmpl, args & spec) is
available, along with a hash table walker.  We also need a way of
finding or inserting entries, along with some bookkeeping fns to deal
with the instantiation and (partial) specialization lists.

This is slightly modified from the earlier posting -- one of the 
functions, used for checking, isn't needed as 
match_mergeable_specialization is modified to allow that use.


gcc/cp/
* cp-tree.h (struct spec_entry): Moved from pt.c.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): Declare.
* pt.c (struct spec_entry): Moved to cp-tree.h.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): New.

pushing to trunk
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 0c4b74a8895..021de76e142 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -5403,6 +5403,14 @@ public:
   hash_map *saved;
 };
 
+/* Entry in the specialization hash table.  */
+struct GTY((for_user)) spec_entry
+{
+  tree tmpl;  /* The general template this is a specialization of.  */
+  tree args;  /* The args for this (maybe-partial) specialization.  */
+  tree spec;  /* The specialization itself.  */
+};
+
 /* in class.c */
 
 extern int current_class_depth;
@@ -6994,6 +7002,15 @@ extern bool copy_guide_p			(const_tree);
 extern bool template_guide_p			(const_tree);
 extern bool builtin_guide_p			(const_tree);
 extern void store_explicit_specifier		(tree, tree);
+extern void walk_specializations		(bool,
+		 void (*)(bool, spec_entry *,
+			  void *),
+		 void *);
+extern tree match_mergeable_specialization	(bool is_decl, tree tmpl,
+		 tree args, tree spec);
+extern unsigned get_mergeable_specialization_flags (tree tmpl, tree spec);
+extern void add_mergeable_specialization(tree tmpl, tree args,
+		 tree spec, unsigned);
 extern tree add_outermost_template_args		(tree, tree);
 extern tree add_extra_args			(tree, tree);
 extern tree build_extra_args			(tree, tree, tsubst_flags_t);
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index a1b6631d691..463b1c3a57d 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -103,13 +103,6 @@ local_specialization_stack::~local_specialization_stack ()
 /* True if we've recursed into fn_type_unification too many times.  */
 static bool excessive_deduction_depth;
 
-struct GTY((for_user)) spec_entry
-{
-  tree tmpl;
-  tree args;
-  tree spec;
-};
-
 struct spec_hasher : ggc_ptr_hash
 {
   static hashval_t hash (spec_entry *);
@@ -29625,6 +29618,101 @@ declare_integer_pack (void)
 			  CP_BUILT_IN_INTEGER_PACK);
 }
 
+/* Walk the decl or type specialization table calling FN on each
+   entry.  */
+
+void
+walk_specializations (bool decls_p,
+		  void (*fn) (bool decls_p, spec_entry *entry, void *data),
+		  void *data)
+{
+  spec_hash_table *table = decls_p ? decl_specializations
+: type_specializations;
+  spec_hash_table::iterator end (table->end ());
+  for (spec_hash_table::iterator iter (table->begin ()); iter != end; ++iter)
+fn (decls_p, *iter, data);
+}
+
+/* Lookup the specialization of TMPL, ARGS in the decl or type
+   specialization table.  Return what's there, or if SPEC is non-null,
+   add it and return NULL.  */
+
+tree
+match_mergeable_specialization (bool decl_p, tree tmpl, tree args, tree spec)
+{
+  spec_entry elt = {tmpl, args, spec};
+  hash_table *specializations
+= decl_p ? decl_specializations : type_specializations;
+  hashval_t hash = spec_hasher::hash ();
+  spec_entry **slot
+= specializations->find_slot_with_hash (, hash,
+	spec ? INSERT : NO_INSERT);
+  spec_entry *entry = slot ? *slot: NULL;
+  
+  if (entry)
+return entry->spec;
+
+  if (spec)
+{
+  entry = ggc_alloc ();
+  *entry = elt;
+  *slot = entry;
+}
+
+  return NULL_TREE;
+}
+
+/* Return flags encoding whether SPEC is on the instantiation and/or
+   specialization lists of TMPL.  */
+
+unsigned
+get_mergeable_specialization_flags (tree tmpl, tree decl)
+{
+  unsigned flags = 0;
+
+  for (tree inst = DECL_TEMPLATE_INSTANTIATIONS (tmpl);
+   inst; inst = TREE_CHAIN (inst))
+if (TREE_VALUE (inst) == decl)
+  {
+	flags |= 1;
+	break;
+  }
+
+  if (CLASS_TYPE_P (TREE_TYPE (decl))
+  && CLASSTYPE_TEMPLATE_INFO (TREE_TYPE (decl))
+  && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+/* Only need to search if DECL is a partial specialization.  */
+for (tree part = DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
+	 part; part = TREE_CHAIN (part))
+  if (TREE_VALUE (part) == decl)
+	{
+	  flags |= 2;
+	  break;
+	}
+
+  return flags;
+}
+
+/* Add a new specialization of TMPL.  FLAGS is as returned from
+   get_mergeable_specialization_flags.  */
+
+void

Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Joseph Myers
On Thu, 19 Nov 2020, Uecker, Martin wrote:

> Apparently I did not have enough coffee when
> generalizing this to the other qualifiers. 
> 
> Ok, with the following test?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 11:25:08AM -0600, Pat Haugen wrote:
> > +(define_insn "vmodu_"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> > +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmodu %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Since the vdiv.../vmod... instructions execute in the fixed point divide unit,

... on some implementations.  The only one currently, sure, but...

> all the above instructions should have a type of "div" instead of "vecsimple".

... it should use "vecdiv" instead (which already exists).  And set
"size" to a proper value as well, so that the scheduling models can see
the difference with e.g. xsdivqp (which should perhaps not use vecdiv at
all itself, it is a scalar div, but we do not currently have good types
for that).

> > +;; Vector multiply low double word
> > +(define_insn "mulv2di3"
> > +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> > +   (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> > +  (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmulld %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Similarly, the above 3 insns should have a "mul" instruction type.

The existing AltiVec vmul* are type "veccomplex", because that was the
execution pipe used on original AltiVec...  This needs to be adapted as
well.  Not sure what is best.


Segher


c++: Expose constexpr hash table

2020-11-19 Thread Nathan Sidwell
Again, I noticed some cleanups on the way to preparing this exposure of 
the constexpr hash table.  Committing this to trunk


This patch exposes the constexpr hash table so that the modules
machinery can save and load constexpr bodies.  While there I noticed
that we could do a little constification of the hasher and comparator
functions.  Also combine the saving machinery to a single function
returning void -- nothing ever looked at its return value.

gcc/cp/
* cp-tree.h (struct constexpr_fundef): Moved from constexpr.c.
(maybe_save_constexpr_fundef): Declare.
(register_constexpr_fundef): Take constexpr_fundef object, return
void.
* decl.c (mabe_save_function_definition): Delete, functionality
moved to maybe_save_constexpr_fundef.
(emit_coro_helper, finish_function): Adjust.
* constexpr.c (struct constexpr_fundef): Moved to cp-tree.h.
(constexpr_fundef_hasher::equal): Constify.
(constexpr_fundef_hasher::hash): Constify.
(retrieve_constexpr_fundef): Make non-static.
(maybe_save_constexpr_fundef): Break out checking and duplication
from ...
(register_constexpr_fundef): ... here.  Just register the 
constexpr.



--
Nathan Sidwell
diff --git i/gcc/cp/constexpr.c w/gcc/cp/constexpr.c
index e6ab5eecd68..625410327b8 100644
--- i/gcc/cp/constexpr.c
+++ w/gcc/cp/constexpr.c
@@ -133,19 +133,10 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
-/* Representation of entries in the constexpr function definition table.  */
-
-struct GTY((for_user)) constexpr_fundef {
-  tree decl;
-  tree body;
-  tree parms;
-  tree result;
-};
-
 struct constexpr_fundef_hasher : ggc_ptr_hash
 {
-  static hashval_t hash (constexpr_fundef *);
-  static bool equal (constexpr_fundef *, constexpr_fundef *);
+  static hashval_t hash (const constexpr_fundef *);
+  static bool equal (const constexpr_fundef *, const constexpr_fundef *);
 };
 
 /* This table holds all constexpr function definitions seen in
@@ -158,7 +149,8 @@ static GTY (()) hash_table *constexpr_fundef_table;
same constexpr function.  */
 
 inline bool
-constexpr_fundef_hasher::equal (constexpr_fundef *lhs, constexpr_fundef *rhs)
+constexpr_fundef_hasher::equal (const constexpr_fundef *lhs,
+const constexpr_fundef *rhs)
 {
   return lhs->decl == rhs->decl;
 }
@@ -167,20 +159,20 @@ constexpr_fundef_hasher::equal (constexpr_fundef *lhs, constexpr_fundef *rhs)
Return a hash value for the entry pointed to by Q.  */
 
 inline hashval_t
-constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
+constexpr_fundef_hasher::hash (const constexpr_fundef *fundef)
 {
   return DECL_UID (fundef->decl);
 }
 
 /* Return a previously saved definition of function FUN.   */
 
-static constexpr_fundef *
+constexpr_fundef *
 retrieve_constexpr_fundef (tree fun)
 {
   if (constexpr_fundef_table == NULL)
 return NULL;
 
-  constexpr_fundef fundef = { fun, NULL, NULL, NULL };
+  constexpr_fundef fundef = { fun, NULL_TREE, NULL_TREE, NULL_TREE };
   return constexpr_fundef_table->find ();
 }
 
@@ -669,7 +661,7 @@ get_function_named_in_call (tree t)
   return fun;
 }
 
-/* Subroutine of register_constexpr_fundef.  BODY is the body of a function
+/* Subroutine of check_constexpr_fundef.  BODY is the body of a function
declared to be constexpr, or a sub-statement thereof.  Returns the
return value if suitable, error_mark_node for a statement not allowed in
a constexpr function, or NULL_TREE if no return value was found.  */
@@ -738,7 +730,7 @@ constexpr_fn_retval (tree body)
 }
 }
 
-/* Subroutine of register_constexpr_fundef.  BODY is the DECL_SAVED_TREE of
+/* Subroutine of check_constexpr_fundef.  BODY is the DECL_SAVED_TREE of
FUN; do the necessary transformations to turn it into a single expression
that we can store in the hash table.  */
 
@@ -868,27 +860,28 @@ cx_check_missing_mem_inits (tree ctype, tree body, bool complain)
 }
 
 /* We are processing the definition of the constexpr function FUN.
-   Check that its BODY fulfills the propriate requirements and
-   enter it in the constexpr function definition table.
-   For constructor BODY is actually the TREE_LIST of the
-   member-initializer list.  */
+   Check that its body fulfills the apropriate requirements and
+   enter it in the constexpr function definition table.  */
 
-tree
-register_constexpr_fundef (tree fun, tree body)
+void
+maybe_save_constexpr_fundef (tree fun)
 {
-  constexpr_fundef entry;
-  constexpr_fundef **slot;
+  if (processing_template_decl
+  || !DECL_DECLARED_CONSTEXPR_P (fun)
+  || cp_function_chain->invalid_constexpr
+  || DECL_CLONED_FUNCTION_P (fun))
+return;
 
   if (!is_valid_constexpr_fn (fun, !DECL_GENERATED_P (fun)))
-return NULL;
+return;
 
-  tree massaged = massage_constexpr_body (fun, body);
+  tree massaged = massage_constexpr_body (fun, DECL_SAVED_TREE (fun));
   if (massaged == NULL_TREE || 

Re: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread Peter Bergner via Gcc-patches
On 11/19/20 12:58 PM, acsaw...@linux.ibm.com wrote:
> +(define_expand "mma_disassemble_pair"
> +  [(match_operand:V16QI 0 "mma_disassemble_output_operand")
> +   (match_operand:OO 1 "input_operand")
> +   (match_operand 2 "const_0_to_1_operand")]

Maybe we should use vsx_register_operand instead of input_operand here?



> +(define_insn_and_split "*mma_disassemble_pair"
> +  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
> +   (unspec:V16QI [(match_operand:OO 1 "input_operand" "wa")
> +  (match_operand 2 "const_0_to_1_operand")]
> +   UNSPEC_MMA_EXTRACT))]

Likewise?



> +  "TARGET_MMA
> +   && fpr_reg_operand (operands[1], OOmode)"

pairs can be assigned to any vsx register, so I think we want
vsx_register_operand here too.




> +(define_expand "mma_disassemble_acc"
> +  [(match_operand:V16QI 0 "mma_disassemble_output_operand")
> +   (match_operand:XO 1 "input_operand")
> +   (match_operand 2 "const_0_to_3_operand")]

Likewise as above, do we want to use the fpr_reg_operand predicate here
instead of input_operand?



> +(define_insn_and_split "*mma_disassemble_acc"
> +  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
> +   (unspec:V16QI [(match_operand:XO 1 "input_operand" "d")
> +  (match_operand 2 "const_0_to_3_operand")]

Likewise?


Peter




Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> > On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> >> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> >>> guojiufu  writes:
>  When unroll loops, if there are calls inside the loop, those calls
>  may raise negative impacts for unrolling.  This patch adds a param
>  param_max_unrolled_calls, and checks if the number of calls inside
>  the loop bigger than this param, loop is prevent from unrolling.
> 
>  This patch is checking the _average_ number of calls which is the
>  summary of call numbers multiply the possibility of the call maybe
>  executed.  The _average_ number could be a fraction, to keep the
>  precision, the param is the threshold number multiply 1.
> 
>  Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> 
>  gcc/ChangeLog
>  2020-08-19  Jiufu Guo   
> 
>   * params.opt (param_max_unrolled_average_calls_x1): New param.
>   * cfgloop.h (average_num_loop_calls): New declare.
>   * cfgloopanal.c (average_num_loop_calls): New function.
>   * loop-unroll.c (decide_unroll_constant_iteration,
>   decide_unroll_runtime_iterations,
>   decide_unroll_stupid): Check average_num_loop_calls and
>   param_max_unrolled_average_calls_x1.
> >> So what's the motivation behind adding a PARAM to control this
> >> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> >> tune behavior (though I've made the same lapse in judgment myself).  In
> >> my mind a PARAM is really more about controlling pathological behavior.
> > But we (Power) need very different tuning than what others apparently
> > need.  It is similar to inlining, in that that also differs a lot
> > between archs how aggressively to do that optimally.
> But what I think that argues is that we've got a gap in the costing
> model and/or how its being used.  Throwing PARAMS at the problem isn't
> really useful for the end user.  The vast majority aren't going to use
> them and of the ones that do, most are probably going to get it wrong.

No, the vast majority of people will *not* (consciously) use them,
because the target defaults will set things to useful values.

The compiler could use saner "generic" defaults perhaps, but those will
still not be satisfactory for anyone (except when they aren't generic in
fact but instead tuned for one arch ;-) ) -- unrolling is just too
important for performance.

> In  my mind fixing things so they work with no magic arguments is best. 
> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> in between.  Others may clearly have different opinions here.

There is no big difference between params and flags here, IMO -- it has
to be a -f with a value as well, for good results.

Since we have (almost) all such tunings in --param already, I'd say this
one belongs there as well?


Segher


Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
>> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
>>> guojiufu  writes:
 When unroll loops, if there are calls inside the loop, those calls
 may raise negative impacts for unrolling.  This patch adds a param
 param_max_unrolled_calls, and checks if the number of calls inside
 the loop bigger than this param, loop is prevent from unrolling.

 This patch is checking the _average_ number of calls which is the
 summary of call numbers multiply the possibility of the call maybe
 executed.  The _average_ number could be a fraction, to keep the
 precision, the param is the threshold number multiply 1.

 Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

 gcc/ChangeLog
 2020-08-19  Jiufu Guo   

* params.opt (param_max_unrolled_average_calls_x1): New param.
* cfgloop.h (average_num_loop_calls): New declare.
* cfgloopanal.c (average_num_loop_calls): New function.
* loop-unroll.c (decide_unroll_constant_iteration,
decide_unroll_runtime_iterations,
decide_unroll_stupid): Check average_num_loop_calls and
param_max_unrolled_average_calls_x1.
>> So what's the motivation behind adding a PARAM to control this
>> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
>> tune behavior (though I've made the same lapse in judgment myself).  In
>> my mind a PARAM is really more about controlling pathological behavior.
> But we (Power) need very different tuning than what others apparently
> need.  It is similar to inlining, in that that also differs a lot
> between archs how aggressively to do that optimally.
But what I think that argues is that we've got a gap in the costing
model and/or how its being used.  Throwing PARAMS at the problem isn't
really useful for the end user.  The vast majority aren't going to use
them and of the ones that do, most are probably going to get it wrong.

In  my mind fixing things so they work with no magic arguments is best. 
PARAMS are the worst solution.  A -f flag with no arguments is somewhere
in between.  Others may clearly have different opinions here.


jeff



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> > guojiufu  writes:
> >> When unroll loops, if there are calls inside the loop, those calls
> >> may raise negative impacts for unrolling.  This patch adds a param
> >> param_max_unrolled_calls, and checks if the number of calls inside
> >> the loop bigger than this param, loop is prevent from unrolling.
> >>
> >> This patch is checking the _average_ number of calls which is the
> >> summary of call numbers multiply the possibility of the call maybe
> >> executed.  The _average_ number could be a fraction, to keep the
> >> precision, the param is the threshold number multiply 1.
> >>
> >> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> >>
> >> gcc/ChangeLog
> >> 2020-08-19  Jiufu Guo   
> >>
> >>* params.opt (param_max_unrolled_average_calls_x1): New param.
> >>* cfgloop.h (average_num_loop_calls): New declare.
> >>* cfgloopanal.c (average_num_loop_calls): New function.
> >>* loop-unroll.c (decide_unroll_constant_iteration,
> >>decide_unroll_runtime_iterations,
> >>decide_unroll_stupid): Check average_num_loop_calls and
> >>param_max_unrolled_average_calls_x1.
> So what's the motivation behind adding a PARAM to control this
> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> tune behavior (though I've made the same lapse in judgment myself).  In
> my mind a PARAM is really more about controlling pathological behavior.

But we (Power) need very different tuning than what others apparently
need.  It is similar to inlining, in that that also differs a lot
between archs how aggressively to do that optimally.


Segher


Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Uecker, Martin
Am Donnerstag, den 19.11.2020, 18:58 + schrieb Joseph Myers:
> On Thu, 19 Nov 2020, Uecker, Martin wrote:

...
> 
> > +void g(void)
> > +{
> > + volatile int j;
> > + typeof((0,j)) i21; i21 = j;;
> > + typeof(+j) i22; i22 = j;;
> > + typeof(-j) i23; i23 = j;;
> > + typeof(1?j:0) i24; i24 = j;;
> > + typeof((int)j) i25; i25 = j;;
> > + typeof((volatile int)j) i26; i26 = j;;
> > +}
> > +
> > +void h(void)
> > +{
> > + _Atomic int j;
> > + typeof((0,j)) i32; i32 = j;;
> > + typeof(+j) i33; i33 = j;;
> > + typeof(-j) i34; i34 = j;;
> > + typeof(1?j:0) i35; i35 = j;;
> > + typeof((int)j) i36; i36 = j;;
> > + typeof((_Atomic int)j) i37; i37 = j;;
> > +}
> > +
> > +void e(void)
> > +{
> > + int* restrict j;
> > + typeof((0,j)) i43; i43 = j;;
> > + typeof(1?j:0) i44; i44 = j;;
> > + typeof((int*)j) i45; i45 = j;;
> > + typeof((int* restrict)j) i46; i46 = j;;
> > +}
> 
> But these tests don't look like they do anything useful (i.e. verify that 
> typeof loses the qualifier), because testing by assignment like that only 
> works with const.  You could do e.g.
> 
> volatile int j;
> extern int i;
> extern typeof((0,j)) i;
> 
> instead to verify the qualifier is removed.

Apparently I did not have enough coffee when
generalizing this to the other qualifiers. 

Ok, with the following test?



/* test that lvalue conversions drops qualifiers, Bug 97702 */
/* { dg-do compile } */
/* { dg-options "" } */


const int jc;
extern int j;
extern typeof(0,jc) j;
extern typeof(+jc) j;
extern typeof(-jc) j;
extern typeof(1?jc:0) j;
extern typeof((int)jc) j;
extern typeof((const int)jc) j;

volatile int kv;
extern int k;
extern typeof(0,kv) k;
extern typeof(+kv) k;
extern typeof(-kv) k;
extern typeof(1?kv:0) k;
extern typeof((int)kv) k;
extern typeof((volatile int)kv) k;

_Atomic int la;
extern int l;
extern typeof(0,la) l;
extern typeof(+la) l;
extern typeof(-la) l;
extern typeof(1?la:0) l;
extern typeof((int)la) l;
extern typeof((_Atomic int)la) l;

int * restrict mr;
extern int *m;
extern typeof(0,mr) m;
extern typeof(1?mr:0) m;
extern typeof((int *)mr) m;
extern typeof((int * restrict)mr) m;



Fix two issues I introduced in operand_equal_p

2020-11-19 Thread Jan Hubicka
Hi,
doing some further testing and analysis of icf miscompares I noticed tat
my change for hadling OEP_ADDRESS_OF of COMPONENT_REF had last minute
chnage that made it not effective, since flag is cleared before the
conditional.  After some exprimenting it seem cleanest to just use
temporary bool.

Other problem is that obj-C++ produces OBJ_TYPE_REFs that are referring
to something else than class types. obj_type_ref_class asserts for that
since one is expected to use virutal_method_call_p predicate first.
It would be nice to make obj-C++ either produce standard OBJ_TYEP_REFs
or use different code for that, but that is for another day.

I apologize for both - clearly need a break which I will do. Fortunately
it seems that ICF issues tracked in PR92535 are almost resolved.

lto-bootstraped and regtested x86_64-linux, comitted.

Honza

* fold-const.c (operand_compare::operand_equal_p): Fix thinko in
COMPONENT_REF handling and guard types_same_for_odr by
virtual_method_call_p.
(operand_compare::hash_operand): Likewise.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 820b08d26fd..1bce9e72c1d 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3314,30 +3314,34 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
 may be NULL when we're called to compare MEM_EXPRs.  */
  if (!OP_SAME_WITH_NULL (0))
return false;
- /* Most of time we only need to compare FIELD_DECLs for equality.
-However when determining address look into actual offsets.
-These may match for unions and unshared record types.  */
- flags &= ~OEP_ADDRESS_OF;
- if (!OP_SAME (1))
-   {
- if (flags & OEP_ADDRESS_OF)
-   {
- if (TREE_OPERAND (arg0, 2)
- || TREE_OPERAND (arg1, 2))
-   return OP_SAME_WITH_NULL (2);
- tree field0 = TREE_OPERAND (arg0, 1);
- tree field1 = TREE_OPERAND (arg1, 1);
-
- if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
-   DECL_FIELD_OFFSET (field1), flags)
- || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
-  DECL_FIELD_BIT_OFFSET (field1),
-  flags))
-   return false;
-   }
- else
-   return false;
-   }
+ {
+   bool compare_address = flags & OEP_ADDRESS_OF;
+
+   /* Most of time we only need to compare FIELD_DECLs for equality.
+  However when determining address look into actual offsets.
+  These may match for unions and unshared record types.  */
+   flags &= ~OEP_ADDRESS_OF;
+   if (!OP_SAME (1))
+ {
+   if (compare_address)
+ {
+   if (TREE_OPERAND (arg0, 2)
+   || TREE_OPERAND (arg1, 2))
+ return OP_SAME_WITH_NULL (2);
+   tree field0 = TREE_OPERAND (arg0, 1);
+   tree field1 = TREE_OPERAND (arg1, 1);
+
+   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
+ DECL_FIELD_OFFSET (field1), flags)
+   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
+DECL_FIELD_BIT_OFFSET (field1),
+flags))
+ return false;
+ }
+   else
+ return false;
+ }
+ }
  return OP_SAME_WITH_NULL (2);
 
case BIT_FIELD_REF:
@@ -3436,8 +3440,11 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
  OBJ_TYPE_REF_OBJECT (arg1), flags))
  return false;
-   if (!types_same_for_odr (obj_type_ref_class (arg0),
-obj_type_ref_class (arg1)))
+   if (virtual_method_call_p (arg0) != virtual_method_call_p (arg1))
+ return false;
+   if (virtual_method_call_p (arg0)
+   && !types_same_for_odr (obj_type_ref_class (arg0),
+   obj_type_ref_class (arg1)))
  return false;
return true;
 
@@ -3866,6 +3873,8 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash ,
  flags &= ~OEP_ADDRESS_OF;
  inchash::add_expr (OBJ_TYPE_REF_TOKEN (t), hstate, flags);
  inchash::add_expr (OBJ_TYPE_REF_OBJECT (t), hstate, flags);
+ if (!virtual_method_call_p (t))
+   return;
  if (tree c = obj_type_ref_class (t))
{
  c = TYPE_NAME (TYPE_MAIN_VARIANT (c));


Re: libstdc++: Avoid zero-probability events in discrete_distribution [PR61369]

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 19/11/20 12:57 -0500, Lewis Hyatt via Libstdc++ wrote:

Hello-

PR61369 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61369) points out
that std::discrete_distribution can return an event even if it has 0
probability, and proposes a simple fix. It seems that this fix was never
applied, because there was an expectation of redoing this code anyway to
use a more efficient algorithm (PR57925). Given that this new algorithm
has not been implemented so far, would it make sense to apply the simple
fix to address this issue? The attached patch does this.

One question about the patch, a slight annoyance is that only
std::lower_bound() is currently available in random.tcc, as this file
includes only bits/stl_algobase.h and not bits/stl_algo.h (via including
). Is there a preference between simply including stl_algo.h, or
moving upper_bound to stl_algobase.h, where lower_bound is? I noticed
that in C++20 mode,  includes stl_algo.h already, so I figured
it would be fine to just include it in random.tcc unconditionally.


But the increase in header sizes in C++20 is a regression:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92546

Anyway, I'll review this patch tomorrow, thanks for sending it.


bootstrap + testing were done on x86-64 GNU/Linux, all tests the same
before + after plus 2 new passes from the new test. Thanks for taking a
look!

-Lewis



From: Lewis Hyatt 
Date: Wed, 18 Nov 2020 17:12:51 -0500
Subject: [PATCH] libstdc++: Avoid zero-probability events in 
discrete_distribution [PR61369]

Fixes PR61369, as recommended by the PR's submitter, by replacing
lower_bound() with upper_bound(). Currently, if there is an initial subset of
events with probability 0, the first of them will be returned with non-zero
probability (if the underlying RNG returns exactly 0). Switching to
upper_bound() ensures that this will not happen.

libstdc++-v3/ChangeLog:

PR libstdc++/61369
* include/bits/random.tcc: Include bits/stl_algo.h.
(discrete_distribution::operator()): Use upper_bound rather than
lower_bound.
* testsuite/26_numerics/random/pr60037-neg.cc: Adapt to new line
numbering in random.tcc.
* testsuite/26_numerics/random/discrete_distribution/pr61369.cc: New
test.

diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index 3205442f2f6..14fe4f39c7b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -31,6 +31,7 @@
#define _RANDOM_TCC 1

#include  // std::accumulate and std::partial_sum
+#include  // std::upper_bound

namespace std _GLIBCXX_VISIBILITY(default)
{
@@ -2706,7 +2707,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __aurng(__urng);

const double __p = __aurng();
-   auto __pos = std::lower_bound(__param._M_cp.begin(),
+   auto __pos = std::upper_bound(__param._M_cp.begin(),
  __param._M_cp.end(), __p);

return __pos - __param._M_cp.begin();
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc 
b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
new file mode 100644
index 000..f8fa97e293e
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+// { dg-require-cstdint "" }
+
+#include 
+#include 
+#include 
+#include 
+
+class not_so_random
+{
+public:
+  using result_type = std::uint64_t;
+
+  static constexpr result_type
+  min()
+  { return 0u; }
+
+  static constexpr result_type
+  max()
+  { return std::numeric_limits::max(); }
+
+  result_type
+  operator()() const
+  { return 0u; }
+};
+
+void
+test01()
+{
+  std::discrete_distribution<> u{0.0, 0.5, 0.5};
+  not_so_random rng;
+  VERIFY( u(rng) > 0 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index ba252ef34fe..4d00d1846c4 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -12,4 +12,4 @@ auto x = std::generate_canonical



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> guojiufu  writes:
>
> Hi,
>
> In this patch, the default value of
> param=max-unrolled-average-calls-x1 is '0', which means to unroll
> a loop, there should be no call inside the body.  Do I need to set the
> default value to a bigger value (16?) for later tune?  Biger value will
> keep the behavior unchanged.
>
> And is this patch ok for trunk?  Thanks a lot for you comments!
>
> BR.
> Jiufu.
>
>
>> Hi,
>>
>> When unroll loops, if there are calls inside the loop, those calls
>> may raise negative impacts for unrolling.  This patch adds a param
>> param_max_unrolled_calls, and checks if the number of calls inside
>> the loop bigger than this param, loop is prevent from unrolling.
>>
>> This patch is checking the _average_ number of calls which is the
>> summary of call numbers multiply the possibility of the call maybe
>> executed.  The _average_ number could be a fraction, to keep the
>> precision, the param is the threshold number multiply 1.
>>
>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>
>> gcc/ChangeLog
>> 2020-08-19  Jiufu Guo   
>>
>>  * params.opt (param_max_unrolled_average_calls_x1): New param.
>>  * cfgloop.h (average_num_loop_calls): New declare.
>>  * cfgloopanal.c (average_num_loop_calls): New function.
>>  * loop-unroll.c (decide_unroll_constant_iteration,
>>  decide_unroll_runtime_iterations,
>>  decide_unroll_stupid): Check average_num_loop_calls and
>>  param_max_unrolled_average_calls_x1.
So what's the motivation behind adding a PARAM to control this
behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
tune behavior (though I've made the same lapse in judgment myself).  In
my mind a PARAM is really more about controlling pathological behavior.

jeff



Re: [PATCH] libstdc++: Enable without gthreads

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 19/11/20 13:36 +, Jonathan Wakely wrote:

On 16/11/20 14:43 -0800, Thomas Rodgers wrote:

This patch looks good to me.


Committed now.


This patch was also needed, but I don't understand why I didn't see
the FAILs on gcc135 in teh cfarm.

Anyway, tested x86_64-linux, committed to trunk.



commit 5e6a43158d2e5b26616716c50badedd3400c6bea
Author: Jonathan Wakely 
Date:   Thu Nov 19 16:17:33 2020

libstdc++: Add missing header to some tests

These tests use std::this_thread::sleep_for without including .

libstdc++-v3/ChangeLog:

* testsuite/30_threads/async/async.cc: Include .
* testsuite/30_threads/future/members/93456.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 1c779bfbcad4..b06c2553c952 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -22,6 +22,7 @@
 
 
 #include 
+#include 
 #include 
 
 using namespace std;
diff --git a/libstdc++-v3/testsuite/30_threads/future/members/93456.cc b/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
index 8d6a5148ce3c..9d1cbcef0013 100644
--- a/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
@@ -22,6 +22,7 @@
 
 
 #include 
+#include 
 #include 
 #include 
 #include 


Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Joseph Myers
On Thu, 19 Nov 2020, Uecker, Martin wrote:

> Here is another version of the patch. The
> only difference is the additional the check 
> using 'tree_ssa_useless_type_conversion'.

The code changes in this one are OK.  However, in the test:

> +void f(void)
> +{
> + const int j;
> + typeof((0,j)) i10; i10 = j;;
> + typeof(+j) i11; i11 = j;;
> + typeof(-j) i12; i12 = j;;
> + typeof(1?j:0) i13; i13 = j;;
> + typeof((int)j) i14; i14 = j;;
> + typeof((const int)j) i15; i15 = j;;
> +}

This test function seems fine.

> +void g(void)
> +{
> + volatile int j;
> + typeof((0,j)) i21; i21 = j;;
> + typeof(+j) i22; i22 = j;;
> + typeof(-j) i23; i23 = j;;
> + typeof(1?j:0) i24; i24 = j;;
> + typeof((int)j) i25; i25 = j;;
> + typeof((volatile int)j) i26; i26 = j;;
> +}
> +
> +void h(void)
> +{
> + _Atomic int j;
> + typeof((0,j)) i32; i32 = j;;
> + typeof(+j) i33; i33 = j;;
> + typeof(-j) i34; i34 = j;;
> + typeof(1?j:0) i35; i35 = j;;
> + typeof((int)j) i36; i36 = j;;
> + typeof((_Atomic int)j) i37; i37 = j;;
> +}
> +
> +void e(void)
> +{
> + int* restrict j;
> + typeof((0,j)) i43; i43 = j;;
> + typeof(1?j:0) i44; i44 = j;;
> + typeof((int*)j) i45; i45 = j;;
> + typeof((int* restrict)j) i46; i46 = j;;
> +}

But these tests don't look like they do anything useful (i.e. verify that 
typeof loses the qualifier), because testing by assignment like that only 
works with const.  You could do e.g.

volatile int j;
extern int i;
extern typeof((0,j)) i;

instead to verify the qualifier is removed.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

The documentation for POST_MODIFY says:
   Currently, the compiler can only handle second operands of the
   form (plus (reg) (reg)) and (plus (reg) (const_int)), where
   the first operand of the PLUS has to be the same register as
   the first operand of the *_MODIFY.
The following testcase ICEs, because combine just attempts to simplify
things and ends up with
(post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
but the target predicates accept it, because they only verify
that POST_MODIFY's second operand is PLUS and the second operand
of the PLUS is a REG.

The following patch fixes this by performing further verification that
the POST_MODIFY is in the form it should be.

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
and release branches after a while?

2020-11-19  Jakub Jelinek  

PR target/97528
* config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require
first POST_MODIFY operand is a REG and is equal to the first operand
of PLUS.

* gcc.target/arm/pr97528.c: New test.

--- gcc/config/arm/arm.c.jj 2020-11-13 19:00:46.729620560 +0100
+++ gcc/config/arm/arm.c2020-11-18 17:05:44.656867343 +0100
@@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ
   /* Allow post-increment by register for VLDn */
   if (type == 2 && GET_CODE (ind) == POST_MODIFY
   && GET_CODE (XEXP (ind, 1)) == PLUS
-  && REG_P (XEXP (XEXP (ind, 1), 1)))
+  && REG_P (XEXP (XEXP (ind, 1), 1))
+  && REG_P (XEXP (ind, 0))
+  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
  return true;
 
   /* Match:
--- gcc/testsuite/gcc.target/arm/pr97528.c.jj   2020-11-18 17:09:58.195053288 
+0100
+++ gcc/testsuite/gcc.target/arm/pr97528.c  2020-11-18 17:09:47.839168237 
+0100
@@ -0,0 +1,28 @@
+/* PR target/97528 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O1" }  */
+/* { dg-add-options arm_neon } */
+
+#include 
+
+typedef __simd64_int16_t T;
+typedef __simd64_uint16_t U;
+unsigned short c;
+int d;
+U e;
+
+void
+foo (void)
+{
+  unsigned short *dst = 
+  int g = d, b = 4;
+  U dc = e;
+  for (int h = 0; h < b; h++)
+{
+  unsigned short *i = dst;
+  U j = dc;
+  vst1_s16 ((int16_t *) i, (T) j);
+  dst += g;
+}
+}


Jakub



Re: [PATCH] rs6000: Fix p8_mtvsrd_df's insn type

2020-11-19 Thread David Edelsohn via Gcc-patches
On Thu, Nov 19, 2020 at 1:54 AM Kewen.Lin  wrote:
>
> Hi,
>
> The insn type of p8_mtvsrd_df looks missed to be updated
> with mtvsr.  Here I supposed mtvsrd's all usages should
> be with the same insn type.
>
> This patch is to fix its current insn type mfvsr by mtvsr.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.md (p8_mtvsrd_df): Fix insn type.

Good that you noticed it. Okay for trunk.

Thanks, David


Re: [PATCH] c++: Fix array new with value-initialization [PR97523]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/19/20 11:11 AM, Marek Polacek wrote:

Since my r11-3092 the following is rejected with -std=c++20:

   struct T { explicit T(); };
   void fn(int n) {
 new T[1]();
   }

with "would use explicit constructor 'T::T()'".  It is because since
that change we go into the P1009 block in build_new (array_p is false,
but nelts is non-null and we're in C++20).  Since we only have (), we
build a {} and continue to build_new_1, which then calls build_vec_init
and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.

For (), which is value-initializing, we want to do what we were doing
before: pass empty init and let build_value_init take care of it.

For various reasons I wanted to dig a little bit deeper into this,
and as a result, I'm adding a test for [expr.new]/24 (and checked that
out current behavior matches clang++).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97523
* init.c (build_new): When value-initializing an array new,
leave the INIT as an empty vector.

gcc/testsuite/ChangeLog:

PR c++/97523
* g++.dg/expr/anew5.C: New test.
* g++.dg/expr/anew6.C: New test.
---
  gcc/cp/init.c |  6 +-
  gcc/testsuite/g++.dg/expr/anew5.C | 26 
  gcc/testsuite/g++.dg/expr/anew6.C | 33 +++
  3 files changed, 64 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/expr/anew5.C
  create mode 100644 gcc/testsuite/g++.dg/expr/anew6.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index ffb84ea5b09..0b98f338feb 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3766,7 +3766,11 @@ build_new (location_t loc, vec **placement, 
tree type,
  
/* P1009: Array size deduction in new-expressions.  */

const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
-  if (*init && (array_p || (nelts && cxx_dialect >= cxx20)))
+  if (*init
+  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
+we have to process the parenthesized-list.  But don't do it for (),
+which is value-initialization, and INIT should stay empty.  */
+  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
  {
/* This means we have 'new T[]()'.  */
if ((*init)->is_empty ())
diff --git a/gcc/testsuite/g++.dg/expr/anew5.C 
b/gcc/testsuite/g++.dg/expr/anew5.C
new file mode 100644
index 000..d597caf5483
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew5.C
@@ -0,0 +1,26 @@
+// PR c++/97523
+// { dg-do compile }
+// We were turning the () into {} which made it seem like
+// aggregate-initialization (we are dealing with arrays here), which
+// performs copy-initialization, which only accepts converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]();
+  new T[2]();
+  new T[3]();
+  new T[n]();
+#if __cpp_aggregate_paren_init
+  new T[]();
+  new T[2](1, 2);
+  // T[2] is initialized via copy-initialization, so we can't call
+  // explicit T().
+  new T[3](1, 2); // { dg-error "explicit constructor" "" { target c++20 } }
+#endif
+}
diff --git a/gcc/testsuite/g++.dg/expr/anew6.C 
b/gcc/testsuite/g++.dg/expr/anew6.C
new file mode 100644
index 000..0542daac275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew6.C
@@ -0,0 +1,33 @@
+// PR c++/97523
+// { dg-do compile { target c++11 } }
+
+// [expr.new]/24: If the new-expression creates an object or an array of
+// objects of class type, access and ambiguity control are done for the
+// [...] constructor selected for the initialization (if any).
+// NB: We only check for a default constructor if the array has a non-constant
+// bound, or there are insufficient initializers.  Since an array is an
+// aggregate, we perform aggregate-initialization, which performs
+// copy-initialization, so we only accept converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+struct S {
+  S(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]{}; // { dg-error "explicit constructor" }
+  new T[2]{1, 2};
+  new T[3]{1, 2}; // { dg-error "explicit constructor" }
+  new T[n]{}; // { dg-error "explicit constructor" }
+
+  new S[1]{}; // { dg-error "could not convert" }
+  new S[2]{1, 2};
+  new S[3]{1, 2}; // { dg-error "could not convert" }
+  new S[n]{}; // { dg-error "could not convert" }
+}

base-commit: 2729378d0905a04e476a8bdcaaf0288f417810ec





Re: [PATCH] c++: Fix crash with broken deduction from {} [PR97895]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/19/20 11:11 AM, Marek Polacek wrote:

Unfortunately, the otherwise beautiful

   for (constructor_elt  : *CONSTRUCTOR_ELTS (init))

is not immune to an empty constructor, so we have to check
CONSTRUCTOR_ELTS first.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97895
* pt.c (do_auto_deduction): Don't crash when the constructor has
zero elements.

gcc/testsuite/ChangeLog:

PR c++/97895
* g++.dg/cpp0x/auto54.C: New test.
---
  gcc/cp/pt.c | 11 +++
  gcc/testsuite/g++.dg/cpp0x/auto54.C | 10 ++
  2 files changed, 17 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto54.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1babf833d32..a1b6631d691 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29250,10 +29250,13 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
  return error_mark_node;
  
if (BRACE_ENCLOSED_INITIALIZER_P (init))

-/* We don't recurse here because we can't deduce from a nested
-   initializer_list.  */
-for (constructor_elt  : *CONSTRUCTOR_ELTS (init))
-  elt.value = resolve_nondeduced_context (elt.value, complain);
+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt  : *CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
else
  init = resolve_nondeduced_context (init, complain);
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto54.C b/gcc/testsuite/g++.dg/cpp0x/auto54.C

new file mode 100644
index 000..0c1815a99bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto54.C
@@ -0,0 +1,10 @@
+// PR c++/97895
+// { dg-do compile { target c++11 } }
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+decltype(sizeof 0) n;
+  };
+  auto a = {}; // { dg-error "unable to deduce" }
+}

base-commit: 25bb75f841c552cfd27a4344b7487efbe35b4481





Re: [PATCH] c++: Implement -Wuninitialized for mem-initializers [PR19808]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/17/20 3:44 AM, Jan Hubicka wrote:

On Tue, Nov 17, 2020 at 01:33:48AM -0500, Jason Merrill via Gcc-patches wrote:

Why doesn't the middle-end warning work for inline functions?


It does but only when they're called (and, as usual, also unless
the uninitialized use is eliminated).


Yes, but why?  I assume because we don't bother going through all the phases
of compilation for unused inlines, but couldn't we change that when we're
asking for (certain) warnings?


CCing Richard and Honza on this.

I think for unused functions we don't even gimplify unused functions, the
cgraph code just throws them away.  Even trying just to run the first few
passes (gimplification up to uninit1) would have several high costs,

Note that uninit1 is a late pass so it is not just few passes we speak
about.  Late passes are run only on cocde that really lands in .s file
so enabling them would mean splitting the pass queue and running another
unreachable code somewhere.  That would confuse inliner and other IPA
passes since they will have to somehow deal with dead code in their
program size estimate and also affect LTO.

Even early passes are run only on reachable portion of program, since
functions are analyzed by cgraphunit on demand (only if they are
analyzed by someone else). Simlar logic is also done be C++ FE to decide
what templates.  Changling this would also have quite some compile
time/memory use impact.

There is -fkeep-inline-functions.


OK, thanks for the explanation.  -fkeep-inline-functions seems like an 
acceptable answer for people who want a warning audit of their library 
header inlines.


Martin, I notice that the middle-end warning doesn't currently catch this:

struct B { int i,j; };

struct A
{
  B b;
  A(): b({b.i}) { }
};

A a;

It does warn if B only has one member; adding the second wrongly 
silences the warning.


Jason



Re: [PATCH] libstdc++: Enable without gthreads

2020-11-19 Thread Tom Tromey
> "Jonathan" == Jonathan Wakely  writes:

Jonathan> Here's a slightly more conservative version of the patch. This moves
Jonathan> std::thread and this_thread::get_id() and this_thread::yield() to a
Jonathan> new header, and makes *most* of std::thread defined without gthreads
Jonathan> (because we need the nested thread::id type to be returned from
Jonathan> this_thread::get_id()). But it doesn't declare the std::thread
Jonathan> constructor that creates new threads.
...
Jonathan> Both this and the previous patch require some GDB changes, because GDB
Jonathan> currently assumes that if std::thread is declared in  that it
Jonathan> is usable and multiple threads are supported. That's no longer true,
Jonathan> because we would declare a useless std::thread after this patch. Tom
Jonathan> Tromey has patches to make GDB handle this though.

It turns out that with this approach, there's nothing to do in gdb,
because luckily the configure check looks to see if the constructor is
usable:

AC_CACHE_CHECK([for std::thread],
   gdb_cv_cxx_std_thread,
   [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
[[#include 
  void callback() { }]],
[[std::thread t(callback);]])],

I will probably still check in the patch to catch system_error when
starting a thread, though.

thanks,
Tom


Re: [PATCH] openmp: Implicit 'declare target' for C++ static initializers

2020-11-19 Thread Kwok Cheung Yeung

On 29/10/2020 10:03 am, Jakub Jelinek wrote:

I'm actually not sure how this can work correctly.
Let's say we have
int foo () { return 1; }
int bar () { return 2; }
int baz () { return 3; }
int qux () { return 4; }
int a = foo ();
int b = bar ();
int c = baz ();
int *d = 
int e = qux ();
int f = e + 1;
int *g = 
#pragma omp declare target to (b, d, g)
So, for the implicit declare target discovery, a is not declare target to,
nor is foo, and everything else is; b, d, g explicitly, c because it is
referenced in initializer of b, f because it is mentioned in initializer of
g and e because it is mentioned in initializer of f.
Haven't checked if the new function you've added is called before or after
analyze_function calls omp_discover_implicit_declare_target, but I don't
really see how it can work when it is not inside of that function, so that
discovery of new static vars that are implicitly declare target to doesn't
result in marking of its dynamic initializers too.  Perhaps we need a
langhook for that.  But if it is a separate function, either it is called
before the other discovery and will ignore static initializers for vars
that will only be marked as implicit declare target to later, or it is done
afterwards, but then it would really need to duplicate everything what the
other function does, otherwise it woiuldn't discover everything.



I have added a new langhook GET_DECL_INIT that by default returns the 
DECL_INITIAL of a variable declaration, but for C++ can also return the dynamic 
initializer if present. omp_discover_implicit_declare_target and 
omp_discover_declare_target_var_r have been changed to use the new langhook 
instead of using DECL_INITIAL.


The dynamic initializer information is stored in a new variable 
dynamic_initializers. The information is originally stored in static_aggregates, 
but this is nulled by calling prune_vars_needing_no_initialization in 
c_parse_final_cleanups. I copy the information into a separate variable before 
it is discarded - this avoids any potential problems that may be caused by 
trying to change the way that static_aggregates currently works.


With this, all the functions and variables in your example are marked correctly:

foo ()
...

__attribute__((omp declare target))
bar ()
...

__attribute__((omp declare target))
baz ()
...

__attribute__((omp declare target))
qux ()
...

.offload_var_table:
.quad   g
.quad   8
.quad   d
.quad   8
.quad   b
.quad   4
.quad   c
.quad   4
.quad   f
.quad   4
.quad   e
.quad   4

Your example is now a compile test in g++.dg/gomp/.


Anyway, that is one thing, the other is even if the implicit declare target
discovery handles those correctly, the question is what should we do
afterwards.  Because the C++ FE normally creates a single function that
performs the dynamic initialization of the TUs variables.  But that function
shouldn't be really declare target to, it initializes not only (explicit or
implicit) declare target to variables, but also host only variables.
So we'll probably need to create next to that host only TU constructor
also a device only constructor function that will only initialize the
declare target to variables.


Even without this patch, G++ currently accepts something like

int foo() { return 1; }
int x = foo();
#pragma omp declare target to(x)

but will not generate the device-side initializer for x, even though x is now 
present on the device. So this part of the implementation is broken with or 
without the patch.


Given that my patch doesn't make the current situation any worse, can I commit 
this portion of it to trunk for now, and leave device-side dynamic 
initialization for later?


Bootstrapped on x86_64 with no offloading, G++ testsuite ran with no 
regressions, and no regressions in the libgomp testsuite with Nvidia offloading.


Thanks,

Kwok
From 0348b149474d0922d79209705e6777e7af271e0d Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 18 Nov 2020 13:54:01 -0800
Subject: [PATCH] openmp: Implicitly add 'declare target' directives for
 dynamic initializers in C++

2020-11-18  Kwok Cheung Yeung  

gcc/
* langhooks-def.h (lhd_get_decl_init): New.
(LANG_HOOKS_GET_DECL_INIT): New.
(LANG_HOOKS_DECLS): Add LANG_HOOKS_GET_DECL_INIT.
* langhooks.h (struct lang_hooks_for_decls): Add get_decl_init.
* omp-offload.c (omp_discover_declare_target_var_r): Use
get_decl_init langhook in place of DECL_INITIAL.

gcc/cp/
* cp-lang.c (cxx_get_decl_init): New.
(LANG_HOOKS_GET_DECL_INIT): New.
* cp-tree.h (dynamic_initializers): New.
* decl.c (dynamic_initializers): New.
* decl2.c (c_parse_final_cleanups): Copy vars into
dynamic_initializers.

gcc/testsuite/
* g++.dg/gomp/declare-target-3.C: New.
---
 gcc/cp/cp-lang.c | 24 +
 

libstdc++: Avoid zero-probability events in discrete_distribution [PR61369]

2020-11-19 Thread Lewis Hyatt via Gcc-patches
Hello-

PR61369 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61369) points out
that std::discrete_distribution can return an event even if it has 0
probability, and proposes a simple fix. It seems that this fix was never
applied, because there was an expectation of redoing this code anyway to
use a more efficient algorithm (PR57925). Given that this new algorithm
has not been implemented so far, would it make sense to apply the simple
fix to address this issue? The attached patch does this.

One question about the patch, a slight annoyance is that only
std::lower_bound() is currently available in random.tcc, as this file
includes only bits/stl_algobase.h and not bits/stl_algo.h (via including
). Is there a preference between simply including stl_algo.h, or
moving upper_bound to stl_algobase.h, where lower_bound is? I noticed
that in C++20 mode,  includes stl_algo.h already, so I figured
it would be fine to just include it in random.tcc unconditionally.

bootstrap + testing were done on x86-64 GNU/Linux, all tests the same
before + after plus 2 new passes from the new test. Thanks for taking a
look!

-Lewis
From: Lewis Hyatt 
Date: Wed, 18 Nov 2020 17:12:51 -0500
Subject: [PATCH] libstdc++: Avoid zero-probability events in 
discrete_distribution [PR61369]

Fixes PR61369, as recommended by the PR's submitter, by replacing
lower_bound() with upper_bound(). Currently, if there is an initial subset of
events with probability 0, the first of them will be returned with non-zero
probability (if the underlying RNG returns exactly 0). Switching to
upper_bound() ensures that this will not happen.

libstdc++-v3/ChangeLog:

PR libstdc++/61369
* include/bits/random.tcc: Include bits/stl_algo.h.
(discrete_distribution::operator()): Use upper_bound rather than
lower_bound.
* testsuite/26_numerics/random/pr60037-neg.cc: Adapt to new line
numbering in random.tcc.
* testsuite/26_numerics/random/discrete_distribution/pr61369.cc: New
test.

diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index 3205442f2f6..14fe4f39c7b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -31,6 +31,7 @@
 #define _RANDOM_TCC 1
 
 #include  // std::accumulate and std::partial_sum
+#include  // std::upper_bound
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -2706,7 +2707,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __aurng(__urng);
 
const double __p = __aurng();
-   auto __pos = std::lower_bound(__param._M_cp.begin(),
+   auto __pos = std::upper_bound(__param._M_cp.begin(),
  __param._M_cp.end(), __p);
 
return __pos - __param._M_cp.begin();
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc 
b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
new file mode 100644
index 000..f8fa97e293e
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+// { dg-require-cstdint "" }
+
+#include 
+#include 
+#include 
+#include 
+
+class not_so_random
+{
+public:
+  using result_type = std::uint64_t;
+
+  static constexpr result_type
+  min()
+  { return 0u; }
+
+  static constexpr result_type
+  max()
+  { return std::numeric_limits::max(); }
+
+  result_type
+  operator()() const
+  { return 0u; }
+};
+
+void
+test01()
+{
+  std::discrete_distribution<> u{0.0, 0.5, 0.5};
+  not_so_random rng;
+  VERIFY( u(rng) > 0 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index ba252ef34fe..4d00d1846c4 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -12,4 +12,4 @@ auto x = std::generate_canonical

config: Add tests for modules-desired features

2020-11-19 Thread Nathan Sidwell

this adds configure tests for features that modules can take advantage
of -- and if they are not present has reduced or fallback functionality.

It is slightly different from the earlier posting, as the server 
functionality has been moved from gcc/cp to its own toplevel directory


 gcc/
 * configure.ac: Add tests for fstatat, sighandler_t, O_CLOEXEC,
 unix-domain and ipv6 sockets.
 * config.in: Rebuilt.
 * configure: Rebuilt.

pushing to trunk

--
Nathan Sidwell

diff --git c/gcc/configure.ac w/gcc/configure.ac
index b2732d17bf4..1cce371a9e1 100644
--- c/gcc/configure.ac
+++ w/gcc/configure.ac
@@ -1417,8 +1417,8 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked feof_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
-	gettimeofday mbstowcs wcswidth mmap setlocale \
-	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2)
+	gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
+	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2 fstatat)
 
 if test x$ac_cv_func_mbstowcs = xyes; then
   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
@@ -1440,6 +1440,10 @@ fi
 
 AC_CHECK_TYPE(ssize_t, int)
 AC_CHECK_TYPE(caddr_t, char *)
+AC_CHECK_TYPE(sighander_t,
+  AC_DEFINE(HAVE_SIGHANDLER_T, 1,
+[Define if  defines sighandler_t]),
+,signal.h)
 
 GCC_AC_FUNC_MMAP_BLACKLIST
 
@@ -1585,6 +1589,72 @@ if test $ac_cv_f_setlkw = yes; then
   [Define if F_SETLKW supported by fcntl.])
 fi
 
+# Check if O_CLOEXEC is defined by fcntl
+AC_CACHE_CHECK(for O_CLOEXEC, ac_cv_o_cloexec, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]], [[
+return open ("/dev/null", O_RDONLY | O_CLOEXEC);]])],
+[ac_cv_o_cloexec=yes],[ac_cv_o_cloexec=no])])
+if test $ac_cv_o_cloexec = yes; then
+  AC_DEFINE(HOST_HAS_O_CLOEXEC, 1,
+  [Define if O_CLOEXEC supported by fcntl.])
+fi
+
+# C++ Modules would like some networking features to provide the mapping
+# server.  You can still use modules without them though.
+# The following network-related checks could probably do with some
+# Windows and other non-linux defenses and checking.
+
+# Local socket connectivity wants AF_UNIX networking
+# Check for AF_UNIX networking
+AC_CACHE_CHECK(for AF_UNIX, ac_cv_af_unix, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_un un;
+un.sun_family = AF_UNSPEC;
+int fd = socket (AF_UNIX, SOCK_STREAM, 0);
+connect (fd, (sockaddr *), sizeof (un));]])],
+[ac_cv_af_unix=yes],
+[ac_cv_af_unix=no])])
+if test $ac_cv_af_unix = yes; then
+  AC_DEFINE(HAVE_AF_UNIX, 1,
+  [Define if AF_UNIX supported.])
+fi
+
+# Remote socket connectivity wants AF_INET6 networking
+# Check for AF_INET6 networking
+AC_CACHE_CHECK(for AF_INET6, ac_cv_af_inet6, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_in6 in6;
+in6.sin6_family = AF_UNSPEC;
+struct addrinfo *addrs = 0;
+struct addrinfo hints;
+hints.ai_flags = 0;
+hints.ai_family = AF_INET6;
+hints.ai_socktype = SOCK_STREAM;
+hints.ai_protocol = 0;
+hints.ai_canonname = 0;
+hints.ai_addr = 0;
+hints.ai_next = 0;
+int e = getaddrinfo ("localhost", 0, , );
+const char *str = gai_strerror (e);
+freeaddrinfo (addrs);
+int fd = socket (AF_INET6, SOCK_STREAM, 0);
+connect (fd, (sockaddr *), sizeof (in6));]])],
+[ac_cv_af_inet6=yes],
+[ac_cv_af_inet6=no])])
+if test $ac_cv_af_inet6 = yes; then
+  AC_DEFINE(HAVE_AF_INET6, 1,
+  [Define if AF_INET6 supported.])
+fi
+
 # Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS="$saved_CFLAGS"
 CXXFLAGS="$saved_CXXFLAGS"



Re: [PATCH] pru: Add builtins for HALT and LMBD

2020-11-19 Thread Dimitar Dimitrov
On четвъртък, 19 ноември 2020 г. 2:07:59 EET Jeff Law wrote:
> On 11/13/20 1:07 PM, Dimitar Dimitrov wrote:
> > Add builtins for HALT and LMBD, per Texas Instruments document
> > SPRUHV7C.  Use the new LMBD pattern to define an expand for clz.
> > 
> > Binutils [1] and sim [2] support for LMBD instruction are merged now.
> > 
> > [1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
> > [2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html
> > 
> > gcc/ChangeLog:
> > * config/pru/alu-zext.md: Add lmbd patterns for zero_extend
> > variants.
> > * config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
> > (pru_init_builtins): Ditto.
> > (pru_builtin_decl): Ditto.
> > (pru_expand_builtin): Ditto.
> > * config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
> > value for CLZ with zero value parameter.
> > * config/pru/pru.md: Add halt, lmbd and clz patterns.
> > * doc/extend.texi: Document PRU builtins.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.target/pru/halt.c: New test.
> > * gcc.target/pru/lmbd.c: New test.
> 
> OK.  Please commit if you haven't already.

Thank you. Pushed as 5ace1776b88d4b0fc371414d0b3983015e22fead .

Regards,
Dimitar






Re: [patch] Plug loophole in string store merging

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 8:52 AM, Eric Botcazou wrote:
> Hi,
>
> there is a loophole in new string store merging support I added recently: it 
> does not check that the stores are consecutive, which is obviously required 
> if 
> you want to concatenate them...  Simple fix attached, the nice thing being 
> that it can fall back to the regular processing if any hole is detected in 
> the 
> series of stores, thanks to the handling of STRING_CST by native_encode_expr.
>
> Tested on x86-64/Linux, OK for the mainline?
>
>
> 2020-11-19  Eric Botcazou  
>
>   * gimple-ssa-store-merging.c (struct merged_store_group): Add
>   new 'consecutive' field.
>   (merged_store_group): Set it to true.
>   (do_merge): Set it to false if the store is not consecutive and
>   set string_concatenation to false in this case.
>   (merge_into): Call do_merge on entry.
>   (merge_overlapping): Likewise.
>
>
> 2020-11-19  Eric Botcazou  
>
>   * gnat.dg/opt90a.adb: New test.
>   * gnat.dg/opt90b.adb: Likewise.
>   * gnat.dg/opt90c.adb: Likewise.
>   * gnat.dg/opt90d.adb: Likewise.
>   * gnat.dg/opt90e.adb: Likewise.
>   * gnat.dg/opt90a_pkg.ads: New helper.
>   * gnat.dg/opt90b_pkg.ads: Likewise.
>   * gnat.dg/opt90c_pkg.ads: Likewise.
>   * gnat.dg/opt90d_pkg.ads: Likewise.
>   * gnat.dg/opt90e_pkg.ads: Likewise.
OK
jeff



Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Pat Haugen via Gcc-patches
On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "vdives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdiveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmodu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Since the vdiv.../vmod... instructions execute in the fixed point divide unit, 
all the above instructions should have a type of "div" instead of "vecsimple".


> +
> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmulhu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHU))]
> +  "TARGET_POWER10"
> +  "vmulhu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +(match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Similarly, the above 3 insns should have a "mul" instruction type.

-Pat


Re: [AArch64] Add --with-tune configure flag

2020-11-19 Thread Richard Earnshaw (lists) via Gcc-patches
On 19/11/2020 14:40, Wilco Dijkstra via Gcc-patches wrote:
> Hi,
> 
     As for your second patch, --with-cpu-64 could be a simple alias indeed,
     but what is the exact definition/expected behaviour of a --with-cpu-32
     on a target that only supports 64-bit code? The AArch64 target cannot
     generate AArch32 code, so we shouldn't silently accept it.
>>>
>>> IMO allowing users to specify all the flags available on x86 is important.
>>>
>>
>> This isn't about general users though; it's about how you configure the
>> compiler and that's not all the same.  I don't mind the --with-cpu-64 as
>> a strict alias for --with-cpu, but --with-cpu-32 is both redundant and
>> misleading as it might give the impression that it does something useful.
> 
> We could make it do something useful, for example emit a warning, an error
> or default to -mabi=ilp32 (since that is similar to what other targets do).
> Anything is better than being the only target that doesn't support it...
> 
> Cheers,
> Wilco
> 

Having the same option have a completely different meaning would be even
worse than not having the option at all.  So no, that's a non-starter.

It's not like these configure options have wide-spread usage at present.

R.


Re: [PATCH] libsanitizer: fix SIGSEGV in fopen64 interceptor

2020-11-19 Thread Martin Liška

On 11/19/20 12:28 PM, Slava Barinov via Gcc-patches wrote:

Null pointer in path argument leads to SIGSEGV in interceptor.


Hello.

I can't see we ever had the null check in master. I don't this it was lost
during a merge from master.

Why do we need the hunk?
Thanks,
Martin



libsanitizer/ChangeLog:
 * sanitizer_common/sanitizer_common_interceptors.inc: Check
path for null before dereference in fopen64 interceptor.
---

Notes:
 Apparently check has been lost during merge from upstream

  libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc 
b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
index 729eead43c0..2ef23d9a50b 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
@@ -6081,7 +6081,7 @@ INTERCEPTOR(__sanitizer_FILE *, freopen, const char 
*path, const char *mode,
  INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char *mode) {
void *ctx;
COMMON_INTERCEPTOR_ENTER(ctx, fopen64, path, mode);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
+  if (path) COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
COMMON_INTERCEPTOR_READ_RANGE(ctx, mode, REAL(strlen)(mode) + 1);
__sanitizer_FILE *res = REAL(fopen64)(path, mode);
COMMON_INTERCEPTOR_FILE_OPEN(ctx, res, path);





[PATCH] c++, v2: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

This is the whole __builtin_clear_padding patchset merged into a single
patch, + 2 new changes - one is that fold_builtin_1 now folds the
1 argument (meant for users) __builtin_clear_padding into an internal
2 argument form, where the second argument is NULL of the first argument's
type, such that gimplifier's stripping of useless type conversions doesn't
change behavior, and handling NULLPTR_TYPE as all padding bits, because
lvalue-to-rvalue conversions with decltype(nullptr) type don't really read
anything from the memory and so we need to clear all the bits as padding.
Here is the full description:

The following patch implements __builtin_clear_padding builtin that clears
the padding bits in object representation (but preserves value
representation).  Inside of unions it clears only those padding bits that
are padding for all the union members (so that it never alters value
representation).

It handles trailing padding, padding in the middle of structs including
bitfields (PDP11 unhandled, I've never figured out how those bitfields
work), VLAs (doesn't handle variable length structures, but I think almost
nobody uses them and it isn't worth the extra complexity).  For VLAs and
sufficiently large arrays it uses runtime clearing loop instead of emitting
straight-line code (unless arrays are inside of a union).

The way I think this can be used for atomics is e.g. if the structures
are power of two sized and small enough that we use the hw atomics
for say compare_exchange __builtin_clear_padding could be called first on
the address of expected and desired arguments (for desired only if we want
to ensure that most of the time the atomic memory will have padding bits
cleared), then perform the weak cmpxchg and if that fails, we got the
value from the atomic memory; we can call __builtin_clear_padding on a copy
of that and then compare it with expected, and if it is the same with the
padding bits masked off, we can use the original with whatever random
padding bits in it as the new expected for next cmpxchg.
__builtin_clear_padding itself is not atomic and therefore it shouldn't
be called on the atomic memory itself, but compare_exchange*'s expected
argument is a reference and normally the implementation may store there
the current value from memory, so padding bits can be cleared in that,
and desired is passed by value rather than reference, so clearing is fine
too.

When using libatomic, we can use it either that way, or add new libatomic
APIs that accept another argument, pointer to the padding bit bitmask,
and construct that in the template as
  alignas (_T) unsigned char _mask[sizeof (_T)];
  std::memset (_mask, ~0, sizeof (_mask));
  __builtin_clear_padding ((_T *) _mask);
which will have bits cleared for padding bits and set for bits taking part
in the value representation.  Then libatomic could internally instead
of using memcmp compare
for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))

Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

2020-11-19  Jakub Jelinek  

PR libstdc++/88101
gcc/
* builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
* builtins.c (fold_builtin_1): Handle BUILT_IN_CLEAR_PADDING.
* gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
const variables.
(struct clear_padding_struct): New type.
(clear_padding_flush, clear_padding_add_padding,
clear_padding_emit_loop, clear_padding_type,
clear_padding_union, clear_padding_real_needs_padding_p,
clear_padding_type_may_have_padding_p,
gimple_fold_builtin_clear_padding): New functions.
(gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
* doc/extend.texi (__builtin_clear_padding): Document.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_CLEAR_PADDING.
gcc/testsuite/
* c-c++-common/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-2.c: New test.
* c-c++-common/torture/builtin-clear-padding-3.c: New test.
* c-c++-common/torture/builtin-clear-padding-4.c: New test.
* c-c++-common/torture/builtin-clear-padding-5.c: New test.
* g++.dg/torture/builtin-clear-padding-1.C: New test.
* g++.dg/torture/builtin-clear-padding-2.C: New test.
* gcc.dg/builtin-clear-padding-1.c: New test.

--- gcc/builtins.def.jj 2020-11-18 09:38:28.481816977 +0100
+++ gcc/builtins.def2020-11-19 16:15:50.573639579 +0100
@@ -839,6 +839,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_CLEAR_C
 /* [trans-mem]: Adjust BUILT_IN_TM_CALLOC if BUILT_IN_CALLOC is changed.  */
 DEF_LIB_BUILTIN(BUILT_IN_CALLOC, "calloc", BT_FN_PTR_SIZE_SIZE, 
ATTR_MALLOC_WARN_UNUSED_RESULT_SIZE_1_2_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLASSIFY_TYPE, "classify_type", 
BT_FN_INT_VAR, ATTR_LEAF_LIST)

Re: Update [PATCH 6/X] libsanitizer: Add hwasan pass and associated gimple changes

2020-11-19 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> +/* Emit gimple statements into  that take the size given in `len` and
> +   generate a size that is guaranteed to be rounded upwards to `align`.
> +
> +   This is a helper function for both handle_builtin_alloca and
> +   asan_expand_mark_ifn when using HWASAN.
> +
> +   Return the tree node representing this size, it is of TREE_TYPE
> +   size_type_node.  */
> +
> +static tree
> +hwasan_emit_round_up (gimple_seq *seq, location_t loc, tree old_size,
> +   uint8_t align)
> +{
> +  uint8_t tg_mask = align - 1;
> +  /* tree new_size = (old_size + tg_mask) & ~tg_mask;  */
> +  tree tree_mask = build_int_cst (size_type_node, tg_mask);
> +  tree oversize = gimple_build (seq, loc, PLUS_EXPR, size_type_node, 
> old_size,
> + tree_mask);
> +
> +  tree mask = build_int_cst (size_type_node, -align);
> +  return gimple_build (seq, loc, BIT_AND_EXPR, size_type_node, oversize, 
> mask);
> +}
> +

There's nothing really hwasan-specific about this, apart from the choice
“uint8_t” for the alignment and mask.  So I think we should:

- chnage “align” and “tg_mask” to “unsigned HOST_WIDE_INT”
- change the name to “gimple_build_round_up”
- take the type as a parameter, in the same position as other
  gimple_build_* type parameters
- move the function to gimple-fold.c, exported via gimple-fold.h
- drop:

   This is a helper function for both handle_builtin_alloca and
   asan_expand_mark_ifn when using HWASAN.

> […]
> @@ -690,6 +757,71 @@ handle_builtin_alloca (gcall *call, gimple_stmt_iterator 
> *iter)
>  = DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
>? 0 : tree_to_uhwi (gimple_call_arg (call, 1));
>  
> +  if (hwasan_sanitize_allocas_p ())
> +{
> +  gimple_seq stmts = NULL;
> +  location_t loc = gimple_location (gsi_stmt (*iter));
> +  /*
> +  HWASAN needs a different expansion.
> +
> +  addr = __builtin_alloca (size, align);
> +
> +  should be replaced by
> +
> +  new_size = size rounded up to HWASAN_TAG_GRANULE_SIZE byte alignment;
> +  untagged_addr = __builtin_alloca (new_size, align);
> +  tag = __hwasan_choose_alloca_tag ();
> +  addr = ifn_HWASAN_SET_TAG (untagged_addr, tag);
> +  __hwasan_tag_memory (untagged_addr, tag, new_size);
> + */
> +  /* Ensure alignment at least HWASAN_TAG_GRANULE_SIZE bytes so we start 
> on
> +  a tag granule.  */
> +  align = align > HWASAN_TAG_GRANULE_SIZE ? align : 
> HWASAN_TAG_GRANULE_SIZE;
> +
> +  tree old_size = gimple_call_arg (call, 0);
> +  tree new_size = hwasan_emit_round_up (, loc, old_size,
> + HWASAN_TAG_GRANULE_SIZE);
> +
> +  /* Make the alloca call */
> +  tree untagged_addr
> + = gimple_build (, loc,
> + as_combined_fn (BUILT_IN_ALLOCA_WITH_ALIGN), ptr_type,
> + new_size, build_int_cst (size_type_node, align));
> +
> +  /* Choose the tag.
> +  Here we use an internal function so we can choose the tag at expand
> +  time.  We need the decision to be made after stack variables have been
> +  assigned their tag (i.e. once the hwasan_frame_tag_offset variable has
> +  been set to one after the last stack variables tag).  */
> +  gcall *stmt = gimple_build_call_internal (IFN_HWASAN_CHOOSE_TAG, 0);
> +  tree tag = make_ssa_name (unsigned_char_type_node);
> +  gimple_call_set_lhs (stmt, tag);
> +  gimple_set_location (stmt, loc);
> +  gimple_seq_add_stmt_without_update (, stmt);

Even though there are currently no folds defined for argumentless
functions, I think it would be worth adding a gimple_build overload
for this instead of writing it out by hand.  I.e. have a zero-argument
version of:

tree
gimple_build (gimple_seq *seq, location_t loc, combined_fn fn,
  tree type, tree arg0)
{
  tree res = gimple_simplify (fn, type, arg0, seq, gimple_build_valueize);
  if (!res)
{
  gcall *stmt;
  if (internal_fn_p (fn))
stmt = gimple_build_call_internal (as_internal_fn (fn), 1, arg0);
  else
{
  tree decl = builtin_decl_implicit (as_builtin_fn (fn));
  stmt = gimple_build_call (decl, 1, arg0);
}
  if (!VOID_TYPE_P (type))
{
  res = create_tmp_reg_or_ssa_name (type);
  gimple_call_set_lhs (stmt, res);
}
  gimple_set_location (stmt, loc);
  gimple_seq_add_stmt_without_update (seq, stmt);
}
  return res;
}

without the gimple_simplify call.

> +
> +  /* Add tag to pointer.  */
> +  tree addr
> + = gimple_build (, loc, as_combined_fn (IFN_HWASAN_SET_TAG),

This is CFN_HWASAN_SET_TAG.

> + ptr_type, untagged_addr, tag);
> +
> +  /* Tag shadow memory.
> +  NOTE: require using `untagged_addr` here for libhwasan API.  */
> +  gimple_build (, loc, as_combined_fn (BUILT_IN_HWASAN_TAG_MEM),
> + void_type_node, untagged_addr, tag, 

[PATCH] c++: Fix crash with broken deduction from {} [PR97895]

2020-11-19 Thread Marek Polacek via Gcc-patches
Unfortunately, the otherwise beautiful

  for (constructor_elt  : *CONSTRUCTOR_ELTS (init))

is not immune to an empty constructor, so we have to check
CONSTRUCTOR_ELTS first.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97895
* pt.c (do_auto_deduction): Don't crash when the constructor has
zero elements.

gcc/testsuite/ChangeLog:

PR c++/97895
* g++.dg/cpp0x/auto54.C: New test.
---
 gcc/cp/pt.c | 11 +++
 gcc/testsuite/g++.dg/cpp0x/auto54.C | 10 ++
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto54.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1babf833d32..a1b6631d691 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29250,10 +29250,13 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 return error_mark_node;
 
   if (BRACE_ENCLOSED_INITIALIZER_P (init))
-/* We don't recurse here because we can't deduce from a nested
-   initializer_list.  */
-for (constructor_elt  : *CONSTRUCTOR_ELTS (init))
-  elt.value = resolve_nondeduced_context (elt.value, complain);
+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt  : *CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
   else
 init = resolve_nondeduced_context (init, complain);
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto54.C 
b/gcc/testsuite/g++.dg/cpp0x/auto54.C
new file mode 100644
index 000..0c1815a99bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto54.C
@@ -0,0 +1,10 @@
+// PR c++/97895
+// { dg-do compile { target c++11 } }
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+decltype(sizeof 0) n;
+  };
+  auto a = {}; // { dg-error "unable to deduce" }
+}

base-commit: 25bb75f841c552cfd27a4344b7487efbe35b4481
-- 
2.28.0



[PATCH] c++: Fix array new with value-initialization [PR97523]

2020-11-19 Thread Marek Polacek via Gcc-patches
Since my r11-3092 the following is rejected with -std=c++20:

  struct T { explicit T(); };
  void fn(int n) {
new T[1]();
  }

with "would use explicit constructor 'T::T()'".  It is because since
that change we go into the P1009 block in build_new (array_p is false,
but nelts is non-null and we're in C++20).  Since we only have (), we
build a {} and continue to build_new_1, which then calls build_vec_init
and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.

For (), which is value-initializing, we want to do what we were doing
before: pass empty init and let build_value_init take care of it.

For various reasons I wanted to dig a little bit deeper into this,
and as a result, I'm adding a test for [expr.new]/24 (and checked that
out current behavior matches clang++).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97523
* init.c (build_new): When value-initializing an array new,
leave the INIT as an empty vector.

gcc/testsuite/ChangeLog:

PR c++/97523
* g++.dg/expr/anew5.C: New test.
* g++.dg/expr/anew6.C: New test.
---
 gcc/cp/init.c |  6 +-
 gcc/testsuite/g++.dg/expr/anew5.C | 26 
 gcc/testsuite/g++.dg/expr/anew6.C | 33 +++
 3 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/expr/anew5.C
 create mode 100644 gcc/testsuite/g++.dg/expr/anew6.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index ffb84ea5b09..0b98f338feb 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3766,7 +3766,11 @@ build_new (location_t loc, vec **placement, 
tree type,
 
   /* P1009: Array size deduction in new-expressions.  */
   const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
-  if (*init && (array_p || (nelts && cxx_dialect >= cxx20)))
+  if (*init
+  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
+we have to process the parenthesized-list.  But don't do it for (),
+which is value-initialization, and INIT should stay empty.  */
+  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
 {
   /* This means we have 'new T[]()'.  */
   if ((*init)->is_empty ())
diff --git a/gcc/testsuite/g++.dg/expr/anew5.C 
b/gcc/testsuite/g++.dg/expr/anew5.C
new file mode 100644
index 000..d597caf5483
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew5.C
@@ -0,0 +1,26 @@
+// PR c++/97523
+// { dg-do compile }
+// We were turning the () into {} which made it seem like
+// aggregate-initialization (we are dealing with arrays here), which
+// performs copy-initialization, which only accepts converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]();
+  new T[2]();
+  new T[3]();
+  new T[n]();
+#if __cpp_aggregate_paren_init
+  new T[]();
+  new T[2](1, 2);
+  // T[2] is initialized via copy-initialization, so we can't call
+  // explicit T().
+  new T[3](1, 2); // { dg-error "explicit constructor" "" { target c++20 } }
+#endif
+}
diff --git a/gcc/testsuite/g++.dg/expr/anew6.C 
b/gcc/testsuite/g++.dg/expr/anew6.C
new file mode 100644
index 000..0542daac275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew6.C
@@ -0,0 +1,33 @@
+// PR c++/97523
+// { dg-do compile { target c++11 } }
+
+// [expr.new]/24: If the new-expression creates an object or an array of
+// objects of class type, access and ambiguity control are done for the
+// [...] constructor selected for the initialization (if any).
+// NB: We only check for a default constructor if the array has a non-constant
+// bound, or there are insufficient initializers.  Since an array is an
+// aggregate, we perform aggregate-initialization, which performs
+// copy-initialization, so we only accept converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+struct S {
+  S(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]{}; // { dg-error "explicit constructor" }
+  new T[2]{1, 2};
+  new T[3]{1, 2}; // { dg-error "explicit constructor" }
+  new T[n]{}; // { dg-error "explicit constructor" }
+
+  new S[1]{}; // { dg-error "could not convert" }
+  new S[2]{1, 2};
+  new S[3]{1, 2}; // { dg-error "could not convert" }
+  new S[n]{}; // { dg-error "could not convert" }
+}

base-commit: 2729378d0905a04e476a8bdcaaf0288f417810ec
-- 
2.28.0



Re: [PATCH 0/2] Improve MSP430 hardware multiply support

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/17/20 7:47 AM, Jozef Lawrynowicz wrote:
> In addition to the default config, I would suggest:
>   msp430-sim/-mcpu=msp430
> Test the 430 ISA
>   msp430-sim/-mlarge/-mcode-region=either
> Test the large memory model with data assumed to be in the lower
> memory region (default, reduces code size penalty of using -mlarge),
> whilst shuffling code between the upper and lower memory regions to
> make the program fit.
>   msp430-sim/-mlarge/-mdata-region=either/-mcode-region=either
>Test the large memory model, shuffling code and data between upper
>and lower memory regions.
>
> I should really use -mlarge/-mcode-region=either, instead of just
> -mlarge, as well. -mcode-region=either doesn't change code gen, just
> allows the linker shuffling of text sections so more tests build and so
> we get better test coverage.
>
> With limited testing capacity, testing hwmult configs is not very useful
> unless hwmult behavior is specifically changed. There are msp430
> specific tests to verify the options basically work.
ACK.  I've added those multilibs to msp430-elf configuration.

Thanks!

jeff



c++: Relax new assert [PR 97905]

2020-11-19 Thread Nathan Sidwell


It turns out there are legitimate cases for the new decl to not have
lang-specific.

PR c++/97905
gcc/cp/
* decl.c (duplicate_decls): Relax new assert.
gcc/testsuite/
* g++.dg/lookup/pr97905.C: New.

pushing to trunk

--
Nathan Sidwell
diff --git c/gcc/cp/decl.c w/gcc/cp/decl.c
index d90e9840f40..f5c6f5c0d10 100644
--- c/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -2749,9 +2749,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
  with that from NEWDECL below.  */
   if (DECL_LANG_SPECIFIC (olddecl))
 {
-  gcc_checking_assert (DECL_LANG_SPECIFIC (newdecl)
-			   && (DECL_LANG_SPECIFIC (olddecl)
-			   != DECL_LANG_SPECIFIC (newdecl)));
+  gcc_checking_assert (DECL_LANG_SPECIFIC (olddecl)
+			   != DECL_LANG_SPECIFIC (newdecl));
   ggc_free (DECL_LANG_SPECIFIC (olddecl));
 }
 
diff --git c/gcc/testsuite/g++.dg/lookup/pr97905.C w/gcc/testsuite/g++.dg/lookup/pr97905.C
new file mode 100644
index 000..22a7e5cf6d4
--- /dev/null
+++ w/gcc/testsuite/g++.dg/lookup/pr97905.C
@@ -0,0 +1,7 @@
+// PR 97905
+
+
+template  void a() {
+  extern int *b; // This decl gets an (unneeded) decl-lang-specific
+}
+int *b; // this does not


[patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-11-19 Thread Qing Zhao via Gcc-patches
Hi, 

PR9 - ICE: in df_refs_verify, at df-scan.c:3991 with -O -ffinite-math-only 
-fzero-call-used-regs=all

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Is a bug triggered by the new pass zero-call-used-regs, however, it’s an old 
bug in the pass “reg-stack”.
This pass does not correctly maintain the df information after transformation. 

Since the transformation is reg-stack pass is quite complicate, involving both 
instruction changes and control
Flow changes, I called “df_insn_rescan_all” after the transformation is done.

The patch has been tested with bootstrap with 
--enable-checking=yes,rtl,df,extra, no regression. 

Okay for commit?

Qing

From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Thu, 19 Nov 2020 16:46:50 +0100
Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
 reg-stack.c [pr9]

reg-stack pass does not maintain the data flow information correctly.
call df_insn_rescan_all after the transformation is done.

gcc/
PR rtl-optimization/9
* reg-stack.c (rest_of_handle_stack_regs): call
df_insn_rescan_all if reg_to_stack return true.

gcc/testsuite/
PR rtl-optimization/9
* gcc.target/i386/pr9.c: New test.
---
 gcc/reg-stack.c | 3 ++-
 gcc/testsuite/gcc.target/i386/pr9.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr9.c

diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index 8f98bd85750..3dab843f803 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -3426,7 +3426,8 @@ static unsigned int
 rest_of_handle_stack_regs (void)
 {
 #ifdef STACK_REGS
-  reg_to_stack ();
+  if (reg_to_stack ())
+df_insn_rescan_all ();
   regstack_completed = 1;
 #endif
   return 0;
diff --git a/gcc/testsuite/gcc.target/i386/pr9.c 
b/gcc/testsuite/gcc.target/i386/pr9.c
new file mode 100644
index 000..fcefc098637
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O -fzero-call-used-regs=used -ffinite-math-only" } */
+
+float
+foo (void)
+{
+  return __builtin_fmod (0, 0);
+}
+
-- 
2.11.0




[patch] Plug loophole in string store merging

2020-11-19 Thread Eric Botcazou
Hi,

there is a loophole in new string store merging support I added recently: it 
does not check that the stores are consecutive, which is obviously required if 
you want to concatenate them...  Simple fix attached, the nice thing being 
that it can fall back to the regular processing if any hole is detected in the 
series of stores, thanks to the handling of STRING_CST by native_encode_expr.

Tested on x86-64/Linux, OK for the mainline?


2020-11-19  Eric Botcazou  

* gimple-ssa-store-merging.c (struct merged_store_group): Add
new 'consecutive' field.
(merged_store_group): Set it to true.
(do_merge): Set it to false if the store is not consecutive and
set string_concatenation to false in this case.
(merge_into): Call do_merge on entry.
(merge_overlapping): Likewise.


2020-11-19  Eric Botcazou  

* gnat.dg/opt90a.adb: New test.
* gnat.dg/opt90b.adb: Likewise.
* gnat.dg/opt90c.adb: Likewise.
* gnat.dg/opt90d.adb: Likewise.
* gnat.dg/opt90e.adb: Likewise.
* gnat.dg/opt90a_pkg.ads: New helper.
* gnat.dg/opt90b_pkg.ads: Likewise.
* gnat.dg/opt90c_pkg.ads: Likewise.
* gnat.dg/opt90d_pkg.ads: Likewise.
* gnat.dg/opt90e_pkg.ads: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 6089faf7ac8..17a4250d77f 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -1450,6 +1450,7 @@ public:
   bool bit_insertion;
   bool string_concatenation;
   bool only_constants;
+  bool consecutive;
   unsigned int first_nonmergeable_order;
   int lp_nr;
 
@@ -1822,6 +1823,7 @@ merged_store_group::merged_store_group (store_immediate_info *info)
   bit_insertion = info->rhs_code == BIT_INSERT_EXPR;
   string_concatenation = info->rhs_code == STRING_CST;
   only_constants = info->rhs_code == INTEGER_CST;
+  consecutive = true;
   first_nonmergeable_order = ~0U;
   lp_nr = info->lp_nr;
   unsigned HOST_WIDE_INT align_bitpos = 0;
@@ -1957,6 +1959,9 @@ merged_store_group::do_merge (store_immediate_info *info)
   first_stmt = stmt;
 }
 
+  if (info->bitpos != start + width)
+consecutive = false;
+
   /* We need to use extraction if there is any bit-field.  */
   if (info->rhs_code == BIT_INSERT_EXPR)
 {
@@ -1964,13 +1969,17 @@ merged_store_group::do_merge (store_immediate_info *info)
   gcc_assert (!string_concatenation);
 }
 
-  /* We need to use concatenation if there is any string.  */
+  /* We want to use concatenation if there is any string.  */
   if (info->rhs_code == STRING_CST)
 {
   string_concatenation = true;
   gcc_assert (!bit_insertion);
 }
 
+  /* But we cannot use it if we don't have consecutive stores.  */
+  if (!consecutive)
+string_concatenation = false;
+
   if (info->rhs_code != INTEGER_CST)
 only_constants = false;
 }
@@ -1982,12 +1991,13 @@ merged_store_group::do_merge (store_immediate_info *info)
 void
 merged_store_group::merge_into (store_immediate_info *info)
 {
+  do_merge (info);
+
   /* Make sure we're inserting in the position we think we're inserting.  */
   gcc_assert (info->bitpos >= start + width
 	  && info->bitregion_start <= bitregion_end);
 
   width = info->bitpos + info->bitsize - start;
-  do_merge (info);
 }
 
 /* Merge a store described by INFO into this merged store.
@@ -1997,11 +2007,11 @@ merged_store_group::merge_into (store_immediate_info *info)
 void
 merged_store_group::merge_overlapping (store_immediate_info *info)
 {
+  do_merge (info);
+
   /* If the store extends the size of the group, extend the width.  */
   if (info->bitpos + info->bitsize > start + width)
 width = info->bitpos + info->bitsize - start;
-
-  do_merge (info);
 }
 
 /* Go through all the recorded stores in this group in program order and
package Opt90a_Pkg is

  type Rec is record
A : Short_Short_Integer;
B : Integer;
C : String (1 .. 12);
  end record;
  pragma Pack (Rec);
  for Rec'Alignment use 1;

  type Data is tagged record
R : Rec;
  end record;

end Opt90a_Pkg;
-- { dg-do run }
-- { dg-options "-O2" }

with Ada.Calendar; use Ada.Calendar;
with Opt90a_Pkg; use Opt90a_Pkg;

procedure Opt90a is
  B : constant Integer := Year (Clock);
  V : Data;

begin
  V := (R => (A => 0, B => B, C => ""));
  if V.R.B /= B then
raise Program_Error;
  end if;
end;
package Opt90b_Pkg is

  type Rec is record
A : Short_Short_Integer;
B : Integer;
C : Short_Integer;
D : String (1 .. 12);
  end record;
  pragma Pack (Rec);
  for Rec'Alignment use 1;

  type Data is tagged record
R : Rec;
  end record;

end Opt90b_Pkg;
-- { dg-do run }
-- { dg-options "-O2" }

with Ada.Calendar; use Ada.Calendar;
with Opt90c_Pkg; use Opt90c_Pkg;

procedure Opt90c is
  B : constant Integer := Year (Clock);
  V : Data;

begin
  V := (R => (A => 0, B => B, C => 0, D => ""));
  if V.R.B /= B then
raise 

Fix PR ada/97805

2020-11-19 Thread Eric Botcazou
We need to include limits.h (or ) in adaint.c because of LLONG_MIN.

Tested on x86-64/Linux, applied on the mainline.


2020-11-19  Eric Botcazou  

PR ada/97805
* adaint.c: Include climits in C++ and limits.h otherwise.

-- 
Eric Botcazoudiff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index 560f3529442..f5432626ee6 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -145,6 +145,13 @@
 #include "version.h"
 #endif
 
+/* limits.h is needed for LLONG_MIN.  */
+#ifdef __cplusplus
+#include 
+#else
+#include 
+#endif
+
 #ifdef __cplusplus
 extern "C" {
 #endif


Re: Update: [PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-11-19 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> […]
> +/* hwasan_frame_base_init_seq is the sequence of RTL insns that will 
> initialize
> +   the hwasan_frame_base_ptr.  When the hwasan_frame_base_ptr is requested, 
> we
> +   generate this sequence but do not emit it.  If the sequence was created it
> +   is emitted once the function body has been expanded.
> +
> +   This delay is because the frame base pointer may be needed anywhere in the
> +   function body, or needed by the expand_used_vars function.  Emitting once 
> in
> +   a known place is simpler than requiring the emition of the instructions to

s/emition/emission/

> +   be know where it should go depending on the first place the hwasan frame
> +   base is needed.  */
> +static GTY(()) rtx_insn *hwasan_frame_base_init_seq = NULL;
> […]
> +/* For stack tagging:
> +
> +   Return the 'base pointer' for this function.  If that base pointer has not
> +   yet been created then we create a register to hold it and record the insns
> +   to initialize the register in `hwasan_frame_base_init_seq` for later
> +   emission.  */
> +rtx
> +hwasan_frame_base ()
> +{
> +  if (! hwasan_frame_base_ptr)
> +{
> +  start_sequence ();
> +  hwasan_frame_base_ptr =
> + force_reg (Pmode,
> +targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
> +  NULL_RTX));

Nit: should be formatted as:

  hwasan_frame_base_ptr
= force_reg (Pmode,
 targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
   NULL_RTX));

> […]
> +  size_t length = hwasan_tagged_stack_vars.length ();
> +  hwasan_stack_var *vars = hwasan_tagged_stack_vars.address ();
> +
> +  poly_int64 bot = 0, top = 0;
> +  size_t i = 0;
> +  for (i = 0; i < length; i++)
> +{
> +  hwasan_stack_var& cur = vars[i];

Simpler as:

  poly_int64 bot = 0, top = 0;
  for (hwasan_stack_var  : hwasan_tagged_stack_vars)

(GCC style is to add a space before “&”, as for “*”)

> +  poly_int64 nearest = cur.nearest_offset;
> +  poly_int64 farthest = cur.farthest_offset;
> +
> +  if (known_ge (nearest, farthest))
> + {
> +   top = nearest;
> +   bot = farthest;
> + }
> +  else
> + {
> +   /* Given how these values are calculated, one must be known greater
> +  than the other.  */
> +   gcc_assert (known_le (nearest, farthest));
> +   top = farthest;
> +   bot = nearest;
> + }
> +  poly_int64 size = (top - bot);
> +
> +  /* Assert the edge of each variable is aligned to the HWASAN tag 
> granule
> +  size.  */
> +  gcc_assert (multiple_p (top, HWASAN_TAG_GRANULE_SIZE));
> +  gcc_assert (multiple_p (bot, HWASAN_TAG_GRANULE_SIZE));
> +  gcc_assert (multiple_p (size, HWASAN_TAG_GRANULE_SIZE));
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  rtx base_tag = targetm.memtag.extract_tag (cur.tagged_base, NULL_RTX);
> +  rtx tag = plus_constant (QImode, base_tag, cur.tag_offset);
> +  tag = hwasan_truncate_to_tag_size (tag, NULL_RTX);
> +
> +  rtx bottom = convert_memory_address (ptr_mode,
> +plus_constant (Pmode,
> +   cur.untagged_base,
> +   bot));
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +  bottom, ptr_mode,
> +  tag, QImode,
> +  gen_int_mode (size, ptr_mode), ptr_mode);
> +}
> +  /* Clear the stack vars, we've emitted the prologue for them all now.  */
> +  hwasan_tagged_stack_vars.truncate (0);
> +}
> +
> +/* For stack tagging:
> +
> +   Return RTL insns to clear the tags between DYNAMIC and VARS pointers
> +   into the stack.  These instructions should be emitted at the end of
> +   every function.
> +
> +   If `dynamic` is NULL_RTX then no insns are returned.  */
> +rtx_insn *
> +hwasan_emit_untag_frame (rtx dynamic, rtx vars)
> +{
> +  if (! dynamic)
> +return NULL;
> +
> +  start_sequence ();
> +
> +  dynamic = convert_memory_address (ptr_mode, dynamic);
> +  vars = convert_memory_address (ptr_mode, vars);
> +
> +  rtx top_rtx;
> +  rtx bot_rtx;
> +  if (FRAME_GROWS_DOWNWARD)
> +{
> +  top_rtx = vars;
> +  bot_rtx = dynamic;
> +}
> +  else
> +{
> +  top_rtx = dynamic;
> +  bot_rtx = vars;
> +}
> +
> +  rtx size_rtx = expand_simple_binop (ptr_mode, MINUS, top_rtx, bot_rtx,
> +   NULL_RTX, /* unsignedp = */0,
> +   OPTAB_DIRECT);
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +  bot_rtx, ptr_mode,
> +  HWASAN_STACK_BACKGROUND, QImode,
> +  size_rtx, ptr_mode);

Nit: “ret” seems like a strange name for this variable, since 

preprocessor: main file searching

2020-11-19 Thread Nathan Sidwell
this patch is slightly modified from the original 07 patch, due to the 
cleanup I posted earlier today.


This adds the capability to locate the main file on the user or system
include paths.  That's extremely useful to users building header
units.  Searching has to be requiested (plain header-unit compilation
will not search).  Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.

libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file.  Set main_loc
* files.c (cpp_retrofit_as_include): New.

pushing to trunk.

nathan
--
Nathan Sidwell
diff --git i/libcpp/files.c w/libcpp/files.c
index ba52d2bf3cf..301b2379a23 100644
--- i/libcpp/files.c
+++ w/libcpp/files.c
@@ -1131,6 +1131,37 @@ cpp_find_header_unit (cpp_reader *pfile, const char *name, bool angle,
   return file->path;
 }
 
+/* Retrofit the just-entered main file asif it was an include.  This
+   will permit correct include_next use, and mark it as a system
+   header if that's where it resides.  We use filesystem-appropriate
+   prefix matching of the include path to locate the main file.  */
+void
+cpp_retrofit_as_include (cpp_reader *pfile)
+{
+  /* We should be the outermost.  */
+  gcc_assert (!pfile->buffer->prev);
+
+  if (const char *name = pfile->main_file->name)
+{
+  /* Locate name on the include dir path, using a prefix match.  */
+  size_t name_len = strlen (name);
+  for (cpp_dir *dir = pfile->quote_include; dir; dir = dir->next)
+	if (dir->len < name_len
+	&& IS_DIR_SEPARATOR (name[dir->len])
+	&& !filename_ncmp (name, dir->name, dir->len))
+	  {
+	pfile->main_file->dir = dir;
+	if (dir->sysp)
+	  cpp_make_system_header (pfile, 1, 0);
+	break;
+	  }
+}
+
+  /* Initialize controlling macro state.  */
+  pfile->mi_valid = true;
+  pfile->mi_cmacro = 0;
+}
+
 /* Could not open FILE.  The complication is dependency output.  */
 static void
 open_file_failed (cpp_reader *pfile, _cpp_file *file, int angle_brackets,
diff --git i/libcpp/include/cpplib.h w/libcpp/include/cpplib.h
index 630f2e055d1..91226cfc248 100644
--- i/libcpp/include/cpplib.h
+++ w/libcpp/include/cpplib.h
@@ -308,6 +308,15 @@ enum cpp_normalize_level {
   normalized_none
 };
 
+enum cpp_main_search 
+{
+  CMS_none,/* A regular source file.  */
+  CMS_header,  /* Is a directly-specified header file (eg PCH or
+		  header-unit).  */
+  CMS_user,/* Search the user INCLUDE path.  */
+  CMS_system,  /* Search the system INCLUDE path.  */
+};
+
 /* This structure is nested inside struct cpp_reader, and
carries all the options visible to the command line.  */
 struct cpp_options
@@ -566,6 +575,8 @@ struct cpp_options
 
   /* The maximum depth of the nested #include.  */
   unsigned int max_include_depth;
+
+  cpp_main_search main_search : 8;
 };
 
 /* Diagnostic levels.  To get a diagnostic without associating a
@@ -997,6 +1008,10 @@ extern const char *cpp_find_header_unit (cpp_reader *, const char *file,
too.  If there was an error opening the file, it returns NULL.  */
 extern const char *cpp_read_main_file (cpp_reader *, const char *,
    bool injecting = false);
+extern location_t cpp_main_loc (const cpp_reader *);
+
+/* Adjust for the main file to be an include.  */
+extern void cpp_retrofit_as_include (cpp_reader *);
 
 /* Set up built-ins with special behavior.  Use cpp_init_builtins()
instead unless your know what you are doing.  */
diff --git i/libcpp/init.c w/libcpp/init.c
index fc826583d3a..f77dc26a003 100644
--- i/libcpp/init.c
+++ w/libcpp/init.c
@@ -675,8 +675,14 @@ cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting)
 deps_add_default_target (deps, fname);
 
   pfile->main_file
-= _cpp_find_file (pfile, fname, >no_search_path, /*angle=*/0,
-		  _cpp_FFK_NORMAL, 0);
+= _cpp_find_file (pfile, fname,
+		  CPP_OPTION (pfile, preprocessed) ? >no_search_path
+		  : CPP_OPTION (pfile, main_search) == CMS_user
+		  ? pfile->quote_include
+		  : CPP_OPTION (pfile, main_search) == CMS_system
+		  ? pfile->bracket_include : >no_search_path,
+		  /*angle=*/0, _cpp_FFK_NORMAL, 0);
+
   if (_cpp_find_failed (pfile->main_file))
 return NULL;
 
@@ -698,7 +704,16 @@ cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting)
 			 LINEMAP_LINE (last), LINEMAP_SYSP (last));
   }
 
-  return ORDINARY_MAP_FILE_NAME (LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table));
+  auto *map = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
+  pfile->main_loc = 

  1   2   >