RE: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-17 Thread qiaopeixin
Hi Richard,

Thanks for the review and explanation.

The previous fix adding if condition of TARGET_FLOAT does crash glibc-2.29.

I checked the past log of writing the function aarch64_init_cumulative_args, 
and did not find the reason why Alan Lawrence added TREE_PUBLIC (fndecl) as one 
condition for entering the function type check. Maybe Alan could clarify? I 
tried to delete TREE_PUBLIC (fndecl), which turns out could solve both the 
glibc problem and the previous ICE problem. A new fix is made as following, 
passed bootstrap and deja test. I believe this fix is reasonable, since the 
function type should be checked no matter if it has external linkage or not.

The function aarch64_init_cumulative_args checks the function types and should 
catch the error that "-mgeneral-regs-only" is incompatible with the use of 
SIMD/FP registers. In the test case on PR96479, the function myfunc2 returns 
one vector of 4 integers, while it is defined static type. TREE_PUBLIC (fndecl) 
is set as false and it prevents from entering if statement and checking 
function types. I delete "TREE_PUBLIC (fndecl)" so that gcc can catch the error 
through the function aarch64_init_cumulative_args now. The ICE on PR96479 can 
report the diagnostic error with this fix. The patch for the fix is attached as 
following:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b7f5bc76f1b..9ce83dce131 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6017,7 +6017,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
 
   if (!silent_p
   && !TARGET_FLOAT
-  && fndecl && TREE_PUBLIC (fndecl)
+  && fndecl
   && fntype && fntype != error_mark_node)
 {
   const_tree type = TREE_TYPE (fntype);

Christophe, thanks for your tests on glibc-2.29. With the above fix, I built 
glibc-2.29, and the previous error does not show up now. Could you please check 
if this fix works?

Do you have any suggestions on this fix?

All the best,
Peixin


-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Thursday, August 13, 2020 8:19 PM
To: Christophe Lyon 
Cc: qiaopeixin ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
[PR96479]

Christophe Lyon  writes:
> On Thu, 13 Aug 2020 at 03:54, qiaopeixin  wrote:
>>
>> Thanks for the review and commit.
>>
>> All the best,
>> Peixin
>>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: 2020年8月13日 0:25
>> To: qiaopeixin 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] AArch64: Add if condition in 
>> aarch64_function_value [PR96479]
>>
>> qiaopeixin  writes:
>> > Hi,
>> >
>> > The test case vector-subscript-2.c in the gcc testsuit will report an ICE 
>> > in the expand pass since '-mgeneral-regs-only' is incompatible with the 
>> > use of V4SI mode. I propose to report the diagnostic information instead 
>> > of ICE, and the problem has been discussed on PR 96479.
>> >
>> > I attached the patch to solve the problem. Bootstrapped and tested on 
>> > aarch64-linux-gnu. Any suggestions?
>>
>> Thanks, pushed.  I was initially sceptical because raising an error here and 
>> in aarch64_layout_arg is a hack.  Both functions are just query functions 
>> and shouldn't have any side effects.
>>
>> The approach we took for FP modes seemed better: we define the FP move 
>> patterns unconditionally, and raise an error if we try to emit an FP move 
>> with !TARGET_FLOAT.  This defers any error reporting until we actually try 
>> to generate code that depends on TARGET_FLOAT.
>>
>> But I guess SIMD stuff is different.  There's no reason in principle why you 
>> can't use:
>>
>>   unsigned short __attribute__((vector_size(8)))
>>
>> *within* a function with -mgeneral-regs-only.  It would just need to be 
>> emulated, in the same way as for:
>>
>>   unsigned short __attribute__((vector_size(4)))
>>
>> So it would be wrong to define the SIMD move patterns unconditionally and 
>> raise an error there.
>>
>> So all in all, I agree this is the best we can do given the current 
>> infrastructure.
>>
>
> Since this patch was committed my buildbot is broken for 
> aarch64-linux-gnu because it now fails to build glibc-2.29:
> ../stdlib/bits/stdlib-float.h: In function 'atof':
> ../stdlib/bits/stdlib-float.h:26:1: error: '-mgeneral-regs-only' is 
> incompatible with the use of floating-point types

Thanks for the heads-up.  I've reverted the patch for now.

Looking more closely, it seems like aarch64_init_cumulative_args already tries 
to catch the problem that the patch was fixing:

  if (!silent_p
  && !TARGET_FLOAT
  && fndecl && TREE_PUBLIC (fndecl)
  && fntype && fntype != error_mark_node)
{
  const_tree type = TREE_TYPE (fntype);
  machine_mode mode ATTRIBUTE_UNUSED; /* To pass pointer as argument.  */
  int nregs ATTRIBUTE_UNUSED; /* Likewise.  */
  if 

Re: [PATCH v2] C-SKY: Support -mfloat-abi=hard.

2020-08-17 Thread Cooper Qu via Gcc-patches

Hi Jojo,

Nowhere is this rule directly stated. But there are indent options 
showed in 
https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting 
corresponding to recommendations of C formatting style, which use the 
defualt 8 clumns tab wide.



On 8/18/20 9:42 AM, Jojo R wrote:

Hi,

Is there coding rule with it ?

I can not find it from 
https://www.gnu.org/prep/standards/html_node/index.html
and https://gcc.gnu.org/codingconventions.html

Could you give me any hints ?

Thanks.

Jojo
在 2020年8月17日 +0800 PM11:05,Xianmiao Qu ,写道:

Hi Jojo,


On 8/17/20 7:09 PM, Jojo R wrote:

diff --git a/gcc/config/csky/csky.c b/gcc/config/csky/csky.c
index 7ba3ed3..b71291a 100644
--- a/gcc/config/csky/csky.c
+++ b/gcc/config/csky/csky.c
@@ -328,6 +328,16 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
{
builtin_define ("__csky_hard_float__");
builtin_define ("__CSKY_HARD_FLOAT__");
+ if (TARGET_HARD_FLOAT_ABI)
+ {
+ builtin_define ("__csky_hard_float_abi__");
+ builtin_define ("__CSKY_HARD_FLOAT_ABI__");
+ }
+ if (TARGET_SINGLE_FPU)
+ {
+ builtin_define ("__csky_hard_float_fpu_sf__");
+ builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
+ }
}

These is one more thing you shoud pay attention to, if the spaces number
reaches 8 at begining of a line, you should use tab instead of 8 spaces.


Thanks,

Xianmiao


Re: [PATCH]Don't use pinsr for struct initialization.

2020-08-17 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 14, 2020 at 5:57 PM Uros Bizjak  wrote:
>
> On Fri, Aug 14, 2020 at 8:03 AM Hongtao Liu  wrote:
> >
> > Hi:
> >   For struct initialization, when it fits in a TImode, gcc will use
> > pinsr insn which causes poor codegen described in PR93897 and PR96562.
>
> You should probably remove TImode handling also from ix86_expand_pextr.
>

Yes, but i failed to construct a testcase to cover this part.
Anyway, the regression test for i386/x86-64 backend is ok, bootstrap is ok.
I also run the patch on SPEC2017, no big impact.

> Uros.
>
> >   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> >   Ok for trunk?
> >
> > ChangeLog
> > gcc/
> > PR target/96562
> > PR target/93897
> > * config/i386/i386-expand.c (ix86_expand_pinsr): Don't use
> > pinsr for TImode.
> >
> > gcc/testsuite/
> > * gcc.target/i386/pr96562-1.c: New test.
> >
> > --
> > BR,
> > Hongtao

Update patch.

-- 
BR,
Hongtao
From 12e879c481ca7ff9c3477beb3dfd3b615dbe8f60 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 12 Aug 2020 10:48:17 +0800
Subject: [PATCH] Don't use pinsr/pextr for struct initialization/extraction.

gcc/
	PR target/96562
	PR target/93897
	* config/i386/i386-expand.c (ix86_expand_pinsr): Don't use
	pinsr for TImode.
	(ix86_expand_pextr): Don't use pextr for TImode.

gcc/testsuite/
	* gcc.target/i386/pr96562-1.c: New test.
---
 gcc/config/i386/i386-expand.c |  2 -
 gcc/testsuite/gcc.target/i386/pr96562-1.c | 81 +++
 2 files changed, 81 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96562-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e194214804b..9b585c8cc8c 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -20237,7 +20237,6 @@ ix86_expand_pextr (rtx *operands)
 case E_V4SImode:
 case E_V2DImode:
 case E_V1TImode:
-case E_TImode:
   {
 	machine_mode srcmode, dstmode;
 	rtx d, pat;
@@ -20333,7 +20332,6 @@ ix86_expand_pinsr (rtx *operands)
 case E_V4SImode:
 case E_V2DImode:
 case E_V1TImode:
-case E_TImode:
   {
 	machine_mode srcmode, dstmode;
 	rtx (*pinsr)(rtx, rtx, rtx, rtx);
diff --git a/gcc/testsuite/gcc.target/i386/pr96562-1.c b/gcc/testsuite/gcc.target/i386/pr96562-1.c
new file mode 100644
index 000..6ebeeb1fb17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96562-1.c
@@ -0,0 +1,81 @@
+/* { dg-do compile} */
+/* { dg-options "-msse4.1 -O2" } */
+/* { dg-final { scan-assembler-not "pinsr" } } */
+
+typedef struct
+{
+  long long a;
+  int b;
+} st1;
+
+typedef struct
+{
+  long long a;
+  int b;
+  short c;
+} st2;
+
+typedef struct
+{
+  long long a;
+  int b;
+  short c;
+  char d;
+} st3;
+
+typedef struct
+{
+  int b;
+  long long a;
+} st4;
+
+typedef struct
+{
+  short c;
+  int b;
+  long long a;
+} st5;
+
+typedef struct
+{
+  char d;
+  short c;
+  int b;
+  long long a;
+} st6;
+
+st1
+foo1 (long long a, int b)
+{
+  return (st1){a, b};
+}
+
+st2
+foo2 (long long a, int b, short c)
+{
+  return (st2){a, b, c};
+}
+
+st3
+foo3 (long long a, int b, short c, char d)
+{
+  return (st3){a, b, c, d};
+}
+
+st4
+foo4 (long long a, int b)
+{
+  return (st4){b, a};
+}
+
+st5
+foo5 (long long a, int b, short c)
+{
+  return (st5){c, b, a};
+}
+
+st6
+foo6 (long long a, int b, short c, char d)
+{
+  return (st6){d, c, b, a};
+}
-- 
2.18.1



Re: [PATCH v2] C-SKY: Support -mfloat-abi=hard.

2020-08-17 Thread Jojo R
Hi,

Is there coding rule with it ?

I can not find it from 
https://www.gnu.org/prep/standards/html_node/index.html
and https://gcc.gnu.org/codingconventions.html

Could you give me any hints ?

Thanks.

Jojo
在 2020年8月17日 +0800 PM11:05,Xianmiao Qu ,写道:
> Hi Jojo,
>
>
> On 8/17/20 7:09 PM, Jojo R wrote:
> > diff --git a/gcc/config/csky/csky.c b/gcc/config/csky/csky.c
> > index 7ba3ed3..b71291a 100644
> > --- a/gcc/config/csky/csky.c
> > +++ b/gcc/config/csky/csky.c
> > @@ -328,6 +328,16 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
> > {
> > builtin_define ("__csky_hard_float__");
> > builtin_define ("__CSKY_HARD_FLOAT__");
> > + if (TARGET_HARD_FLOAT_ABI)
> > + {
> > + builtin_define ("__csky_hard_float_abi__");
> > + builtin_define ("__CSKY_HARD_FLOAT_ABI__");
> > + }
> > + if (TARGET_SINGLE_FPU)
> > + {
> > + builtin_define ("__csky_hard_float_fpu_sf__");
> > + builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
> > + }
> > }
>
> These is one more thing you shoud pay attention to, if the spaces number
> reaches 8 at begining of a line, you should use tab instead of 8 spaces.
>
>
> Thanks,
>
> Xianmiao


[committed] analyzer: fix name of local in region_model::get_rvalue_1

2020-08-17 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as da7c2773e56c889f4f131b80d4b91f1adbae80a2.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::get_rvalue_1): Fix name of local.
---
 gcc/analyzer/region-model.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 305e9648c79..c3d9ca7f650 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1118,8 +1118,8 @@ region_model::get_rvalue_1 (path_var pv, 
region_model_context *ctxt)
 case RESULT_DECL:
 case ARRAY_REF:
   {
-   const region *element_reg = get_lvalue (pv, ctxt);
-   return get_store_value (element_reg);
+   const region *reg = get_lvalue (pv, ctxt);
+   return get_store_value (reg);
   }
 
 case REALPART_EXPR:
-- 
2.26.2



[committed] analyzer: fix ICE on unhandled tree codes in get_rvalue_1 [PR96641]

2020-08-17 Thread David Malcolm via Gcc-patches
The old implementation of region_model::get_rvalue_1 gracefully handled
tree codes it didn't understand, returning "UNKNOWN", whereas the new
implementation (r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d) had
an assertion left over from development, leading to ICEs.

This patch restores the old behavior for these cases.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2729-g2242b975c08e150dd712d8e64341cae8457788ef.

gcc/analyzer/ChangeLog:
PR analyzer/96641
* region-model.cc (region_model::get_rvalue_1): Handle
unrecognized tree codes by returning "UNKNOWN.

gcc/testsuite/ChangeLog:
PR analyzer/96641
* g++.dg/analyzer/pr96641.C: New test.
---
 gcc/analyzer/region-model.cc|  2 +-
 gcc/testsuite/g++.dg/analyzer/pr96641.C | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr96641.C

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index cd74c0f6195..305e9648c79 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1097,7 +1097,7 @@ region_model::get_rvalue_1 (path_var pv, 
region_model_context *ctxt)
   switch (TREE_CODE (pv.m_tree))
 {
 default:
-  gcc_unreachable ();
+  return m_mgr->get_or_create_unknown_svalue (TREE_TYPE (pv.m_tree));
 
 case ADDR_EXPR:
   {
diff --git a/gcc/testsuite/g++.dg/analyzer/pr96641.C 
b/gcc/testsuite/g++.dg/analyzer/pr96641.C
new file mode 100644
index 000..eb11c8584b6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr96641.C
@@ -0,0 +1,18 @@
+struct uh {
+  virtual void
+  sx ();
+};
+
+struct iz : uh {
+  virtual void
+  sx ()
+  {
+sx ();
+  }
+};
+
+void
+a2 ()
+{
+  iz ().sx ();
+}
-- 
2.26.2



[committed] analyzer: fix ICE on unhandled tree codes in gassign [PR96640]

2020-08-17 Thread David Malcolm via Gcc-patches
PR analyzer/96640 reports a ICE within region_model::on_assignment when
failing to handle a WIDEN_MULT_EVEN_EXPR, and various other tree codes.

The old implementation of region_model::on_assignment gracefully handled
tree codes it didn't understand, returning "UNKNOWN", whereas the new
implementation (r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d) had
a "sorry_at" and an assertion left over from development, leading to ICEs.

This patch restores the old behavior for these cases, and marks various
vector operations as leading to unknown results.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2728-g1b0be822208349b2795381deace2352e998c1ad0.

gcc/analyzer/ChangeLog:
PR analyzer/96640
* region-model.cc (region_model::get_gassign_result): Handle various
VEC_* tree codes by returning UNKNOWN.
(region_model::on_assignment): Handle unrecognized tree codes by
setting lhs to an unknown value, rather than issuing a "sorry" and
asserting.
---
 gcc/analyzer/region-model.cc | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 3c7ea40e8d8..cd74c0f6195 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -526,6 +526,22 @@ region_model::get_gassign_result (const gassign *assign,
 case VEC_SERIES_EXPR:
 case VEC_COND_EXPR:
 case VEC_PERM_EXPR:
+case VEC_WIDEN_MULT_HI_EXPR:
+case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_MULT_EVEN_EXPR:
+case VEC_WIDEN_MULT_ODD_EXPR:
+case VEC_UNPACK_HI_EXPR:
+case VEC_UNPACK_LO_EXPR:
+case VEC_UNPACK_FLOAT_HI_EXPR:
+case VEC_UNPACK_FLOAT_LO_EXPR:
+case VEC_UNPACK_FIX_TRUNC_HI_EXPR:
+case VEC_UNPACK_FIX_TRUNC_LO_EXPR:
+case VEC_PACK_TRUNC_EXPR:
+case VEC_PACK_SAT_EXPR:
+case VEC_PACK_FIX_TRUNC_EXPR:
+case VEC_PACK_FLOAT_EXPR:
+case VEC_WIDEN_LSHIFT_HI_EXPR:
+case VEC_WIDEN_LSHIFT_LO_EXPR:
   return m_mgr->get_or_create_unknown_svalue (TREE_TYPE (lhs));
 }
 }
@@ -555,10 +571,12 @@ region_model::on_assignment (const gassign *assign, 
region_model_context *ctxt)
 {
 default:
   {
-   if (1)
+   if (0)
  sorry_at (assign->location, "unhandled assignment op: %qs",
get_tree_code_name (op));
-   gcc_unreachable ();
+   const svalue *unknown_sval
+ = m_mgr->get_or_create_unknown_svalue (TREE_TYPE (lhs));
+   set_value (lhs_reg, unknown_sval, ctxt);
   }
   break;
 
-- 
2.26.2



Go patch committed: Export thunks referenced by inline functions

2020-08-17 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go fronted exports thunks referenced by inline
functions.  Otherwise we get a link time error.  The test case is
https://golang.org/cl/248637.  This fixes
https://golang.org/issue/40252.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 10 branch.

Ian
5903b4561331e2a8907937baad8040e58b92aea3
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e443282d0e8..e425f15285e 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-fe5d94c5792f7f990004c3dee0ea501835512200
+823c91088bc6ac606362fc34b2880ce0de1624ad
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index d295fd10136..8bbc557c65f 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -1635,16 +1635,15 @@ 
Func_descriptor_expression::do_get_backend(Translate_context* context)
  || no->name().find("equal") != std::string::npos))
is_exported_runtime = true;
 
-  bool is_referenced_by_inline =
-   no->is_function() && no->func_value()->is_referenced_by_inline();
-
   bool is_hidden = ((no->is_function()
 && no->func_value()->enclosing() != NULL)
|| (Gogo::is_hidden_name(no->name())
-   && !is_exported_runtime
-   && !is_referenced_by_inline)
+   && !is_exported_runtime)
|| Gogo::is_thunk(no));
 
+  if (no->is_function() && no->func_value()->is_referenced_by_inline())
+   is_hidden = false;
+
   bvar = context->backend()->immutable_struct(var_name, asm_name,
   is_hidden, false,
  btype, bloc);
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 13de74bc870..82d4c1fd54d 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -3370,7 +3370,8 @@ class Create_function_descriptors : public Traverse
   Gogo* gogo_;
 };
 
-// Create a descriptor for every top-level exported function.
+// Create a descriptor for every top-level exported function and every
+// function referenced by an inline function.
 
 int
 Create_function_descriptors::function(Named_object* no)
@@ -3378,8 +3379,9 @@ Create_function_descriptors::function(Named_object* no)
   if (no->is_function()
   && no->func_value()->enclosing() == NULL
   && !no->func_value()->is_method()
-  && !Gogo::is_hidden_name(no->name())
-  && !Gogo::is_thunk(no))
+  && ((!Gogo::is_hidden_name(no->name())
+  && !Gogo::is_thunk(no))
+ || no->func_value()->is_referenced_by_inline()))
 no->func_value()->descriptor(this->gogo_, no);
 
   return TRAVERSE_CONTINUE;


Re: [PATCH] libibery/hashtab: add new functions

2020-08-17 Thread Ian Lance Taylor via Gcc-patches
On Mon, Aug 17, 2020 at 7:06 AM Martin Liška  wrote:
>
> Adding libiberty maintainer to CC.

I guess I'm not sure why either of these belong in libiberty.
htab_insert can be written elsewhere as needed.  And while perhaps
some sort of stats API would be reasonable, I don't think it should be
something that prints values to a FILE.

Ian


> On 8/17/20 4:03 PM, Martin Liška wrote:
> > Hey.
> >
> > I'm working on bintuils where I would like to port a hash table
> > implementation in gas/hash.[ch] to libiberty one.
> >
> > But it would be handy for me to add 2 new functions.
> >
> > Thoughts?
> > Thanks,
> > Martin
> >
> > include/ChangeLog:
> >
> >  * hashtab.h (htab_insert): New function.
> >  (htab_print_statistics): Likewise.
> >
> > libiberty/ChangeLog:
> >
> >  * hashtab.c (htab_insert): New function.
> >  (htab_print_statistics): Likewise.
> > ---
> >   include/hashtab.h   |  6 ++
> >   libiberty/hashtab.c | 23 +++
> >   2 files changed, 29 insertions(+)
> >
> > diff --git a/include/hashtab.h b/include/hashtab.h
> > index 6cca342b989..bcaee909bcf 100644
> > --- a/include/hashtab.h
> > +++ b/include/hashtab.h
> > @@ -37,6 +37,7 @@ extern "C" {
> >   #endif /* __cplusplus */
> >
> >   #include "ansidecl.h"
> > +#include 
> >
> >   /* The type for a hash code.  */
> >   typedef unsigned int hashval_t;
> > @@ -172,6 +173,7 @@ extern void **htab_find_slot (htab_t, const void *, 
> > enum insert_option);
> >   extern void *htab_find_with_hash (htab_t, const void *, hashval_t);
> >   extern void **htab_find_slot_with_hash (htab_t, const void *,
> > hashval_t, enum insert_option);
> > +extern voidhtab_insert (htab_t, void *);
> >   extern voidhtab_clear_slot(htab_t, void **);
> >   extern voidhtab_remove_elt(htab_t, const void *);
> >   extern voidhtab_remove_elt_with_hash (htab_t, const void *, 
> > hashval_t);
> > @@ -183,6 +185,10 @@ extern size_thtab_size (htab_t);
> >   extern size_thtab_elements (htab_t);
> >   extern doublehtab_collisions(htab_t);
> >
> > +extern voidhtab_print_statistics (FILE *f, htab_t table,
> > +   const char *name,
> > +   const char *prefix);
> > +
> >   /* A hash function for pointers.  */
> >   extern htab_hash htab_hash_pointer;
> >
> > diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
> > index 225e9e540a7..fb3152ec9c6 100644
> > --- a/libiberty/hashtab.c
> > +++ b/libiberty/hashtab.c
> > @@ -704,6 +704,15 @@ htab_find_slot (htab_t htab, const PTR element, enum 
> > insert_option insert)
> >  insert);
> >   }
> >
> > +/* Insert ELEMENT into HTAB.  If the element exists, it is overwritten.  */
> > +
> > +void
> > +htab_insert (htab_t htab, PTR element)
> > +{
> > +  void **slot = htab_find_slot (htab, element, INSERT);
> > +  *slot = element;
> > +}
> > +
> >   /* This function deletes an element with the given value from hash
> >  table (the hash is computed from the element).  If there is no matching
> >  element in the hash table, this function does nothing.  */
> > @@ -803,6 +812,20 @@ htab_collisions (htab_t htab)
> > return (double) htab->collisions / (double) htab->searches;
> >   }
> >
> > +/* Print statistics about a hash table.  */
> > +
> > +void
> > +htab_print_statistics (FILE *f, htab_t table, const char *name,
> > +   const char *prefix)
> > +{
> > +  fprintf (f, "%s hash statistics:\n", name);
> > +  fprintf (f, "%s%u searches\n", prefix, table->searches);
> > +  fprintf (f, "%s%lu elements\n", prefix, htab_elements (table));
> > +  fprintf (f, "%s%lu table size\n", prefix, htab_size (table));
> > +  fprintf (f, "%s%.2f collisions per search\n",
> > +   prefix, htab_collisions (table));
> > +}
> > +
> >   /* Hash P as a null-terminated string.
> >
> >  Copied from gcc/hashtable.c.  Zack had the following to say with 
> > respect
>


Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-17 Thread Segher Boessenkool
Ping (added some Cc:s).

Thanks in advance,


Segher


On Fri, Aug 07, 2020 at 09:51:04PM +, Segher Boessenkool wrote:
> When the compgotos pass copies the tail of blocks ending in an indirect
> jump, there is a micro-optimization to not copy the last one, since the
> original block will then just be deleted.  This does not work properly
> if cleanup_cfg does not merge all pairs of blocks we expect it to.
> 
> 
> v2: This also deletes the other use of single_pred_p, which has the same
> problem in principle, I just never have triggered it so far.
> 
> Tested on powerpc64-linux {-m32,-m64} like before.  Is this okay for
> trunk?
> 
> 
> Segher
> 
> 
> 2020-08-07  Segher Boessenkool  
> 
>   PR rtl-optimization/96475
>   * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p
>   micro-optimization.
> ---
>  gcc/bb-reorder.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index c635010..76e56b5 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt)
>  static bool
>  maybe_duplicate_computed_goto (basic_block bb, int max_size)
>  {
> -  if (single_pred_p (bb))
> -return false;
> -
>/* Make sure that the block is small enough.  */
>rtx_insn *insn;
>FOR_BB_INSNS (bb, insn)
> @@ -2700,10 +2697,9 @@ maybe_duplicate_computed_goto (basic_block bb, int 
> max_size)
>  {
>basic_block pred = e->src;
>  
> -  /* Do not duplicate BB into PRED if that is the last predecessor, or if
> -  we cannot merge a copy of BB with PRED.  */
> -  if (single_pred_p (bb)
> -   || !single_succ_p (pred)
> +  /* Do not duplicate BB into PRED if we cannot merge a copy of BB
> +  with PRED.  */
> +  if (!single_succ_p (pred)
> || e->flags & EDGE_COMPLEX
> || pred->index < NUM_FIXED_BLOCKS
> || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
> -- 
> 1.8.3.1


Re: [PATCH] middle-end: Fix PR middle-end/85811: Introduce tree_expr_maybe_nan_p et al.

2020-08-17 Thread Segher Boessenkool
On Mon, Aug 17, 2020 at 10:31:08PM +, Joseph Myers wrote:
> On Sat, 15 Aug 2020, Segher Boessenkool wrote:
> > On Sat, Aug 15, 2020 at 12:10:42PM +0100, Roger Sayle wrote:
> > > I'll quote Joseph Myers (many thanks) who describes things clearly as:
> > > > (a) When both arguments are NaNs, the return value should be a qNaN,
> > > > but sometimes it is an sNaN if at least one argument is an sNaN.
> > 
> > Where is this defined?  I can't find it in C11, in 18661, and of course
> > it isn't what GCC does (it requires -fsignaling to even acknowledge the
> > existence of signaling NaNs :-) )
> 
> The semantics of fmax and fmin are those of the maxNum and minNum 
> operations in IEEE 754-2008 (that were removed in IEEE 754-2019); see the 
> table of IEEE operation bindings that 18661-1 adds to Annex F.
> 
>   minNum(x, y) is the canonicalized number x if x < y, y if y < x, the 
>   canonicalized number if one operand is a number and the other a quiet 
>   NaN. Otherwise it is either x or y, canonicalized (this means results 
>   might differ among implementations). When either x or y is a 
>   signalingNaN, then the result is according to 6.2.
> 
>   maxNum(x, y) is the canonicalized number y if x < y, x if y < x, the 
>   canonicalized number if one operand is a number and the other a quiet 
>   NaN. Otherwise it is either x or y, canonicalized (this means results 
>   might differ among implementations). When either x or y is a 
>   signalingNaN, then the result is according to 6.2.
> 
> where the relevant wording from 6.2 is
> 
>   Under default exception handling, any operation signaling an invalid 
>   operation exception and for which a floating-point result is to be 
>   delivered shall deliver a quiet NaN.
> 
>   Signaling NaNs shall be reserved operands that, under default exception 
>   handling, signal the invalid operation exception (see 7.2) for every 
>   general-computational and signaling-computational operation except for 
>   the conversions described in 5.12. For non-default treatment, see 8.
> 
> (and maxNum and minNum are in 5.3 "Homogeneous general-computational 
> operations").

Ah, so "When both arguments are NaNs, the return value should be a qNaN"
means the QNaN corresponding to eother x or y.  I see, thanks!


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-17 Thread Segher Boessenkool
On Mon, Aug 17, 2020 at 06:05:09PM -0400, David Edelsohn wrote:
> The Power Vector ABI is available at
> 
> https://github.com/power8-abi-doc/vector-function-abi
> 
> It apparently did not attach correctly to the sourceware wiki or the
> filename is different.

Thanks!


Segher


Re: [PATCH] rs6000: unaligned VSX in memcpy/memmove expansion

2020-08-17 Thread Segher Boessenkool
Hi!

On Fri, Aug 14, 2020 at 05:59:05PM -0500, Aaron Sawdey via Gcc-patches wrote:
> +static rtx
> +gen_lxvl_stxvl_move (rtx dest, rtx src, int length)
> +{
> +  gcc_assert (MEM_P (dest) ^ MEM_P (src));

Maybe just "!="?

> +  gcc_assert (GET_MODE (dest) == V16QImode && GET_MODE (src) == V16QImode);
> +  gcc_assert (length <= 16);
> +
> +  bool is_store = MEM_P (dest);
> +
> +  /* If the address form is not a simple register, make it so.  */
> +  if (is_store)
> +{
> +  dest = XEXP (dest, 0);
> +  if (!REG_P (dest))
> + dest = force_reg (Pmode, dest);

So this changes what "dest" means.

Maybe it is clearer if you have a separate variable "addr"?  That you
can use for dest and src as well, whichever is memory.

> +  if (is_store)
> +return gen_stxvl (src, dest, len);
> +  else
> +return  gen_lxvl (dest, src, len);

(doubled space -- well I guess you wanted to align the code)

> +  /* If we can't succeed in doing it in one pass, we can't do it in the
> +  might_overlap case.  Bail out and return failure.  */
> +  if (might_overlap && (num_reg+1) >= MAX_MOVE_REG
> +   && bytes > move_bytes)
> + return 0;

The "num_reg+1" isn't obvious, and the comment doesn't say (we usually
write is as "num_reg + 1" fwiw, and the parens are superfluous).


Looks good, thanks!  Okay for trunk with or without such changes.


Segher


Re: [PATCH] middle-end: Fix PR middle-end/85811: Introduce tree_expr_maybe_nan_p et al.

2020-08-17 Thread Joseph Myers
On Sat, 15 Aug 2020, Segher Boessenkool wrote:

> Hi!
> 
> On Sat, Aug 15, 2020 at 12:10:42PM +0100, Roger Sayle wrote:
> > I'll quote Joseph Myers (many thanks) who describes things clearly as:
> > > (a) When both arguments are NaNs, the return value should be a qNaN,
> > > but sometimes it is an sNaN if at least one argument is an sNaN.
> 
> Where is this defined?  I can't find it in C11, in 18661, and of course
> it isn't what GCC does (it requires -fsignaling to even acknowledge the
> existence of signaling NaNs :-) )

The semantics of fmax and fmin are those of the maxNum and minNum 
operations in IEEE 754-2008 (that were removed in IEEE 754-2019); see the 
table of IEEE operation bindings that 18661-1 adds to Annex F.

  minNum(x, y) is the canonicalized number x if x < y, y if y < x, the 
  canonicalized number if one operand is a number and the other a quiet 
  NaN. Otherwise it is either x or y, canonicalized (this means results 
  might differ among implementations). When either x or y is a 
  signalingNaN, then the result is according to 6.2.

  maxNum(x, y) is the canonicalized number y if x < y, x if y < x, the 
  canonicalized number if one operand is a number and the other a quiet 
  NaN. Otherwise it is either x or y, canonicalized (this means results 
  might differ among implementations). When either x or y is a 
  signalingNaN, then the result is according to 6.2.

where the relevant wording from 6.2 is

  Under default exception handling, any operation signaling an invalid 
  operation exception and for which a floating-point result is to be 
  delivered shall deliver a quiet NaN.

  Signaling NaNs shall be reserved operands that, under default exception 
  handling, signal the invalid operation exception (see 7.2) for every 
  general-computational and signaling-computational operation except for 
  the conversions described in 5.12. For non-default treatment, see 8.

(and maxNum and minNum are in 5.3 "Homogeneous general-computational 
operations").

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c: Fix -Wunused-but-set-* warning with _Generic [PR96571]

2020-08-17 Thread Joseph Myers
On Fri, 14 Aug 2020, Jakub Jelinek via Gcc-patches wrote:

> Hi!
> 
> The following testcase shows various problems with -Wunused-but-set*
> warnings and _Generic construct.  I think it is best to treat the selector
> and the ignored expressions as (potentially) read, because when they are
> parsed, the vars in there are already marked as TREE_USED.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-17 Thread Joseph Myers
On Thu, 13 Aug 2020, Martin Sebor via Gcc-patches wrote:

> > * Maybe cdk_pointer is followed by cdk_attrs before cdk_id.  In this case
> > the code won't return.
> 
> I think I see the problem you're pointing out (I just don't see how
> to trigger it or test that it doesn't happen).  If the tweak in
> the attached update doesn't fix it a test case would be helpful.

I think you need a while loop there, not just an if, to account for the 
case of multiple consecutive cdk_attrs.  At least the GNU attribute syntax

   direct-declarator:
[...]
 ( gnu-attributes[opt] declarator )

should produce multiple consecutive cdk_attrs for each level of 
parentheses with attributes inside.

> > * Maybe the code is correct to continue because we're in the case of an
> > array of pointers (cdk_array follows).  But as I understand it, the intent
> > is to set up an "arg spec" that describes only the (multidimensional)
> > array that is the parameter itself - not any array pointed to.  And it
> > looks to me like, in the case of an array of pointers to arrays, both sets
> > of array bounds would end up in the spec constructed.
> 
> Ideally, I'd like to check even pointers to arrays and so they should
> be recorded somewhere.  The middle end code doesn't do any checking
> of those yet for out-of-bounds accesses.  It wasn't a goal for
> the first iteration so I've tweaked the code to avoid recording them.

Could you expand the comment on get_parm_array_spec to specify exactly 
what you think the function should be putting in the returned attribute, 
in what order, in cases where there are array declarators (constant, 
empty, [*] and VLA) intermixed with other kinds of declarators and the 
type from the type specifiers may or may not be an array type itself?  
That will provide a basis for subsequent rounds of review of whether the 
function is actually behaving as expected.

As far as I can see, the logic

+  if (TREE_CODE (nelts) == INTEGER_CST)
+   {
+ /* Skip all constant bounds except the most significant one.
+The interior ones are included in the array type.  */
+ if (next && (next->kind == cdk_array || next->kind == cdk_pointer))
+   continue;

will skip constant bounds in an array that's the target of a pointer 
declarator, but not any other kind of bounds.  Is that what you intend - 
that all the other kind of bounds in pointed-to arrays will be recorded in 
this string?

> > Then, the code
> > 
> > +  if (pd->kind == cdk_id)
> > +   {
> > + /* Extract the upper bound from a parameter of an array type.  */
> > 
> > also seems misplaced.  If the type specifiers for the parameter are a
> > typedef for an array type, that array type should be processed *before*
> > the declarator to get the correct semantics (as if the bounds from those
> > type specifiers were given in the declarator), not at the end which gets
> > that type out of order with respect to array declarators.  (Processing
> > before the declarator also means clearing the results of that processing
> > if a pointer declarator is encountered at any point, because in that case
> > the array type in the type specifiers is irrelevant.)
> 
> I'm not sure I follow you here.  Can you show me what you mean on
> a piece of code?  This test case (which IIUC does what you described)
> works as expected:
> 
> $ cat q.c && gcc -O2 -S -Wall q.c
> typedef int A[7][9];
> 
> void f (A[3][5]);

So this is equivalent to A[3][5][7][9].  The c_declarator structures have 
the one for the [3] (the top-level bound) inside the one for the [5].  
The [5] bound is skipped by the "Skip all constant bounds except the most 
significant one." logic.  When the [3] bound is reached, the "break;" at 
the end of that processing means the "Extract the upper bound from a 
parameter of an array type." never gets executed.  Try replacing the [3] 
bound by a VLA bound.  As I read the code, it will end up generating a 
spec string that records first the VLA, then the [7], when it should be 
first the 9 (skipped), then the 7 (skipped), then the 5 (skipped), then 
the VLA.  Or if it's "void f (A *[variable][5]);", it will do the same 
thing (VLA, then 7, although both the 7 and the 9 are part of the 
pointed-to type).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-17 Thread David Edelsohn via Gcc-patches
The Power Vector ABI is available at

https://github.com/power8-abi-doc/vector-function-abi

It apparently did not attach correctly to the sourceware wiki or the
filename is different.

Thanks, David

On Mon, Aug 17, 2020 at 1:44 PM GT  wrote:
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool 
>  wrote:
>
> > Hi!
> >
> > This is about the Power binding to some OpenMP API, right? It has
> > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > we have ABIs already, more than enough of each.
> >
> > It is very very VERY hard to review this without being told the proper
> > setting here.
> >
>
> What this is about:
>
> David Edelsohn wanted to have new library functions, one for each of these 6 
> single-precision functions:
> sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
> sin, cos, sincos, exp, log, and pow.
>
> For the single-precision functions, the corresponding new functions would 
> compute 4 results
> simulatneously. For the double-precision functions, the new ones would 
> compute 2 results
> simultaneously.
>
> x86_64 has already done something very similar so I thought I would adapt as 
> much of their
> documentation and implementation as I could for PPC64.
>
> Let's start with that. Comments so far?
>
> Bert.


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-17 Thread Segher Boessenkool
On Mon, Aug 17, 2020 at 05:44:46PM +, GT wrote:
> > This is about the Power binding to some OpenMP API, right? It has
> > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > we have ABIs already, more than enough of each.
> >
> > It is very very VERY hard to review this without being told the proper
> > setting here.
> 
> What this is about:
> 
> David Edelsohn wanted to have new library functions, one for each of these 6 
> single-precision functions:
> sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
> sin, cos, sincos, exp, log, and pow.
> 
> For the single-precision functions, the corresponding new functions would 
> compute 4 results
> simulatneously. For the double-precision functions, the new ones would 
> compute 2 results
> simultaneously.
> 
> x86_64 has already done something very similar so I thought I would adapt as 
> much of their
> documentation and implementation as I could for PPC64.
> 
> Let's start with that. Comments so far?

That sounds like libmvec?

I still don't know what this is.


Segher


[PATCH] PR fortran/96613 - SIGFPE on min1() with -ffpe-trap=invalid switch

2020-08-17 Thread Harald Anlauf
While looking at the reported issue, it appeared that the Fortran frontend
mishandled the conversion of functions of the MIN/MAX variety to inline code.
At the same time, the simplification of expressions using a common and GNU
extension (but non-standard) could result in inconsistent results.  The patch
below addresses that.

Regtested on x86_64-pc-linux-gnu.

OK for master?

Thanks,
Harald


PR fortran/96613 - Fix type/kind of temporaries evaluating MIN/MAX

When evaluating functions of the MIN/MAX variety inline, use a temporary
of appropriate type and kind, and convert to the result type at the end.
In the case of allowing for the GNU extensions to MIN/MAX, derive the
result kind consistently during simplificaton.

gcc/fortran/ChangeLog:

* simplify.c (min_max_choose): The simplification result shall
have the highest kind value of the arguments.
* trans-intrinsic.c (gfc_conv_intrinsic_minmax): Choose type and
kind of intermediate by looking at all arguments, not the result.

gcc/testsuite/ChangeLog:

* gfortran.dg/min_max_kind.f90: New test.
* gfortran.dg/pr96613.f90: New test.

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index eb8b2afeb29..074b50c2e68 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4924,6 +4924,8 @@ min_max_choose (gfc_expr *arg, gfc_expr *extremum, int sign, bool back_val)
   switch (arg->ts.type)
 {
   case BT_INTEGER:
+	if (extremum->ts.kind < arg->ts.kind)
+	  extremum->ts.kind = arg->ts.kind;
 	ret = mpz_cmp (arg->value.integer,
 		   extremum->value.integer) * sign;
 	if (ret > 0)
@@ -4931,6 +4933,8 @@ min_max_choose (gfc_expr *arg, gfc_expr *extremum, int sign, bool back_val)
 	break;

   case BT_REAL:
+	if (extremum->ts.kind < arg->ts.kind)
+	  extremum->ts.kind = arg->ts.kind;
 	if (mpfr_nan_p (extremum->value.real))
 	  {
 	ret = 1;
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index fd8809902b7..2483f016d8e 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -4073,6 +4073,7 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree val;
   tree *args;
   tree type;
+  tree argtype;
   gfc_actual_arglist *argexpr;
   unsigned int i, nargs;

@@ -4082,16 +4083,24 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   gfc_conv_intrinsic_function_args (se, expr, args, nargs);
   type = gfc_typenode_for_spec (>ts);

-  argexpr = expr->value.function.actual;
-  if (TREE_TYPE (args[0]) != type)
-args[0] = convert (type, args[0]);
   /* Only evaluate the argument once.  */
   if (!VAR_P (args[0]) && !TREE_CONSTANT (args[0]))
 args[0] = gfc_evaluate_now (args[0], >pre);

-  mvar = gfc_create_var (type, "M");
-  gfc_add_modify (>pre, mvar, args[0]);
+  /* Determine suitable type of temporary, as a GNU extension allows
+ different argument kinds.  */
+  argtype = TREE_TYPE (args[0]);
+  argexpr = expr->value.function.actual;
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree tmptype = TREE_TYPE (args[i]);
+  if (TYPE_PRECISION (tmptype) > TYPE_PRECISION (argtype))
+	argtype = tmptype;
+}
+  mvar = gfc_create_var (argtype, "M");
+  gfc_add_modify (>pre, mvar, convert (argtype, args[0]));

+  argexpr = expr->value.function.actual;
   for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
 {
   tree cond = NULL_TREE;
@@ -4119,8 +4128,8 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 	 Also, there is no consensus among other tested compilers.  In
 	 short, it's a mess.  So lets just do whatever is fastest.  */
   tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
-  calc = fold_build2_loc (input_location, code, type,
-			  convert (type, val), mvar);
+  calc = fold_build2_loc (input_location, code, argtype,
+			  convert (argtype, val), mvar);
   tmp = build2_v (MODIFY_EXPR, mvar, calc);

   if (cond != NULL_TREE)
@@ -4128,7 +4137,10 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 			build_empty_stmt (input_location));
   gfc_add_expr_to_block (>pre, tmp);
 }
-  se->expr = mvar;
+  if (TREE_CODE (type) == INTEGER_TYPE)
+se->expr = fold_build1_loc (input_location, FIX_TRUNC_EXPR, type, mvar);
+  else
+se->expr = convert (type, mvar);
 }


diff --git a/gcc/testsuite/gfortran.dg/min_max_kind.f90 b/gcc/testsuite/gfortran.dg/min_max_kind.f90
new file mode 100644
index 000..b22691e1ffe
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_max_kind.f90
@@ -0,0 +1,15 @@
+! { dg-do run }
+! { dg-options "-O2 -std=gnu" }
+! Verify that the GNU extensions to MIN/MAX handle mixed kinds properly.
+
+program p
+  implicit none
+  integer(1), parameter :: i1 = 1
+  integer(2), parameter :: i2 = 2
+  real(4),parameter :: r4 = 4
+  real(8),parameter :: r8 = 8
+  if (kind 

[committed] analyzer: fix ICE on NULL dereference [PR96644]

2020-08-17 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Pushed to master as r11-2725-gb00a83047574eb6f8d1e670ad439609125873506.

gcc/analyzer/ChangeLog:
PR analyzer/96644
* region-model-manager.cc (get_region_for_unexpected_tree_code):
Handle ctxt being NULL.

gcc/testsuite/ChangeLog:
PR analyzer/96644
* gcc.dg/analyzer/pr96644.c: New test.
---
 gcc/analyzer/region-model-manager.cc|  4 ++--
 gcc/testsuite/gcc.dg/analyzer/pr96644.c | 24 
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96644.c

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 9c7b0602e88..4faeaa52a63 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -927,11 +927,11 @@ get_region_for_unexpected_tree_code (region_model_context 
*ctxt,
 tree t,
 const dump_location_t )
 {
-  gcc_assert (ctxt);
   tree type = TYPE_P (t) ? t : TREE_TYPE (t);
   region *new_reg
 = new unknown_region (alloc_region_id (), _root_region, type);
-  ctxt->on_unexpected_tree_code (t, loc);
+  if (ctxt)
+ctxt->on_unexpected_tree_code (t, loc);
   return new_reg;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96644.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96644.c
new file mode 100644
index 000..3953c8d58c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96644.c
@@ -0,0 +1,24 @@
+/* { dg-additional-options "-O1" } */
+
+int oh[1];
+int *x3;
+
+int *
+cm (char *m0)
+{
+  return oh;
+}
+
+void
+ek (void)
+{
+  for (;;)
+{
+  char *b2 = 0;
+
+  if (*b2 != 0) /* { dg-warning "dereference of NULL" } */
+   ++b2;
+
+  x3 = cm (b2);
+}
+}
-- 
2.26.2



[committed] analyzer: fix ICE due to NULL type [PR96639]

2020-08-17 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Pushed to master as r11-2724-g42c5ae5d7f0ad89b75d93c497fe44b6c66da7e76.

gcc/analyzer/ChangeLog:
PR analyzer/96639
* region.cc (region::get_subregions_for_binding): Check for "type"
being NULL.

gcc/testsuite/ChangeLog:
PR analyzer/96639
* gcc.dg/analyzer/pr96639.c: New test.
---
 gcc/analyzer/region.cc  |  2 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96639.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index afe416b001b..eab1f2771cf 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -256,7 +256,7 @@ region::get_subregions_for_binding (region_model_manager 
*mgr,
tree type,
auto_vec  *out) const
 {
-  if (get_type () == NULL_TREE)
+  if (get_type () == NULL_TREE || type == NULL_TREE)
 return;
   if (relative_bit_offset == 0
   && types_compatible_p (get_type (), type))
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
new file mode 100644
index 000..02ca3f084a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -0,0 +1,10 @@
+void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
+
+int
+x7 (void)
+{
+  int **md = calloc (1, 1);
+
+  return md[0][0]; /* { dg-warning "possibly-NULL" "unchecked deref" } */
+  /* { dg-warning "leak of 'md'" "leak" { target *-*-* } .-1 } */
+}
-- 
2.26.2



[committed] analyzer: handle _CST in constant pool initializers [PR96642]

2020-08-17 Thread David Malcolm via Gcc-patches
In r11-2708-g2867118ddda9b56d991c16022f7d3d634ed08313 I added support to
the analyzer for initialization from var_decls in the global constant
pool.  However, that commit didn't support initialization from
ADDR_EXPR of a STRING_CST leading to an ICE seen in data-model-1.c and
pr94639.c on arm and powerpc64 at least, and as PR analyzer/96642 on
x86_64 at least.

This patch adds support for such initializers, fixing the ICE.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Verified the fix to data-model-1.c and pr94639.c on arm, aarch64 and
powerpc64.

Pushed to master as r11-2723-g35c5f8fb432c8e68af68ab48c8d3107e7839775e.

gcc/analyzer/ChangeLog:
PR analyzer/96642
* store.cc (get_svalue_for_ctor_val): New.
(binding_map::apply_ctor_to_region): Call it.

gcc/testsuite/ChangeLog:
PR analyzer/96642
* gcc.dg/analyzer/pr96642.c: New test.
---
 gcc/analyzer/store.cc   | 21 ++---
 gcc/testsuite/gcc.dg/analyzer/pr96642.c | 10 ++
 2 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96642.c

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 232920019e0..5af86d09c2b 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -391,6 +391,22 @@ get_subregion_within_ctor (const region *parent_reg, tree 
index,
 }
 }
 
+/* Get the svalue for VAL, a non-CONSTRUCTOR value within a CONSTRUCTOR.  */
+
+static const svalue *
+get_svalue_for_ctor_val (tree val, region_model_manager *mgr)
+{
+  if (TREE_CODE (val) == ADDR_EXPR)
+{
+  gcc_assert (TREE_CODE (TREE_OPERAND (val, 0)) == STRING_CST);
+  const string_region *str_reg
+   = mgr->get_region_for_string (TREE_OPERAND (val, 0));
+  return mgr->get_ptr_svalue (TREE_TYPE (val), str_reg);
+}
+  gcc_assert (CONSTANT_CLASS_P (val));
+  return mgr->get_or_create_constant_svalue (val);
+}
+
 /* Bind values from CONSTRUCTOR to this map, relative to
PARENT_REG's relationship to its base region.  */
 
@@ -415,12 +431,11 @@ binding_map::apply_ctor_to_region (const region 
*parent_reg, tree ctor,
apply_ctor_to_region (child_reg, val, mgr);
   else
{
- gcc_assert (CONSTANT_CLASS_P (val));
- const svalue *cst_sval = mgr->get_or_create_constant_svalue (val);
+ const svalue *sval = get_svalue_for_ctor_val (val, mgr);
  const binding_key *k
= binding_key::make (mgr->get_store_manager (), child_reg,
 BK_direct);
- put (k, cst_sval);
+ put (k, sval);
}
 }
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96642.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96642.c
new file mode 100644
index 000..117aa0437ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96642.c
@@ -0,0 +1,10 @@
+void
+ut (void)
+{
+  struct {
+char *cc;
+  } sr[] = {
+0, 0, 0, 0, 0, 0, 0, 0, 0, "", "", 0, "", 0, 0, "",
+0, 0, "", 0, 0, "", 0, 0, "", 0, 0, "", 0, 0, 0, 0, 0,
+  };
+}
-- 
2.26.2



[committed] i386: Use parametrized pattern names some more.

2020-08-17 Thread Uros Bizjak via Gcc-patches
Use parameterized pattern names to simplify calling of named patterns.

2020-08-15  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386-builtin.def (__builtin_ia32_bextri_u32)
(__builtin_ia32_bextri_u64): Use CODE_FOR_nothing.
* config/i386/i386.md (@tbm_bextri_):
Implement as parametrized name pattern.
(@rdrand): Ditto.
(@rdseed): Ditto.
* config/i386/i386-expand.c (ix86_expand_builtin)
[case IX86_BUILTIN_BEXTRI32, case IX86_BUILTIN_BEXTRI64]:
Update for parameterized name patterns.
[case IX86_BUILTIN_RDRAND16_STEP, case IX86_BUILTIN_RDRAND32_STEP]
[case IX86_BUILTIN_RDRAND64_STEP]: Ditto.
[case IX86_BUILTIN_RDSEED16_STEP, case IX86_BUILTIN_RDSEED32_STEP]
[case IX86_BUILTIN_RDSEED64_STEP]: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/rdrand-1.c (dg-final): Update scan string.
* gcc.target/i386/rdrand-2.c (dg-final): Ditto.
* gcc.target/i386/rdrand-3.c (dg-final): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 3b6c4a85579..fec5cef0b55 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1270,8 +1270,8 @@ BDESC (OPTION_MASK_ISA_BMI, 0, CODE_FOR_tzcnt_si, 
"__builtin_ia32_tzcnt_u32", IX
 BDESC (OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_tzcnt_di, 
"__builtin_ia32_tzcnt_u64", IX86_BUILTIN_TZCNT64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64)
 
 /* TBM */
-BDESC (OPTION_MASK_ISA_TBM, 0, CODE_FOR_tbm_bextri_si, 
"__builtin_ia32_bextri_u32", IX86_BUILTIN_BEXTRI32, UNKNOWN, (int) 
UINT_FTYPE_UINT_UINT)
-BDESC (OPTION_MASK_ISA_TBM | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_tbm_bextri_di, 
"__builtin_ia32_bextri_u64", IX86_BUILTIN_BEXTRI64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64_UINT64)
+BDESC (OPTION_MASK_ISA_TBM, 0, CODE_FOR_nothing, "__builtin_ia32_bextri_u32", 
IX86_BUILTIN_BEXTRI32, UNKNOWN, (int) UINT_FTYPE_UINT_UINT)
+BDESC (OPTION_MASK_ISA_TBM | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, 
"__builtin_ia32_bextri_u64", IX86_BUILTIN_BEXTRI64, UNKNOWN, (int) 
UINT64_FTYPE_UINT64_UINT64)
 
 /* F16C */
 BDESC (OPTION_MASK_ISA_F16C, 0, CODE_FOR_vcvtph2ps, 
"__builtin_ia32_vcvtph2ps", IX86_BUILTIN_CVTPH2PS, UNKNOWN, (int) 
V4SF_FTYPE_V8HI)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 9de6f5029b9..d8368bfd4a9 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -11709,24 +11709,26 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
 
 case IX86_BUILTIN_BEXTRI32:
 case IX86_BUILTIN_BEXTRI64:
+  mode = (fcode == IX86_BUILTIN_BEXTRI32 ? SImode : DImode);
+
   arg0 = CALL_EXPR_ARG (exp, 0);
   arg1 = CALL_EXPR_ARG (exp, 1);
   op0 = expand_normal (arg0);
   op1 = expand_normal (arg1);
-  icode = (fcode == IX86_BUILTIN_BEXTRI32
- ? CODE_FOR_tbm_bextri_si
- : CODE_FOR_tbm_bextri_di);
+
   if (!CONST_INT_P (op1))
-{
-  error ("last argument must be an immediate");
-  return const0_rtx;
-}
+   {
+ error ("last argument must be an immediate");
+ return const0_rtx;
+   }
   else
-{
-  unsigned char length = (INTVAL (op1) >> 8) & 0xFF;
-  unsigned char lsb_index = INTVAL (op1) & 0xFF;
-  op1 = GEN_INT (length);
-  op2 = GEN_INT (lsb_index);
+   {
+ unsigned char lsb_index = UINTVAL (op1);
+ unsigned char length = UINTVAL (op1) >> 8;
+
+ unsigned char bitsize = GET_MODE_BITSIZE (mode);
+
+ icode = code_for_tbm_bextri (mode);
 
  mode1 = insn_data[icode].operand[1].mode;
  if (!insn_data[icode].operand[1].predicate (op0, mode1))
@@ -11737,25 +11739,32 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
  || !register_operand (target, mode0))
target = gen_reg_rtx (mode0);
 
-  pat = GEN_FCN (icode) (target, op0, op1, op2);
-  if (pat)
-emit_insn (pat);
-  return target;
-}
+ if (length == 0 || lsb_index >= bitsize)
+   {
+ emit_move_insn (target, const0_rtx);
+ return target;
+   }
+
+ if (length + lsb_index > bitsize)
+   length = bitsize - lsb_index;
+
+ op1 = GEN_INT (length);
+ op2 = GEN_INT (lsb_index);
+
+ emit_insn (GEN_FCN (icode) (target, op0, op1, op2));
+ return target;
+   }
 
 case IX86_BUILTIN_RDRAND16_STEP:
-  icode = CODE_FOR_rdrandhi_1;
-  mode0 = HImode;
+  mode = HImode;
   goto rdrand_step;
 
 case IX86_BUILTIN_RDRAND32_STEP:
-  icode = CODE_FOR_rdrandsi_1;
-  mode0 = SImode;
+  mode = SImode;
   goto rdrand_step;
 
 case IX86_BUILTIN_RDRAND64_STEP:
-  icode = CODE_FOR_rdranddi_1;
-  mode0 = DImode;
+  mode = DImode;
 
 rdrand_step:
   arg0 = CALL_EXPR_ARG (exp, 0);
@@ -11766,16 

[PATCH][Arm] Auto-vectorization for MVE: vsub

2020-08-17 Thread Dennis Zhang

Hi all,

This patch enables MVE vsub instructions for auto-vectorization.
It adds RTL templates for MVE vsub instructions using 'minus' instead of 
unspec expression to make the instructions recognizable for vectorization.
MVE target is added in sub3 optab. The sub3 optab is 
modified to use a mode iterator that selects available modes for various 
targets correspondingly.
MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to 
support vectorization.

This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests 
generate wrong instruction numbers because of unexpected icf optimization.
This bug is exposed by the MVE vector modes enabled in this patch, 
therefore it is corrected in this patch to avoid test failures.

MVE instructions are documented here: 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics

The patch is regtested for arm-none-eabi and bootstrapped for 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-08-10  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
(TARGET_NEON_MVE_HFP): Likewise.
* config/arm/iterators.md (VSEL): New mode iterator to select modes
for corresponding targets.
* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (sub3): Removed here. Integrated in the
sub3 in vec-common.md
* config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
to select available modes. Exclude TARGET_NEON_FP16INST from
TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
originally in neon.md.

gcc/testsuite/ChangeLog:

2020-08-10  Dennis Zhang  

* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
* gcc.target/arm/mve/vect/vect_sub_1.c: New test.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..eb8c9599357 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -334,6 +334,14 @@ emission of floating point pcs attributes.  */
 		isa_bit_mve_float) \
 			   && !TARGET_GENERAL_REGS_ONLY)
 
+#define TARGET_NEON_IWMMXT	(TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_IWMMXT_MVE	(TARGET_NEON || TARGET_REALLY_IWMMXT \
+ || TARGET_HAVE_MVE)
+#define TARGET_NEON_IWMMXT_MVE_FP ((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+   || TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_MVE_HFP	((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+ || TARGET_NEON_FP16INST)
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6b7ca829f1c..dcbcbbeced0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28913,6 +28913,30 @@ arm_preferred_simd_mode (scalar_mode mode)
   default:;
   }
 
+  if (TARGET_HAVE_MVE)
+switch (mode)
+  {
+  case QImode:
+	return V16QImode;
+  case HImode:
+	return V8HImode;
+  case SImode:
+	return V4SImode;
+
+  default:;
+  }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+switch (mode)
+  {
+  case HFmode:
+	return V8HFmode;
+  case SFmode:
+	return V4SFmode;
+
+  default:;
+  }
+
   return word_mode;
 }
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..52c3a8a4355 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -80,6 +80,19 @@
 ;; Integer and float modes supported by Neon and IWMMXT but not MVE.
 (define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
 
+;; Select modes for NEON, IWMMXT and MVE.
+(define_mode_iterator VSEL [(V16QI "TARGET_NEON_IWMMXT_MVE")
+			(V8HI  "TARGET_NEON_IWMMXT_MVE")
+			  

RE: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-17 Thread Carl Love via Gcc-patches
Bill:

On Mon, 2020-08-17 at 13:09 -0500, Bill Schmidt wrote:
> > 
> > There are three prototypes __builtin_cfuged, __builtin_pdepd,
> > __builtin_pextd defined in the document.
> > 
> > The corresponding builtin definitions in  GCC are:
> > 
> > __builtin_altivec_cfuged, __builtin_altivec_pdepd,
> > __builtin_altivec_pextd
> > 
> > which does not match the defined prototype in the document.
> 
> 
> These are scalar instructions, not vector, so they should not be
> using 
> any flavor of "V".  They should be using BU_P10_MISC_n, where n is
> the 
> number of arguments.

Yes, looks like that is those are the scalar versions.  I got them
mixed up with the vector definitions

   vector unsigned long long int vec_pdep()
   vector unsigned long long int vec_pext ()
   vector unsigned long long int vec_cfuge ()

I was thinking the __builtin_name() was also referring to the vector
versions.

So, given that there are separate definitions, it does appear that the
names are all consistent with the documentation.  Thanks Bill.

Carl



Re: [EXTERNAL] Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-17 Thread Bill Schmidt via Gcc-patches

On 8/17/20 12:13 PM, Carl Love wrote:

Segher, Bill, Peter:

On Fri, 2020-08-14 at 19:42 -0500, Segher Boessenkool wrote:

Do the names agree with the (future) documentation now?

Did not double check on the documentation.

Someone should...

Looking at the box document "Proposed function Prototypes for P10".

There are a number of builtins of the form "name()" which get expanded
to

  __builtin_altivec_name or __builtin_vsx_name.

But there does not appear to be any additional defined prototype for
the __builtin_altivec_name or __builtin_vsx_name in the document so we
don't need to worry about these prototypes as far as I can see.


There are three prototypes __builtin_cfuged, __builtin_pdepd,
__builtin_pextd defined in the document.

The corresponding builtin definitions in  GCC are:

   __builtin_altivec_cfuged, __builtin_altivec_pdepd,
__builtin_altivec_pextd

which does not match the defined prototype in the document.



These are scalar instructions, not vector, so they should not be using 
any flavor of "V".  They should be using BU_P10_MISC_n, where n is the 
number of arguments.


Bill



I don't see any defines in gcc/config/rs6000 that would map
__builtin_name to __builtin_altivec_name so these three appear to be
unsupported as far as I can see.  I assume adding

   #define __builtin_name  __builtin_altivec_name

to gcc/config/rs6000/altivec.h would be the easiest way to define the
prototypes from the document.  I can add the defines if you think that
is the correct fix.  Please let me know.


The MMA related builtins at the end of the document appear to have the
proper define BU_MMA_# macro expansions to generate the defined
prototype names.


Looking at the builtin definitions in box for RFC 2608, RFC 2609, RFC
2629 the builtins are all of the form name() so I don't see any issues
with the internal GCC name changes for the builtins in these documents.

   Carl



Re: [PATCH] x86_64: PR rtl-optimization/92180: class_likely_spilled vs. cant_combine_insn.

2020-08-17 Thread Segher Boessenkool
Hi!

On Mon, Aug 17, 2020 at 01:06:10PM +0200, Uros Bizjak wrote:
> On Mon, Aug 17, 2020 at 12:42 PM Roger Sayle  
> wrote:
> > (insn 14 7 15 2 (set (reg/i:SI 0 ax)
> > (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal}
> >  (expr_list:REG_DEAD (reg:DI 84)
> > (nil)))
> >
> > Normally, combine/simplify-rtx would notice that insns 6 and 7
> > (which update highpart bits) are unnecessary as the final insn 14
> > only requires to lowpart bits.  The complication is that insn 14
> > sets a hard register in targetm.class_likely_spilled_p which
> > prevents combine from performing its simplifications, and removing
> > the redundant instructions.

> I think that fwprop interferes with recent change to combine, where
> combine won't propagate hard registers anymore.

It won't propagate move insns from a hard non-fixed register to a
pseudo into other insns, yeah.  But that does not apply here?

> So, following that
> change, there is no point for fwprop to create instructions that
> combine won't be able to process. Alternatively, perhaps fwprop should
> be prevented from propagating likely_spilled hard registers?
> 
> Let's ask Segher for his opinion.

I have no opinion about class_likely_spilled_p; it is just a gross
target hack as far as I can see.  (I wonder how much of that is still
useful with LRA?)

Maybe combine could move return values in a hard reg through a pseudo?
So pretty much the same as make_more_copies, but the other way around.
You'll get the copy to a pseudo (which is in SImode here) as a separate
insn that combines with the previous insns fine, and RA will give the
pseudo the same hard register in all cases where that is beneficial.


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-08-17 Thread GT via Gcc-patches
‐‐‐ Original Message ‐‐‐
On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool 
 wrote:

> Hi!
>
> This is about the Power binding to some OpenMP API, right? It has
> nothing to do with "vector" or "ABI" -- we have vectors already, and
> we have ABIs already, more than enough of each.
>
> It is very very VERY hard to review this without being told the proper
> setting here.
>

What this is about:

David Edelsohn wanted to have new library functions, one for each of these 6 
single-precision functions:
sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
sin, cos, sincos, exp, log, and pow.

For the single-precision functions, the corresponding new functions would 
compute 4 results
simulatneously. For the double-precision functions, the new ones would compute 
2 results
simultaneously.

x86_64 has already done something very similar so I thought I would adapt as 
much of their
documentation and implementation as I could for PPC64.

Let's start with that. Comments so far?

Bert.


Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-17 Thread François Dumont via Gcc-patches

Hi

    Here is the new proposal.

    As we can't remove template parameters I simply restore those that 
I tried to pass differently _H2 and _ExtractKey, so eventually I only 
remove usage of _Hash which I renamed in _Unused. Maybe I can keep the 
doc about it in hashtable.h and just add a remark saying that it is now 
unused.


    For _RangeHash, formerly _H2, and _ExtractKey I just stop 
maintaining any storage. When we need those I always use a value 
initialized instance. I kind of prefer the value initialization syntax 
because you can't confuse it with a function call but let me know if it 
is wrong and I should use _ExtractKey() or _RangeHash(). I also add some 
static assertions about those types regarding their noexcept qualifications.


    I also included in this patch the few changes left from [Hashtable 
0/6] which are mostly _M_insert_unique_node and _M_insert_multi_node 
signature cleanup as the key part can be extracted from the inserted node.


    Tested under Linux x86_64, ok to commit ?

François

On 06/08/20 11:27 am, Jonathan Wakely wrote:

On 06/08/20 08:35 +0200, François Dumont wrote:

On 17/07/20 1:35 pm, Jonathan Wakely wrote:

I really like the general idea of getting rid of some of the
complexity and not supporting infinite customization. But we can do
that without changing mangled names of the _Hashtable specialiations.



I didn't thought we need to keep abi compatibility for extensions.


These aren't extensions though, they're part of std::unordered_map
etc.

Just because something like _Vector_base is an internal type rather
than something defined in the standard doesn't mean we can just change
its ABI, because that would change the ABI of std::vector. It the same
here.

Changing _Hashtable affects all users of std::unordered_map etc.




diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 7b772a475e3..1ba32a3c7e2 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -69,21 +69,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  and returns a bool-like value that is true if the two objects
*  are considered equal.
*
-   *  @tparam _H1  The hash function. A unary function object with
+   *  @tparam _Hash  The hash function. A unary function object with
*  argument type _Key and result type size_t. Return values should
*  be distributed over the entire range [0, numeric_limits:::max()].
*
-   *  @tparam _H2  The range-hashing function (in the terminology of
+   *  @tparam _RangeHash  The range-hashing function (in the terminology of
*  Tavori and Dreizin).  A binary function object whose argument
*  types and result type are all size_t.  Given arguments r and N,
*  the return value is in the range [0, N).
*
-   *  @tparam _Hash  The ranged hash function (Tavori and Dreizin). A
-   *  binary function whose argument types are _Key and size_t and
-   *  whose result type is size_t.  Given arguments k and N, the
-   *  return value is in the range [0, N).  Default: hash(k, N) =
-   *  h2(h1(k), N).  If _Hash is anything other than the default, _H1
-   *  and _H2 are ignored.
+   *  @tparam _Unused  Not used.
*
*  @tparam _RehashPolicy  Policy class with three members, all of
*  which govern the bucket count. _M_next_bkt(n) returns a bucket
@@ -91,9 +86,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  bucket count appropriate for an element count of n.
*  _M_need_rehash(n_bkt, n_elt, n_ins) determines whether, if the
*  current bucket count is n_bkt and the current element count is
-   *  n_elt, we need to increase the bucket count.  If so, returns
-   *  make_pair(true, n), where n is the new bucket count.  If not,
-   *  returns make_pair(false, )
+   *  n_elt, we need to increase the bucket count for n_ins insertions.
+   *  If so, returns make_pair(true, n), where n is the new bucket count. If
+   *  not, returns make_pair(false, )
*
*  @tparam _Traits  Compile-time class with three boolean
*  std::integral_constant members:  __cache_hash_code, __constant_iterators,
@@ -168,19 +163,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   template
 class _Hashtable
 : public __detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal,
-   _H1, _H2, _Hash, _Traits>,
+   _Hash, _RangeHash, _Unused, _Traits>,
   public __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
- _H1, _H2, _Hash, _RehashPolicy, _Traits>,
+ _Hash, _RangeHash, _Unused,
+ _RehashPolicy, _Traits>,
   public __detail::_Insert<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-			   _H1, _H2, _Hash, _RehashPolicy, _Traits>,
+			   _Hash, _RangeHash, _Unused,
+			   _RehashPolicy, _Traits>,
   public __detail::_Rehash_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-_H1, _H2, _Hash, _RehashPolicy, _Traits>,
+_Hash, _RangeHash, _Unused,
+_RehashPolicy, _Traits>,
   public 

RE: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-17 Thread Carl Love via Gcc-patches
Segher, Bill, Peter:

On Fri, 2020-08-14 at 19:42 -0500, Segher Boessenkool wrote:
> > > Do the names agree with the (future) documentation now?
> > 
> > Did not double check on the documentation.
> 
> Someone should...

Looking at the box document "Proposed function Prototypes for P10".

There are a number of builtins of the form "name()" which get expanded
to

 __builtin_altivec_name or __builtin_vsx_name.

But there does not appear to be any additional defined prototype for
the __builtin_altivec_name or __builtin_vsx_name in the document so we
don't need to worry about these prototypes as far as I can see.


There are three prototypes __builtin_cfuged, __builtin_pdepd,
__builtin_pextd defined in the document.

The corresponding builtin definitions in  GCC are:

  __builtin_altivec_cfuged, __builtin_altivec_pdepd,
__builtin_altivec_pextd

which does not match the defined prototype in the document.  

I don't see any defines in gcc/config/rs6000 that would map
__builtin_name to __builtin_altivec_name so these three appear to be
unsupported as far as I can see.  I assume adding 

  #define __builtin_name  __builtin_altivec_name

to gcc/config/rs6000/altivec.h would be the easiest way to define the
prototypes from the document.  I can add the defines if you think that
is the correct fix.  Please let me know.


The MMA related builtins at the end of the document appear to have the
proper define BU_MMA_# macro expansions to generate the defined
prototype names.


Looking at the builtin definitions in box for RFC 2608, RFC 2609, RFC
2629 the builtins are all of the form name() so I don't see any issues
with the internal GCC name changes for the builtins in these documents.

  Carl 



Re: [Patch, fortran] PRs 96100 and 96101 - Problems with string lengths of array constructors

2020-08-17 Thread Andre Vehreschild
Hi Paul,

> The fix for PR9601 is rather trivial and is the last chunk of the patch.
> Finding the fix for PR96100 took a silly amount of time but it now looks
> rather obvious. Trying to evaluate the string length by calling
> gfc_conv_expr_descriptor, when this function is already failing to find it
> is kind of doomed to failure :-) Therefore, gfc_conv_expr is used with
> tse.descriptor_only set. This has the effect of ignoring trailing array
> references and making use of gfc_conv_component_ref's being able to extract
> the hidden string length for deferred length components. Finally, the
> string length of the first element in the array constructor is set if this
> is a deferred length component.

The patch seems to be effective. Albeit I don't understand why, when it is a
parenthesis op, you deduce that this has to be the string length?

The explanation for the second fix left me completely lost.

> Regtests OK on FC31/x86_64 - OK for master?

Tests ok with no regression. Therefore ok by me.

Regards,
Andre
>
> Paul
>
> This patch fixes PR96100 and PR96101 by making some minor changes to
> the evaluation of string lengths for gfc_conv_expr_descriptor.
>
> 2020-08-13  Paul Thomas  
>
> gcc/fortran
> PR fortran/96100
> PR fortran/96101
> * trans-array.c (get_array_charlen): Tidy up the evaluation of
> the string length for array constructors. Avoid trailing array
> references. Ensure string lengths of deferred length components
> are set. For parentheses operator apply string  length to both
> the primary expression and the enclosed expression.
>
> gcc/testsuite/
> PR fortran/96100
> PR fortran/96101
> * gfortran.dg/char_length_23.f90: New test.


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: PING: Fwd: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-17 Thread Andrew MacLeod via Gcc-patches

On 8/17/20 6:04 AM, Aldy Hernandez wrote:



On 8/14/20 7:16 PM, Andrew MacLeod wrote:

On 8/14/20 12:05 PM, Aldy Hernandez wrote:

I made some minor changes to the function comments.

gcc/ChangeLog:

* vr-values.c (check_for_binary_op_overflow): Change type of store
to range_query.
(vr_values::adjust_range_with_scev): Abstract most of the code...
(range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
(simplify_using_ranges::simplify_using_ranges): Change type of 
store

to range_query.
* vr-values.h (class range_query): New.
(class simplify_using_ranges): Use range_query.
(class vr_values): Add OVERRIDE to get_value_range.
(range_of_var_in_loop): New.
---
 gcc/vr-values.c | 150 ++--
 gcc/vr-values.h |  23 ++--
 2 files changed, 88 insertions(+), 85 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9002d87c14b..5b7bae3bfb7 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,

    overflow.  */

 static bool
-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
   enum tree_code subcode, tree type,
   tree op0, tree op1, bool *ovf)
 {
@@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code 
comp, const value_range *vr,


   gcc_unreachable ();
 }
-/* Given a range VR, a LOOP and a variable VAR, determine whether it
-   would be profitable to adjust VR using scalar evolution information
-   for VAR.  If so, update VR with the new limits.  */
+
+/* Given a VAR in STMT within LOOP, determine the range of the
+   variable and store it in VR.  If no range can be determined, the
+   resulting range will be set to VARYING.  */

 void
-vr_values::adjust_range_with_scev (value_range_equiv *vr, class 
loop *loop,

-   gimple *stmt, tree var)
+range_of_var_in_loop (irange *vr, range_query *query,
+  class loop *loop, gimple *stmt, tree var)
 {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, min, max, type;
   enum ev_direction dir;

-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-    return;
-


IIUC, you've switched to using the new API, so the bounds calls will 
basically turn and ANTI range into a varying , making [lbound,ubound] 
will be [MIN, MAX] ?
so its effectively a no-op, except we will not punt on getting a 
range when VR is an anti range anymore.. so that goodness...


Yes.




chrec = instantiate_parameters (loop, analyze_scalar_evolution 
(loop, var));


   /* Like in PR19590, scev can return a constant function. */
@@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev 
(value_range_equiv *vr, class loop *loop,

 }

   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
-    return;
+    {
+  vr->set_varying (TREE_TYPE (var));
+  return;
+    }


Im seeing a lot of this pattern...
Maybe we should set vr to varying upon entry to the function as the 
default return value.. then we can just return like it did before in 
all those places.


Better yet, since this routine doesn't "update" anymore and simply 
returns a range, maybe it could instead return a boolean if it finds 
a range rather than the current behaviour...

then those simply become

+    return false;

We won't have to intersect at the caller if we don't need to, and its 
useful information at other points to know a range was calculated 
without having to see if varying_p () came back from the call.

ie, we'd the usage pattern would then be

value_range_equiv r;
if (range_of_var_in_loop (, this, loop, stmt, var))
    vr->intersect ();

This is the pattern we use throughout the ranger.


Done.






   init = initial_condition_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-    init = tem;
+  if (TREE_CODE (init) == SSA_NAME)
+    query->get_value_range (init, stmt)->singleton_p ();
   step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-    step = tem;
+  if (TREE_CODE (step) == SSA_NAME)
+    query->get_value_range (step, stmt)->singleton_p ();


If I read this correctly, we get values for init and step... and if 
they are SSA_NAMES, then we query ranges, otherwise use what we got 
back.. So that would seem to be the same behaviour as before then..

Perhaps a comment is warranted? I had to read it a few times :-)


Indeed.  I am trying to do too much in one line.  I've added a comment.






   /* If STEP is symbolic, we can't know whether INIT will be the
  minimum or maximum value in the range.  Also, unless INIT is
@@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev 
(value_range_equiv *vr, class loop *loop,

   if 

Re: [PATCH] rs6000: unaligned VSX in memcpy/memmove expansion

2020-08-17 Thread will schmidt via Gcc-patches
On Fri, 2020-08-14 at 17:59 -0500, Aaron Sawdey via Gcc-patches wrote:

Hi,

> This patch adds a few new instructions to inline expansion of
> memcpy/memmove. Generation of all these is controlled by

s/is/are/ ?

> the option -mblock-ops-unaligned-vsx which is set on by default if the
> target has TARGET_EFFICIENT_UNALIGNED_VSX.
>  * unaligned vsx load/store (V2DImode)
>  * unaligned vsx pair load/store (POImode) which is also controlled
>by -mblock-ops-vector-pair in case it is not wanted at some point.
>The default for this option is also for it to be on if the target has
>TARGET_EFFICIENT_UNALIGNED_VSX.

'this option' meaing the -mblock-ops-vecftor-pair option? 


>  * unaligned vsx lxvl/stxvl but generally only to do the remainder
>of a copy/move we stated with some vsx loads/stores, and also prefer
>to use lb/lh/lw/ld if the remainder is 1/2/4/8 bytes.
> 
> Testing of this is actually accomplished by gcc.dg/memcmp-1.c which does
> two memcpy() for each memcmp(). If the memcpy() calls don't do the right
> thing then the memcmp() will fail unexpectedly.



> 
> Regstrap passed on ppc64le power9 and the memcmp-1.c test passes on
> power10 simulator, ok for trunk?
> 
> Thanks!
> Aaron
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-string.c (gen_lxvl_stxvl_move):
>   Helper function.
>   (expand_block_move): Add lxvl/stxvl, vector pair, and
>   unaligned VSX.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   Default value for -mblock-ops-vector-pair.
>   * config/rs6000/rs6000.opt: Add -mblock-ops-vector-pair.
> ---
>  gcc/config/rs6000/rs6000-string.c | 105 ++
>  gcc/config/rs6000/rs6000.c|  14 +++-
>  gcc/config/rs6000/rs6000.opt  |   4 ++
>  3 files changed, 107 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-string.c 
> b/gcc/config/rs6000/rs6000-string.c
> index c35d93180ca..ce6db2ba14d 100644
> --- a/gcc/config/rs6000/rs6000-string.c
> +++ b/gcc/config/rs6000/rs6000-string.c
> @@ -2708,6 +2708,36 @@ gen_lvx_v4si_move (rtx dest, rtx src)
>  return gen_altivec_lvx_v4si_internal (dest, src);
>  }
> 
> +static rtx
> +gen_lxvl_stxvl_move (rtx dest, rtx src, int length)
> +{
> +  gcc_assert (MEM_P (dest) ^ MEM_P (src));
> +  gcc_assert (GET_MODE (dest) == V16QImode && GET_MODE (src) == V16QImode);
> +  gcc_assert (length <= 16);
> +
> +  bool is_store = MEM_P (dest);
> +
> +  /* If the address form is not a simple register, make it so.  */

Possibly just cosmetic - Would ' /*  Force dest and src to be simple
registers if necessary.  */' make more sense?

> +  if (is_store)
> +{
> +  dest = XEXP (dest, 0);
> +  if (!REG_P (dest))
> + dest = force_reg (Pmode, dest);
> +}
> +  else
> +{
> +  src = XEXP (src, 0);
> +  if (!REG_P (src))
> + src = force_reg (Pmode, src);
> +}



> +
> +  rtx len = force_reg (DImode, gen_int_mode (length, DImode));
> +  if (is_store)
> +return gen_stxvl (src, dest, len);
> +  else
> +return  gen_lxvl (dest, src, len);
> +}
> +
>  /* Expand a block move operation, and return 1 if successful.  Return 0
> if we should let the compiler generate normal code.
> 

ok

> @@ -2750,18 +2780,57 @@ expand_block_move (rtx operands[], bool might_overlap)
>if (bytes > rs6000_block_move_inline_limit)
>  return 0;
> 
> +  int orig_bytes = bytes;
>for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
>  {
>union {
> - rtx (*movmemsi) (rtx, rtx, rtx, rtx);
>   rtx (*mov) (rtx, rtx);
> + rtx (*movlen) (rtx, rtx, int);
>} gen_func;
>machine_mode mode = BLKmode;
>rtx src, dest;
> -
> -  /* Altivec first, since it will be faster than a string move
> -  when it applies, and usually not significantly larger.  */
> -  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
> +  bool move_with_length = false;
> +
> +  /* Use POImode for paired vsx load/store.  Use V2DI for single
> +  unaligned vsx load/store, for consistency with what other
> +  expansions (compare) already do, and so we can use lxvd2x on
> +  p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, vsx
> +  with length < 16 (if allowed), then smaller gpr
> +  load/store.  */

s/vsx/VSX/
s/smaller// ?


> +
> +  if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
> +   && TARGET_BLOCK_OPS_VECTOR_PAIR
> +   && bytes >= 32
> +   && (align >= 256 || !STRICT_ALIGNMENT))
> + {
> +   move_bytes = 32;
> +   mode = POImode;
> +   gen_func.mov = gen_movpoi;
> + }
> +  else if (TARGET_POWERPC64 && TARGET_BLOCK_OPS_UNALIGNED_VSX
> +&& VECTOR_MEM_VSX_P (V2DImode)
> +&& bytes >= 16 && (align >= 128 || !STRICT_ALIGNMENT))
> + {
> +   move_bytes = 16;
> +   mode = V2DImode;
> +   gen_func.mov = gen_vsx_movv2di_64bit;
> + }
> +  else if 

Re: [PATCH v2] C-SKY: Support -mfloat-abi=hard.

2020-08-17 Thread Xianmiao Qu

Hi Jojo,


On 8/17/20 7:09 PM, Jojo R wrote:

diff --git a/gcc/config/csky/csky.c b/gcc/config/csky/csky.c
index 7ba3ed3..b71291a 100644
--- a/gcc/config/csky/csky.c
+++ b/gcc/config/csky/csky.c
@@ -328,6 +328,16 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
  {
builtin_define ("__csky_hard_float__");
builtin_define ("__CSKY_HARD_FLOAT__");
+  if (TARGET_HARD_FLOAT_ABI)
+{
+  builtin_define ("__csky_hard_float_abi__");
+  builtin_define ("__CSKY_HARD_FLOAT_ABI__");
+}
+  if (TARGET_SINGLE_FPU)
+{
+  builtin_define ("__csky_hard_float_fpu_sf__");
+  builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
+}
  }


These is one more thing you shoud pay attention to, if the spaces number 
reaches 8 at begining of a line, you should use tab instead of 8 spaces.



Thanks,

Xianmiao



Re: [PATCH] improve memcmp and memchr constant folding (PR 78257)

2020-08-17 Thread Jeff Law via Gcc-patches
On Sat, 2020-08-15 at 16:19 +0200, Christophe Lyon wrote:
> Hi Martin,
> 
> 
> On Sat, 15 Aug 2020 at 01:14, Martin Sebor via Gcc-patches
>  wrote:
> > On 8/13/20 11:44 AM, Martin Sebor wrote:
> > > On 8/13/20 10:21 AM, Jeff Law wrote:
> > > > On Fri, 2020-07-31 at 17:55 -0600, Martin Sebor via Gcc-patches wrote:
> > > > > The folders for these functions (and some others) call c_getsr
> > > > > which relies on string_constant to return the representation of
> > > > > constant strings.  Because the function doesn't handle constants
> > > > > of other types, including aggregates, memcmp or memchr calls
> > > > > involving those are not folded when they could be.
> > > > > 
> > > > > The attached patch extends the algorithm used by string_constant
> > > > > to also handle constant aggregates involving elements or members
> > > > > of the same types as native_encode_expr.  (The change restores
> > > > > the empty initializer optimization inadvertently disabled in
> > > > > the fix for pr96058.)
> > > > > 
> > > > > To avoid accidentally misusing either string_constant or c_getstr
> > > > > with non-strings I have introduced a pair of new functions to get
> > > > > the representation of those: byte_representation and getbyterep.
> > > > > 
> > > > > Tested on x86_64-linux.
> > > > > 
> > > > > Martin
> > > > > PR tree-optimization/78257 - missing memcmp optimization with
> > > > > constant arrays
> > > > > 
> > > > > gcc/ChangeLog:
> > > > > 
> > > > > PR middle-end/78257
> > > > > * builtins.c (expand_builtin_memory_copy_args): Rename called
> > > > > function.
> > > > > (expand_builtin_stpcpy_1): Remove argument from call.
> > > > > (expand_builtin_memcmp): Rename called function.
> > > > > (inline_expand_builtin_bytecmp): Same.
> > > > > * expr.c (convert_to_bytes): New function.
> > > > > (constant_byte_string): New function (formerly string_constant).
> > > > > (string_constant): Call constant_byte_string.
> > > > > (byte_representation): New function.
> > > > > * expr.h (byte_representation): Declare.
> > > > > * fold-const-call.c (fold_const_call): Rename called function.
> > > > > * fold-const.c (c_getstr): Remove an argument.
> > > > > (getbyterep): Define a new function.
> > > > > * fold-const.h (c_getstr): Remove an argument.
> > > > > (getbyterep): Declare a new function.
> > > > > * gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee.
> > > > > (gimple_fold_builtin_string_compare): Same.
> > > > > (gimple_fold_builtin_memchr): Same.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > > 
> > > > > PR middle-end/78257
> > > > > * gcc.dg/memchr.c: New test.
> > > > > * gcc.dg/memcmp-2.c: New test.
> > > > > * gcc.dg/memcmp-3.c: New test.
> > > > > * gcc.dg/memcmp-4.c: New test.
> > > > > 
> > > > > diff --git a/gcc/expr.c b/gcc/expr.c
> > > > > index a150fa0d3b5..a124df54655 100644
> > > > > --- a/gcc/expr.c
> > > > > +++ b/gcc/expr.c
> > > > > @@ -11594,15 +11594,103 @@ is_aligning_offset (const_tree offset,
> > > > > const_tree exp)
> > > > > /* This must now be the address of EXP.  */
> > > > > return TREE_CODE (offset) == ADDR_EXPR && TREE_OPERAND (offset,
> > > > > 0) == exp;
> > > > >   }
> > > > > -
> > > > > -/* Return the tree node if an ARG corresponds to a string constant
> > > > > or zero
> > > > > -   if it doesn't.  If we return nonzero, set *PTR_OFFSET to the
> > > > > (possibly
> > > > > -   non-constant) offset in bytes within the string that ARG is
> > > > > accessing.
> > > > > -   If MEM_SIZE is non-zero the storage size of the memory is 
> > > > > returned.
> > > > > -   If DECL is non-zero the constant declaration is returned if
> > > > > available.  */
> > > > > -tree
> > > > > -string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree
> > > > > *decl)
> > > > > +/* If EXPR is a constant initializer (either an expression or
> > > > > CONSTRUCTOR),
> > > > > +   attempt to obtain its native representation as an array of
> > > > > nonzero BYTES.
> > > > > +   Return true on success and false on failure (the latter without
> > > > > modifying
> > > > > +   BYTES).  */
> > > > > +
> > > > > +static bool
> > > > > +convert_to_bytes (tree type, tree expr, vec *bytes)
> > > > > +{
> > > > > +  if (TREE_CODE (expr) == CONSTRUCTOR)
> > > > > +{
> > > > > +  /* Set to the size of the CONSTRUCTOR elements.  */
> > > > > +  unsigned HOST_WIDE_INT ctor_size = bytes->length ();
> > > > > +
> > > > > +  if (TREE_CODE (type) == ARRAY_TYPE)
> > > > > +{
> > > > > +  tree val, idx;
> > > > > +  tree eltype = TREE_TYPE (type);
> > > > > +  unsigned HOST_WIDE_INT elsize =
> > > > > +tree_to_uhwi (TYPE_SIZE_UNIT (eltype));
> > > > > +  unsigned HOST_WIDE_INT i, last_idx = HOST_WIDE_INT_M1U;
> > > > > +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, idx, val)
> > > > > +{
> > > > > +  /* Append zeros for 

Re: [PATCH] Add support for putting jump table into relocation read-only section

2020-08-17 Thread Segher Boessenkool
Hi!

On Mon, Aug 17, 2020 at 10:28:31AM +0800, HAO CHEN GUI wrote:
> >For the reloc,  my understanding is the jump table needs to be 
> >relocated if it's a non-relative jump table and PIC flag is set at the 
> >same time.

Yes, I did say the *existing* code seems sub-optimal, too :-)

> >According to the slice of code in stmt.c,  the non-relative jump table 
> >is created with PIC flag set when CASE_VECTOR_PC_RELATIVE is false, 
> >flag_pic is true and targetm.asm_out.generate_pic_addr_diff_vec is 
> >false. So I set the reloc to
> >
> >reloc = (! CASE_VECTOR_PC_RELATIVE && flag_pic &&
> >   ! targetm.asm_out.generate_pic_addr_diff_vec ()) ? 1 
> >: 0;
> >
> >The funcation_rodata_section is not only for jump tables. It's no 
> >relro in other cases. I am not sure if it's suitable to put selecting 
> >relro section in it. Of course, I can create a separate function for 
> >section selection of jump table and send its output to 
> >funcation_rodata_section.

.data.rel.ro is just another kind of .rodata, one that *can* be
relocated.  So when we use it, fPIC or not doesn't matter.  Also, we can
just use the existing rodata functions for generating .data.rel.ro, and
it should simplify all code even.

> -@deftypefn {Target Hook} {section *} TARGET_ASM_FUNCTION_RODATA_SECTION 
> (tree @var{decl})
> -Return the readonly data section associated with
> +@deftypefn {Target Hook} {section *} TARGET_ASM_FUNCTION_RODATA_SECTION 
> (tree @var{decl}, bool @var{section_reloc})
> +Return the readonly or reloc readonly data section associated with

Should this take the 2-bit int "reloc" field like other functions,
instead of this bool?


Segher


[committed] libstdc++: Remove inheritance from elements in std::tuple

2020-08-17 Thread Jonathan Wakely via Gcc-patches
This fixes a number of std::tuple bugs by no longer making use of the
empty base-class optimization. By using the C++20 [[no_unique_address]]
attribute we can always store the element as a data member, while still
compressing the layout of tuples containing empty types.

Since we no longer use inheritance we could also apply the compression
optimization for final types and for tuples of tuples, but doing so
would be an ABI break.

Using [[no_unique_address]] more liberally for the unstable std::__8
configuration is left for a later date. There may be reasons not to
apply the attribute unconditionally, e.g. see the discussion about
guaranteed elision in PR 94062.

libstdc++-v3/ChangeLog:

PR libstdc++/55713
PR libstdc++/71096
PR libstdc++/93147
* include/std/tuple [__has_cpp_attribute(no_unique_address)]
(_Head_base): New definition of the partial
specialization, using [[no_unique_address]] instead of
inheritance.
* testsuite/libstdc++-prettyprinters/48362.cc: Adjust expected
output.
* testsuite/20_util/tuple/comparison_operators/93147.cc: New test.
* testsuite/20_util/tuple/creation_functions/55713.cc: New test.
* testsuite/20_util/tuple/element_access/71096.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 91e6226f880b048275a7ceedef716e159c7cefd9
Author: Jonathan Wakely 
Date:   Fri Aug 7 17:13:56 2020

libstdc++: Remove inheritance from elements in std::tuple

This fixes a number of std::tuple bugs by no longer making use of the
empty base-class optimization. By using the C++20 [[no_unique_address]]
attribute we can always store the element as a data member, while still
compressing the layout of tuples containing empty types.

Since we no longer use inheritance we could also apply the compression
optimization for final types and for tuples of tuples, but doing so
would be an ABI break.

Using [[no_unique_address]] more liberally for the unstable std::__8
configuration is left for a later date. There may be reasons not to
apply the attribute unconditionally, e.g. see the discussion about
guaranteed elision in PR 94062.

libstdc++-v3/ChangeLog:

PR libstdc++/55713
PR libstdc++/71096
PR libstdc++/93147
* include/std/tuple [__has_cpp_attribute(no_unique_address)]
(_Head_base): New definition of the partial
specialization, using [[no_unique_address]] instead of
inheritance.
* testsuite/libstdc++-prettyprinters/48362.cc: Adjust expected
output.
* testsuite/20_util/tuple/comparison_operators/93147.cc: New test.
* testsuite/20_util/tuple/creation_functions/55713.cc: New test.
* testsuite/20_util/tuple/element_access/71096.cc: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 0dc11768a90..d4a35f0fe7f 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -73,6 +73,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool = __empty_not_final<_Head>::value>
 struct _Head_base;
 
+#if __has_cpp_attribute(no_unique_address)
+  template
+struct _Head_base<_Idx, _Head, true>
+{
+  constexpr _Head_base()
+  : _M_head_impl() { }
+
+  constexpr _Head_base(const _Head& __h)
+  : _M_head_impl(__h) { }
+
+  constexpr _Head_base(const _Head_base&) = default;
+  constexpr _Head_base(_Head_base&&) = default;
+
+  template
+   constexpr _Head_base(_UHead&& __h)
+   : _M_head_impl(std::forward<_UHead>(__h)) { }
+
+  _GLIBCXX20_CONSTEXPR
+  _Head_base(allocator_arg_t, __uses_alloc0)
+  : _M_head_impl() { }
+
+  template
+   _Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
+   : _M_head_impl(allocator_arg, *__a._M_a) { }
+
+  template
+   _Head_base(allocator_arg_t, __uses_alloc2<_Alloc> __a)
+   : _M_head_impl(*__a._M_a) { }
+
+  template
+   _GLIBCXX20_CONSTEXPR
+   _Head_base(__uses_alloc0, _UHead&& __uhead)
+   : _M_head_impl(std::forward<_UHead>(__uhead)) { }
+
+  template
+   _Head_base(__uses_alloc1<_Alloc> __a, _UHead&& __uhead)
+   : _M_head_impl(allocator_arg, *__a._M_a, std::forward<_UHead>(__uhead))
+   { }
+
+  template
+   _Head_base(__uses_alloc2<_Alloc> __a, _UHead&& __uhead)
+   : _M_head_impl(std::forward<_UHead>(__uhead), *__a._M_a) { }
+
+  static constexpr _Head&
+  _M_head(_Head_base& __b) noexcept { return __b._M_head_impl; }
+
+  static constexpr const _Head&
+  _M_head(const _Head_base& __b) noexcept { return __b._M_head_impl; }
+
+  [[no_unique_address]] _Head _M_head_impl;
+};
+#else
   template
 struct _Head_base<_Idx, _Head, true>
 : public _Head
@@ -119,6 +171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr const _Head&
   

[PATCH 4/X] libsanitizer: options: Add hwasan flags and argument parsing

2020-08-17 Thread Matthew Malcomson
These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.

The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel.  This flag has defaults that allow compiling KASAN with tags as
it is currently implemented.
These defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Stack tagging in the kernel is a future aim, I don't know of any reason
it would not work, but this has not yet been tested.

We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.

gcc/ChangeLog:

* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/invoke.texi: Document hwasan command line flags.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.opt (hwasan--instrument-stack): New
(hwasan-random-frame-tag): New
(hwasan-instrument-allocas): New
(hwasan-instrument-reads): New
(hwasan-instrument-writes): New
(hwasan-instrument-mem-intrinsics): New
* target.def (HOOK_PREFIX): Add new hook.
(can_tag_addresses): Add new hook under memtag prefix.
* targhooks.c (default_memtag_can_tag_addresses): New.
* targhooks.h (default_memtag_can_tag_addresses): New decl.
* toplev.c (process_options): Ensure hwasan only on TBI
architectures.

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_hwaddress_attribute): New
attribute.



### Attachment also inlined for ease of reply###


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 
372148315389db6671dfd943fd1a68670fcb1cbc..f8bf165aa48b5709c26f4e8245e5ab929b44fca6
 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -54,6 +54,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_hwaddress_attribute (tree *, tree, tree,
+   int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_hwaddress",0, 0, true, false, false, false,
+ handle_no_sanitize_hwaddress_attribute, NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -946,6 +950,22 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_hwaddress" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_sanitize_hwaddress_attribute (tree *node, tree name, tree, int,
+ bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+add_no_sanitize_value (*node, SANITIZE_HWADDRESS);
+
+  return NULL_TREE;
+}
+
 /* Handle a "no_sanitize_thread" attribute; arguments as in
struct 

[PATCH 6/X] libsanitizer: Add hwasan pass and associated gimple changes

2020-08-17 Thread Matthew Malcomson
There are four main features to this change:

1) Check pointer tags match address tags.

In the new `hwasan` pass we put HWASAN_CHECK internal functions before
all memory accesses to check that tags in the pointer being used match
the tag stored in shadow memory for the memory region being used.

These internal functions are expanded into actual checks in the sanopt
pass that happens just before expansion into RTL.

We use the same mechanism that currently inserts ASAN_CHECK internal
functions to insert the new HWASAN_CHECK functions.

2) Instrument known builtin function calls.

Handle all builtin functions that we know use memory accesses.
This commit uses the machinery added for ASAN to identify builtin
functions that access memory.

The main differences between the approaches for HWASAN and ASAN are:
 - libhwasan intercepts much less builtin functions.
 - Alloca needs to be transformed differently (instead of adding
   redzones it needs to tag shadow memory and return a tagged pointer).
 - stack_restore needs to untag the shadow stack between the current
   position and where it's going.
 - `noreturn` functions can not be handled by simply unpoisoning the
   entire shadow stack -- there is no "always valid" tag.
   (exceptions and things such as longjmp need to be handled in a
   different way).

For hardware implemented checking (such as AArch64's memory tagging
extension) alloca and stack_restore will need to be handled by hooks in
the backend rather than transformation at the gimple level.  This will
allow architecture specific handling of such stack modifications.

3) Introduce HWASAN block-scope poisoning

Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison
variables on entry/exit of a block.

In order to simply use the exact same machinery we're using the same
internal functions until the SANOPT pass.  This means that all handling
of ASAN_MARK is the same.
This has the negative that the naming may be a little confusing, but a
positive that handling of the internal function doesn't have to be
duplicated for a function that behaves exactly the same but has a
different name.

gcc/ChangeLog:

* asan.c (asan_instrument_reads): New.
(asan_instrument_writes): New.
(asan_memintrin): New.
(handle_builtin_stack_restore): Account for HWASAN.
(handle_builtin_alloca): Account for HWASAN.
(get_mem_refs_of_builtin_call): Special case strlen for HWASAN.
(report_error_func): Assert not HWASAN.
(build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK.
(instrument_derefs): HWASAN does not tag globals.
(instrument_builtin_call): Use new helper functions.
(maybe_instrument_call): Don't instrument `noreturn` functions.
(initialize_sanitizer_builtins): Add new type.
(asan_expand_mark_ifn): Account for HWASAN.
(asan_expand_check_ifn): Assert never called by HWASAN.
(asan_expand_poison_ifn): Account for HWASAN.
(hwasan_instrument_reads): New.
(hwasan_instrument_writes): New.
(hwasan_memintrin): New.
(hwasan_instrument): New.
(hwasan_base): New.
(hwasan_check_func): New.
(hwasan_expand_check_ifn): New.
(hwasan_expand_mark_ifn): New.
(gate_hwasan): New.
(class pass_hwasan): New.
(make_pass_hwasan): New.
(class pass_hwasan_O0): New.
(make_pass_hwasan_O0): New.
* asan.h (hwasan_base): New decl.
(hwasan_expand_check_ifn): New decl.
(hwasan_expand_mark_ifn): New decl.
(gate_hwasan): New decl.
(enum hwasan_mark_flags): New.
(asan_intercepted_p): Always false for hwasan.
(asan_sanitize_use_after_scope): Account for HWASAN.
* builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New.
* gimple-pretty-print.c (dump_gimple_call_args): Account for
HWASAN.
* gimplify.c (asan_poison_variable): Account for HWASAN.
(gimplify_function_tree): Remove requirement of
SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in
`asan_sanitize_use_after_scope`.
* internal-fn.c (expand_HWASAN_CHECK): New.
(expand_HWASAN_CHOOSE_TAG): New.
(expand_HWASAN_MARK): New.
(expand_HWASAN_ALLOCA_UNPOISON): New.
* internal-fn.def (HWASAN_CHOOSE_TAG): New.
(HWASAN_CHECK): New.
(HWASAN_MARK): New.
(HWASAN_ALLOCA_UNPOISON): New.
* passes.def: Add hwasan and hwasan_O0 passes.
* sanitizer.def (BUILT_IN_HWASAN_LOAD1): New.
(BUILT_IN_HWASAN_LOAD2): New.
(BUILT_IN_HWASAN_LOAD4): New.
(BUILT_IN_HWASAN_LOAD8): New.
(BUILT_IN_HWASAN_LOAD16): New.
(BUILT_IN_HWASAN_LOADN): New.
(BUILT_IN_HWASAN_STORE1): New.
(BUILT_IN_HWASAN_STORE2): New.
(BUILT_IN_HWASAN_STORE4): New.
(BUILT_IN_HWASAN_STORE8): New.
(BUILT_IN_HWASAN_STORE16): New.
(BUILT_IN_HWASAN_STOREN): 

[PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-08-17 Thread Matthew Malcomson
Handling stack variables has three features.

1) Ensure HWASAN required alignment for stack variables

When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.

This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
object is aligned to ensure the start of any other data stored on the
stack is in a different granule.

This patch ensures the above by adding alignment requirements in
`align_local_variable` and forcing all stack variable allocation to be
deferred so that `expand_stack_vars` can ensure the stack pointer is
aligned before allocating any variable for the current frame.

2) Put tags into each stack variable pointer

Make sure that every pointer to a stack variable includes a tag of some
sort on it.

The way tagging works is:
  1) For every new stack frame, a random tag is generated.
  2) A base register is formed from the stack pointer value and this
 random tag.
  3) References to stack variables are now formed with RTL describing an
 offset from this base in both tag and value.

The random tag generation is handled by a backend hook.  This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag.  Using the stack
background is necessary for testing and bootstrap.  It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.

Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.

The tag offsets are also handled by a backend hook.

This patch also adds some macros defining how the HWASAN shadow memory
is stored and how a tag is stored in a pointer.

3) For each stack variable, tag and untag the shadow stack on function
   prologue and epilogue.

On entry to each function we tag the relevant shadow stack region for
each stack variable the tag to match the tag added to each pointer for
that variable.

This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce.  This instrumentation is done
in `compile_file`.

When exiting a function we need to ensure the shadow stack for this
function has no remaining tag.  Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.

Hence we ensure that the entire stack frame is cleared on function exit.

config/ChangeLog:

* bootstrap-hwasan.mk: Disable random frame tags for
stack-tagging during bootstrap.

gcc/ChangeLog:

* asan.c (hwasan_record_base): New function.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New function.
(hwasan_with_tag): New function.
(hwasan_tag_init): New function.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_tag): New.
(hwasan_extract_tag): New.
(hwasan_emit_prologue): New.
(hwasan_create_untagged_base): New.
(hwasan_finish_file): New.
(hwasan_ctor_statements): New variable.
(hwasan_sanitize_stack_p): New.
(hwasan_sanitize_p): New.
(hwasan_sanitize_allocas_p): New.
* asan.h (hwasan_record_base): New declaration.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New declaration.
(hwasan_with_tag): New declaration.
(hwasan_sanitize_stack_p): New declaration.
(hwasan_sanitize_allocas_p): New declaration.
(hwasan_tag_init): New declaration.
(hwasan_sanitize_p): New declaration.
(HWASAN_TAG_SIZE): New macro.
(HWASAN_TAG_GRANULE_SIZE): New macro.
(HWASAN_TAG_SHIFT_SIZE): New macro.
(HWASAN_SHIFT): New macro.
(HWASAN_SHIFT_RTX): New macro.
(HWASAN_STACK_BACKGROUND): New macro.
(hwasan_finish_file): New declaration.
(hwasan_current_tag): New declaration.
(hwasan_create_untagged_base): New declaration.
(hwasan_extract_tag): New declaration.
(hwasan_emit_prologue): New declaration.
* cfgexpand.c (struct stack_vars_data): Add information to
record hwasan variable stack offsets.
(expand_stack_vars): Ensure variables are offset from a tagged
base. Record offsets for hwasan. Ensure alignment.
(expand_used_vars): Call function to emit prologue, and get
untagging instructions for function exit.
(align_local_variable): Ensure alignment.
(defer_stack_allocation): Ensure all variables are deferred so
they can be handled by 

[PATCH 7/X] libsanitizer: Add tests

2020-08-17 Thread Matthew Malcomson
Adding hwasan tests.

Only interesting thing here is that we have to make sure the tagging mechanism
is deterministic to avoid flaky tests.

gcc/testsuite/ChangeLog:

* c-c++-common/hwasan/aligned-alloc.c: New test.
* c-c++-common/hwasan/alloca-array-accessible.c: New test.
* c-c++-common/hwasan/alloca-gets-different-tag.c: New test.
* c-c++-common/hwasan/alloca-outside-caught.c: New test.
* c-c++-common/hwasan/arguments.c: New test.
* c-c++-common/hwasan/arguments-1.c: New test.
* c-c++-common/hwasan/arguments-2.c: New test.
* c-c++-common/hwasan/arguments-3.c: New test.
* c-c++-common/hwasan/asan-pr63316.c: New test.
* c-c++-common/hwasan/asan-pr70541.c: New test.
* c-c++-common/hwasan/asan-pr78106.c: New test.
* c-c++-common/hwasan/asan-pr79944.c: New test.
* c-c++-common/hwasan/asan-rlimit-mmap-test-1.c: New test.
* c-c++-common/hwasan/bitfield-1.c: New test.
* c-c++-common/hwasan/bitfield-2.c: New test.
* c-c++-common/hwasan/builtin-special-handling.c: New test.
* c-c++-common/hwasan/check-interface.c: New test.
* c-c++-common/hwasan/halt_on_error-1.c: New test.
* c-c++-common/hwasan/heap-overflow.c: New test.
* c-c++-common/hwasan/hwasan-poison-optimisation.c: New test.
* c-c++-common/hwasan/hwasan-thread-access-parent.c: New test.
* c-c++-common/hwasan/hwasan-thread-basic-failure.c: New test.
* c-c++-common/hwasan/hwasan-thread-clears-stack.c: New test.
* c-c++-common/hwasan/hwasan-thread-success.c: New test.
* c-c++-common/hwasan/kernel-defaults.c: New test.
* c-c++-common/hwasan/large-aligned-0.c: New test.
* c-c++-common/hwasan/large-aligned-1.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-0.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-1.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-2.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-3.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-4.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-5.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-6.c: New test.
* c-c++-common/hwasan/large-aligned-untagging-7.c: New test.
* c-c++-common/hwasan/macro-definition.c: New test.
* c-c++-common/hwasan/no-sanitize-attribute.c: New test.
* c-c++-common/hwasan/param-instrument-reads-and-writes.c: New test.
* c-c++-common/hwasan/param-instrument-reads.c: New test.
* c-c++-common/hwasan/param-instrument-writes.c: New test.
* c-c++-common/hwasan/param-instrument-mem-intrinsics.c: New test.
* c-c++-common/hwasan/random-frame-tag.c: New test.
* c-c++-common/hwasan/sanity-check-pure-c.c: New test.
* c-c++-common/hwasan/setjmp-longjmp-0.c: New test.
* c-c++-common/hwasan/setjmp-longjmp-1.c: New test.
* c-c++-common/hwasan/stack-tagging-basic-0.c: New test.
* c-c++-common/hwasan/stack-tagging-basic-1.c: New test.
* c-c++-common/hwasan/stack-tagging-disable.c: New test.
* c-c++-common/hwasan/unprotected-allocas-0.c: New test.
* c-c++-common/hwasan/unprotected-allocas-1.c: New test.
* c-c++-common/hwasan/use-after-free.c: New test.
* c-c++-common/hwasan/vararray-outside-caught.c: New test.
* c-c++-common/hwasan/vararray-stack-restore-correct.c: New test.
* c-c++-common/hwasan/very-large-objects.c: New test.
* g++.dg/hwasan/hwasan.exp: New file.
* g++.dg/hwasan/rvo-handled.C: New test.
* gcc.dg/hwasan/hwasan.exp: New file.
* gcc.dg/hwasan/nested-functions-0.c: New test.
* gcc.dg/hwasan/nested-functions-1.c: New test.
* gcc.dg/hwasan/nested-functions-2.c: New test.
* lib/hwasan-dg.exp: New file.


hwasan-diff6.patch.gz
Description: application/gzip


[PATCH 1/X] libsanitizer: Tie the hwasan library into our build system

2020-08-17 Thread Matthew Malcomson
This patch tries to tie libhwasan into the GCC build system in the same way
that the other sanitizer runtime libraries are handled.

libsanitizer/ChangeLog:

* Makefile.am:  Build libhwasan.
* Makefile.in:  Build libhwasan.
* asan/Makefile.in:  Build libhwasan.
* configure:  Build libhwasan.
* configure.ac:  Build libhwasan.
* hwasan/Makefile.am: New file.
* hwasan/Makefile.in: New file.
* hwasan/libtool-version: New file.
* interception/Makefile.in: Build libhwasan.
* libbacktrace/Makefile.in: Build libhwasan.
* libsanitizer.spec.in: Build libhwasan.
* lsan/Makefile.in: Build libhwasan.
* sanitizer_common/Makefile.in: Build libhwasan.
* tsan/Makefile.in: Build libhwasan.
* ubsan/Makefile.in: Build libhwasan.



### Attachment also inlined for ease of reply###


diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
index 
65ed1e712378ef453f820f86c4d3221f9dee5f2c..2a7e8e1debe838719db0f0fad218b2543cc3111b
 100644
--- a/libsanitizer/Makefile.am
+++ b/libsanitizer/Makefile.am
@@ -14,11 +14,12 @@ endif
 if LIBBACKTRACE_SUPPORTED
 SUBDIRS += libbacktrace
 endif
-SUBDIRS += lsan asan ubsan
+SUBDIRS += lsan asan ubsan hwasan
 nodist_saninclude_HEADERS += \
   include/sanitizer/lsan_interface.h \
   include/sanitizer/asan_interface.h \
-  include/sanitizer/tsan_interface.h
+  include/sanitizer/tsan_interface.h \
+  include/sanitizer/hwasan_interface.h
 if TSAN_SUPPORTED
 SUBDIRS += tsan
 endif
diff --git a/libsanitizer/Makefile.in b/libsanitizer/Makefile.in
index 
02c7f70ac6578a3e93a490ce8bd2c54fc0693c50..2c57d49cbffdb486645aeb5f2c0f85d6e0fad124
 100644
--- a/libsanitizer/Makefile.in
+++ b/libsanitizer/Makefile.in
@@ -92,7 +92,8 @@ target_triplet = @target@
 @SANITIZER_SUPPORTED_TRUE@am__append_1 = 
include/sanitizer/common_interface_defs.h \
 @SANITIZER_SUPPORTED_TRUE@ include/sanitizer/lsan_interface.h \
 @SANITIZER_SUPPORTED_TRUE@ include/sanitizer/asan_interface.h \
-@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/tsan_interface.h
+@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/tsan_interface.h \
+@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/hwasan_interface.h
 @SANITIZER_SUPPORTED_TRUE@@USING_MAC_INTERPOSE_FALSE@am__append_2 = 
interception
 @LIBBACKTRACE_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_3 = 
libbacktrace
 @SANITIZER_SUPPORTED_TRUE@@TSAN_SUPPORTED_TRUE@am__append_4 = tsan
@@ -207,7 +208,7 @@ ETAGS = etags
 CTAGS = ctags
 CSCOPE = cscope
 DIST_SUBDIRS = sanitizer_common interception libbacktrace lsan asan \
-   ubsan tsan
+   ubsan hwasan tsan
 ACLOCAL = @ACLOCAL@
 ALLOC_FILE = @ALLOC_FILE@
 AMTAR = @AMTAR@
@@ -329,6 +330,7 @@ install_sh = @install_sh@
 libdir = @libdir@
 libexecdir = @libexecdir@
 link_libasan = @link_libasan@
+link_libhwasan = @link_libhwasan@
 link_liblsan = @link_liblsan@
 link_libtsan = @link_libtsan@
 link_libubsan = @link_libubsan@
@@ -362,7 +364,7 @@ sanincludedir = 
$(libdir)/gcc/$(target_alias)/$(gcc_version)/include/sanitizer
 nodist_saninclude_HEADERS = $(am__append_1)
 @SANITIZER_SUPPORTED_TRUE@SUBDIRS = sanitizer_common $(am__append_2) \
 @SANITIZER_SUPPORTED_TRUE@ $(am__append_3) lsan asan ubsan \
-@SANITIZER_SUPPORTED_TRUE@ $(am__append_4)
+@SANITIZER_SUPPORTED_TRUE@ hwasan $(am__append_4)
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
diff --git a/libsanitizer/asan/Makefile.in b/libsanitizer/asan/Makefile.in
index 
29622bf466a37f819c9fade30e31195adda51190..25c7fd7b7597d6e243005a1bb7de5b6243d2cfcf
 100644
--- a/libsanitizer/asan/Makefile.in
+++ b/libsanitizer/asan/Makefile.in
@@ -383,6 +383,7 @@ install_sh = @install_sh@
 libdir = @libdir@
 libexecdir = @libexecdir@
 link_libasan = @link_libasan@
+link_libhwasan = @link_libhwasan@
 link_liblsan = @link_liblsan@
 link_libtsan = @link_libtsan@
 link_libubsan = @link_libubsan@
diff --git a/libsanitizer/configure b/libsanitizer/configure
index 
04eca04fbe5e59bae1ba00597de0cf1b7cf1b5fa..9ed9669a85d3cfc2f2f623e796e61a5f8f7e4ded
 100755
--- a/libsanitizer/configure
+++ b/libsanitizer/configure
@@ -657,6 +657,7 @@ USING_MAC_INTERPOSE_TRUE
 link_liblsan
 link_libubsan
 link_libtsan
+link_libhwasan
 link_libasan
 LSAN_SUPPORTED_FALSE
 LSAN_SUPPORTED_TRUE
@@ -12361,7 +12362,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12364 "configure"
+#line 12365 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12467,7 +12468,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12470 "configure"
+#line 12471 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15943,6 +15944,10 @@ fi
 link_libasan=$link_sanitizer_common
 
 
+# Set up the set of additional 

[PATCH 2/X] libsanitizer: Only build libhwasan when targeting AArch64

2020-08-17 Thread Matthew Malcomson
Though the library has limited support for x86, we don't have any
support for generating code targeting x86 so there is no point building
for that target.

libsanitizer/ChangeLog:

* Makefile.am: Condition building hwasan directory.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Set HWASAN_SUPPORTED based on target
architecture.
* configure.tgt: Likewise.



### Attachment also inlined for ease of reply###


diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
index 
2a7e8e1debe838719db0f0fad218b2543cc3111b..065a65e78d49f7689a01ecb64db1f07ca83aa987
 100644
--- a/libsanitizer/Makefile.am
+++ b/libsanitizer/Makefile.am
@@ -14,7 +14,7 @@ endif
 if LIBBACKTRACE_SUPPORTED
 SUBDIRS += libbacktrace
 endif
-SUBDIRS += lsan asan ubsan hwasan
+SUBDIRS += lsan asan ubsan
 nodist_saninclude_HEADERS += \
   include/sanitizer/lsan_interface.h \
   include/sanitizer/asan_interface.h \
@@ -23,6 +23,9 @@ nodist_saninclude_HEADERS += \
 if TSAN_SUPPORTED
 SUBDIRS += tsan
 endif
+if HWASAN_SUPPORTED
+SUBDIRS += hwasan
+endif
 endif
 
 ## May be used by toolexeclibdir.
diff --git a/libsanitizer/Makefile.in b/libsanitizer/Makefile.in
index 
2c57d49cbffdb486645aeb5f2c0f85d6e0fad124..3873ea4d7050f04a3f7bbd0dd3f2a71e9b65d287
 100644
--- a/libsanitizer/Makefile.in
+++ b/libsanitizer/Makefile.in
@@ -97,6 +97,7 @@ target_triplet = @target@
 @SANITIZER_SUPPORTED_TRUE@@USING_MAC_INTERPOSE_FALSE@am__append_2 = 
interception
 @LIBBACKTRACE_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_3 = 
libbacktrace
 @SANITIZER_SUPPORTED_TRUE@@TSAN_SUPPORTED_TRUE@am__append_4 = tsan
+@HWASAN_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_5 = hwasan
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -208,7 +209,7 @@ ETAGS = etags
 CTAGS = ctags
 CSCOPE = cscope
 DIST_SUBDIRS = sanitizer_common interception libbacktrace lsan asan \
-   ubsan hwasan tsan
+   ubsan tsan hwasan
 ACLOCAL = @ACLOCAL@
 ALLOC_FILE = @ALLOC_FILE@
 AMTAR = @AMTAR@
@@ -364,7 +365,7 @@ sanincludedir = 
$(libdir)/gcc/$(target_alias)/$(gcc_version)/include/sanitizer
 nodist_saninclude_HEADERS = $(am__append_1)
 @SANITIZER_SUPPORTED_TRUE@SUBDIRS = sanitizer_common $(am__append_2) \
 @SANITIZER_SUPPORTED_TRUE@ $(am__append_3) lsan asan ubsan \
-@SANITIZER_SUPPORTED_TRUE@ hwasan $(am__append_4)
+@SANITIZER_SUPPORTED_TRUE@ $(am__append_4) $(am__append_5)
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
diff --git a/libsanitizer/configure b/libsanitizer/configure
index 
9ed9669a85d3cfc2f2f623e796e61a5f8f7e4ded..cc5c229f4aebcdd454e9e2e415a8e16046dc1b1a
 100755
--- a/libsanitizer/configure
+++ b/libsanitizer/configure
@@ -659,6 +659,8 @@ link_libubsan
 link_libtsan
 link_libhwasan
 link_libasan
+HWASAN_SUPPORTED_FALSE
+HWASAN_SUPPORTED_TRUE
 LSAN_SUPPORTED_FALSE
 LSAN_SUPPORTED_TRUE
 TSAN_SUPPORTED_FALSE
@@ -12362,7 +12364,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12365 "configure"
+#line 12367 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12468,7 +12470,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12471 "configure"
+#line 12473 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15819,6 +15821,7 @@ fi
 # Get target configury.
 unset TSAN_SUPPORTED
 unset LSAN_SUPPORTED
+unset HWASAN_SUPPORTED
 . ${srcdir}/configure.tgt
  if test "x$TSAN_SUPPORTED" = "xyes"; then
   TSAN_SUPPORTED_TRUE=
@@ -15836,6 +15839,14 @@ else
   LSAN_SUPPORTED_FALSE=
 fi
 
+ if test "x$HWASAN_SUPPORTED" = "xyes"; then
+  HWASAN_SUPPORTED_TRUE=
+  HWASAN_SUPPORTED_FALSE='#'
+else
+  HWASAN_SUPPORTED_TRUE='#'
+  HWASAN_SUPPORTED_FALSE=
+fi
+
 
 # Check for functions needed.
 for ac_func in clock_getres clock_gettime clock_settime lstat readlink
@@ -16818,7 +16829,7 @@ ac_config_files="$ac_config_files Makefile 
libsanitizer.spec libbacktrace/backtr
 ac_config_headers="$ac_config_headers config.h"
 
 
-ac_config_files="$ac_config_files interception/Makefile 
sanitizer_common/Makefile libbacktrace/Makefile lsan/Makefile asan/Makefile 
hwasan/Makefile ubsan/Makefile"
+ac_config_files="$ac_config_files interception/Makefile 
sanitizer_common/Makefile libbacktrace/Makefile lsan/Makefile asan/Makefile 
ubsan/Makefile"
 
 
 if test "x$TSAN_SUPPORTED" = "xyes"; then
@@ -16826,6 +16837,11 @@ if test "x$TSAN_SUPPORTED" = "xyes"; then
 
 fi
 
+if test "x$HWASAN_SUPPORTED" = "xyes"; then
+  ac_config_files="$ac_config_files hwasan/Makefile"
+
+fi
+
 
 
 
@@ -17090,6 +17106,10 @@ if test -z "${LSAN_SUPPORTED_TRUE}" && test -z 
"${LSAN_SUPPORTED_FALSE}"; then
   as_fn_error $? "conditional \"LSAN_SUPPORTED\" was never defined.
 Usually 

[PATCH 3/X] libsanitizer: Add option to bootstrap using HWASAN

2020-08-17 Thread Matthew Malcomson
This is an analogous option to --bootstrap-asan to configure.  It allows
bootstrapping GCC using HWASAN.

For the same reasons as for ASAN we have to avoid using the HWASAN
sanitizer when compiling libiberty and the lto-plugin.

Also add a function to query whether -fsanitize=hwaddress has been
passed.

ChangeLog:

* configure: Regenerate.
* configure.ac: Add --bootstrap-hwasan option.

config/ChangeLog:

* bootstrap-hwasan.mk: New file.

gcc/ChangeLog:

* doc/install.texi: Document new option.

libiberty/ChangeLog:

* configure: Regenerate.
* configure.ac: Avoid using sanitizer.

lto-plugin/ChangeLog:

* Makefile.am: Avoid using sanitizer.
* Makefile.in: Regenerate.



### Attachment also inlined for ease of reply###


diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
new file mode 100644
index 
..4f60bed3fd6e98b47a3a38aea6eba2a7c320da25
--- /dev/null
+++ b/config/bootstrap-hwasan.mk
@@ -0,0 +1,8 @@
+# This option enables -fsanitize=hwaddress for stage2 and stage3.
+
+STAGE2_CFLAGS += -fsanitize=hwaddress
+STAGE3_CFLAGS += -fsanitize=hwaddress
+POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/.libs
diff --git a/configure b/configure
index 
a0c5aca9e8d5cae2782c8fe4625a501853dc226a..203319e3f899e8d24429950c3a5d22927fb5150f
 100755
--- a/configure
+++ b/configure
@@ -8297,7 +8297,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then
   case "$BUILD_CONFIG" in
-*bootstrap-asan* | *bootstrap-ubsan* )
+*bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
   bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/configure.ac b/configure.ac
index 
1a53ed418e4d97606356b14a17b50186c79adcd3..9d5c187c31bfc01003e75058896b686807e47643
 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2809,7 +2809,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then
   case "$BUILD_CONFIG" in
-*bootstrap-asan* | *bootstrap-ubsan* )
+*bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
   bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 
d581a34653f61a440b3c3b832836fe109e2fbd08..25d041fcbb1f7c16f7ac47b7b5d4ea8308c6f69c
 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2767,6 +2767,11 @@ the build tree.
 Compiles GCC itself using Address Sanitization in order to catch invalid memory
 accesses within the GCC code.
 
+@item @samp{bootstrap-hwasan}
+Compiles GCC itself using HWAddress Sanitization in order to catch invalid
+memory accesses within the GCC code.  This option is only available on AArch64
+targets running a Linux kernel that supports the required ABI (5.4 or later).
+
 @end table
 
 @section Building a cross compiler
diff --git a/libiberty/configure b/libiberty/configure
index 
1f8e23f0d235a6a5d5158bf6705023db95ac7023..59e0b73d5838bbd42a5548759084471e97ec254f
 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -5264,6 +5264,7 @@ fi
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 
 
diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 
4e2599c14a89bafcb8c7e523b9ce5b3d60b8c0f6..ad952963971a31968b5d109661b9cab0aa4b95fc
 100644
--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -240,6 +240,7 @@ AC_SUBST(PICFLAG)
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 AC_SUBST(NOASANFLAG)
 
diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 
ba5882df7a7272f65219191c82ecd78ab4d3725e..50d6e09dac881d28d4ff70def47b09ed8c0ea66c
 100644
--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -11,8 +11,8 @@ AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS)
 AM_LDFLAGS = @ac_lto_plugin_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
-override CFLAGS := $(filter-out -fsanitize=address,$(CFLAGS))
-override LDFLAGS := $(filter-out -fsanitize=address,$(LDFLAGS))
+override CFLAGS := $(filter-out -fsanitize=address 
-fsanitize=hwaddress,$(CFLAGS))
+override LDFLAGS := $(filter-out -fsanitize=address 
-fsanitize=hwaddress,$(LDFLAGS))
 
 libexecsub_LTLIBRARIES = liblto_plugin.la
 gcc_build_dir = @gcc_build_dir@
diff --git 

Re: [PATCH] libibery/hashtab: add new functions

2020-08-17 Thread Martin Liška

Adding libiberty maintainer to CC.

On 8/17/20 4:03 PM, Martin Liška wrote:

Hey.

I'm working on bintuils where I would like to port a hash table
implementation in gas/hash.[ch] to libiberty one.

But it would be handy for me to add 2 new functions.

Thoughts?
Thanks,
Martin

include/ChangeLog:

 * hashtab.h (htab_insert): New function.
 (htab_print_statistics): Likewise.

libiberty/ChangeLog:

 * hashtab.c (htab_insert): New function.
 (htab_print_statistics): Likewise.
---
  include/hashtab.h   |  6 ++
  libiberty/hashtab.c | 23 +++
  2 files changed, 29 insertions(+)

diff --git a/include/hashtab.h b/include/hashtab.h
index 6cca342b989..bcaee909bcf 100644
--- a/include/hashtab.h
+++ b/include/hashtab.h
@@ -37,6 +37,7 @@ extern "C" {
  #endif /* __cplusplus */

  #include "ansidecl.h"
+#include 

  /* The type for a hash code.  */
  typedef unsigned int hashval_t;
@@ -172,6 +173,7 @@ extern void **    htab_find_slot (htab_t, const void *, 
enum insert_option);
  extern void *    htab_find_with_hash (htab_t, const void *, hashval_t);
  extern void **    htab_find_slot_with_hash (htab_t, const void *,
    hashval_t, enum insert_option);
+extern void    htab_insert (htab_t, void *);
  extern void    htab_clear_slot    (htab_t, void **);
  extern void    htab_remove_elt    (htab_t, const void *);
  extern void    htab_remove_elt_with_hash (htab_t, const void *, hashval_t);
@@ -183,6 +185,10 @@ extern size_t    htab_size (htab_t);
  extern size_t    htab_elements (htab_t);
  extern double    htab_collisions    (htab_t);

+extern void    htab_print_statistics (FILE *f, htab_t table,
+   const char *name,
+   const char *prefix);
+
  /* A hash function for pointers.  */
  extern htab_hash htab_hash_pointer;

diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
index 225e9e540a7..fb3152ec9c6 100644
--- a/libiberty/hashtab.c
+++ b/libiberty/hashtab.c
@@ -704,6 +704,15 @@ htab_find_slot (htab_t htab, const PTR element, enum 
insert_option insert)
     insert);
  }

+/* Insert ELEMENT into HTAB.  If the element exists, it is overwritten.  */
+
+void
+htab_insert (htab_t htab, PTR element)
+{
+  void **slot = htab_find_slot (htab, element, INSERT);
+  *slot = element;
+}
+
  /* This function deletes an element with the given value from hash
     table (the hash is computed from the element).  If there is no matching
     element in the hash table, this function does nothing.  */
@@ -803,6 +812,20 @@ htab_collisions (htab_t htab)
    return (double) htab->collisions / (double) htab->searches;
  }

+/* Print statistics about a hash table.  */
+
+void
+htab_print_statistics (FILE *f, htab_t table, const char *name,
+   const char *prefix)
+{
+  fprintf (f, "%s hash statistics:\n", name);
+  fprintf (f, "%s%u searches\n", prefix, table->searches);
+  fprintf (f, "%s%lu elements\n", prefix, htab_elements (table));
+  fprintf (f, "%s%lu table size\n", prefix, htab_size (table));
+  fprintf (f, "%s%.2f collisions per search\n",
+   prefix, htab_collisions (table));
+}
+
  /* Hash P as a null-terminated string.

     Copied from gcc/hashtable.c.  Zack had the following to say with respect




[PATCH] libibery/hashtab: add new functions

2020-08-17 Thread Martin Liška

Hey.

I'm working on bintuils where I would like to port a hash table
implementation in gas/hash.[ch] to libiberty one.

But it would be handy for me to add 2 new functions.

Thoughts?
Thanks,
Martin

include/ChangeLog:

* hashtab.h (htab_insert): New function.
(htab_print_statistics): Likewise.

libiberty/ChangeLog:

* hashtab.c (htab_insert): New function.
(htab_print_statistics): Likewise.
---
 include/hashtab.h   |  6 ++
 libiberty/hashtab.c | 23 +++
 2 files changed, 29 insertions(+)

diff --git a/include/hashtab.h b/include/hashtab.h
index 6cca342b989..bcaee909bcf 100644
--- a/include/hashtab.h
+++ b/include/hashtab.h
@@ -37,6 +37,7 @@ extern "C" {
 #endif /* __cplusplus */
 
 #include "ansidecl.h"

+#include 
 
 /* The type for a hash code.  */

 typedef unsigned int hashval_t;
@@ -172,6 +173,7 @@ extern void **  htab_find_slot (htab_t, const void *, 
enum insert_option);
 extern void *  htab_find_with_hash (htab_t, const void *, hashval_t);
 extern void ** htab_find_slot_with_hash (htab_t, const void *,
  hashval_t, enum insert_option);
+extern voidhtab_insert (htab_t, void *);
 extern voidhtab_clear_slot (htab_t, void **);
 extern voidhtab_remove_elt (htab_t, const void *);
 extern voidhtab_remove_elt_with_hash (htab_t, const void *, hashval_t);
@@ -183,6 +185,10 @@ extern size_t  htab_size (htab_t);
 extern size_t  htab_elements (htab_t);
 extern double  htab_collisions (htab_t);
 
+extern void	htab_print_statistics (FILE *f, htab_t table,

+  const char *name,
+  const char *prefix);
+
 /* A hash function for pointers.  */
 extern htab_hash htab_hash_pointer;
 
diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c

index 225e9e540a7..fb3152ec9c6 100644
--- a/libiberty/hashtab.c
+++ b/libiberty/hashtab.c
@@ -704,6 +704,15 @@ htab_find_slot (htab_t htab, const PTR element, enum 
insert_option insert)
   insert);
 }
 
+/* Insert ELEMENT into HTAB.  If the element exists, it is overwritten.  */

+
+void
+htab_insert (htab_t htab, PTR element)
+{
+  void **slot = htab_find_slot (htab, element, INSERT);
+  *slot = element;
+}
+
 /* This function deletes an element with the given value from hash
table (the hash is computed from the element).  If there is no matching
element in the hash table, this function does nothing.  */
@@ -803,6 +812,20 @@ htab_collisions (htab_t htab)
   return (double) htab->collisions / (double) htab->searches;
 }
 
+/* Print statistics about a hash table.  */

+
+void
+htab_print_statistics (FILE *f, htab_t table, const char *name,
+  const char *prefix)
+{
+  fprintf (f, "%s hash statistics:\n", name);
+  fprintf (f, "%s%u searches\n", prefix, table->searches);
+  fprintf (f, "%s%lu elements\n", prefix, htab_elements (table));
+  fprintf (f, "%s%lu table size\n", prefix, htab_size (table));
+  fprintf (f, "%s%.2f collisions per search\n",
+  prefix, htab_collisions (table));
+}
+
 /* Hash P as a null-terminated string.
 
Copied from gcc/hashtable.c.  Zack had the following to say with respect

--
2.28.0



Re: [PATCH] Implement no_stack_protect attribute.

2020-08-17 Thread Martin Liška

PING^4

On 7/23/20 1:10 PM, Martin Liška wrote:

PING^3

On 6/24/20 11:09 AM, Martin Liška wrote:

PING^2

On 6/10/20 10:12 AM, Martin Liška wrote:

PING^1

On 5/25/20 3:10 PM, Martin Liška wrote:

On 5/21/20 4:53 PM, Martin Sebor wrote:

On 5/21/20 5:28 AM, Martin Liška wrote:

On 5/18/20 10:37 PM, Martin Sebor wrote:

I know there are some somewhat complex cases the attribute exclusion
mechanism isn't general enough to handle but this seems simple enough
that it should work.  Unless I'm missing something that makes it not
feasible I would suggest to use it.


Hi Martin.

Do we have a better place where we check for attribute collision?


If by collision you mean the same thing as the mutual exclusion I was
talking about then that's done by creating an attribute_spec::exclusions
array like for instance attr_cold_hot_exclusions in c-attribs.c and
pointing to it from the attribute_spec entries for each of
the mutually exclusive attributes in the attribute table.  Everything
else is handled automatically by decl_attributes.

Martin


Thanks, I'm sending updated version of the patch that utilizes the conflict
detection.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin










[PATCH] floatformat.h: Add bfloat16 support.

2020-08-17 Thread Felix Willgerodt via Gcc-patches
This change is motivated by a patchset that adds bfloat16 debugging
support for new avx512 instructions to GDB. The gdb thread can be found
here: https://sourceware.org/pipermail/gdb-patches/2020-July/170820.html

include:
2020-08-17  Felix Willgerodt  

* floatformat.h (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.

libiberty:
2020-08-17  Felix Willgerodt  

* floatformat.c (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.
---
 include/floatformat.h   |  3 +++
 libiberty/floatformat.c | 19 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/floatformat.h b/include/floatformat.h
index ce8d6d4add8..630fade0449 100644
--- a/include/floatformat.h
+++ b/include/floatformat.h
@@ -133,6 +133,9 @@ extern const struct floatformat 
floatformat_ia64_quad_little;
 /* IBM long double (double+double).  */
 extern const struct floatformat floatformat_ibm_long_double_big;
 extern const struct floatformat floatformat_ibm_long_double_little;
+/* bfloat16.  */
+extern const struct floatformat floatformat_bfloat16_big;
+extern const struct floatformat floatformat_bfloat16_little;
 
 /* Convert from FMT to a double.
FROM is the address of the extended float.
diff --git a/libiberty/floatformat.c b/libiberty/floatformat.c
index 2fd5e688ec4..6b9b03288e2 100644
--- a/libiberty/floatformat.c
+++ b/libiberty/floatformat.c
@@ -389,7 +389,24 @@ const struct floatformat 
floatformat_ibm_long_double_little =
   floatformat_ibm_long_double_is_valid,
   _ieee_double_little
 };
-
+
+const struct floatformat floatformat_bfloat16_big =
+{
+  floatformat_big, 16, 0, 1, 8, 127, 255, 9, 7,
+  floatformat_intbit_no,
+  "floatformat_bfloat16_big",
+  floatformat_always_valid,
+  NULL
+};
+
+const struct floatformat floatformat_bfloat16_little =
+{
+  floatformat_little, 16, 0, 1, 8, 127, 255, 9, 7,
+  floatformat_intbit_no,
+  "floatformat_bfloat16_little",
+  floatformat_always_valid,
+  NULL
+};
 
 #ifndef min
 #define min(a, b) ((a) < (b) ? (a) : (b))
-- 
2.25.4

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928



Re: [PATCH] openmp: fix UBSAN error at gcc/fortran/openmp.c:4737

2020-08-17 Thread Tobias Burnus

On 8/17/20 11:15 AM, Martin Liška wrote:


I'm suggesting one more clean up that uses static assert
instead of a run-time check.


I concur that compile-time checks are nicer.
LGTM – it should be able catch this kind of mistakes.

Tobias



Thoughts?
Martin

0001-opnemp-add-static-assert-for-clause_names.patch

 From c9aee2c44d5cf7e417d381988b2f4900e9ea8b05 Mon Sep 17 00:00:00 2001
From: Martin Liska
Date: Mon, 17 Aug 2020 11:14:13 +0200
Subject: [PATCH] opnemp: add static assert for clause_names.

gcc/fortran/ChangeLog:

  * openmp.c (resolve_omp_clauses): Add static assert
  for OMP_LIST_NUM and size of clause_names array.
  Remove check that is always true.
---
  gcc/fortran/openmp.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 60d8e5573c2..4d33a450a33 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4371,6 +4371,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
  "TO", "FROM", "REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE",
  "CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
  "NONTEMPORAL" };
+  STATIC_ASSERT (ARRAY_SIZE (clause_names) == OMP_LIST_NUM);

if (omp_clauses == NULL)
  return;
@@ -4732,12 +4733,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
for (list = 0; list < OMP_LIST_NUM; list++)
  if ((n = omp_clauses->lists[list]) != NULL)
{
- const char *name;
-
- if (list < OMP_LIST_NUM)
-   name = clause_names[list];
- else
-   gcc_unreachable ();
+ const char *name = clause_names[list];

  switch (list)
{
-- 2.28.0

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH v2] C-SKY: Support -mfloat-abi=hard.

2020-08-17 Thread Jojo R
gcc/ChangeLog:

* config/csky/csky.md (CSKY_NPARM_FREGS): New.
(call_value_internal_vs/d): New.
(untyped_call): New.
* config/csky/csky.h (TARGET_SINGLE_FPU): New.
(TARGET_DOUBLE_FPU): New.
(FUNCTION_VARG_REGNO_P): New.
(CSKY_VREG_MODE_P): New.
(FUNCTION_VARG_MODE_P): New.
(CUMULATIVE_ARGS): Add extra regs info.
(INIT_CUMULATIVE_ARGS): Use csky_init_cumulative_args.
(FUNCTION_ARG_REGNO_P): Use FUNCTION_VARG_REGNO_P.
* config/csky/csky-protos.h (csky_init_cumulative_args): Extern.
* config/csky/csky.c (csky_cpu_cpp_builtins): Support 
TARGET_HARD_FLOAT_ABI.
(csky_function_arg): Likewise.
(csky_num_arg_regs): Likewise.
(csky_function_arg_advance): Likewise.
(csky_function_value): Likewise.
(csky_libcall_value): Likewise.
(csky_function_value_regno_p): Likewise.
(csky_arg_partial_bytes): Likewise.
(csky_setup_incoming_varargs): Likewise.
(csky_init_cumulative_args): New.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-apply2.c : Skip if CSKY.
* gcc.dg/torture/stackalign/builtin-apply-2.c : Likewise.

---
 gcc/config/csky/csky-protos.h  |  2 +
 gcc/config/csky/csky.c | 96 +++---
 gcc/config/csky/csky.h | 34 ++--
 gcc/config/csky/csky.md| 84 +++
 gcc/testsuite/gcc.dg/builtin-apply2.c  |  2 +-
 .../gcc.dg/torture/stackalign/builtin-apply-2.c|  2 +-
 6 files changed, 200 insertions(+), 20 deletions(-)

diff --git a/gcc/config/csky/csky-protos.h b/gcc/config/csky/csky-protos.h
index cc1a033..2c02399 100644
--- a/gcc/config/csky/csky-protos.h
+++ b/gcc/config/csky/csky-protos.h
@@ -68,4 +68,6 @@ extern int csky_compute_pushpop_length (rtx *);
 
 extern int csky_default_branch_cost (bool, bool);
 extern bool csky_default_logical_op_non_short_circuit (void);
+
+extern void csky_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
 #endif /* GCC_CSKY_PROTOS_H */
diff --git a/gcc/config/csky/csky.c b/gcc/config/csky/csky.c
index 7ba3ed3..b71291a 100644
--- a/gcc/config/csky/csky.c
+++ b/gcc/config/csky/csky.c
@@ -328,6 +328,16 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define ("__csky_hard_float__");
   builtin_define ("__CSKY_HARD_FLOAT__");
+  if (TARGET_HARD_FLOAT_ABI)
+{
+  builtin_define ("__csky_hard_float_abi__");
+  builtin_define ("__CSKY_HARD_FLOAT_ABI__");
+}
+  if (TARGET_SINGLE_FPU)
+{
+  builtin_define ("__csky_hard_float_fpu_sf__");
+  builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
+}
 }
   else
 {
@@ -1790,9 +1800,22 @@ static rtx
 csky_function_arg (cumulative_args_t pcum_v, const function_arg_info )
 {
   CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
+  int reg = pcum->reg;
+  machine_mode mode = arg.mode;
 
-  if (*pcum < CSKY_NPARM_REGS)
-return gen_rtx_REG (arg.mode, CSKY_FIRST_PARM_REGNUM + *pcum);
+  if (FUNCTION_VARG_MODE_P(mode)
+  && !pcum->is_stdarg)
+{
+  reg = pcum->freg;
+
+  if (reg < CSKY_NPARM_FREGS)
+return gen_rtx_REG (mode, CSKY_FIRST_VFP_REGNUM + reg);
+  else
+return NULL_RTX;
+}
+
+  if (reg < CSKY_NPARM_REGS)
+return gen_rtx_REG (mode, CSKY_FIRST_PARM_REGNUM + reg);
 
   return NULL_RTX;
 }
@@ -1802,7 +1825,7 @@ csky_function_arg (cumulative_args_t pcum_v, const 
function_arg_info )
MODE and TYPE.  */
 
 static int
-csky_num_arg_regs (machine_mode mode, const_tree type)
+csky_num_arg_regs (machine_mode mode, const_tree type, bool is_stdarg)
 {
   int size;
 
@@ -1811,6 +1834,14 @@ csky_num_arg_regs (machine_mode mode, const_tree type)
   else
 size = GET_MODE_SIZE (mode);
 
+  if (TARGET_HARD_FLOAT_ABI
+  && !is_stdarg)
+{
+  if (CSKY_VREG_MODE_P(mode)
+  && !TARGET_SINGLE_FPU)
+return ((CSKY_NUM_WORDS (size) + 1) / 2);
+}
+
   return CSKY_NUM_WORDS (size);
 }
 
@@ -1822,12 +1853,23 @@ csky_function_arg_advance (cumulative_args_t pcum_v,
   const function_arg_info )
 {
   CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
-  int param_size = csky_num_arg_regs (arg.mode, arg.type);
+  int *reg = >reg;
+  machine_mode mode = arg.mode;
+
+  int param_size = csky_num_arg_regs (mode, arg.type, pcum->is_stdarg);
+  int param_regs_nums = CSKY_NPARM_REGS;
+
+  if (FUNCTION_VARG_MODE_P(mode)
+  && !pcum->is_stdarg)
+{
+  reg = >freg;
+  param_regs_nums = CSKY_NPARM_FREGS;
+}
 
-  if (*pcum + param_size > CSKY_NPARM_REGS)
-*pcum = CSKY_NPARM_REGS;
+  if (*reg + param_size > param_regs_nums)
+*reg = param_regs_nums;
   else
-*pcum += param_size;
+*reg += param_size;
 }
 
 
@@ -1843,6 +1885,12 @@ csky_function_value (const_tree type, const_tree func,
   mode = TYPE_MODE (type);
   size 

Re: [PATCH] x86_64: PR rtl-optimization/92180: class_likely_spilled vs. cant_combine_insn.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 17, 2020 at 12:42 PM Roger Sayle  wrote:
>
>
> This patch catches a missed optimization opportunity where GCC currently
> generates worse code than LLVM.  The issue, as nicely analyzed in bugzilla,
> boils down to the following three insns in combine:
>
> (insn 6 5 7 2 (parallel [
> (set (reg:DI 85)
> (ashift:DI (reg:DI 85)
> (const_int 32 [0x20])))
> (clobber (reg:CC 17 flags))
> ]) "pr92180.c":4:10 564 {*ashldi3_1}
>  (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil)))
> (insn 7 6 14 2 (parallel [
> (set (reg:DI 84)
> (ior:DI (reg:DI 84)
> (reg:DI 85)))
> (clobber (reg:CC 17 flags))
> ]) "pr92180.c":4:10 454 {*iordi_1}
>  (expr_list:REG_DEAD (reg:DI 85)
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil
> (insn 14 7 15 2 (set (reg/i:SI 0 ax)
> (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal}
>  (expr_list:REG_DEAD (reg:DI 84)
> (nil)))
>
> Normally, combine/simplify-rtx would notice that insns 6 and 7
> (which update highpart bits) are unnecessary as the final insn 14
> only requires to lowpart bits.  The complication is that insn 14
> sets a hard register in targetm.class_likely_spilled_p which
> prevents combine from performing its simplifications, and removing
> the redundant instructions.
>
> At first glance a fix would appear to require changes to combine,
> potentially affecting code generation on all small register class
> targets...  An alternate (and I think clever) solution is to spot
> that this problematic situation can be avoided by the backend.
>
> At RTL expansion time, the middle-end has a clear separation between
> pseudos and hard registers, so the RTL initially contains:
>
> (insn 9 8 10 2 (set (reg:SI 86)
> (subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1
>  (nil))
> (insn 10 9 14 2 (set (reg:SI 83 [  ])
> (reg:SI 86)) "pr92180.c":6:10 -1
>  (nil))
> (insn 14 10 15 2 (set (reg/i:SI 0 ax)
> (reg:SI 83 [  ])) "pr92180.c":7:1 -1
>  (nil))
>
> which can be optimized without problems by combine; it is only the
> intervening passes (initially fwprop1) that propagate computations
> into sets of hard registers, and disable those opportunities.
>
> The solution proposed here is to have the x86 backend/recog prevent
> early RTL passes composing instructions (that set likely_spilled hard
> registers) that they (combine) can't simplify, until after reload.
> We allow sets from pseudo registers, immediate constants and memory
> accesses, but anything more complicated is performed via a temporary
> pseudo.  Not only does this simplify things for the register allocator,
> but any remaining register-to-register moves are easily cleaned up
> by the late optimization passes after reload, such as peephole2 and
> cprop_hardreg.
>
> This patch has been tested on x86_64-pc-linux-gnu with a
> "make bootstrap" and a "make -k check" with no new failures.
> Ok for mainline?

I think that fwprop interferes with recent change to combine, where
combine won't propagate hard registers anymore. So, following that
change, there is no point for fwprop to create instructions that
combine won't be able to process. Alternatively, perhaps fwprop should
be prevented from propagating likely_spilled hard registers?

Let's ask Segher for his opinion.

Uros.

>
>
> 2020-08-17  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/92180
> * config/i386/i386.c (ix86_hardreg_mov_ok): New function to
> determine whether (set DST SRC) should be allowed at this point.
> * config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here.
> * config/i386/i386-expand.c (ix86_expand_move): Check whether
> this is a complex set of a likely spilled hard register, and if
> so place the value in a pseudo, and load the hard reg from it.
> * config/i386/i386.md (*movdi_internal, *movsi_internal,
> *movhi_internal, *movqi_internal): Make these instructions
> conditional on ix86_hardreg_mov_ok.
> (*lea): Make this define_insn_and_split conditional on
> ix86_hardreg_mov_ok.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/92180
> * gcc.target/i386/pr92180.c: New test.
>
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


[PATCH] x86_64: PR rtl-optimization/92180: class_likely_spilled vs. cant_combine_insn.

2020-08-17 Thread Roger Sayle

This patch catches a missed optimization opportunity where GCC currently
generates worse code than LLVM.  The issue, as nicely analyzed in bugzilla,
boils down to the following three insns in combine:

(insn 6 5 7 2 (parallel [
(set (reg:DI 85)
(ashift:DI (reg:DI 85)
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) "pr92180.c":4:10 564 {*ashldi3_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 7 6 14 2 (parallel [
(set (reg:DI 84)
(ior:DI (reg:DI 84)
(reg:DI 85)))
(clobber (reg:CC 17 flags))
]) "pr92180.c":4:10 454 {*iordi_1}
 (expr_list:REG_DEAD (reg:DI 85)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 14 7 15 2 (set (reg/i:SI 0 ax)
(subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal}
 (expr_list:REG_DEAD (reg:DI 84)
(nil)))

Normally, combine/simplify-rtx would notice that insns 6 and 7
(which update highpart bits) are unnecessary as the final insn 14
only requires to lowpart bits.  The complication is that insn 14
sets a hard register in targetm.class_likely_spilled_p which
prevents combine from performing its simplifications, and removing
the redundant instructions.

At first glance a fix would appear to require changes to combine,
potentially affecting code generation on all small register class
targets...  An alternate (and I think clever) solution is to spot
that this problematic situation can be avoided by the backend.

At RTL expansion time, the middle-end has a clear separation between
pseudos and hard registers, so the RTL initially contains:

(insn 9 8 10 2 (set (reg:SI 86)
(subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1
 (nil))
(insn 10 9 14 2 (set (reg:SI 83 [  ])
(reg:SI 86)) "pr92180.c":6:10 -1
 (nil))
(insn 14 10 15 2 (set (reg/i:SI 0 ax)
(reg:SI 83 [  ])) "pr92180.c":7:1 -1
 (nil))

which can be optimized without problems by combine; it is only the
intervening passes (initially fwprop1) that propagate computations
into sets of hard registers, and disable those opportunities.

The solution proposed here is to have the x86 backend/recog prevent
early RTL passes composing instructions (that set likely_spilled hard
registers) that they (combine) can't simplify, until after reload.
We allow sets from pseudo registers, immediate constants and memory
accesses, but anything more complicated is performed via a temporary
pseudo.  Not only does this simplify things for the register allocator,
but any remaining register-to-register moves are easily cleaned up
by the late optimization passes after reload, such as peephole2 and
cprop_hardreg.

This patch has been tested on x86_64-pc-linux-gnu with a
"make bootstrap" and a "make -k check" with no new failures.
Ok for mainline?


2020-08-17  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/92180
* config/i386/i386.c (ix86_hardreg_mov_ok): New function to
determine whether (set DST SRC) should be allowed at this point.
* config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here.
* config/i386/i386-expand.c (ix86_expand_move): Check whether
this is a complex set of a likely spilled hard register, and if
so place the value in a pseudo, and load the hard reg from it.
* config/i386/i386.md (*movdi_internal, *movsi_internal,
*movhi_internal, *movqi_internal): Make these instructions
conditional on ix86_hardreg_mov_ok.
(*lea): Make this define_insn_and_split conditional on
ix86_hardreg_mov_ok.

gcc/testsuite/ChangeLog
PR rtl-optimization/92180
* gcc.target/i386/pr92180.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index f441ba9..e6e4433 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -190,6 +190,17 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   op0 = operands[0];
   op1 = operands[1];
 
+  /* Avoid complex sets of likely spilled hard registers before reload.  */
+  if (!ix86_hardreg_mov_ok (op0, op1))
+{
+  tmp = gen_reg_rtx (mode);
+  operands[0] = tmp;
+  ix86_expand_move (mode, operands);
+  operands[0] = op0;
+  operands[1] = tmp;
+  op1 = tmp;
+}
+
   switch (GET_CODE (op1))
 {
 case CONST:
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index b6088f2..a10bc56 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -161,6 +161,7 @@ extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
 extern bool ix86_check_no_addr_space (rtx);
 extern void ix86_split_idivmod (machine_mode, rtx[], bool);
+extern bool ix86_hardreg_mov_ok (rtx, rtx);
 
 extern rtx assign_386_stack_local (machine_mode, enum 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
>
> Enable operator or/xor/and/andn/not for mask register, kxnor is not
> enabled since there's no corresponding instruction for general
> registers.
>
> gcc/
> PR target/88808
> * config/i386/i386.md: (*movsi_internal): Adjust constraints
> for mask registers.
> (*movhi_internal): Ditto.
> (*movqi_internal): Ditto.
> (*anddi_1): Support mask register operations
> (*and_1): Ditto.
> (*andqi_1): Ditto.
> (*andn_1): Ditto.
> (*_1): Ditto.
> (*qi_1): Ditto.
> (*one_cmpl2_1): Ditto.
> (*one_cmplsi2_1_zext): Ditto.
> (*one_cmplqi2_1): Ditto.
>
> gcc/testsuite/
> * gcc.target/i386/bitwise_mask_op-1.c: New test.
> * gcc.target/i386/bitwise_mask_op-2.c: New test.
> * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> * gcc.target/i386/avx512f-kmovw-5.c: Ditto.

index 74d207c3711..e8ad79d1b0a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2294,7 +2294,7 @@

 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
+"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
 (match_operand:SI 1 "general_operand"
 "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"

I'd rather see *k everywhere, also with *movqi_internal and
*movhi_internal patterns. The "*" means that the allocator won't
allocate a mask register by default, but it will be used to optimize
moves. With the above change, you are risking that during integer
register pressure, the register allocator will allocate zero to a mask
register, and later "optimize" the move with a direct maskreg-intreg
move.

The current strategy is that only general registers get allocated for
integer modes. Let's keep it this way for now.

Otherwise, the patchset LGTM, but please test the suggested changes and repost.

BTW: Do you plan to remove mask operations from sse.md? ATM, they are
used to distinguish mask operations, generated from builtins from
generic operations, so I'd like to keep them for a while. The drawback
is, that they are not combined with other operations, but at the end
of the day, this is what the programmer asked for by using builtins.

Uros.


Re: PING: Fwd: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-17 Thread Aldy Hernandez via Gcc-patches



On 8/14/20 7:16 PM, Andrew MacLeod wrote:

On 8/14/20 12:05 PM, Aldy Hernandez wrote:

I made some minor changes to the function comments.

gcc/ChangeLog:

* vr-values.c (check_for_binary_op_overflow): Change type of store
to range_query.
(vr_values::adjust_range_with_scev): Abstract most of the code...
(range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
(simplify_using_ranges::simplify_using_ranges): Change type of store
to range_query.
* vr-values.h (class range_query): New.
(class simplify_using_ranges): Use range_query.
(class vr_values): Add OVERRIDE to get_value_range.
(range_of_var_in_loop): New.
---
 gcc/vr-values.c | 150 ++--
 gcc/vr-values.h |  23 ++--
 2 files changed, 88 insertions(+), 85 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9002d87c14b..5b7bae3bfb7 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,

    overflow.  */

 static bool
-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
   enum tree_code subcode, tree type,
   tree op0, tree op1, bool *ovf)
 {
@@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code comp, 
const value_range *vr,


   gcc_unreachable ();
 }
-/* Given a range VR, a LOOP and a variable VAR, determine whether it
-   would be profitable to adjust VR using scalar evolution information
-   for VAR.  If so, update VR with the new limits.  */
+
+/* Given a VAR in STMT within LOOP, determine the range of the
+   variable and store it in VR.  If no range can be determined, the
+   resulting range will be set to VARYING.  */

 void
-vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop 
*loop,

-   gimple *stmt, tree var)
+range_of_var_in_loop (irange *vr, range_query *query,
+  class loop *loop, gimple *stmt, tree var)
 {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, min, max, type;
   enum ev_direction dir;

-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-    return;
-


IIUC, you've switched to using the new API, so the bounds calls will 
basically turn and ANTI range into a varying , making [lbound,ubound] 
will be [MIN, MAX] ?
so its effectively a no-op, except we will not punt on getting a range 
when VR is an anti range anymore.. so that goodness...


Yes.




chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, 
var));


   /* Like in PR19590, scev can return a constant function.  */
@@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev 
(value_range_equiv *vr, class loop *loop,

 }

   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
-    return;
+    {
+  vr->set_varying (TREE_TYPE (var));
+  return;
+    }


Im seeing a lot of this pattern...
Maybe we should set vr to varying upon entry to the function as the 
default return value.. then we can just return like it did before in all 
those places.


Better yet, since this routine doesn't "update" anymore and simply 
returns a range, maybe it could instead return a boolean if it finds a 
range rather than the current behaviour...

then those simply become

+    return false;

We won't have to intersect at the caller if we don't need to, and its 
useful information at other points to know a range was calculated 
without having to see if varying_p () came back from the call.

ie, we'd the usage pattern would then be

value_range_equiv r;
if (range_of_var_in_loop (, this, loop, stmt, var))
    vr->intersect ();

This is the pattern we use throughout the ranger.


Done.






   init = initial_condition_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-    init = tem;
+  if (TREE_CODE (init) == SSA_NAME)
+    query->get_value_range (init, stmt)->singleton_p ();
   step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-    step = tem;
+  if (TREE_CODE (step) == SSA_NAME)
+    query->get_value_range (step, stmt)->singleton_p ();


If I read this correctly, we get values for init and step... and if they 
are SSA_NAMES, then we query ranges, otherwise use what we got back.. So 
that would seem to be the same behaviour as before then..

Perhaps a comment is warranted? I had to read it a few times :-)


Indeed.  I am trying to do too much in one line.  I've added a comment.






   /* If STEP is symbolic, we can't know whether INIT will be the
  minimum or maximum value in the range.  Also, unless INIT is
@@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev 
(value_range_equiv *vr, class loop *loop,

   if (step == NULL_TREE
   || 

Re: [PATCH 3/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 14, 2020 at 10:25 AM Hongtao Liu  wrote:
>
> 1. Set cost of movement inside mask registers a bit higher than gpr's.
> 2. Set cost of movement between mask register and gpr much higher than 
> movement
>inside gpr, but still less equal than load/store.
> 3. Set cost of mask register load/store a bit higher than gpr load/store.

I have no comment here (fine tuning costs is a painful task ;) )

Uros.


Re: [PATCH 2/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu  wrote:
>
>   Enable direct move between masks and gprs in pass_reload with
> consideration of cost model.
>
> Changelog
> gcc/
> * config/i386/i386.c (inline_secondary_memory_needed):
> No memory is needed between mask regs and gpr.
> (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
> mask regno.
> * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
> (REG_CLASS_NAMES): Ditto.
> (REG_CLASS_CONTENTS): Ditto.
> * config/i386/i386.md: Exclude mask register in
> define_peephole2 which is available only for gpr.
>
> gcc/testsuites/
> * gcc.target/i386/pr71453-1.c: New tests.
> * gcc.target/i386/pr71453-2.c: Ditto.
> * gcc.target/i386/pr71453-3.c: Ditto.
> * gcc.target/i386/pr71453-4.c: Ditto.

@@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode
mode, reg_class_t class1,
   || MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
   || MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
   || MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
-  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
-  || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
-  || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
+  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
 {
   gcc_assert (!strict || lra_in_progress);
   return true;

No, this is still needed, the reason is explained in the comment above
inline_secondary_memory_needed:

   The function can't work reliably when one of the CLASSES is a class
   containing registers from multiple sets.  We avoid this by never combining
   different sets in a single alternative in the machine description.
   Ensure that this constraint holds to avoid unexpected surprises.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b24a4557871..74d207c3711 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15051,7 +15051,7 @@
(parallel [(set (reg:CC FLAGS_REG)
(unspec:CC [(match_dup 0)] UNSPEC_PARITY))
   (clobber (match_dup 0))])]
-  ""
+  "!MASK_REGNO_P (REGNO (operands[0]))"
   [(set (reg:CC FLAGS_REG)
 (unspec:CC [(match_dup 1)] UNSPEC_PARITY))])

@@ -15072,6 +15072,7 @@
(label_ref (match_operand 5))
(pc)))]
   "REGNO (operands[2]) == REGNO (operands[3])
+   && !MASK_REGNO_P (REGNO (operands[1]))
&& peep2_reg_dead_p (3, operands[0])
&& peep2_reg_dead_p (3, operands[2])
&& peep2_regno_dead_p (4, FLAGS_REG)"

Actually, there are several (historic?) peephole2 patterns that assume
register_operand means only integer registers. Just change
register_operand to general_reg_operand and eventually
nonimmediate_operand to nonimmediate_gr_operand. Do not put additional
predicates into insn predicate.

Uros.


Re: [PATCH 1/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-17 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 14, 2020 at 10:22 AM Hongtao Liu  wrote:
>
> Hi:
>   First, since avx512 masks involve both vector isa and general part,
> so i add both maintainers to the maillist.
>
>   I'm doing this in 4 steps:
>   1 - Add cost model for operation of mask registers.
>   2 - Introduce new cover class INT_MASK_REGS, this will enable direct
> move between gpr and mask registers in pass_reload by consideration of
> cost model, this is similar as INT_SSE_REGS.
>   3 - Tune cost model.
>   4 - Enable operator or/xor/and/andn/not for mask register. kxnor is
> not enabled since there's no corresponding instruction for general
> registers, 64bit mask op is not enabled for 32bit target.
> kadd/kshift/ktest are not merged into general versionsadd/ashl/test
> since i think it would be odd to use mask register for those
> operations.
>
>   Bootstrap is ok, regression test is ok for i386/x86-64 result.
>   There's some improvement for performance of SPEC2017 tested on SKL,
> i observe there're many spills from integer to mask registers instead
> of memory which is the reason for the improvement.

+  if (MASK_CLASS_P (regclass))
+{
+  int index;
+  switch (GET_MODE_SIZE (mode))
+{
+case 1:
+  index = 0;
+  break;
+case 2:
+  index = 1;
+  break;
+default:
+  index = 3;

Max index = 2!

+  break;
+}
+
+  if (in == 2)
+return MAX (ix86_cost->hard_register.mask_load[index],
+ix86_cost->hard_register.mask_store[index]);
+  return in ? ix86_cost->hard_register.mask_load[2]
+: ix86_cost->hard_register.mask_store[2];
+}

Are DImode loads and stores assumed to cost the same as SImode? A
comment would be nice here.

Uros.


Re: PING: Fwd: [PATCH 1/2] Add statement context to get_value_range.

2020-08-17 Thread Aldy Hernandez via Gcc-patches



On 8/14/20 6:03 PM, Andrew MacLeod wrote:

On 8/11/20 7:53 AM, Aldy Hernandez via Gcc-patches wrote:

-- Forwarded message -
From: Aldy Hernandez 
Date: Tue, Aug 4, 2020, 13:55
Subject: [PATCH 1/2] Add statement context to get_value_range.
To: 
Cc: , Aldy Hernandez 


This is in line with the statement context that we have for get_value()
in the substitute_and_fold_engine class.
---
  gcc/vr-values.c | 64 ++---
  gcc/vr-values.h | 14 +--
  2 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 511342f2f13..9002d87c14b 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -147,7 +147,8 @@ vr_values::get_lattice_entry (const_tree var)
 return NULL.  Otherwise create an empty range if none existed for 
VAR.

*/

  const value_range_equiv *
-vr_values::get_value_range (const_tree var)
+vr_values::get_value_range (const_tree var,
+   gimple *stmt ATTRIBUTE_UNUSED)
  {
    /* If we have no recorded ranges, then return NULL.  */
    if (!vr_value)
@@ -450,7 +451,7 @@ simplify_using_ranges::op_with_boolean_value_range_p
(tree op)

    /* ?? Errr, this should probably check for [0,0] and [1,1] as well
   as [0,1].  */
-  const value_range *vr = get_value_range (op);
+  const value_range *vr = get_value_range (op, NULL);
    return *vr == value_range (build_zero_cst (TREE_TYPE (op)),
  build_one_cst (TREE_TYPE (op)));
  }


I think if we are adding "gimple *stmt" as a parameter, we should make 
if default to NULL...  Then we won't have to change all the callers that 
don't have a need for it.
I get that it helped us find all the places where stmts were 
available/needed originally, but I think that need is no longer relevant 
and we can revert to making it a default parameter now.


Done.



further more, I don't think it should be a ATTRIBUTE_UNUSED, and then 
pass a NULL further down :)  we should be able to pass stmt.



@@ -972,12 +973,13 @@ vr_values::extract_range_from_cond_expr
(value_range_equiv *vr, gassign *stmt)

  void
  vr_values::extract_range_from_comparison (value_range_equiv *vr,
+ gimple *stmt,
   enum tree_code code,
   tree type, tree op0, tree op1)


Now that we are passing stmt in, and there is only one use of this 
function, I think you can kill the final 4 parameters and just get them 
in the function itself...


Done.




  {
    bool sop;
    tree val
-    = simplifier.vrp_evaluate_conditional_warnv_with_ops (code, op0, 
op1,
+    = simplifier.vrp_evaluate_conditional_warnv_with_ops (stmt, code, 
op0,

op1,
   false, ,
NULL);
    if (val)
  {
@@ -1008,14 +1010,14 @@ check_for_binary_op_overflow (vr_values *store,
  {
    value_range vr0, vr1;
    if (TREE_CODE (op0) == SSA_NAME)
-    vr0 = *store->get_value_range (op0);
+    vr0 = *store->get_value_range (op0, NULL);
    else if (TREE_CODE (op0) == INTEGER_CST)
  vr0.set (op0);
    else
  vr0.set_varying (TREE_TYPE (op0));

    if (TREE_CODE (op1) == SSA_NAME)
-    vr1 = *store->get_value_range (op1);
+    vr1 = *store->get_value_range (op1, NULL);
    else if (TREE_CODE (op1) == INTEGER_CST)
  vr1.set (op1);
    else
@@ -1472,7 +1474,7 @@ vr_values::extract_range_from_assignment
(value_range_equiv *vr, gassign *stmt)
    else if (code == COND_EXPR)
  extract_range_from_cond_expr (vr, stmt);
    else if (TREE_CODE_CLASS (code) == tcc_comparison)
-    extract_range_from_comparison (vr, gimple_assign_rhs_code (stmt),
+    extract_range_from_comparison (vr, stmt, gimple_assign_rhs_code 
(stmt),

    gimple_expr_type (stmt),
    gimple_assign_rhs1 (stmt),
    gimple_assign_rhs2 (stmt));
@@ -1805,7 +1807,7 @@ vr_values::adjust_range_with_scev 
(value_range_equiv

*vr, class loop *loop,
    if (TREE_CODE (step) == INTEGER_CST
    && is_gimple_val (init)
    && (TREE_CODE (init) != SSA_NAME
- || get_value_range (init)->kind () == VR_RANGE))
+ || get_value_range (init, stmt)->kind () == VR_RANGE))
  {
    widest_int nit;

@@ -1838,7 +1840,7 @@ vr_values::adjust_range_with_scev 
(value_range_equiv

*vr, class loop *loop,
   value_range initvr;

   if (TREE_CODE (init) == SSA_NAME)
-   initvr = *(get_value_range (init));
+   initvr = *(get_value_range (init, stmt));
   else if (is_gimple_min_invariant (init))
 initvr.set (init);
   else
@@ -2090,7 +2092,7 @@ const value_range_equiv *
  simplify_using_ranges::get_vr_for_comparison (int i, value_range_equiv
*tem)
  {
    /* Shallow-copy equiv bitmap.  */
-  const value_range_equiv *vr = 

Re: [PATCH] openmp: fix UBSAN error at gcc/fortran/openmp.c:4737

2020-08-17 Thread Martin Liška

On 8/17/20 10:52 AM, Tobias Burnus wrote:

LGTM & thanks! – Sorry for missing it.


That happens.


(I re-checked against the OMP_LIST_* enum and
it seems to be only missing one.)


Good.

I'm suggesting one more clean up that uses static assert
instead of a run-time check.

Thoughts?
Martin
>From c9aee2c44d5cf7e417d381988b2f4900e9ea8b05 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 17 Aug 2020 11:14:13 +0200
Subject: [PATCH] opnemp: add static assert for clause_names.

gcc/fortran/ChangeLog:

	* openmp.c (resolve_omp_clauses): Add static assert
	for OMP_LIST_NUM and size of clause_names array.
	Remove check that is always true.
---
 gcc/fortran/openmp.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 60d8e5573c2..4d33a450a33 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4371,6 +4371,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	"TO", "FROM", "REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE",
 	"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
 	"NONTEMPORAL" };
+  STATIC_ASSERT (ARRAY_SIZE (clause_names) == OMP_LIST_NUM);
 
   if (omp_clauses == NULL)
 return;
@@ -4732,12 +4733,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
   for (list = 0; list < OMP_LIST_NUM; list++)
 if ((n = omp_clauses->lists[list]) != NULL)
   {
-	const char *name;
-
-	if (list < OMP_LIST_NUM)
-	  name = clause_names[list];
-	else
-	  gcc_unreachable ();
+	const char *name = clause_names[list];
 
 	switch (list)
 	  {
-- 
2.28.0



RE: [PATCH] driver: Fix several memory leaks

2020-08-17 Thread Alex Coplan
Ping^2.

> -Original Message-
> From: Gcc-patches  On Behalf Of Alex
> Coplan
> Sent: 03 August 2020 16:02
> To: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] driver: Fix several memory leaks
> 
> Ping.
> 
> > -Original Message-
> > From: Jit  On Behalf Of Alex Coplan
> > Sent: 14 July 2020 10:08
> > To: gcc-patches@gcc.gnu.org; j...@gcc.gnu.org
> > Subject: [PATCH] driver: Fix several memory leaks
> >
> > Updating the subject since this is really just a driver change (and
> > therefore needs a review from those who can approve patches there).
> >
> > Thanks,
> > Alex
> >
> > -Original Message-
> > From: Jit  On Behalf Of Alex Coplan
> > Sent: 09 July 2020 21:13
> > To: gcc-patches@gcc.gnu.org; j...@gcc.gnu.org
> > Cc: nd 
> > Subject: [PATCH] libgccjit: Fix several memory leaks in the driver
> >
> > Hello,
> >
> > This patch fixes several memory leaks in the driver, all of which
> relate
> > to the handling of static specs. We introduce functions
> > set_static_spec_{shared,owned}() which are used to enforce proper
> memory
> > management when updating the strings in the static_specs table.
> >
> > This is achieved by making use of the alloc_p field in the table
> > entries. Similarly to set_spec(), each time we update an entry, we
> check
> > whether alloc_p is set, and free the old value if so. We then set
> > alloc_p correctly based on whether we "own" this memory or whether
> we're
> > just taking a pointer to a shared string which we shouldn't free.
> >
> > The following table shows the number of leaks found by AddressSanitizer
> > when running a minimal libgccjit program on AArch64. The test program
> > does the whole libgccjit compilation cycle in a loop (including
> acquiring
> > and releasing the context), and the table below shows the number of
> leaks
> > for different iterations of that loop.
> >
> > +--+-+-+--+---+
> > | # of runs >  | 1   | 2   | 3| Leaks per run |
> > +--+-+-+--+---+
> > | Before patch | 463 | 940 | 1417 | 477   |
> > +--+-+-+--+---+
> > | After patch  | 416 | 846 | 1276 | 430   |
> > +--+-+-+--+---+
> >
> > Ensuring that we minimize "leaks per run" (ultimately eliminating all
> of
> > them) is important in order for long-running applications to be able to
> > make use of in-process libgccjit.
> >
> > Testing:
> >  * Bootstrap and regtest on aarch64-linxu-gnu, x86_64-linux-gnu.
> >  * Bootstrap and regtest on aarch64-linux-gnu with bootstrap-asan
> config.
> >  * Smoke test of libgccjit, ran regressions on a --disable-bootstrap
> > build on
> >aarch64-linux-gnu.
> >
> > OK for master?
> >
> > Thanks,
> > Alex
> >
> > ---
> >
> > gcc/ChangeLog:
> >
> > 2020-07-09  Alex Coplan  
> >
> > * gcc.c (set_static_spec): New.
> > (set_static_spec_owned): New.
> > (set_static_spec_shared): New.
> > (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Use
> > set_static_spec_owned() to take ownership of lto_wrapper_file
> > such that it gets freed in driver::finalize.
> > (driver::maybe_run_linker): Use set_static_spec_shared() to
> > ensure that we don't try and free() the static string "ld",
> > also ensuring that any previously-allocated string in
> > linker_name_spec is freed. Likewise with argv0.
> > (driver::finalize): Use set_static_spec_shared() when resetting
> > specs that previously had allocated strings; remove if(0)
> > around call to free().



[PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-17 Thread Alex Coplan
Hello,

Given the following C function:

double *f(double *p, unsigned x)
{
return p + x;
}

prior to this patch, GCC at -O2 would generate:

f:
add x0, x0, x1, uxtw 3
ret

but this add instruction uses architecturally-invalid syntax: the width
of the third operand conflicts with the width of the extension
specifier. The third operand is only permitted to be an x register when
the extension specifier is (u|s)xtx.

This instruction, and analogous insns for adds, sub, subs, and cmp, are
rejected by clang, but accepted by binutils. Assembling and
disassembling such an insn with binutils gives the architecturally-valid
version in the disassembly:

   0:   8b214c00add x0, x0, w1, uxtw #3

This patch fixes several patterns in the AArch64 backend to use the
standard syntax as specified in the Arm ARM such that GCC's output can
be assembled by assemblers other than GAS.

Note that an obvious omission here is that this patch does not touch the
mult patterns such as *add__mult_. I found
that I couldn't hit these patterns with C code since multiplications by
powers of two always get turned into shifts by earlier RTL passes. If
there's a way to reliably hit these patterns, then perhaps these should
be updated as well.

Testing:
 * New test which checks for the correct syntax in all updated
   patterns (fails before and passes after the aarch64.md change).
 * New test can be assembled by both GAS and llvm-mc following the
   change.
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64.md
(*adds__): Ensure extended operand
agrees with width of extension specifier.
(*subs__): Likewise.
(*adds__shift_): Likewise.
(*subs__shift_): Likewise.
(*add__): Likewise.
(*add__shft_): Likewise.
(*add_uxt_shift2): Likewise.
(*sub__): Likewise.
(*sub__shft_): Likewise.
(*sub_uxt_shift2): Likewise.
(*cmp_swp__reg): Likewise.
(*cmp_swp__shft_): Likewise.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
* gcc.target/aarch64/cmp.c: Likewise.
* gcc.target/aarch64/subs3.c: Likewise.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.target/aarch64/extend-syntax.c: New test.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b20dd0b1a0..b1e83dfda78 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2383,7 +2383,7 @@
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (ANY_EXTEND:GPI (match_dup 1)) (match_dup 2)))]
   ""
-  "adds\\t%0, %2, %1, xt"
+  "adds\\t%0, %2, %w1, xt"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2397,7 +2397,7 @@
(set (match_operand:GPI 0 "register_operand" "=r")
(minus:GPI (match_dup 1) (ANY_EXTEND:GPI (match_dup 2]
   ""
-  "subs\\t%0, %1, %2, xt"
+  "subs\\t%0, %1, %w2, xt"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2415,7 +2415,7 @@
  (match_dup 2))
  (match_dup 3)))]
   ""
-  "adds\\t%0, %3, %1, xt %2"
+  "adds\\t%0, %3, %w1, xt %2"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2433,7 +2433,7 @@
   (ashift:GPI (ANY_EXTEND:GPI (match_dup 2))
   (match_dup 3]
   ""
-  "subs\\t%0, %1, %2, xt %3"
+  "subs\\t%0, %1, %w2, xt %3"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2549,7 +2549,7 @@
(plus:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 "register_operand" "r"))
  (match_operand:GPI 2 "register_operand" "r")))]
   ""
-  "add\\t%0, %2, %1, xt"
+  "add\\t%0, %2, %w1, xt"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -2571,7 +2571,7 @@
  (match_operand 2 "aarch64_imm3" "Ui3"))
  (match_operand:GPI 3 "register_operand" "r")))]
   ""
-  "add\\t%0, %3, %1, xt %2"
+  "add\\t%0, %3, %w1, xt %2"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -2819,7 +2819,7 @@
   "*
   operands[3] = GEN_INT (aarch64_uxt_size (INTVAL(operands[2]),
   INTVAL (operands[3])));
-  return \"add\t%0, %4, %1, uxt%e3 %2\";"
+  return \"add\t%0, %4, %w1, uxt%e3 %2\";"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3305,7 +3305,7 @@
   (ANY_EXTEND:GPI
(match_operand:ALLX 2 "register_operand" "r"]
   ""
-  "sub\\t%0, %1, %2, xt"
+  "sub\\t%0, %1, %w2, xt"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3328,7 +3328,7 @@
(match_operand:ALLX 2 "register_operand" "r"))
   (match_operand 3 "aarch64_imm3" "Ui3"]
   ""
-  "sub\\t%0, %1, %2, xt %3"
+  "sub\\t%0, %1, %w2, xt %3"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3607,7 +3607,7 @@
   "*
   operands[3] = GEN_INT (aarch64_uxt_size (INTVAL (operands[2]),
   INTVAL (operands[3])));
-  return \"sub\t%0, %4, 

[PATCH] download_prerequisites: Add option --proxy

2020-08-17 Thread Mert Kirpici via Gcc-patches
The script contrib/download_prerequisites now accepts the command line
argument '--proxy'. Which instructs the fetcher program to use the
specified proxy.

Signed-off-by: Mert Kirpici 
---
contrib/download_prerequisites | 23 ++-
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
index 7d0c4b5ea8d..4a297f29dd2 100755
--- a/contrib/download_prerequisites
+++ b/contrib/download_prerequisites
@@ -58,11 +58,6 @@ case $OS in
;;
esac

-if type wget > /dev/null ; then
- fetch='wget'
-else
- fetch='curl -LO'
-fi
chksum_extension='sha512'
directory='.'

@@ -74,6 +69,7 @@ GCC source tree and the GCC build will do the right thing.
The following options are available:

--directory=DIR download and unpack packages into DIR instead of '.'
+ --proxy=URL download via specified http proxy server URL
--force download again overwriting existing packages
--no-force do not download existing packages again (default)
--isl download ISL, needed for Graphite loop optimizations (default)
@@ -143,6 +139,12 @@ do
--directory=*)
directory="${arg#--directory=}"
;;
+ --proxy=*)
+ proxy="${arg#--proxy=}"
+ ;;
+ --proxy)
+ argnext='proxy'
+ ;;
--force)
force=1
;;
@@ -202,6 +204,9 @@ do
directory)
directory="${arg}"
;;
+ proxy)
+ proxy="${arg}"
+ ;;
*)
die "The impossible has happened"
;;
@@ -218,6 +223,14 @@ unset arg argnext
[ -d "${directory}" ] \
|| die "No such directory: ${directory}"

+if type wget > /dev/null ; then
+ fetch='wget'
+ [ -z "${proxy}" ] || fetch="${fetch} -e use_proxy=on -e http_proxy=${proxy}"
+else
+ fetch='curl -LO'
+ [ -z "${proxy}" ] || fetch="${fetch} --proxy ${proxy}"
+fi
+
for ar in $(echo_archives)
do
if [ ${force} -gt 0 ]; then rm -f "${directory}/${ar}"; fi
--
2.20.1

Re: [PATCH] openmp: fix UBSAN error at gcc/fortran/openmp.c:4737

2020-08-17 Thread Tobias Burnus

On 8/17/20 10:41 AM, Martin Liška wrote:


Since 21cfe724cbdc30612bf1ef59b26f19ada2210832 there's a new
OMP_LIST_NONTEMPORAL value, but it was missing in
resolve_omp_clauses static array that is defined at the function
beginning:

gcc/fortran/ChangeLog:

* openmp.c (resolve_omp_clauses): Add NONTEMPORAL to clause
names.


LGTM & thanks! – Sorry for missing it.
(I re-checked against the OMP_LIST_* enum and
it seems to be only missing one.)

Tobias


---
 gcc/fortran/openmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index c44a2530b88..60d8e5573c2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4369,7 +4369,8 @@ resolve_omp_clauses (gfc_code *code,
gfc_omp_clauses *omp_clauses,
 = { "PRIVATE", "FIRSTPRIVATE", "LASTPRIVATE", "COPYPRIVATE",
"SHARED",
 "COPYIN", "UNIFORM", "ALIGNED", "LINEAR", "DEPEND", "MAP",
 "TO", "FROM", "REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE",
-"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR" };
+"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
+"NONTEMPORAL" };

   if (omp_clauses == NULL)
 return;

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] openmp: fix UBSAN error at gcc/fortran/openmp.c:4737

2020-08-17 Thread Martin Liška

Since 21cfe724cbdc30612bf1ef59b26f19ada2210832 there's a new
OMP_LIST_NONTEMPORAL value, but it was missing in
resolve_omp_clauses static array that is defined at the function
beginning:

./xgcc -B. 
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/gomp/nontemporal-1.f90 
-fopenmp -c
../../gcc/fortran/openmp.c:4737:28: runtime error: index 21 out of bounds for 
type 'char *[21]'
#0 0xbdb956 in resolve_omp_clauses ../../gcc/fortran/openmp.c:4737
#1 0xbeb076 in resolve_omp_do ../../gcc/fortran/openmp.c:6139
#2 0xbf029a in gfc_resolve_omp_directive(gfc_code*, gfc_namespace*) 
../../gcc/fortran/openmp.c:6792
#3 0xcb6363 in gfc_resolve_code(gfc_code*, gfc_namespace*) 
../../gcc/fortran/resolve.c:12185
#4 0xcef8cf in resolve_codes ../../gcc/fortran/resolve.c:17303

Ready for master?
Thanks,
Martin

gcc/fortran/ChangeLog:

* openmp.c (resolve_omp_clauses): Add NONTEMPORAL to clause
names.
---
 gcc/fortran/openmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index c44a2530b88..60d8e5573c2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4369,7 +4369,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
 = { "PRIVATE", "FIRSTPRIVATE", "LASTPRIVATE", "COPYPRIVATE", "SHARED",
"COPYIN", "UNIFORM", "ALIGNED", "LINEAR", "DEPEND", "MAP",
"TO", "FROM", "REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE",
-   "CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR" };
+   "CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
+   "NONTEMPORAL" };
 
   if (omp_clauses == NULL)

 return;
--
2.28.0



[PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-17 Thread Iain Buclaw via Gcc-patches
Hi,

Currently when building a cross-compiler targeting arm-wrs-vxworks7, the
selftests fail unless the VSB_DIR environment variable is set.

The same !nostdinc condition is used for VXWORKS_ADDITIONAL_CPP_SPEC.

OK for mainline?

Iain.

---
gcc/ChangeLog:

* config/vxworks.h (STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if
-nostdinc is used.
---
 gcc/config/vxworks.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/vxworks.h b/gcc/config/vxworks.h
index d648d2f23cb..065c9e12b88 100644
--- a/gcc/config/vxworks.h
+++ b/gcc/config/vxworks.h
@@ -108,7 +108,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #if TARGET_VXWORKS7
 #undef  STARTFILE_PREFIX_SPEC
-#define STARTFILE_PREFIX_SPEC "%:getenv(VSB_DIR /usr/lib/common)"
+#define STARTFILE_PREFIX_SPEC "%{!nostdinc:%:getenv(VSB_DIR /usr/lib/common)}"
 #define TLS_SYM "-u __tls__"
 #else
 #define TLS_SYM ""
-- 
2.20.1



RE: [PATCH] middle-end: Recognize idioms for bswap32 and bswap64 in match.pd.

2020-08-17 Thread Roger Sayle

Hi Jakub and Marc,
Here's version #3 of the patch to recognize bswap32 and bswap64 that
now also implements Jakub's suggestion to support addition and xor in
addition to bitwise ior when recognizing the union of highpart and
lowpart (and two additional tests to check for these variants).

This revised patch has been tested on x86_64-pc-linux-gnu with a
"make bootstrap" and "make -k check" with no new failures, and
confirming all four new tests pass.
Ok for mainline?

2020-08-17  Roger Sayle  
Marc Glisse  
Jakub Jelinek  

gcc/ChangeLog
* match.pd (((T)bswapX(x)<>C) -> bswapY(x)):
New simplifications to recognize __builtin_bswap{32,64}.

gcc/testsuite/ChangeLog
* gcc.dg/fold-bswap-1.c: New test.
* gcc.dg/fold-bswap-2.c: New test.
* gcc.dg/fold-bswap-3.c: New test.
* gcc.dg/fold-bswap-4.c: New test.


Thanks in advance,
Roger
--

-Original Message-
From: Jakub Jelinek  
Sent: 15 August 2020 14:26
To: Roger Sayle 
Cc: 'GCC Patches' ; 'Marc Glisse'

Subject: Re: [PATCH] middle-end: Recognize idioms for bswap32 and bswap64 in
match.pd.

On Sat, Aug 15, 2020 at 11:09:17AM +0100, Roger Sayle wrote:
> +/* Recognize ((T)bswap32(x)<<32)|bswap32(x>>32) as bswap64(x).  */ 
> +(simplify
> +  (bit_ior:c

Any reason for supporting bit_ior only?  Don't plus:c or bit_xor:c work the
same (i.e. use (for op (bit_ior bit_xor plus) ...)?

Jakub

diff --git a/gcc/match.pd b/gcc/match.pd
index c3b8816..3d7a0db 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3410,6 +3410,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bswap (bitop:c (bswap @0) @1))
(bitop @0 (bswap @1)
 
+/* Recognize ((T)bswap32(x)<<32)|bswap32(x>>32) as bswap64(x).  */
+(for op (bit_ior bit_xor plus)
+  (simplify
+(op:c
+  (lshift (convert (BUILT_IN_BSWAP32 (convert@0 @1)))
+ INTEGER_CST@2)
+  (convert (BUILT_IN_BSWAP32 (convert@3 (rshift @1 @2)
+(if (INTEGRAL_TYPE_P (type)
+&& TYPE_PRECISION (type) == 64
+&& types_match (TREE_TYPE (@1), uint64_type_node)
+&& types_match (TREE_TYPE (@0), uint32_type_node)
+&& types_match (TREE_TYPE (@3), uint32_type_node)
+&& wi::to_widest (@2) == 32)
+  (convert (BUILT_IN_BSWAP64 @1)
+
+/* Recognize ((T)bswap16(x)<<16)|bswap16(x>>16) as bswap32(x).  */
+(for op (bit_ior bit_xor plus)
+  (simplify
+(op:c
+  (lshift
+   (convert (BUILT_IN_BSWAP16 (convert (bit_and @0 INTEGER_CST@1
+   (INTEGER_CST@2))
+  (convert (BUILT_IN_BSWAP16 (convert (rshift @0 @2)
+(if (INTEGRAL_TYPE_P (type)
+&& TYPE_PRECISION (type) == 32
+&& types_match (TREE_TYPE (@0), uint32_type_node)
+&& wi::to_widest (@1) == 65535
+&& wi::to_widest (@2) == 16)
+  (convert (BUILT_IN_BSWAP32 @0)
 
 /* Combine COND_EXPRs and VEC_COND_EXPRs.  */
 
diff --git a/gcc/testsuite/gcc.dg/fold-bswap-1.c 
b/gcc/testsuite/gcc.dg/fold-bswap-1.c
new file mode 100644
index 000..3abb862
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-bswap-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int swap32(unsigned int x)
+{
+  if (sizeof(unsigned int)==4 && sizeof(unsigned short)==2) {
+unsigned int a = __builtin_bswap16(x);
+x >>= 16;
+a <<= 16;
+return __builtin_bswap16(x) | a;
+  } else return __builtin_bswap32(x);
+}
+
+unsigned long swap64(unsigned long x)
+{
+  if (sizeof(unsigned long)==8 && sizeof(unsigned int)==4) {
+unsigned long a = __builtin_bswap32(x);
+x >>= 32;
+a <<= 32;
+return __builtin_bswap32(x) | a;
+  } else return __builtin_bswap64(x);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_bswap32" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_bswap64" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-bswap-2.c 
b/gcc/testsuite/gcc.dg/fold-bswap-2.c
new file mode 100644
index 000..a581fd6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-bswap-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int swap32(unsigned int x)
+{
+  if (sizeof(int)==4 && sizeof(short)==2) {
+int a = __builtin_bswap16(x);
+x >>= 16;
+a <<= 16;
+return __builtin_bswap16(x) | a;
+  } else return __builtin_bswap32(x);
+}
+
+long swap64(unsigned long x)
+{
+  if (sizeof(long)==8 && sizeof(int)==4) {
+long a = __builtin_bswap32(x);
+x >>= 32;
+a <<= 32;
+return __builtin_bswap32(x) | a;
+  } else return __builtin_bswap64(x);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_bswap32" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_bswap64" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-bswap-3.c 
b/gcc/testsuite/gcc.dg/fold-bswap-3.c
new file mode 100644
index 000..13bb6eb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-bswap-3.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-17 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, August 5, 2020 12:26 AM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng  writes:
> >> > Sorry, I should have used it.  And I prepare a patch to use
> FLOAT_MODE_P
> >> > macro and add a flag FLAG_SUPPRESS_FP_EXCEPTIONS to suppress
> >> > FLAG_RAISE_FP_EXCEPTIONS for certain intrinsics in future.
> >>
> >> The same thing is true for reading FPCR as well, so I think the flag
> >> should suppress the FLOAT_MODE_P check, instead of fixing up the flags
> >> afterwards.
> >>
> >> I'm struggling to think of a good name though.  How about adding
> >> FLAG_AUTO_FP and making the FLOAT_MODE_P check dependent on
> >> FLAG_AUTO_FP
> >> being set?
> >>
> >> We could leave FLAG_AUTO_FP out of FLAG_ALL, since FLAG_ALL already
> >> includes FLAG_FP.  Including it in FLAG_ALL wouldn't do no any harm
> >> though.
> >
> > I could not think of a better name either.  So I choose to use
> FLAG_AUTO_FP
> > to control the check of FLOAT_MODE_P finally.
> >
> > Bootstrapped and tested on aarch64 Linux platform.
> 
> Thanks, pushed to master.
> 
> Richard

I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first for a try,
including all the add/sub arithmetic intrinsics.

Something like faddp intrinsic which only handles floating-point operations,
both FP and NONE flags are suitable for it because FLAG_FP will be added
later if the intrinsic handles floating-point operations.  And I prefer FP since
it would be more clear.

But for qadd intrinsics, they would modify FPSR register which is a scenario
I missed before.  And I consider to add an additional flag FLAG_WRITE_FPSR
to represent it.

Bootstrapped and tested on aarch64 Linux platform.

Have any suggestions?

Thanks,
XieZhiheng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9cf1f9733e7..cde50c54d9e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-08-17  Zhiheng Xie  
+
+   * config/aarch64/aarch64-builtins.c (aarch64_modifies_global_state_p):
+   Add flag FLAG_WRITE_FPSR to control attribtues.
+   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAGS
+   for intrinsic functions.
+


pr94442-v1.patch
Description: pr94442-v1.patch


[PING][PATCH 6/6] contrib: Add OPT-enable-obsolete to tile*-*-*

2020-08-17 Thread Iain Buclaw via Gcc-patches
Ping.

On 31/05/2020 12:20, Iain Buclaw wrote:
> The tile*-*-* targets were marked as obsolete in SVN r259724.
> 
> OK?
> 
> Regards
> Iain
> 
> ---
> contrib/ChangeLog:
> 
>   * config-list.mk (LIST): Add OPT-enable-obsolete to tilegx-linux-gnu,
>   tilegxbe-linux-gnu, and tilepro-linux-gnu.
> ---
>  contrib/config-list.mk | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index 5818f7df08b..8a4ce8aca25 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -93,7 +93,8 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
>
> sparc64-sun-solaris2.11OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix 
> \
>sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 
> \
>sparc64-netbsd sparc64-openbsd \
> -  tilegx-linux-gnu tilegxbe-linux-gnu tilepro-linux-gnu \
> +  tilegx-linux-gnuOPT-enable-obsolete tilegxbe-linux-gnuOPT-enable-obsolete \
> +  tilepro-linux-gnuOPT-enable-obsolete \
>v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
>vax-netbsdelf vax-openbsd visium-elf x86_64-apple-darwin \
>x86_64-pc-linux-gnuOPT-with-fpmath=avx \
> 


[PING][PATCH] tilepro: Update generator file to define IN_TARGET_CODE in target file.

2020-08-17 Thread Iain Buclaw via Gcc-patches
Ping.

On 31/05/2020 12:48, Iain Buclaw wrote:
> Hi,
> 
> The target files tilegx/mul-tables.c and tilepri/mul-tables.c were
> updated in SVN r255743, but the generator file that produces them
> wasn't, so it was reverting this change during builds.
> 
> Only tested by running make all-gcc for all tile*-*-* targets present in
> config-list.mk.
> 
> OK?
> 
> Regards
> Iain
> 
> ---
> gcc/ChangeLog:
> 
>   * config/tilepro/gen-mul-tables.cc (main): Define IN_TARGET_CODE to 1
>   in the target file.
> ---
>  gcc/config/tilepro/gen-mul-tables.cc | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/config/tilepro/gen-mul-tables.cc 
> b/gcc/config/tilepro/gen-mul-tables.cc
> index 2a345023aea..7f9fb65dc2f 100644
> --- a/gcc/config/tilepro/gen-mul-tables.cc
> +++ b/gcc/config/tilepro/gen-mul-tables.cc
> @@ -1252,6 +1252,8 @@ main ()
>printf ("/* Note this file is auto-generated from gen-mul-tables.cc.\n");
>printf ("   Make any required changes there.  */\n");
>printf ("\n");
> +  printf ("#define IN_TARGET_CODE 1\n");
> +  printf ("\n");
>printf ("#include \"config.h\"\n");
>printf ("#include \"system.h\"\n");
>printf ("#include \"coretypes.h\"\n");
> 


[PATCH] Fortran : get_environment_variable runtime error PR96486

2020-08-17 Thread Mark Eggleston

Please find attached a fix for PR96486.

OK to commit?

[PATCH] Fortran  :  get_environment_variable runtime error PR96486

Runtime error occurs when the type of the value argument is
character(0):  "Zero-length string passed as value...".
The status argument, intent(out), will contain -1 if the value
of the environment is too large to fit in the value argument, this
is the case if the type is character(0) so there is no reason to
produce a runtime error if the value argument is zero length.

2020-08-17  Mark Eggleston 

libgfortran/

    PR fortran/96486
    * intrinsics/env.c: If value_len is > 0 blank the string.
    Copy the result only if its length is > 0.

2020-08-17  Mark Eggleston 

gcc/testsuite/

    PR fortran/96486
    * gfortran.dg/pr96486.f90

--
https://www.codethink.co.uk/privacy.html

>From 63827120e6286181652c72501f927599125a0508 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Mon, 10 Aug 2020 08:07:39 +0100
Subject: [PATCH] Fortran  :  get_environment_variable runtime error PR96486

Runtime error occurs when the type of the value argument is
character(0):  "Zero-length string passed as value...".
The status argument, intent(out), will contain -1 if the value
of the environment is too large to fit in the value argument, this
is the case if the type is character(0) so there is no reason to
produce a runtime error if the value argument is zero length.

2020-08-17  Mark Eggleston  

libgfortran/

	PR fortran/96486
	* intrinsics/env.c: If value_len is > 0 blank the string.
	Copy the result only if its length is > 0.

2020-08-17  Mark Eggleston  

gcc/testsuite/

	PR fortran/96486
	* gfortran.dg/pr96486.f90
---
 gcc/testsuite/gfortran.dg/pr96486.f90 | 9 +
 libgfortran/intrinsics/env.c  | 7 ++-
 2 files changed, 11 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr96486.f90

diff --git a/gcc/testsuite/gfortran.dg/pr96486.f90 b/gcc/testsuite/gfortran.dg/pr96486.f90
new file mode 100644
index 000..fdc7025d61c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96486.f90
@@ -0,0 +1,9 @@
+! { dg-do run }
+
+program test
+  implicit none
+  character(0) :: value
+  integer :: l, stat
+  call get_environment_variable("HOME",value,length=l,status=stat)
+  if (stat.ne.-1) stop 1
+end program test
diff --git a/libgfortran/intrinsics/env.c b/libgfortran/intrinsics/env.c
index b7837b30873..7ab0b443897 100644
--- a/libgfortran/intrinsics/env.c
+++ b/libgfortran/intrinsics/env.c
@@ -110,10 +110,7 @@ get_environment_variable_i4 (char *name, char *value, GFC_INTEGER_4 *length,
 
   if (value != NULL)
 { 
-  if (value_len < 1)
-	runtime_error ("Zero-length string passed as value to "
-		   "get_environment_variable.");
-  else
+  if (value_len > 0)
 	memset (value, ' ', value_len); /* Blank the string.  */
 }
 
@@ -138,7 +135,7 @@ get_environment_variable_i4 (char *name, char *value, GFC_INTEGER_4 *length,
 	  memcpy (value, res, value_len);
 	  stat = GFC_VALUE_TOO_SHORT;
 	}
-	  else
+	  else if (res_len > 0)
 	memcpy (value, res, res_len);
 	}
 }
-- 
2.11.0



[PATCH] Fortran : rejected f0.d edit descriptor PR96436

2020-08-17 Thread Mark Eggleston

Please find attached a patch for PR96436.

OK to commit?

[PATCH] Fortran  : rejected f0.d edit descriptor PR96436

Zero length f format descriptors are valid for Fortran 95 and
later.  For g format descriptors from Fortran 2008 and later.
Finally for D, E, EN and ES for Fortran 2018 and later.

2020-08-10  Mark Eggleston 

libgfortran/io/

    PR fortran/96436
    * format.c (parse_format_list):  Add new local variable
    "standard" to hold the required standard to check. If the
    format width is zero select standard depending on descriptor.
    Call notification_std using the new standard variable.

2020-08-10  Mark Eggleston 

gcc/testsuite/

    PR fortran/96436
    * gfortran.dg/pr96436_1.f90
    * gfortran.dg/pr96436_2.f90
    * gfortran.dg/pr96436_3.f90
    * gfortran.dg/pr96436_4.f90
    * gfortran.dg/pr96436_5.f90
    * gfortran.dg/pr96436_6.f90
    * gfortran.dg/pr96436_7.f90
    * gfortran.dg/pr96436_8.f90
    * gfortran.dg/pr96436_9.f90
    * gfortran.dg/pr96436_10.f90

--
https://www.codethink.co.uk/privacy.html

>From 9f60ccd71e0c675b48d6614141d1aeddaa863191 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Tue, 4 Aug 2020 14:10:08 +0100
Subject: [PATCH] Fortran  : rejected f0.d edit descriptor PR96436

Zero length f format descriptors are valid for Fortran 95 and
later.  For g format descriptors from Fortran 2008 and later.
Finally for D, E, EN and ES for Fortran 2018 and later.

2020-08-10  Mark Eggleston  

libgfortran/io/

	PR fortran/96436
	* format.c (parse_format_list):  Add new local variable
	"standard" to hold the required standard to check. If the
	format width is zero select standard depending on descriptor.
	Call notification_std using the new standard variable.

2020-08-10  Mark Eggleston  

gcc/testsuite/

	PR fortran/96436
	* gfortran.dg/pr96436_1.f90
	* gfortran.dg/pr96436_2.f90
	* gfortran.dg/pr96436_3.f90
	* gfortran.dg/pr96436_4.f90
	* gfortran.dg/pr96436_5.f90
	* gfortran.dg/pr96436_6.f90
	* gfortran.dg/pr96436_7.f90
	* gfortran.dg/pr96436_8.f90
	* gfortran.dg/pr96436_9.f90
	* gfortran.dg/pr96436_10.f90
---
 gcc/testsuite/gfortran.dg/pr96436_1.f90  | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_10.f90 | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_2.f90  | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_3.f90  | 13 +
 gcc/testsuite/gfortran.dg/pr96436_4.f90  | 25 +
 gcc/testsuite/gfortran.dg/pr96436_5.f90  | 25 +
 gcc/testsuite/gfortran.dg/pr96436_6.f90  | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_7.f90  | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_8.f90  | 10 ++
 gcc/testsuite/gfortran.dg/pr96436_9.f90  | 10 ++
 libgfortran/io/format.c  | 10 +-
 11 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_10.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr96436_9.f90

diff --git a/gcc/testsuite/gfortran.dg/pr96436_1.f90 b/gcc/testsuite/gfortran.dg/pr96436_1.f90
new file mode 100644
index 000..7cc6a0a69b1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96436_1.f90
@@ -0,0 +1,10 @@
+! { dg-do run }
+! { dg-options "-std=f95 -pedantic" }
+
+character(20) :: fmt
+character(9) :: buffer
+fmt = "(1a1,f0.2,1a1)"
+write(buffer,fmt) ">", 3.0, "<"
+if (buffer.ne.">3.00<") stop 1
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr96436_10.f90 b/gcc/testsuite/gfortran.dg/pr96436_10.f90
new file mode 100644
index 000..3bd30a9f16b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96436_10.f90
@@ -0,0 +1,10 @@
+! { dg-do run }
+! { dg-options "-std=f2008 -pedantic" }
+! { dg-shouldfail "Zero width in format descriptor" }
+
+character(10) :: fmt = "(es0.2)"
+print fmt, 3.
+end
+
+! { dg-output "Fortran runtime error: Zero width in format descriptor" }
+
diff --git a/gcc/testsuite/gfortran.dg/pr96436_2.f90 b/gcc/testsuite/gfortran.dg/pr96436_2.f90
new file mode 100644
index 000..d2d6caffbfe
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96436_2.f90
@@ -0,0 +1,10 @@
+! { dg-do run }
+! { dg-options "-std=f2003 -pedantic" }
+
+character(20) :: fmt
+character(9) :: buffer
+fmt = "(1a1,f0.2,1a1)"
+write(buffer,fmt) ">", 3.0, "<"
+if (buffer.ne.">3.00<") stop 1
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr96436_3.f90 b/gcc/testsuite/gfortran.dg/pr96436_3.f90
new file mode 100644
index 000..2750231312f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96436_3.f90
@@ -0,0 +1,13 @@
+! { dg-do run }
+! { dg-options 

*PING* – Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-08-17 Thread Tobias Burnus

On 8/3/20 5:37 PM, Tobias Burnus wrote:

It turned out that the omp_discover_declare_target_tgt_fn_r
discovered all nodes – but as it tagged the C++ alias nodes
and not the streamed-out nodes, no device function was created
and one got link errors if offloading devices were configured.
(Only with -O0 as otherwise inlining happened.)

(Testcase is based on a sollve_vv testcase which in turn was
based on an LLVM bugreport.)

OK?

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] C-SKY: Support -mfloat-abi=hard.

2020-08-17 Thread Xianmiao Qu

Hi Jojo,


On 8/17/20 9:52 AM, Jojo R wrote:

--- a/gcc/config/csky/csky.c
+++ b/gcc/config/csky/csky.c
@@ -328,6 +328,10 @@ csky_cpu_cpp_builtins (cpp_reader *pfile)
  {
builtin_define ("__csky_hard_float__");
builtin_define ("__CSKY_HARD_FLOAT__");
+  if (TARGET_HARD_FLOAT_ABI)
+builtin_define ("__CSKY_HARD_FLOAT_ABI__");
+  if (TARGET_SINGLE_FPU)
+builtin_define ("__CSKY_HARD_FLOAT_FPU_SF__");
  }

These two builtin definitions should also support lowercase.
  
diff --git a/gcc/config/csky/csky.md b/gcc/config/csky/csky.md

@@ -3310,6 +3312,88 @@
 force_reg (Pmode, XEXP (operands[1], 0)));
}")
  
+;; Call subroutine returning any type.

+
+(define_expand "untyped_call"
+  [(parallel [(call (match_operand 0 "" "")
+(const_int 0))
+(match_operand 1 "" "")
+(match_operand 2 "" "")])]
+  "TARGET_HARD_FLOAT_ABI"
+{
+  int i;
+
+  emit_call_insn (gen_call (operands[0], const0_rtx));
+
+  for (i = 0; i < XVECLEN (operands[2], 0); i++)
+{
+  rtx set = XVECEXP (operands[2], 0, i);
+  emit_move_insn (SET_DEST (set), SET_SRC (set));
+}
+
+  /* The optimizer does not know that the call sets the function value
+ registers we stored in the result block.  We avoid problems by
+ claiming that all hard registers are used and clobbered at this
+ point.  */
+  emit_insn (gen_blockage ());
+
+  DONE;
+})
Why does untyped_call only supported when the -mfloat-abi=hard? I think 
this should be supported in any float abis.