Re: [PATCH] rs6000: inefficient 64-bit constant generation for consecutive 1-bits

2020-09-14 Thread Alan Modra via Gcc-patches
On Thu, Sep 10, 2020 at 04:58:03PM -0500, Peter Bergner via Gcc-patches wrote:
> +unsigned long
> +test0 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test1 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test2 (void)
> +{
> +   return 0x0000UL;
> +}
> +
> +unsigned long
> +test3 (void)
> +{
> +   return 0xff00UL;
> +}
> +
> +unsigned long
> +test4 (void)
> +{
> +   return 0xff00UL;
> +}
> +
> +unsigned long
> +test5 (void)
> +{
> +   return 0x0100UL;
> +}
> +
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,8,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,24,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,48" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,23" } } */

Just a comment, I don't really see too much reason to change anything,
but scan-assembler tests can be a maintenance pain in cases like these
where there are multiple ways to generate a constant in two
instructions.  For example,

test3:
li 3,-1
rldicr 3,3,0,23
blr
and

test5:
li 3,16384
rotldi 3,3,26
blr

would be two valid possibilities for test3 and test5 that don't use
rldic.  Ideally the test would verify the actual values generated by
the test functions and count instructions.

And having said that I probably ought to post such a testcase for
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553927.html
I'm sure I had one lying around somewhere...

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread luoxhu via Gcc-patches



On 2020/9/14 17:47, Richard Biener wrote:

On Mon, Sep 14, 2020 at 10:05 AM luoxhu  wrote:



Not sure whether this reflects the issues you discussed above.
I constructed below test cases and tested with and without this patch,
only if "a+c"(which means store only), the performance is getting bad with
this patch due to extra load/store(about 50% slower).
For "v = vec_insert (i, v, n);" usage, v is always loaded after store, so this
patch will always get big benefit.
But for "v[n % 4] = i;" usage, it depends on whether "v" is used immediately
inside the function or out of the function soon.  Does this mean unify the two
usage to same gimple code not a good idea sometimes?  Or is it possible to
detect the generated IFN ".VEC_INSERT (, i_4(D), _1);" destination "v" is
used not far away inside or out side of the function?


#define TYPE int

vector TYPE v = {1, 2, 3, 4};   // a. global vector.
vector TYPE s = {4, 3, 2, 1};

__attribute__ ((noinline))
vector TYPE
test (vector TYPE u, TYPE i, size_t n)
{
   v[n % 4] = i;


^^^

this should be

u[n % 4] = i;

I guess.  Is the % 4 mandated by the vec_insert semantics btw?


Yes.  Segher pasted the builtin description in his reply.  "v = vec_insert (i, u, 
n);"
is a bit different with "u[n % 4] = i;" since it returns a copy of u instead of 
modify
u. The adjust in lower __builtin_vec_insert_xxx gimple will make a copy of u 
first to
meet the requirements.



If you tested with the above error you probably need to re-measure.


No I did test for u as local instead of global before.  If u is not used very
soon, the performance is almost the same for generating single store or IFN_SET
of inserting with variable index.

source:
__attribute__ ((noinline)) vector TYPE
test (vector TYPE u, TYPE i, size_t n)
{
 u[n % 4] = i;
 vector TYPE r = {0};
 for (long k = 0; k < 100; k++)
   {
 r += v;
   }
 return u+r;
}

=> store hit load is relieved due to long distance.

ASM:
0:  addis 2,12,.TOC.-.LCF0@ha
   addi 2,2,.TOC.-.LCF0@l
   .localentry test,.-test
   addis 10,2,.LANCHOR0@toc@ha
   li 8,50
   xxspltib 32,0
   addi 9,1,-16
   rldic 6,6,2,60
   stxv 34,-16(1)
   addi 10,10,.LANCHOR0@toc@l
   mtctr 8
   xxlor 33,32,32
   stwx 5,9,6  // short store
   lxv 45,0(10)
   .p2align 4,,15
.L2:
   vadduwm 0,0,13
   vadduwm 1,1,13
   bdnz .L2
   vadduwm 0,0,1
   lxv 33,-16(1)   // wide load
   vadduwm 2,0,1
   blr


Then I intend to use "v" there for global memory insert test to separate
the store and load into different function ("v" is stored in function,
but loaded out of function will get different performance for single store
or lxv+xxperm+xxsel+stxv, the inner function doesn't know the distance
between load and store across functions).

Do you mean that we don't need generate IFN_SET if "v" is global memory?
I only see VAR_DECL and PARM_DECL, is there any function to check the tree
variable is global? I added DECL_REGISTER, but the RTL still expands to stack:

gcc/internal-fn.c: rtx to_rtx = expand_expr (view_op0, NULL_RTX, VOIDmode, 
EXPAND_WRITE);

(gdb) p view_op0
$584 = 
(gdb) p DECL_REGISTER(view_op0)
$585 = 1
(gdb) p to_rtx
$586 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3190+0 
S16 A128])




On gimple the above function (after fixing it) looks like

   VIEW_CONVERT_EXPR(u)[_1] = i_4(D);

and the IFN idea I had would - for non-global memory 'u' only - transform
this to

   vector_register_2 = u;
   vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D));
   u = vector_register_3;

if vec_set can handle variable indexes.  This then becomes a
vec_set on a register and if that was the only variable indexing of 'u'
will also cause 'u' to be expanded as register rather than stack memory.

Note we can't use the direct-optab method here since the vec_set optab
modifies operand 0 which isn't possible in SSA form.  That might hint
at that we eventually want to extend BIT_INSERT_EXPR to handle
a non-constant bit position but for experiments using an alternate
internal function is certainly easier.



My current implementation does:

1)  v = vec_insert (i, u, n);

=>gimple:
{
 register __vector signed int D.3190;
 D.3190 = u;// *new decl and copy u first.*
 _1 = n & 3;
 VIEW_CONVERT_EXPR(D.3190)[_1] = i;   // *update op0 of 
VIEW_CONVERT_EXPR*
 _2 = D.3190;
 ...
}

=>isel:
{
 register __vector signed int D.3190;
 D.3190 = u_4(D);
 _1 = n_6(D) & 3;
 .VEC_SET (, i_7(D), _1);
 _2 = D.3190;
 ...
}


2) u[n % 4] = i;

=>gimple:
{
 __vector signed int D.3191;
 _1 = n & 3;
 VIEW_CONVERT_EXPR(u)[_1] = i;   // *update op0 of VIEW_CONVERT_EXPR*
 D.3191 = u;   
...

}

=>isel:
{
 D.3190 = u_4(D);
 _1 = n_6(D) & 3;
 .VEC_SET (, i_7(D), _1);
 _2 = D.3190;
 v = _2;
}

The IFN ".VEC_SET" behavior quite like the other IFN STORE functions and doesn't
require a dest operand to be set?  Both 1) and 2) are modifying operand 0 of
VIEW_CONVERT_EXPR just 

Re: Ping: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)

2020-09-14 Thread Feng Xue OS via Gcc-patches
>@@ -3426,8 +3426,16 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
>gimple, operand *result)
>  /* Re-fold the toplevel result.  It's basically an embedded
> gimple_build w/o actually building the stmt.  */
>  if (!is_predicate)
>-   fprintf_indent (f, indent,
>-   "res_op->resimplify (lseq, valueize);\n");
>+   {
>+ fprintf_indent (f, indent,
>+ "res_op->resimplify (lseq, valueize);\n");
>+ if (e->force_leaf)
>+   {
>+ fprintf_indent (f, indent,
>+ "if (!maybe_push_res_to_seq (res_op, NULL))\n");
>+ fprintf_indent (f, indent + 2, "return false;\n");
>
>please use "goto %s;\n", fail_label)  here.  OK with that change.
Ok.

>
>I've tried again to think about sth prettier to cover these kind of
>single-use checks but failed to come up with sth.
Maybe we need a smart combiner that can deduce cost globally, and
remove these single-use specifiers from rule description.

Feng


From: Richard Biener 
Sent: Monday, September 14, 2020 9:39 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Ping: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands 
(PR 94234)

On Mon, Sep 14, 2020 at 5:17 AM Feng Xue OS via Gcc-patches
 wrote:
>
> Thanks,

@@ -3426,8 +3426,16 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
gimple, operand *result)
  /* Re-fold the toplevel result.  It's basically an embedded
 gimple_build w/o actually building the stmt.  */
  if (!is_predicate)
-   fprintf_indent (f, indent,
-   "res_op->resimplify (lseq, valueize);\n");
+   {
+ fprintf_indent (f, indent,
+ "res_op->resimplify (lseq, valueize);\n");
+ if (e->force_leaf)
+   {
+ fprintf_indent (f, indent,
+ "if (!maybe_push_res_to_seq (res_op, NULL))\n");
+ fprintf_indent (f, indent + 2, "return false;\n");

please use "goto %s;\n", fail_label)  here.  OK with that change.

I've tried again to think about sth prettier to cover these kind of
single-use checks but failed to come up with sth.

Thanks and sorry for the delay,
Richard.

> Feng
>
> 
> From: Feng Xue OS
> Sent: Thursday, September 3, 2020 2:06 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 
> 94234)
>
> For pattern A * C +- B * C -> (A +- B) * C, simplification is disabled
> when A and B are not single-use. This patch is a minor enhancement
> on the pattern, which allows folding if final result is found to be a
> simple gimple value (constant/existing SSA).
>
> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>
> Feng
> ---
> 2020-09-03  Feng Xue  
>
> gcc/
> PR tree-optimization/94234
> * genmatch.c (dt_simplify::gen_1): Emit check on final simplification
> result when "!" is specified on toplevel output expr.
> * match.pd ((A * C) +- (B * C) -> (A +- B) * C): Allow folding for
> expr with multi-use operands if final result is a simple gimple value.
>
> gcc/testsuite/
> PR tree-optimization/94234
> * gcc.dg/pr94234-2.c: New test.
> ---


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Qing Zhao via Gcc-patches



> On Sep 14, 2020, at 6:09 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool 
>>>  wrote:
>>> It is definitely *not* effective if there are gadgets that set rax to
>>> a value the attacker wants and then do a syscall.
>> 
>> You mean the following gadget:
>> 
>> 
>> Gadget 1:
>> 
>> mov  rax,  value
>> syscall
>> ret
> 
> No, just
> 
> mov rax,59
> syscall
> 
> (no ret necessary!)

But for ROP, a typical gadget should be ended with a “ret” (or indirect 
branch), right?

Qing
> 
> I.e. just anything that already does an execve.
> 
> 
> Segher



[PATCH] Return mask <-> integer cost for non-AVX512 micro-architecture.

2020-09-14 Thread Hongtao Liu via Gcc-patches
Hi:
  This patch would avoid spill gprs to mask registers for non-AVX512
micro-architecture and fix regression in PR96744.

  Bootstrap is ok, regression test for i386/x86-64 backend is ok.
  No big performance impact on SPEC2017.

gcc/ChangeLog:

PR taregt/96744
* config/i386/x86-tune-costs.h (struct processor_costs):
Increase mask <-> integer cost for non AVX512 target to avoid
spill gpr to mask. Also retune mask <-> integer and
mask_load/store for skylake_cost.


-- 
BR,
Hongtao
From 66549572467fe5dc5c4221e7857f3051d4f51554 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Mon, 24 Aug 2020 20:36:52 +0800
Subject: [PATCH] Retune mask <->integer cost for non-AVX512
 micro-architecture.

gcc/ChangeLog:

	PR taregt/96744
	* config/i386/x86-tune-costs.h (struct processor_costs):
	Increase mask <-> integer cost for non AVX512 target to avoid
	spill gpr to mask. Also retune mask <-> integer and
	mask_load/store for skylake_cost.
---
 gcc/config/i386/x86-tune-costs.h | 88 
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index a782a9dd9e3..0ad4b28903c 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -58,8 +58,8 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */
 	   in 32,64,128,256 and 512-bit */
   {3, 3, 3, 3, 3},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {2, 2, 2},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 2, 2},/* cost if storing mask register
@@ -169,8 +169,8 @@ struct processor_costs i386_cost = {	/* 386 specific costs */
 	   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {2, 4, 2},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 4, 2},/* cost if storing mask register
@@ -277,8 +277,8 @@ struct processor_costs i486_cost = {	/* 486 specific costs */
 	   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {2, 4, 2},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 4, 2},/* cost if storing mask register
@@ -387,8 +387,8 @@ struct processor_costs pentium_cost = {
 	   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {2, 4, 2},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 4, 2},/* cost if storing mask register
@@ -488,8 +488,8 @@ struct processor_costs lakemont_cost = {
 	   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {2, 4, 2},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 4, 2},/* cost if storing mask register
@@ -604,8 +604,8 @@ struct processor_costs pentiumpro_cost = {
 	   in 32,64,128,256 and 512-bit */
   {4, 8, 16, 32, 64},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  3, 3,	/* SSE->integer and integer->SSE moves */
-  2, 2,/* mask->integer and integer->mask moves */
+  3, 3,/* SSE->integer and integer->SSE moves */
+  3, 3,/* mask->integer and integer->mask moves */
   {4, 4, 4},/* cost of loading mask register
 	   in QImode, HImode, SImode.  */
   {2, 2, 2},/* cost if storing mask register
@@ -711,8 +711,8 @@ struct processor_costs geode_cost = {
 	   in 32,64,128,256 and 512-bit */
   {2, 2, 8, 16, 32},			/* cost of storing SSE registers
 	   in 32,64,128,256 and 512-bit */
-  6, 6,	/* SSE->integer and integer->SSE moves */
-  

Re: [PATCH v2] c, c++: Implement -Wsizeof-array-div [PR91741]

2020-09-14 Thread Marek Polacek via Gcc-patches
On Mon, Sep 14, 2020 at 08:39:36PM +, Joseph Myers wrote:
> On Mon, 14 Sep 2020, Marek Polacek via Gcc-patches wrote:
> 
> > so I followed suit.  In the C++ FE this was rather easy, because
> > finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
> > it was trickier; I've added a NOP_EXPR to discern between the non-()
> > and () versions.
> 
> This sort of thing is normally handled for C via original_code in c_expr.  
> I suppose that doesn't work in this case because the code dealing with 
> parenthesized expressions has a special case for sizeof:
> 
>   if (expr.original_code != C_MAYBE_CONST_EXPR
>   && expr.original_code != SIZEOF_EXPR)
> expr.original_code = ERROR_MARK;
> 
> Handling this in some way via c_expr seems better to me than generating 
> NOP_EXPR.  I suppose you could invent a PAREN_SIZEOF_EXPR used by (sizeof 
> foo) and ((sizeof foo)) etc. as an original_code setting (and handled the 
> same as SIZEOF_EXPR by whatever other warnings look for SIZEOF_EXPR 
> there), or else add fields to c_expr to allow more such information to be 
> tracked there.

I went with the last option.  But there was a snag: c_expr doesn't have
a default constructor, so the new member would remain uninitialized.
Instead of initializing .parenthesized_p everywhere where we initialize
.original_{type,type}, I only initialize .parenthesized_p when creating
a SIZEOF_EXPR.

It would seem appropriate to add a default constructor/NSDMIs for c_expr,
and then clean up the codebase so that we don't initialize c_expr's fields
manually.  Would you accept such a patch?  It's clearly out of scope for
this patch though.  Such a change would also mean that c_expr is no longer
a trivial type, meaning that we can't use memset on it, and that we couldn't
skip its initialization like we do e.g. in c_parser_alignof_expression.

Thanks for the review.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements a new warning, -Wsizeof-array-div.  It warns about
code like

  int arr[10];
  sizeof (arr) / sizeof (short);

where we have a division of two sizeof expressions, where the first
argument is an array, and the second sizeof does not equal the size
of the array element.  See e.g. .

Clang makes it possible to suppress the warning by parenthesizing the
second sizeof like this:

  sizeof (arr) / (sizeof (short));

so I followed suit.  In the C++ FE this was rather easy, because
finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
I've added a parenthesized_p member to c_expr to discern between the
non-() and () versions.

This warning is enabled by -Wall.  An example of the output:

x.c:5:23: warning: expression does not compute the number of elements in this 
array; element type is ‘int’, not ‘short int’ [-Wsizeof-array-div]
5 |   return sizeof (arr) / sizeof (short);
  |  ~^~~~
x.c:5:25: note: add parentheses around ‘sizeof (short int)’ to silence this 
warning
5 |   return sizeof (arr) / sizeof (short);
  | ^~
  | ( )
x.c:4:7: note: array ‘arr’ declared here
4 |   int arr[10];
  |   ^~~

gcc/c-family/ChangeLog:

PR c++/91741
* c-common.h (maybe_warn_sizeof_array_div): Declare.
* c-warn.c (sizeof_pointer_memaccess_warning): Unwrap NOP_EXPRs.
(maybe_warn_sizeof_array_div): New function.
* c.opt (Wsizeof-array-div): New option.

gcc/c/ChangeLog:

PR c++/91741
* c-parser.c (c_parser_binary_expression): Implement -Wsizeof-array-div.
(c_parser_postfix_expression): Set expr.parenthesized_p.
* c-tree.h (struct c_expr): Add parenthesized_p member.
(char_type_p): Declare.
* c-typeck.c (c_expr_sizeof_expr): Initialize ret.parenthesized_p.
(c_expr_sizeof_type): Likewise.
(char_type_p): No longer static.

gcc/cp/ChangeLog:

PR c++/91741
* typeck.c (cp_build_binary_op): Implement -Wsizeof-array-div.

gcc/ChangeLog:

PR c++/91741
* doc/invoke.texi: Document -Wsizeof-array-div.

gcc/testsuite/ChangeLog:

PR c++/91741
* c-c++-common/Wsizeof-pointer-div.c: Add dg-warning.
* c-c++-common/Wsizeof-array-div1.c: New test.
* g++.dg/warn/Wsizeof-array-div1.C: New test.
* g++.dg/warn/Wsizeof-array-div2.C: New test.
---
 gcc/c-family/c-common.h   |  1 +
 gcc/c-family/c-warn.c | 47 
 gcc/c-family/c.opt|  5 ++
 gcc/c/c-parser.c  | 33 ++-
 gcc/c/c-tree.h|  6 ++
 gcc/c/c-typeck.c  |  4 +-
 gcc/cp/typeck.c   | 10 +++-
 gcc/doc/invoke.texi   | 19 +++
 

[RS6000] rs6000_rtx_costs for PLUS/MINUS constant

2020-09-14 Thread Alan Modra via Gcc-patches
These functions do behave a little differently for SImode, so the
mode should be passed.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Pass mode to
reg_or_add_cint_operand and reg_or_sub_cint_operand.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6fc86f8185a..32044d33977 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21189,9 +21189,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  return true;
}
   else if ((outer_code == PLUS
-   && reg_or_add_cint_operand (x, VOIDmode))
+   && reg_or_add_cint_operand (x, mode))
   || (outer_code == MINUS
-  && reg_or_sub_cint_operand (x, VOIDmode))
+  && reg_or_sub_cint_operand (x, mode))
   || ((outer_code == SET
|| outer_code == IOR
|| outer_code == XOR)


[RS6000] rs6000_rtx_costs for AND

2020-09-14 Thread Alan Modra via Gcc-patches
The existing "case AND" in this function is not sufficient for
optabs.c:avoid_expensive_constant usage, where the AND is passed in
outer_code.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for
AND to CONST_INT case.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 32044d33977..523d029800a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21150,16 +21150,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == MINUS)
   && (satisfies_constraint_I (x)
   || satisfies_constraint_L (x)))
- || (outer_code == AND
- && (satisfies_constraint_K (x)
- || (mode == SImode
- ? satisfies_constraint_L (x)
- : satisfies_constraint_J (x
- || ((outer_code == IOR || outer_code == XOR)
+ || ((outer_code == AND || outer_code == IOR || outer_code == XOR)
  && (satisfies_constraint_K (x)
  || (mode == SImode
  ? satisfies_constraint_L (x)
  : satisfies_constraint_J (x
+ || (outer_code == AND
+ && rs6000_is_valid_and_mask (x, mode))
  || outer_code == ASHIFT
  || outer_code == ASHIFTRT
  || outer_code == LSHIFTRT
@@ -21196,7 +21193,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
|| outer_code == IOR
|| outer_code == XOR)
   && (INTVAL (x)
-  & ~ (unsigned HOST_WIDE_INT) 0x) == 0))
+  & ~ (unsigned HOST_WIDE_INT) 0x) == 0)
+  || (outer_code == AND
+  && rs6000_is_valid_2insn_and (x, mode)))
{
  *total = COSTS_N_INSNS (1);
  return true;
@@ -21334,26 +21333,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total += COSTS_N_INSNS (1);
  return true;
}
-
- /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
- HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
- if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
- || (val & 0x) == val
- || (val & 0x) == val
- || ((val & 0x) == 0 && mode == SImode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (1);
- return true;
-   }
-
- /* 2 insns.  */
- if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode))
-   {
- *total = rtx_cost (left, mode, AND, 0, speed);
- *total += COSTS_N_INSNS (2);
- return true;
-   }
}
 
   *total = COSTS_N_INSNS (1);


[RS6000] rs6000_rtx_costs comment

2020-09-14 Thread Alan Modra via Gcc-patches
Prior patches in this series were small bug fixes.  This lays out the
ground rules for following patches.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Expand comment.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 523d029800a..5b3c0ee0e8c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21133,7 +21133,45 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
-   scanned.  In either case, *TOTAL contains the cost result.  */
+   scanned.  In either case, *TOTAL contains the cost result.
+
+   1) Calls from places like optabs.c:avoid_expensive_constant will
+   come here with OUTER_CODE set to an operation such as AND with X
+   being a CONST_INT or other CONSTANT_P type.  This will be compared
+   against set_src_cost, where we'll come here with OUTER_CODE as SET
+   and X the same constant.
+
+   2) Calls from places like combine:distribute_and_simplify_rtx are
+   asking whether a possibly quite complex SET_SRC can be implemented
+   more cheaply than some other logically equivalent SET_SRC.
+
+   3) Calls from places like default_noce_conversion_profitable_p will
+   come here via seq_cost and pass the pattern of a SET insn in X.
+   Presuming the insn is valid and set_dest a reg, rs6000_rtx_costs
+   will next see the SET_SRC.  The overall cost should be comparable
+   to rs6000_insn_cost since the code is comparing one insn sequence
+   (some of which may be costed by insn_cost) against another insn
+   sequence.
+
+   4) Calls from places like cprop.c:try_replace_reg will come here
+   with OUTER_CODE as INSN, and X either a valid pattern of a SET or
+   one where some registers have been replaced with constants.  The
+   replacements may make the SET invalid, for example if
+ (set (reg1) (and (reg2) (const_int 0xfff)))
+   replaces reg2 as
+ (set (reg1) (and (symbol_ref) (const_int 0xfff)))
+   then the replacement can't be implemented in one instruction and
+   really the cost should be higher by one instruction.  However,
+   the cost for invalid insns doesn't matter much except that a
+   higher cost may lead to their rejection earlier.
+
+   5) fwprop.c:should_replace_address puts yet another wrinkle on this
+   function, where we prefer an address calculation that is more
+   complex yet has the same address_cost.  In this case "more
+   complex" is determined by having a higher set_src_cost.  So for
+   example, if we want a plain (reg) address to be replaced with
+   (plus (reg) (const)) when possible then PLUS needs to cost more
+   than zero here.  */
 
 static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,


[RS6000] rs6000_rtx_costs reduce cost for SETs

2020-09-14 Thread Alan Modra via Gcc-patches
Also use rs6000_cost only for speed.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
when insn operation cost handled on recursive call.  Only use
rs6000_cost for speed.  Tidy break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fb5fe7969a3..86c90c4d756 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21277,15 +21277,19 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case PLUS:
 case MINUS:
-  if (FLOAT_MODE_P (mode))
+  if (speed && FLOAT_MODE_P (mode))
*total = rs6000_cost->fp;
   else
*total = COSTS_N_INSNS (1);
   return false;
 
 case MULT:
-  if (CONST_INT_P (XEXP (x, 1))
- && satisfies_constraint_I (XEXP (x, 1)))
+  if (!speed)
+   /* A little more than one insn so that nothing is tempted to
+  turn a shift left into a multiply.  */
+   *total = COSTS_N_INSNS (1) + 1;
+  else if (CONST_INT_P (XEXP (x, 1))
+  && satisfies_constraint_I (XEXP (x, 1)))
{
  if (INTVAL (XEXP (x, 1)) >= -256
  && INTVAL (XEXP (x, 1)) <= 255)
@@ -21304,18 +21308,22 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case FMA:
-  if (mode == SFmode)
+  if (!speed)
+   *total = COSTS_N_INSNS (1) + 1;
+  else if (mode == SFmode)
*total = rs6000_cost->fp;
   else
*total = rs6000_cost->dmul;
-  break;
+  return false;
 
 case DIV:
 case MOD:
   if (FLOAT_MODE_P (mode))
{
- *total = mode == DFmode ? rs6000_cost->ddiv
- : rs6000_cost->sdiv;
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 2;
+ else
+   *total = mode == DFmode ? rs6000_cost->ddiv : rs6000_cost->sdiv;
  return false;
}
   /* FALLTHRU */
@@ -21334,7 +21342,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
}
   else
{
- if (GET_MODE (XEXP (x, 1)) == DImode)
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 2;
+ else if (GET_MODE (XEXP (x, 1)) == DImode)
*total = rs6000_cost->divdi;
  else
*total = rs6000_cost->divsi;
@@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
+  *total = COSTS_N_INSNS (1);
   right = XEXP (x, 1);
   if (CONST_INT_P (right))
{
@@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   || left_code == LSHIFTRT)
  && rs6000_is_valid_shift_mask (right, left, mode))
{
- *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
- if (!CONST_INT_P (XEXP (left, 1)))
-   *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
speed);
- *total += COSTS_N_INSNS (1);
+ rtx reg_op = XEXP (left, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, left_code, 0, speed);
+ reg_op = XEXP (left, 1);
+ if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+   *total += rtx_cost (reg_op, mode, left_code, 1, speed);
  return true;
}
}
-
-  *total = COSTS_N_INSNS (1);
   return false;
 
 case IOR:
@@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (outer_code == TRUNCATE
  && GET_CODE (XEXP (x, 0)) == MULT)
{
- if (mode == DImode)
+ if (!speed)
+   *total = COSTS_N_INSNS (1) + 1;
+ else if (mode == DImode)
*total = rs6000_cost->muldi;
  else
*total = rs6000_cost->mulsi;
@@ -21554,11 +21567,16 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case FIX:
 case UNSIGNED_FIX:
 case FLOAT_TRUNCATE:
-  *total = rs6000_cost->fp;
+  if (!speed)
+   *total = COSTS_N_INSNS (1);
+  else
+   *total = rs6000_cost->fp;
   return false;
 
 case FLOAT_EXTEND:
-  if (mode == DFmode)
+  if (!speed)
+   *total = COSTS_N_INSNS (1);
+  else if (mode == DFmode)
*total = rs6000_cost->sfdf_convert;
   else
*total = rs6000_cost->fp;
@@ -21576,7 +21594,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = rs6000_cost->fp;
  return false;
}
-  break;
+  return false;
 
 case NE:
 case EQ:
@@ -21614,13 +21632,32 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  *total = 0;
  return true;
}
-  break;
+  return false;
+
+case SET:
+  /* The default cost of a SET is the number of general purpose
+regs being set multiplied by COSTS_N_INSNS (1).  That only
+works where the 

[RS6000] rs6000_rtx_costs multi-insn constants

2020-09-14 Thread Alan Modra via Gcc-patches
This small patch to rs6000_rtx_const considerably improves code
generated for large constants in 64-bit code, teaching gcc that it is
better to load a constant from memory than to generate a sequence of
up to five dependent instructions.  Note that the rs6000 backend does
generate large constants as loads from memory at expand time but
optimisation passes replace them with SETs of the value due to not
having correct costs.

PR 94393
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost multi-insn
constants.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5b3c0ee0e8c..8c300b82b11 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21181,7 +21181,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
   switch (code)
 {
-  /* On the RS/6000, if it is valid in the insn, it is free.  */
+  /* (reg) is costed at zero by rtlanal.c:rtx_cost.  That sets a
+baseline for rtx costs:  If a constant is valid in an insn,
+it is free.  */
 case CONST_INT:
   if (((outer_code == SET
|| outer_code == PLUS
@@ -21242,6 +21244,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case CONST_DOUBLE:
 case CONST_WIDE_INT:
+  /* Subtract one insn here for consistency with the above code
+that returns one less than the actual number of insns for
+SETs.  Don't subtract one for other than SETs, because other
+operations will require the constant to be loaded to a
+register before performing the operation.  All special cases
+for codes other than SET must be handled before we get
+here.  */
+  *total = COSTS_N_INSNS (num_insns_constant (x, mode)
+ - (outer_code == SET ? 1 : 0));
+  return true;
+
 case CONST:
 case HIGH:
 case SYMBOL_REF:


[RS6000] rotate and mask constants

2020-09-14 Thread Alan Modra via Gcc-patches
Implement more two insn constants.

PR 94393
* config/rs6000/rs6000.c (rotate_and_mask_constant): New function.
(num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 86c90c4d756..1848cb57ef8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1112,6 +1112,8 @@ static tree rs6000_handle_altivec_attribute (tree *, 
tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, HOST_WIDE_INT *,
+ int *, unsigned HOST_WIDE_INT *);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -5789,7 +5791,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, 
machine_mode mode)
  /* We won't get more than 2 from num_insns_constant_gpr
 except when TARGET_POWERPC64 and mode is DImode or
 wider, so the register mode must be DImode.  */
- && rs6000_is_valid_and_mask (GEN_INT (low), DImode))
+ && (rs6000_is_valid_and_mask (GEN_INT (low), DImode)
+ || rotate_and_mask_constant (low, NULL, NULL, NULL)))
insns = 2;
   total += insns;
   /* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing
@@ -9420,6 +9423,82 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Detect cases where a constant can be formed by li; rldicl, li; rldicr,
+   or lis; rldicl.  */
+
+static bool
+rotate_and_mask_constant (unsigned HOST_WIDE_INT c,
+ HOST_WIDE_INT *val, int *shift,
+ unsigned HOST_WIDE_INT *mask)
+{
+  /* We know C can't be formed by lis,addi so that puts constraints
+ on the max leading zeros.  lead_zeros won't be larger than
+ HOST_BITS_PER_WIDE_INT - 31.  */
+  int lead_zeros = wi::clz (c);
+  int non_zero = HOST_BITS_PER_WIDE_INT - lead_zeros;
+  /* 00...01xx0..00 (up to 14 x's, any number of leading
+ and trailing 0's) can be implemented as a li, rldicl.  */
+  if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << (non_zero - 15))) == 0)
+{
+  /* eg. c = 1100   ... 
+-> val = 0x3000, shift = 49, mask = -1ull.  */
+  if (val)
+   {
+ *val = c >> (non_zero - 15);
+ *shift = non_zero - 15;
+ *mask = HOST_WIDE_INT_M1U;
+   }
+  return true;
+}
+  /* 00...01xxx1..11 (up to 15 x's, any number of leading
+ 0's and trailing 1's) can be implemented as a li, rldicl.  */
+  if ((c | (HOST_WIDE_INT_M1U << (non_zero - 16))) == HOST_WIDE_INT_M1U)
+{
+  /* eg. c =  1011   ... 
+-> val = sext(0xbfff), shift = 44, mask = 0x0fff.  */
+  if (val)
+   {
+ *val = (((c >> (non_zero - 16)) & 0x) ^ 0x8000) - 0x8000;
+ *shift = non_zero - 16;
+ *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+   }
+  return true;
+}
+  /* 00...01xxx00..01..11 (up to 15 x's followed by 16 0's,
+ any number of leading 0's and trailing 1's) can be implemented as
+ lis, rldicl.  */
+  if (non_zero >= 32
+  && (c & ((HOST_WIDE_INT_1U << (non_zero - 16))
+  - (HOST_WIDE_INT_1U << (non_zero - 32 == 0
+  && (c | (HOST_WIDE_INT_M1U << (non_zero - 32))) == HOST_WIDE_INT_M1U)
+{
+  if (val)
+   {
+ *val = (((c >> (non_zero - 32)) & 0x)
+ ^ 0x8000) - 0x8000;
+ *shift = non_zero - 32;
+ *mask = HOST_WIDE_INT_M1U >> lead_zeros;
+   }
+  return true;
+}
+  /* 11..1xxx0..0 (up to 15 x's, any number of leading 1's
+ and trailing 0's) can be implemented as a li, rldicr.  */
+  int trail_zeros = wi::ctz (c);
+  if (trail_zeros >= 48
+  || ((c | ((HOST_WIDE_INT_1U << (trail_zeros + 15)) - 1))
+ == HOST_WIDE_INT_M1U))
+{
+  if (val)
+   {
+ *val = (((c >> trail_zeros) & 0x) ^ 0x8000) - 0x8000;
+ *shift = trail_zeros;
+ *mask = HOST_WIDE_INT_M1U << trail_zeros;
+   }
+  return true;
+}
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -9429,6 +9508,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
+  HOST_WIDE_INT val = c;
+  int shift;
+  unsigned HOST_WIDE_INT mask;
 
   ud1 = c & 0x;
   c = c >> 16;
@@ -9454,6 +9536,15 @@ rs6000_emit_set_long_const (rtx dest, 

[RS6000] rs6000_rtx_costs cost IOR

2020-09-14 Thread Alan Modra via Gcc-patches
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c300b82b11..fb5fe7969a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21177,6 +21177,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21367,16 +21368,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case AND:
-  if (CONST_INT_P (XEXP (x, 1)))
+  right = XEXP (x, 1);
+  if (CONST_INT_P (right))
{
- rtx left = XEXP (x, 0);
+ left = XEXP (x, 0);
  rtx_code left_code = GET_CODE (left);
 
  /* rotate-and-mask: 1 insn.  */
  if ((left_code == ROTATE
   || left_code == ASHIFT
   || left_code == LSHIFTRT)
- && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+ && rs6000_is_valid_shift_mask (right, left, mode))
{
  *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
  if (!CONST_INT_P (XEXP (left, 1)))
@@ -21390,9 +21392,106 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case IOR:
-  /* FIXME */
   *total = COSTS_N_INSNS (1);
-  return true;
+  left = XEXP (x, 0);
+  if (GET_CODE (left) == AND
+ && CONST_INT_P (XEXP (left, 1)))
+   {
+ right = XEXP (x, 1);
+ if (GET_CODE (right) == AND
+ && CONST_INT_P (XEXP (right, 1))
+ && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+   {
+ rtx leftop = XEXP (left, 0);
+ rtx rightop = XEXP (right, 0);
+
+ // rotlsi3_insert_5
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == SImode
+ && UINTVAL (XEXP (left, 1)) != 0
+ && UINTVAL (XEXP (right, 1)) != 0
+ && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+   return true;
+ // rotldi3_insert_6
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+   return true;
+ // rotldi3_insert_7
+ if (REG_P (leftop)
+ && REG_P (rightop)
+ && mode == DImode
+ && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+   return true;
+
+ rtx mask = 0;
+ rtx shift = leftop;
+ rtx_code shift_code = GET_CODE (shift);
+ // rotl3_insert
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = right;
+ else
+   {
+ shift = rightop;
+ shift_code = GET_CODE (shift);
+ // rotl3_insert_2
+ if (shift_code == ROTATE
+ || shift_code == ASHIFT
+ || shift_code == LSHIFTRT)
+   mask = left;
+   }
+ if (mask
+ && CONST_INT_P (XEXP (shift, 1))
+ && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+   {
+ /* Test both regs even though the one in the mask is
+constrained to be equal to the output.  Increasing
+cost may well result in rejecting an invalid insn
+earlier.  */
+ rtx reg_op = XEXP (shift, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, shift_code, 0, speed);
+ reg_op = XEXP (mask, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, AND, 0, speed);
+ return true;
+   }
+   }
+ // rotl3_insert_3
+ if (GET_CODE (right) == ASHIFT
+ && CONST_INT_P (XEXP (right, 1))
+ && (INTVAL (XEXP (right, 1))
+ == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+   {
+ rtx reg_op = XEXP (left, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, AND, 0, speed);
+ reg_op = XEXP (right, 0);
+ if (!REG_P (reg_op))
+   *total += rtx_cost (reg_op, mode, ASHIFT, 0, speed);
+ return true;
+   }
+ // rotl3_insert_4
+ if (GET_CODE (right) == LSHIFTRT
+ && CONST_INT_P (XEXP (right, 1))
+ && mode == SImode
+ && (INTVAL (XEXP (right, 1))
+ + exact_log2 (-UINTVAL (XEXP (left, 1 == 32)
+   {
+ rtx reg_op 

[RS6000] rtx_costs

2020-09-14 Thread Alan Modra via Gcc-patches
This patch series fixes a number of issues in rs6000_rtx_costs, the
aim being to provide costing somewhat closer to reality.  Probably the
most important patch of the series is patch 4, which just adds a
comment.  Without the analysis that went into that comment, I found
myself making what seemed to be good changes but which introduced
regressions.

So far these changes have not introduced any testsuite regressions
on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on
powerpc64le-linux.  Pat spec tested on power9 against a baseline
master from a few months ago, seeing a few small improvements and no
degradations above the noise.

Some notes:

Examination of varasm.o shows quite a number of cases where
if-conversion succeeds due to different seq_cost.  One example:

extern int foo ();
int
default_assemble_integer (unsigned size)
{
  extern unsigned long rs6000_isa_flags;

  if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8))
return 0;
  return foo ();
}

This rather horrible code turns the rs6000_isa_flags value into either
4 or 8:
rldicr 9,9,28,0
srdi 9,9,28
addic 9,9,-1
subfe 9,9,9
rldicr 9,9,0,61
addi 9,9,8
Better would be
rldicl 9,9,29,63
sldi 9,9,2
addi 9,9,4

There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60",
and "li r31,0x4000; rotldi r31,r31,17" vs.
"lis r31,0x8000; clrldi r31,r31,32".
Neither of these is a real change.  I saw one occurrence of a 5 insn
sequence being replaced with a load from memory in
default_function_rodata_section, for ".rodata", and others elsewhere.

Sometimes correct insn cost leads to unexpected results.  For
example:

extern unsigned bar (void);
unsigned
f1 (unsigned a)
{
  if ((a & 0x01000200) == 0x01000200)
return bar ();
  return 0;
}

emits for a & 0x01000200
 (set (reg) (and (reg) (const_int 0x01000200)))
at expand time (two rlwinm insns) rather than the older
 (set (reg) (const_int 0x01000200))
 (set (reg) (and (reg) (reg)))
which is three insns.  However, since 0x01000200 is needed later the
older code after optimisation is smaller.


[RS6000] Count rldimi constant insns

2020-09-14 Thread Alan Modra via Gcc-patches
rldimi is generated by rs6000_emit_set_long_const when the high and
low 32 bits of a 64-bit constant are equal.

* config/rs6000/rs6000.c (num_insns_constant_gpr): Count rldimi
constants correctly.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 20a4ba382bc..6fc86f8185a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5731,7 +5731,7 @@ direct_return (void)
 
 /* Helper for num_insns_constant.  Calculate number of instructions to
load VALUE to a single gpr using combinations of addi, addis, ori,
-   oris and sldi instructions.  */
+   oris, sldi and rldimi instructions.  */
 
 static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
@@ -5759,7 +5759,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 
   high >>= 1;
 
-  if (low == 0)
+  if (low == 0 || low == high)
return num_insns_constant_gpr (high) + 1;
   else if (high == 0)
return num_insns_constant_gpr (low) + 1;


Re: PSA: Default C++ dialect is now C++17

2020-09-14 Thread Marek Polacek via Gcc-patches
On Mon, Sep 14, 2020 at 11:13:18AM -0400, Jason Merrill via Gcc-patches wrote:
> On Mon, Jun 29, 2020 at 1:25 PM Martin Liška  wrote:
> >
> > On 6/29/20 4:57 PM, Marek Polacek wrote:
> > > On Mon, Jun 29, 2020 at 09:51:57AM +0200, Martin Liška wrote:
> > >> On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:
> > >>> As discussed last month:
> > >>> 
> > >>> it's time to change the C++ default to gnu++17.  I've committed the 
> > >>> patch after
> > >>> testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
> > >>> yourselves!
> > >>>
> > >>> Marek
> > >>>
> > >>
> > >> Just a small note that 510.parest_r SPEC 2017 benchmark can't be built 
> > >> now
> > >> with default changed to -std=c++17. The spec config needs to be adjusted.
> > >
> > > Interesting, do you know why?  Does it use the register keyword?
> >
> > Apparently it needs -fno-new-ttp-matching for successful compilation.
> > There's a reduced test-case I made:
> >
> > cat fe.ii
> > template  class FiniteElement;
> > template  class DoFHandler;
> > class FETools {
> >template 
> >void back_interpolate(const DoFHandler &, const InVector 
> > &,
> >  const FiniteElement &, OutVector &);
> >template  class DH, class InVector, class 
> > OutVector,
> >  int spacedim>
> >void back_interpolate(const DH &, InVector,
> >  const FiniteElement &, OutVector);
> > };
> > template  class DoFHandler;
> > template  class FiniteElement;
> > template 
> > void FETools::back_interpolate(const DoFHandler &,
> > const InVector &,
> > const FiniteElement &,
> > OutVector &) {}
> > template void FETools::back_interpolate(const DoFHandler<3> &, const float 
> > &,
> >  const FiniteElement<3> &, float &);
> 
> Hmm, looks like I never sent this.
> 
> Further reduced:
> 
> template  class A;
> template  void fn(A &) {}
> template  class TT>  void fn(TT &);
> template void fn(A<3> &);
> 
> This breaks due to the C++17 changes to template template parameters
> causing A to now be considered a valid argument for TT; with that
> change both function templates are valid candidates, and neither is
> more specialized than the other, so it's ambiguous.
> 
> There are still some open core issues around these changes.

Thanks.  I just pushed a patch to introduce GCC 11 porting_to:
 and documented this change.

Let me know if you have any comments.

Marek



[r11-3192 Regression] FAIL: libgomp.c++/udr-3.C execution test on Linux/x86_64 (-m64)

2020-09-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c is the first bad commit
commit e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c
Author: Nathan Sidwell 
Date:   Mon Sep 14 09:42:29 2020 -0700

c++: local externs in templates do not get template head

caused

FAIL: libgomp.c++/udr-3.C execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3192/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/udr-3.C --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3192 Regression] FAIL: libgomp.c++/udr-13.C execution test on Linux/x86_64 (-m64)

2020-09-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c is the first bad commit
commit e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c
Author: Nathan Sidwell 
Date:   Mon Sep 14 09:42:29 2020 -0700

c++: local externs in templates do not get template head

caused

FAIL: libgomp.c++/udr-13.C execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3192/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/udr-13.C --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Segher Boessenkool
On Mon, Sep 14, 2020 at 05:33:33PM +0100, Richard Sandiford wrote:
> > However, for the cases on Power as Segher mentioned, there are also some 
> > scratch registers used for
> > Other purpose, not sure whether we can correctly generate zeroing in 
> > middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

It will zero a whole bunch of registers that are overwritten later, that
are not parameter passing registers either.

Doing this with the limited information the middle end has is not the
best idea.


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Segher Boessenkool
Hi!

On Fri, Sep 11, 2020 at 05:24:58PM -0500, Qing Zhao wrote:
> > So IMO the pass should just search for all the
> > returns in a function and insert the zeroing instructions before
> > each one.
> 
> I was considering this approach too for some time, however, there is one 
> major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, 
> lacking such 
> detailed information might result incorrect code generation at middle end.
> 
> For example, on x86_64 target, when “return” with pop, the scratch register 
> “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the 
> return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
> incorrect code might be 
> generated. 
> 
> Segher also mentioned that on Power, there are some scratch registers also 
> are used for 
> Other purpose, clearing them before return is not correct. 

Depending where you insert those insns, it can be non-harmful, but in
most places it will not be useful.


What you can do (easy and safe) is change the RTL return instructions to
clear all necessary registers (by outputting extra assembler
instructions).  I still have big doubts how effective that will be, and
esp. compared with how expensive that is, but at least its effect on the
compiler is very local, and it does not get in the way of most things.

(This also works with shrink-wrapping and similar.)

(The effectiveness of this whole scheme depends a *lot* on specifics of
the ABI, btw; in that way it is not generic at all!)


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Segher Boessenkool
On Fri, Sep 11, 2020 at 05:41:47PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:51 PM, Segher Boessenkool 
> >  wrote:
> > It is definitely *not* effective if there are gadgets that set rax to
> > a value the attacker wants and then do a syscall.
> 
> You mean the following gadget:
> 
> 
> Gadget 1:
> 
> mov  rax,  value
> syscall
> ret

No, just

mov rax,59
syscall

(no ret necessary!)

I.e. just anything that already does an execve.


Segher


Re: [PATCH] -fprofile-reproducible: fix option value handling

2020-09-14 Thread Sergei Trofimovich via Gcc-patches
On Mon, 14 Sep 2020 09:34:08 +0200
Richard Biener  wrote:

> On Fri, Sep 11, 2020 at 11:56 PM Sergei Trofimovich via Gcc-patches
>  wrote:
> >
> > From: Sergei Trofimovich 
> >
> > Before the change option handling did not accept an argument:
> >   xgcc: error: unknown profile reproducibility method '=serial'
> >   xgcc: note: valid arguments to '-fprofile-reproducible' are: 
> > multithreaded parallel-runs serial; did you mean 'serial'?
> >
> > The change also includes trailing '=' as part of option prefix.  
> 
> Does it still work without an option then?

'-fprofile-reproducible' seems to be unacceptable value.

Initially when I sent the patch I though there was no way to pass
the option to gcc. But now I understand how to do it (case 4):

Before:
  1 $ gcc-11.0.0 -c -fprofile-reproducible a.c -o a
  gcc-11.0.0: error: missing argument to '-fprofile-reproducible'
  2 $ gcc-11.0.0 -c -fprofile-reproducible= a.c -o a
  gcc-11.0.0: error: unknown profile reproducibility method '='
  gcc-11.0.0: note: valid arguments to '-fprofile-reproducible' are: 
multithreaded parallel-runs serial
  3 $ gcc-11.0.0 -c -fprofile-reproducible=serial a.c -o a
  gcc-11.0.0: error: unknown profile reproducibility method '=serial'
  gcc-11.0.0: note: valid arguments to '-fprofile-reproducible' are: 
multithreaded parallel-runs serial; did you mean 'serial'?
  4 $ gcc-11.0.0 -c -fprofile-reproducibleserial a.c -o a
  # ok

Note: case 4 was a way to pass the option.

After:
  1 $ ./xgcc -B. -c -fprofile-reproducible a.c -o a.o
  xgcc: error: unrecognized command-line option '-fprofile-reproducible'; did 
you mean '-fprofile-reproducible='?
  2 $ ./xgcc -B. -c -fprofile-reproducible= a.c -o a.o
  xgcc: error: missing argument to '-fprofile-reproducible='
  3 $ ./xgcc -B. -c -fprofile-reproducible=serial a.c -o a.o
  # ok
  4 $ ./xgcc -B. -c -fprofile-reproducibleserial a.c -o a.o
  xgcc: error: unrecognized command-line option '-fprofile-reproducibleserial'; 
did you mean '-fprofile-reproducible=serial'?

Note: two problems here:
a) case 2 got worse diagnostic
b) case 4 broke something that worked before

I'll look at "a)" to check if it can be easily fixed. Is "b)" worth handling as 
well?
I'll need a hint or example how to handle an alias like that.

> OK if so.
> 
> > gcc/ChangeLog:
> >
> > * common.opt: Fix handling of '-fprofile-reproducible' option.
> > ---
> >  gcc/common.opt | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index dd68c61ae1d..84bf521128d 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2228,7 +2228,7 @@ Enum(profile_reproducibility) String(parallel-runs) 
> > Value(PROFILE_REPRODUCIBILIT
> >  EnumValue
> >  Enum(profile_reproducibility) String(multithreaded) 
> > Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
> >
> > -fprofile-reproducible
> > +fprofile-reproducible=
> >  Common Joined RejectNegative Var(flag_profile_reproducible) 
> > Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
> >  -fprofile-reproducible=[serial|parallel-runs|multithreaded]Control 
> > level of reproducibility of profile gathered by -fprofile-generate.
> >
> > --
> > 2.28.0
> >  


-- 

  Sergei


Re: [Patch] OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]

2020-09-14 Thread Tobias Burnus

On 9/14/20 1:03 PM, Jakub Jelinek wrote:

+
+  if (flag_openmp && !(TYPE_QUALS (TREE_TYPE (ptr)) & TYPE_QUAL_RESTRICT))
+always_modifier = true;

I think we don't want to depend on flag_openmp here, ... Guess for now
an openacc flag with (ctx->region_type & ORT_ACC) != 0 passed to it
seems easiest.


Done so. For gimplify_omp_for, I pass the bool as argument as
gimplify_omp_ctxp is often NULL.


+#pragma GCC optimize("O0")

Not this change please ;).

True ;-)

Formatting (move 1 space before false from after it).


Fixed. Thanks for the quick first review.

New version attached.

Tobias

PS: Regtesting still on-going but most parts already tested
successfully.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]

gcc/cp/ChangeLog:

	PR fortran/96668
	* cp-gimplify.c (cxx_omp_finish_clause): Add bool openacc arg.
	* cp-tree.h (cxx_omp_finish_clause): Likewise
	* semantics.c (handle_omp_for_class_iterator): Update call.

gcc/fortran/ChangeLog:

	PR fortran/96668
	* trans.h (gfc_omp_finish_clause): Add bool openacc arg.
	* trans-openmp.c (gfc_omp_finish_clause): Ditto. Use
	GOMP_MAP_ALWAYS_POINTER with PSET for pointers.
	(gfc_trans_omp_clauses): Like the latter and also if the always
	modifier is used.

gcc/ChangeLog:

	PR fortran/96668
	* gimplify.c (gimplify_omp_for): Add 'bool openacc' argument;
	update omp_finish_clause calls.
	(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses,
	gimplify_expr, gimplify_omp_loop): Update omp_finish_clause
	and/or gimplify_for calls.
	* langhooks-def.h (lhd_omp_finish_clause): Add bool openacc arg.
	* langhooks.c (lhd_omp_finish_clause): Likewise.
	* langhooks.h (lhd_omp_finish_clause): Likewise.
	* omp-low.c (scan_sharing_clauses): Keep GOMP_MAP_TO_PSET cause for
	'declare target' vars.

include/ChangeLog:

	PR fortran/96668
	* gomp-constants.h (GOMP_MAP_ALWAYS_POINTER_P):

libgomp/ChangeLog:

	PR fortran/96668
	* libgomp.h (struct target_var_desc):
	* target.c (gomp_map_vars_existing):
	(gomp_map_fields_existing):
	(gomp_map_vars_internal):
	(GOMP_target_enter_exit_data):
	* testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test.
	* testsuite/libgomp.fortran/map-alloc-ptr-2.f90: New test.

 gcc/cp/cp-gimplify.c   |   2 +-
 gcc/cp/cp-tree.h   |   2 +-
 gcc/cp/semantics.c |   4 +-
 gcc/fortran/trans-openmp.c |  31 +++-
 gcc/fortran/trans.h|   2 +-
 gcc/gimplify.c |  31 ++--
 gcc/langhooks-def.h|   2 +-
 gcc/langhooks.c|   2 +-
 gcc/langhooks.h|   2 +-
 gcc/omp-low.c  |   1 +
 include/gomp-constants.h   |   3 +
 libgomp/libgomp.h  |   3 +
 libgomp/target.c   | 184 -
 .../testsuite/libgomp.fortran/map-alloc-ptr-1.f90  | 114 +
 .../testsuite/libgomp.fortran/map-alloc-ptr-2.f90  |  86 ++
 15 files changed, 408 insertions(+), 61 deletions(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index f869583..b2befa7 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2357,7 +2357,7 @@ cxx_omp_predetermined_mapping (tree decl)
 /* Finalize an implicitly determined clause.  */
 
 void
-cxx_omp_finish_clause (tree c, gimple_seq *)
+cxx_omp_finish_clause (tree c, gimple_seq *, bool /* openacc */)
 {
   tree decl, inner_type;
   bool make_shared = false;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5923574..6e4de7d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7749,7 +7749,7 @@ extern tree cxx_omp_clause_default_ctor		(tree, tree, tree);
 extern tree cxx_omp_clause_copy_ctor		(tree, tree, tree);
 extern tree cxx_omp_clause_assign_op		(tree, tree, tree);
 extern tree cxx_omp_clause_dtor			(tree, tree);
-extern void cxx_omp_finish_clause		(tree, gimple_seq *);
+extern void cxx_omp_finish_clause		(tree, gimple_seq *, bool);
 extern bool cxx_omp_privatize_by_reference	(const_tree);
 extern bool cxx_omp_disregard_value_expr	(tree, bool);
 extern void cp_fold_function			(tree);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index dafb403..4ca2a2f 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8770,7 +8770,7 @@ handle_omp_for_class_iterator (int i, location_t locus, enum tree_code code,
 	{
 	  tree ivc = build_omp_clause (locus, OMP_CLAUSE_FIRSTPRIVATE);
 	  OMP_CLAUSE_DECL (ivc) = iter;
-	  cxx_omp_finish_clause (ivc, NULL);
+	  cxx_omp_finish_clause (ivc, NULL, false);
 	  OMP_CLAUSE_CHAIN (ivc) = clauses;
 	  clauses = ivc;
 	}
@@ -8802,7 +8802,7 @@ 

Re: c++: local externs in templates do not get template head

2020-09-14 Thread Tobias Burnus

This patch cause run-time fails for
  g++ -fopenmp libgomp/testsuite/libgomp.c++/udr-13.C

The follow-up fix does not help.

Namely, in udr-3.C:115:

115 if (t.s != 11 || v.v != 9 || q != 0 || d != 3.0) abort ();
(gdb) p t.s
$1 = 11
(gdb) p v.v
$2 = 4
(gdb) p q
$3 = 0
(gdb) p d
$4 = 3

Could you check?

Thanks,

Tobias

On 9/14/20 6:45 PM, Nathan Sidwell wrote:


Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
teach the template machinery not to give them a TEMPLATE_DECL head,
and the instantiation machinery treat them as the local specialiations
they are.  (openmp UDRs also fall into this category, and are dealt
with similarly.)

gcc/cp/
* pt.c (push_template_decl_real): Don't attach a template head to
local externs.
(tsubst_function_decl): Add support for headless local extern
decls.
(tsubst_decl): Add support for headless local extern decls.

pushed to trunk

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[r11-3192 Regression] FAIL: libgomp.c++/udr-13.C execution test on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c is the first bad commit
commit e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c
Author: Nathan Sidwell 
Date:   Mon Sep 14 09:42:29 2020 -0700

c++: local externs in templates do not get template head

caused

FAIL: libgomp.c++/udr-13.C execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3192/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/udr-13.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3192 Regression] FAIL: libgomp.c++/udr-3.C execution test on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c is the first bad commit
commit e9fdb9a73249f95f3da2d7fde6f268ae12d0d22c
Author: Nathan Sidwell 
Date:   Mon Sep 14 09:42:29 2020 -0700

c++: local externs in templates do not get template head

caused

FAIL: libgomp.c++/udr-3.C execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3192/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c++/udr-3.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] bb-reorder: Fix for ICEs caused by 69ca5f3a9882

2020-09-14 Thread Segher Boessenkool
After the previous patch we are left with an unreachable BB.  This will
ICE if either we have -fschedule-fusion, or we do not have peephole2.

This fixes it.  Okay for trunk?


Segher


2020-09-14  Segher Boessenkool  

PR rtl-optimization/96475
* bb-reorder.c (duplicate_computed_gotos): If we did anything, run
cleanup_cfg.

---
 gcc/bb-reorder.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 76e56b5..2ad5197 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2760,6 +2760,10 @@ duplicate_computed_gotos (function *fun)
 if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
   changed |= maybe_duplicate_computed_goto (bb, max_size);
 
+  /* Some blocks may have become unreachable.  */
+  if (changed)
+cleanup_cfg (0);
+
   /* Duplicating blocks will redirect edges and may cause hot blocks
 previously reached by both hot and cold blocks to become dominated
 only by cold blocks.  */
-- 
1.8.3.1



Re: [PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-09-14 Thread Martin Sebor via Gcc-patches

On 9/4/20 11:14 AM, Jason Merrill wrote:

On 9/3/20 2:44 PM, Martin Sebor wrote:

On 9/1/20 1:22 PM, Jason Merrill wrote:

On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.


The C++ changes are OK.


Thank you for looking at the rest as well.




-compute_objsize (tree ptr, int ostype, access_ref *pref,
-    bitmap *visited, const vr_values *rvals /* = NULL */)
+compute_objsize (tree ptr, int ostype, access_ref *pref, bitmap 
*visited,

+    const vr_values *rvals)


This reformatting seems unnecessary, and I prefer to keep the comment 
about the default argument.


This overload doesn't take a default argument.  (There was a stray
declaration of a similar function at the top of the file that had
one.  I've removed it.)


Ah, true.


-  if (!size || TREE_CODE (size) != INTEGER_CST)
-   return false;

 >...

You change some failure cases in compute_objsize to return success 
with a maximum range, while others continue to return failure.  This 
needs commentary about the design rationale.


This is too much for a comment in the code but the background is
this: compute_objsize initially returned the object size as a constant.
Recently, I have enhanced it to return a range to improve warnings for
allocated objects.  With that, a failure can be turned into success by
having the function set the range to that of the largest object.  That
should simplify the function's callers and could even improve
the detection of some invalid accesses.  Once this change is made
it might even be possible to change its return type to void.

The change that caught your eye is necessary to make the function
a drop-in replacement for the C++ front end code which makes this
same assumption.  Without it, a number of test cases that exercise
VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:

   void f (int n)
   {
 char a[n];
 new (a - 1) int ();
   }

Changing any of the other places isn't necessary for existing tests
to pass (and I didn't want to introduce too much churn).  But I do
want to change the rest of the function along the same lines at some
point.


Please do change the other places to be consistent; better to have more 
churn than to leave the function half-updated.  That can be a separate 
patch if you prefer, but let's do it now rather than later.


I've made most of these changes in the other patch (also attached).
I'm quite happy with the result but it turned out to be a lot more
work than either of us expected, mostly due to the amount of testing.

I've left a couple of failing cases in place mainly as reminders
to handle them better (which means I also didn't change the caller
to avoid testing for failures).  I've also added TODO notes with
reminders to handle some of the new codes more completely.




+  special_array_member sam{ };


sam is always set by component_ref_size, so I don't think it's 
necessary to initialize it at the declaration.


I find initializing pass-by-pointer local variables helpful but
I don't insist on it.




@@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
   tree last_type = TREE_TYPE (last);
   if (TREE_CODE (last_type) != ARRAY_TYPE
   || TYPE_SIZE (last_type))
-    return size;
+    return size ? size : TYPE_SIZE_UNIT (type);


This change seems to violate the comment for the function.


By my reading (and writing) the change is covered by the first
sentence:

    Returns the size of the object designated by DECL considering
    its initializer if it either has one or if it would not affect
    its size, ...


OK, I see it now.


It handles a number of cases in Wplacement-new-size.C fail that
construct a larger object in an extern declaration of a template,
like this:

   template  struct S { char c; };
   extern S s;

   void f ()
   {
 new () int ();
   }

I don't know why DECL_SIZE isn't set here (I don't think it can
be anything but equal to TYPE_SIZE, can it?) and other than struct
objects with a flexible array member where this identity doesn't
hold I can't think of others.  Am I missing something?


Good question.  The 

Re: [PATCH] Fix overflow handling in std::align

2020-09-14 Thread Thomas Rodgers



> On Sep 14, 2020, at 7:30 AM, Ville Voutilainen via Libstdc++ 
>  wrote:
> 
> On Mon, 14 Sep 2020 at 15:49, Glen Fernandes  wrote:
>> 
>> On Mon, Sep 14, 2020 at 5:52 AM Ville Voutilainen wrote:
>>> On Mon, 14 Sep 2020 at 12:51, Ville Voutilainen
>>> wrote:
 On Mon, 14 Sep 2020 at 09:18, Glen Fernandes
>>> wrote:
> Edit; Correct patch this time.
> 
> Fix overflow handling in align
 
 Should the test verify that space is unmodified when nullptr is returned?
>>> 
>>> ..and same for ptr.
>> 
>> Sounds like a good idea. Updated patch attached.
> 
> Looks good to me.

Agree.

libbacktrace: Add support for MiniDebugInfo

2020-09-14 Thread Ian Lance Taylor via Gcc-patches
This patch to libbacktrace adds support for MiniDebugInfo, as
requested in PR 93608.

MiniDebugInfo stores compressed symbol tables for an executable, where
the executable is otherwise stripped.  It is documented at
https://fedoraproject.org/wiki/Features/MiniDebugInfo and
https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html.

Unfortunately, although debug info processors are already required to
handle data compressed using zlib aka gzip in order to handle zdebug
and SHT_COMPRESSED, the MiniDebugInfo implementation choose to instead
compress the debug information with xz which uses LZMA2.  This in
effect forces all debug info processors to add a second decompression
algorithm.  Note to future developers: do not do this.

This libbacktrace implementation includes a new small LZMA
decompressor that just decompresses from one buffer to another.  It is
not heavily optimized, and on my laptop runs at about 78% of the speed
of using liblzma directly.  I think this is a decent tradeoff for
keeping the code reasonably small.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

PR libbacktrace/93608
Add support for MiniDebugInfo.
* elf.c (struct elf_view): Define.  Replace most uses of
backtrace_view with elf_view.
(elf_get_view): New static functions.  Replace most calls of
backtrace_get_view with elf_get_view.
(elf_release_view): New static functions.  Replace most calls of
backtrace_release_view with elf_release_view.
(elf_uncompress_failed): Rename from elf_zlib_failed.  Change all
callers.
(LZMA_STATES, LZMA_POS_STATES, LZMA_DIST_STATES): Define.
(LZMA_DIST_SLOTS, LZMA_DIST_MODEL_START): Define.
(LZMA_DIST_MODEL_END, LZMA_FULL_DISTANCES): Define.
(LZMA_ALIGN_SIZE, LZMA_LEN_LOW_SYMBOLS): Define.
(LZMA_LEN_MID_SYMBOLS, LZMA_LEN_HIGH_SYMBOLS): Define.
(LZMA_LITERAL_CODERS_MAX, LZMA_LITERAL_CODER_SIZE): Define.
(LZMA_PROB_IS_MATCH_LEN, LZMA_PROB_IS_REP_LEN): Define.
(LZMA_PROB_IS_REP0_LEN, LZMA_PROB_IS_REP1_LEN): Define.
(LZMA_PROB_IS_REP2_LEN, LZMA_PROB_IS_REP0_LONG_LEN): Define.
(LZMA_PROB_DIST_SLOT_LEN, LZMA_PROB_DIST_SPECIAL_LEN): Define.
(LZMA_PROB_DIST_ALIGN_LEN): Define.
(LZMA_PROB_MATCH_LEN_CHOICE_LEN): Define.
(LZMA_PROB_MATCH_LEN_CHOICE2_LEN): Define.
(LZMA_PROB_MATCH_LEN_LOW_LEN): Define.
(LZMA_PROB_MATCH_LEN_MID_LEN): Define.
(LZMA_PROB_MATCH_LEN_HIGH_LEN): Define.
(LZMA_PROB_REP_LEN_CHOICE_LEN): Define.
(LZMA_PROB_REP_LEN_CHOICE2_LEN): Define.
(LZMA_PROB_REP_LEN_LOW_LEN): Define.
(LZMA_PROB_REP_LEN_MID_LEN): Define.
(LZMA_PROB_REP_LEN_HIGH_LEN): Define.
(LZMA_PROB_LITERAL_LEN): Define.
(LZMA_PROB_IS_MATCH_OFFSET, LZMA_PROB_IS_REP_OFFSET): Define.
(LZMA_PROB_IS_REP0_OFFSET, LZMA_PROB_IS_REP1_OFFSET): Define.
(LZMA_PROB_IS_REP2_OFFSET): Define.
(LZMA_PROB_IS_REP0_LONG_OFFSET): Define.
(LZMA_PROB_DIST_SLOT_OFFSET): Define.
(LZMA_PROB_DIST_SPECIAL_OFFSET): Define.
(LZMA_PROB_DIST_ALIGN_OFFSET): Define.
(LZMA_PROB_MATCH_LEN_CHOICE_OFFSET): Define.
(LZMA_PROB_MATCH_LEN_CHOICE2_OFFSET): Define.
(LZMA_PROB_MATCH_LEN_LOW_OFFSET): Define.
(LZMA_PROB_MATCH_LEN_MID_OFFSET): Define.
(LZMA_PROB_MATCH_LEN_HIGH_OFFSET): Define.
(LZMA_PROB_REP_LEN_CHOICE_OFFSET): Define.
(LZMA_PROB_REP_LEN_CHOICE2_OFFSET): Define.
(LZMA_PROB_REP_LEN_LOW_OFFSET): Define.
(LZMA_PROB_REP_LEN_MID_OFFSET): Define.
(LZMA_PROB_REP_LEN_HIGH_OFFSET): Define.
(LZMA_PROB_LITERAL_OFFSET): Define.
(LZMA_PROB_TOTAL_COUNT): Define.
(LZMA_IS_MATCH, LZMA_IS_REP, LZMA_IS_REP0): Define.
(LZMA_IS_REP1, LZMA_IS_REP2, LZMA_IS_REP0_LONG): Define.
(LZMA_DIST_SLOT, LZMA_DIST_SPECIAL, LZMA_DIST_ALIGN): Define.
(LZMA_MATCH_LEN_CHOICE, LZMA_MATCH_LEN_CHOICE2): Define.
(LZMA_MATCH_LEN_LOW, LZMA_MATCH_LEN_MID): Define.
(LZMA_MATCH_LEN_HIGH, LZMA_REP_LEN_CHOICE): Define.
(LZMA_REP_LEN_CHOICE2, LZMA_REP_LEN_LOW): Define.
(LZMA_REP_LEN_MID, LZMA_REP_LEN_HIGH, LZMA_LITERAL): Define.
(elf_lzma_varint): New static function.
(elf_lzma_range_normalize): New static function.
(elf_lzma_bit, elf_lzma_integer): New static functions.
(elf_lzma_reverse_integer): New static function.
(elf_lzma_len, elf_uncompress_lzma_block): New static functions.
(elf_uncompress_lzma): New static function.
(backtrace_uncompress_lzma): New function.
(elf_add): Add memory and memory_size parameters.  Change all
callers.  Look for .gnu_debugdata section, and, if found,
decompress it and use it for symbols and debug info.  Permit the
descriptor parameter to be -1.
* internal.h (backtrace_uncompress_lzma): Declare.
* mtest.c: New file.
* xztest.c: New file.
* configure.ac: Check for nm, xz, and comm programs.  Check for
liblzma library.
(HAVE_MINIDEBUG): Define.
* Makefile.am (mtest_SOURCES): Define.
(mtest_CFLAGS, mtest_LDADD): Define.
(TESTS): Add mtest_minidebug if HAVE_MINIDEBUG.
(%_minidebug): New pattern rule, if HAVE_MINIDEBUG.
(xztest_SOURCES, xztest_CFLAGS, xztest_LDADD): Define.
(xztest_alloc_SOURCES, xztest_alloc_CFLAGS): Define
(xztest_alloc_LDADD): Define.
(BUILDTESTS): Add mtest, xztest, xztest_alloc.
(CLEANFILES): Add files created 

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Segher Boessenkool
On Mon, Sep 14, 2020 at 11:47:52AM +0100, Richard Sandiford wrote:
> Would it be worth changing the optab so that the input and output are
> separate?  Having a single operand would be justified if the operation
> was only supposed to touch the selected bytes, but most targets wouldn't
> guarantee that for memory operands, even as things stand.

You have my vote.

> Or maybe the idea was to force the RA's hand by making the input and
> output tied even before RA, with separate moves where necessary.
> But I'm not sure why vec_set is so special that it requires this
> treatment and other optabs don't.

Yeah.  The register allocator is normally very good in using the same
reg in both places, if that is useful.  And it also handles the case
where your machine insns require the two to be the same pretty well.
Not restricting this stuff before RA should be a win.


Segher


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Segher Boessenkool
On Mon, Sep 14, 2020 at 11:47:56AM +0200, Richard Biener wrote:
> this should be
> 
>u[n % 4] = i;
> 
> I guess.  Is the % 4 mandated by the vec_insert semantics btw?

Yes:

  VEC_INSERT (ARG1, ARG2, ARG3)
  Purpose: Returns a copy of vector ARG2 with element ARG3 replaced by
  the value of ARG1.  Result value: A copy of vector ARG2 with element
  ARG3 replaced by the value of ARG1. This function uses modular
  arithmetic on ARG3 to determine the element number. For example, if
  ARG3 is out of range, the compiler uses ARG3 modulo the number of
  elements in the vector to determine the element position.

The builtin requires it.  The machine insns work like that, too, e.g.:

  vinswlx VRT,RA,RB

  if MSR.VEC=0 then Vector_Unavailable()
  index ← GPR[RA].bit[60:63]
  VSR[VRT+32].byte[index:index+3] ← GPR[RB].bit[32:63]

  Let index be the contents of bits 60:63 of GPR[RA].

  The contents of bits 32:63 of GPR[RB] are placed into
  byte elements index:index+3 of VSR[VRT+32].

  All other byte elements of VSR[VRT+32] are not
  modified.

  If index is greater than 12, the result is undefined.


Segher


Re: [PATCH] c, c++: Implement -Wsizeof-array-div [PR91741]

2020-09-14 Thread Joseph Myers
On Mon, 14 Sep 2020, Marek Polacek via Gcc-patches wrote:

> so I followed suit.  In the C++ FE this was rather easy, because
> finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
> it was trickier; I've added a NOP_EXPR to discern between the non-()
> and () versions.

This sort of thing is normally handled for C via original_code in c_expr.  
I suppose that doesn't work in this case because the code dealing with 
parenthesized expressions has a special case for sizeof:

  if (expr.original_code != C_MAYBE_CONST_EXPR
  && expr.original_code != SIZEOF_EXPR)
expr.original_code = ERROR_MARK;

Handling this in some way via c_expr seems better to me than generating 
NOP_EXPR.  I suppose you could invent a PAREN_SIZEOF_EXPR used by (sizeof 
foo) and ((sizeof foo)) etc. as an original_code setting (and handled the 
same as SIZEOF_EXPR by whatever other warnings look for SIZEOF_EXPR 
there), or else add fields to c_expr to allow more such information to be 
tracked there.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATH 3/3] libstdc++: Add std::advance ostreambuf_iterator overload

2020-09-14 Thread François Dumont via Gcc-patches

On 10/09/20 5:19 pm, Jonathan Wakely wrote:

On 09/09/20 22:12 +0200, François Dumont via Libstdc++ wrote:

libstdc++: Add std::advance overload for ostreambuf_iterator

Implement std::advance overload for ostreambuf_iterator using 
basic_streambuf

pubseekof.

libstdc++-v3/ChangeLog:

        * include/bits/streambuf_iterator.h 
(ostreambuf_iterator): Add

        std::advance friend declaration.
        (advance(ostreambuf_iterator<>&, _Distance)): New.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/2.cc:

        New test.
        * 
testsuite/25_algorithms/advance/ostreambuf_iterator/char/2_neg.cc:

        New test.

Tested under Linux x85_64.

Ok to commit ?


I think relying on seeking here is a bad idea for the same reason as
in my previous email. We don't know what seek does for an arbitrary
derived streambuf, or even if it is possible at all.


After having implementing it similarly to the overload on 
istreambuf_iterator I wrote a test to compare the std;;advance behavior 
with a manual increment. And so I realized that incrementing a 
ostreambuf_iterator is a no-op and so must be the std::advance unless 
you tell otherwise.


There was a reason for this overload to be missing.

Thanks for the review,

François



Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-14 Thread H.J. Lu via Gcc-patches
On Mon, Sep 14, 2020 at 10:05 AM Jakub Jelinek  wrote:
>
> On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> > Something like this for GCC 8 and 9.
>
> Guess my preference would be to do this everywhere and then let's discuss if

The same patch also works on master branch.

> we change the warning into error there or keep it being deprecated.
>
> Though, let's see what others want to say about this.
>
> > Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
> > pointer is clobbered by asm statement.
> >
> > gcc/
> >
> >   PR target/97032
> >   * cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
> >   to true if the stack pointer is clobbered by asm statement.
> >   * emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
> >   * config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
> >   if the stack pointer is clobbered by asm statement.
> >
> > gcc/testsuite/
> >
> >   PR target/97032
> >   * gcc.target/i386/pr97032.c: New test.
>
> Jakub
>


-- 
H.J.


Re: [PATCH] mixing of labels and code in C2X

2020-09-14 Thread Joseph Myers
On Sun, 13 Sep 2020, Uecker, Martin wrote:

> Hi Joseph,
> 
> here is the (unfinished) patch to support
> mixing of labels in C2X.

I think there should be explicit tests for old standard versions 
(c11-labels-1.c etc.) that these usages are errors with -pedantic-errors 
with the old -std option, are warnings with -pedantic, and are quietly 
accepted with neither.  In addition to using -pedantic-errors with the new 
standard version test to confirm it's not diagnosed, and a test with the 
new version and -Wc11-c2x-compat.

By way of example of further places that I think need changing in the 
patch: at present, c_parser_label gives an error (that you correctly 
change to a pedwarn_c11) if the label is followed by a declaration - and 
then parses the declaration itself rather than leaving it to be parsed in 
the caller.  So c_parser_compound_statement_nostart would parse a label 
followed by a declaration, and at that point last_label would be set to 
true, meaning that a second declaration would be rejected, when in C2x it 
should be accepted.  You can see this even without the patch with a test 
such as:

void
f (void)
{
 label : int a; int b; 
}

I think that instead c_parser_label should never try to parse a 
declaration following the label; that should be left for the caller to 
handle.  To avoid c_parser_label needing to return information about 
standard attributes on a following declaration, maybe it shouldn't parse 
standard attributes either (note that means that c_parser_all_labels would 
need to handle parsing and warning about and discarding standard 
attributes after each label instead - such attributes might be ones on a 
statement, or ones on the next label in a sequence of labels).

Then of course the checks of last_label in 
c_parser_compound_statement_nostart would need to become checks for 
whether to pedwarn_c11 about the use of a label in a given context, once 
the code knows whether there is a declaration, rather than preventing 
parsing a declaration there at all.  So probably c_parser_label would no 
longer have the pedwarn_c11 either; that would all be left to its callers.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Qing Zhao via Gcc-patches



> On Sep 14, 2020, at 2:20 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
> Like I mentioned earlier though, passes that run after
> pass_thread_prologue_and_epilogue can use call-clobbered registers that
> weren't previously used.  For example, on x86_64, the function might
> not use %r8 when the prologue, epilogue and returns are generated,
> but pass_regrename might later introduce a new use of %r8.  AIUI,
> the “used” version of the new command-line option is supposed to clear
> %r8 in these circumstances, but it wouldn't do so if the data was
> collected at the point that the return is generated.
 
 Thanks for the information.
 
> 
> That's why I think it's more robust to do this later (at the beginning
> of pass_late_compilation) and insert the zeroing before returns that
> already exist.
 
 Yes, looks like it’s not correct to insert the zeroing at the time when 
 prologue, epilogue and return are generated.
 As I also checked, “return” might be also generated as late as pass 
 “pass_delay_slots”,  So, shall we move the
 New pass as late as possible?
>>> 
>>> If we insert the zeroing before pass_delay_slots and describe the
>>> result correctly, pass_delay_slots should do the right thing.
>>> 
>>> Describing the result correctly includes ensuring that the cleared
>>> registers are treated as live on exit from the function, so that the
>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>> 
>> In the current implementation for x86, when we generating a zeroing insn as 
>> the following:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>(const_int 0 [0])) "t10.c":11:1 -1
>> (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>(reg:SI 1 dx)
>>] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>> (nil))
>> 
>> i.e, after each zeroing insn, the register that is zeroed is marked as 
>> “UNSPECV_PRO_EPILOGUE_USE”, 
>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>> 
>> Is doing this enough to describe the result correctly?
>> Is there other thing we need to do in addition to this?
> 
> I guess that works, but I think it would be better to abstract
> EPILOGUE_USES into a new target-independent wrapper function that
> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
> true for registers that need to be zero on return, if the zeroing
> instructions have already been inserted.  The places that currently
> test EPILOGUE_USES should then test this new wrapper function instead.

Okay, I see. 
Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow 
information. If EPILOUGE_USES return true
for registers that need to be zeroed on return, those registers will be 
included in the data flow information, as a result, later
passes will not be able to delete them. 

This sounds to be a cleaner approach than the current one that marks the 
registers  “UNSPECV_PRO_EPILOGUE_USE”. 

A more detailed implementation question on this: 
Where should I put this new target-independent wrapper function in? Which 
header file will be a proper place to hold this new function?

> 
> After inserting the zeroing instructions, the pass should recompute the
> live-out sets based on this.

Is only computing the live-out sets of the block that including the return insn 
enough? Or we should re-compute the whole procedure? 

Which utility routine I should use to recompute the live-out sets?

> 
> But the dataflow information has to be correct between
> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
> any pass in that region could clobber the registers in the same way.
 
 You mean, the data flow information will be not correct after 
 pass_free_cfg? 
 “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
 “return” generated in “pass_delay_slots”, 
 If we want to generate zeroing for the new “return” which was generated in 
 “pass_delay_slots”, can we correctly to do so?
>>> 
>>> …the zeroing has to be done before pass_free_cfg, because the information
>>> isn't reliable after that point.  I think it would make sense to do it
>>> before pass_compute_alignments, because inserting the zeros will affect
>>> alignment.
>> 
>> Okay. 
>> 
>> Then there is another problem:  what about the new “return”s that are 
>> generated at pass_delay_slots?
>> 
>> Should we generate the zeroing for these new returns? Since the data flow 
>> information might not be correct at this pass,
>> It looks like that there is no correct way to add the zeroing insn for these 
>> new “return”, then, what should we do about this?
> 
> pass_delay_slots isn't a problem.  It doesn't change *what* happens
> on each return path, it just changes how the instructions to achieve
> it 

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 12:08:44PM +0200, Richard Biener wrote:
> On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool
>  wrote:
> > There often are problems over function calls (where the compiler cannot
> > usually *see* how something is used).
> 
> Yep.  The best way would be to use small loads and larger stores
> which is what CPUs usually tend to handle fine (with alignment
> constraints, etc.).  Of course that's not what either of the "solutions"
> can do.

Yes, and yes.

> That said, since you seem to be "first" in having an instruction
> to insert into a vector at a variable position the idea that we'd
> have to spill anyway for this to be expanded and thus we expand
> the vector to a stack location in the first place falls down.  And
> that's where I'd first try to improve things.
> 
> So what can the CPU actually do?

Both immediate and variable inserts, of 1, 2, 4, or 8 bytes.  The
inserted part is not allowed to cross the 16B boundary (all aligned
stuff never has that problem).  Variable inserts look at only the low
bits of the GPR that says where to insert (4 bits for bytes, 3 bits
for halfs, etc.)


Segher


[PATCH] PR/fortran 96983 - ICE compiling gfortran.dg/pr96711.f90

2020-09-14 Thread Harald Anlauf
Dear all,

my fix for PR fortran/96711 adds a testcase that failed on powerpc64-*-*
as well as sparc*-sun-solaris2.11.  This is a consequence of the, say,
mess, on x86, where we have real kinds 4,8,10,16, with kind=10 being long
double and kind=16 being represented as float128, while on sparc real(16)
is mapped to long double.  I'm not going to comment on power, see PR for
details.

A minimal solution to the issue would extend an intermediate conversion
that is done for e.g. x86 to float128 to a conversion to long double
on platforms where the TYPE_PRECISION of long double equals 128.  This
solves the issue on sparc*-sun-solaris2.11, as confirmed by Rainer.

The attached patch does just this, and disables the testcase for target
powerpc*-*-*.  I expect a more sophisticated solution being needed for
that platform.

That said, the patch regtests cleanly on x86_64-pc-linux-gnu, and as
reported in the PR, sparc-sun-solaris2.11 and i386-pc-solaris2.11.

OK for master?  Or does anyone with better knowledge of the affected
platforms want to take over?  As this is a technical regression, it
should and could be fixed before the gcc-11 release.

My intention to either close the PR after committing, or changing it
into some appropriate state (SUSPENDED?) so that it can be handled
later, or to rather open a new PR that is finally a target issue.

Thanks,
Harald


PR/fortran 96983 - ICE compiling gfortran.dg/pr96711.f90

The fix for PR fortran/96711 introduced an intermediate conversion to
the float128 type that is used e.g. on x86 but not on some other
targets.  Use a conversion to long double on targets where this type
has a type precision of 128 bit.

gcc/fortran/ChangeLog:

* trans-intrinsic.c (build_round_expr): Use conversion to long
double when this type has a type precision of 128 bit.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr96711.f90: Skip testcase for target powerpc*-*-*.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 32fe9886c57..552cdf5f19c 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -395,6 +395,13 @@ build_round_expr (tree arg, tree restype)
 fn = builtin_decl_for_precision (BUILT_IN_LROUND, argprec);
   else if (resprec <= LONG_LONG_TYPE_SIZE)
 fn = builtin_decl_for_precision (BUILT_IN_LLROUND, argprec);
+  else if (resprec == TYPE_PRECISION (long_double_type_node)
+	   && resprec >= argprec)
+{
+  int kind = TYPE_PRECISION (long_double_type_node) / 8;
+  arg = fold_convert (long_double_type_node, arg);
+  fn = gfc_builtin_decl_for_float_kind (BUILT_IN_ROUND, kind);
+}
   else if (resprec >= argprec && resprec == 128)
 {
   /* Search for a real kind suitable as temporary for conversion.  */
diff --git a/gcc/testsuite/gfortran.dg/pr96711.f90 b/gcc/testsuite/gfortran.dg/pr96711.f90
index 3761a8ea416..13b6e829ed6 100644
--- a/gcc/testsuite/gfortran.dg/pr96711.f90
+++ b/gcc/testsuite/gfortran.dg/pr96711.f90
@@ -3,6 +3,8 @@
 ! { dg-require-effective-target fortran_real_16 }
 ! { dg-additional-options "-fdump-tree-original" }
 ! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 2 "original" } }
+! { dg-skip-if "" { "powerpc*-*-*" } }
+! This test expects a target that supports IEEE128.
 !
 ! PR fortran/96711 - ICE on NINT() Function



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao  writes:
 Like I mentioned earlier though, passes that run after
 pass_thread_prologue_and_epilogue can use call-clobbered registers that
 weren't previously used.  For example, on x86_64, the function might
 not use %r8 when the prologue, epilogue and returns are generated,
 but pass_regrename might later introduce a new use of %r8.  AIUI,
 the “used” version of the new command-line option is supposed to clear
 %r8 in these circumstances, but it wouldn't do so if the data was
 collected at the point that the return is generated.
>>> 
>>> Thanks for the information.
>>> 
 
 That's why I think it's more robust to do this later (at the beginning
 of pass_late_compilation) and insert the zeroing before returns that
 already exist.
>>> 
>>> Yes, looks like it’s not correct to insert the zeroing at the time when 
>>> prologue, epilogue and return are generated.
>>> As I also checked, “return” might be also generated as late as pass 
>>> “pass_delay_slots”,  So, shall we move the
>>> New pass as late as possible?
>> 
>> If we insert the zeroing before pass_delay_slots and describe the
>> result correctly, pass_delay_slots should do the right thing.
>> 
>> Describing the result correctly includes ensuring that the cleared
>> registers are treated as live on exit from the function, so that the
>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>
> In the current implementation for x86, when we generating a zeroing insn as 
> the following:
>
> (insn 18 16 19 2 (set (reg:SI 1 dx)
> (const_int 0 [0])) "t10.c":11:1 -1
>  (nil))
> (insn 19 18 20 2 (unspec_volatile [
> (reg:SI 1 dx)
> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>  (nil))
>
> i.e, after each zeroing insn, the register that is zeroed is marked as 
> “UNSPECV_PRO_EPILOGUE_USE”, 
> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>
> Is doing this enough to describe the result correctly?
> Is there other thing we need to do in addition to this?

I guess that works, but I think it would be better to abstract
EPILOGUE_USES into a new target-independent wrapper function that
(a) returns true if EPILOGUE_USES itself returns true and (b) returns
true for registers that need to be zero on return, if the zeroing
instructions have already been inserted.  The places that currently
test EPILOGUE_USES should then test this new wrapper function instead.

After inserting the zeroing instructions, the pass should recompute the
live-out sets based on this.

 But the dataflow information has to be correct between
 pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
 any pass in that region could clobber the registers in the same way.
>>> 
>>> You mean, the data flow information will be not correct after 
>>> pass_free_cfg? 
>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
>>> “return” generated in “pass_delay_slots”, 
>>> If we want to generate zeroing for the new “return” which was generated in 
>>> “pass_delay_slots”, can we correctly to do so?
>> 
>> …the zeroing has to be done before pass_free_cfg, because the information
>> isn't reliable after that point.  I think it would make sense to do it
>> before pass_compute_alignments, because inserting the zeros will affect
>> alignment.
>
> Okay. 
>
> Then there is another problem:  what about the new “return”s that are 
> generated at pass_delay_slots?
>
> Should we generate the zeroing for these new returns? Since the data flow 
> information might not be correct at this pass,
> It looks like that there is no correct way to add the zeroing insn for these 
> new “return”, then, what should we do about this?

pass_delay_slots isn't a problem.  It doesn't change *what* happens
on each return path, it just changes how the instructions to achieve
it are arranged.

So e.g. if every path through the function clears register R before
pass_delay_slots, and if that clearing is represented as being necessary,
then every path through the function will clear register R after the pass
as well.

>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>> to see whether there's a target-specific reason why a register can't
>> be clobbered.
>
> /* Return true if is OK to use a hard register REGNO as scratch register
>in peephole2.  */
> DEFHOOK
> (hard_regno_scratch_ok,
>
>
> Is this checking only valid for pass_peephole2?

No, that comment looks out of date.  The hook is already used in
postreload, for example.

Thanks,
Richard


[COMMITTED 1/2] bpf: use the expected instruction for NOPs

2020-09-14 Thread Jose E. Marchesi via Gcc-patches
The BPF ISA doesn't have a no-operation instruction, but in practice
the Linux kernel verifier performs some optimizations that rely on
these instructions to be encoded in a particular way.  As it turns
out, we were using the "wrong" instruction in GCC.

This patch makes GCC to generate the expected instruction for NOP (a
`ja 0') and also adds a test to make sure this is the case.

Tested in bpf-unknown-none targets.

2020-09-14  Jose E. Marchesi  

gcc/

* config/bpf/bpf.md ("nop"): Re-define as `ja 0'.

gcc/testsuite/

* gcc.target/bpf/nop-1.c: New test.
---
 gcc/config/bpf/bpf.md|  7 ++-
 gcc/testsuite/gcc.target/bpf/nop-1.c | 14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/nop-1.c

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 41bb4fcd9a7..769d8ea0096 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -82,10 +82,15 @@ (define_mode_attr msuffix [(SI "32") (DI "")])
 
  NOPs
 
+;; The Linux kernel verifier performs some optimizations that rely on
+;; nop instructions to be encoded as `ja 0', i.e. a jump to offset 0,
+;; which actually means to jump to the next instruction, since in BPF
+;; offsets are expressed in 64-bit words _minus one_.
+
 (define_insn "nop"
   [(const_int 0)]
   ""
-  "mov\t%%r0,%%r0"
+  "ja\t0"
   [(set_attr "type" "alu")])
 
  Arithmetic/Logical
diff --git a/gcc/testsuite/gcc.target/bpf/nop-1.c 
b/gcc/testsuite/gcc.target/bpf/nop-1.c
new file mode 100644
index 000..c4d274f6bad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/nop-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99 --patchable-function-entry=2,1" } */
+
+/* The purpose of this test is to make sure the right instruction is
+   generated for NOPs.  See bpf.md for a description on why this is
+   important.  */
+
+int
+foo ()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler "foo:\n\t*ja\t0" } } */
-- 
2.25.0.2.g232378479e



Re: [PATCH 1/2] AArch64: Cleanup CPU option processing code

2020-09-14 Thread Wilco Dijkstra
Hi Richard,

>On 14/09/2020 15:19, Wilco Dijkstra wrote:
>> The --with-cpu/--with-arch configure option processing not only checks valid 
>> arguments
>> but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.  This 
>> isn't used
>> however since a --with-cpu is translated into a -mcpu option which is 
>> processed as if
>> written on the command-line (so TARGET_CPU_DEFAULT is never accessed).
>> 
>> So remove all the complex processing and bitmask, and just validate the 
>> option.
>> Fix a bug that always reports valid architecture extensions as invalid.  As 
>> a result
>> the CPU processing in aarch64.c can be simplified.
>
> Doesn't this change the default behaviour if cc1 is run directly?  I'm
> not saying this is the wrong thing to do (I think we rely on this in the
> arm port), but I just want to understand by what you mean when you say
> 'never used'.

Yes it does change default behaviour of cc1, but I don't think it does matter.
I bootstrapped and passed regress with an assert to verify TARGET_CPU_DEFAULT
is never accessed if there is a --with-cpu configure option. So using cc1 
directly
is not standard practice (and I believe most other configuration options are not
baked into cc1 either).

How do we rely on it in the Arm port? That doesn't sound right...

Cheers,
Wilco


Re: c++: local externs in templates do not get template head

2020-09-14 Thread Marek Polacek via Gcc-patches
On Mon, Sep 14, 2020 at 12:53:18PM -0400, Nathan Sidwell wrote:
> On 9/14/20 12:49 PM, Marek Polacek wrote:
> > On Mon, Sep 14, 2020 at 12:45:33PM -0400, Nathan Sidwell wrote:
> > > Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
> > > teach the template machinery not to give them a TEMPLATE_DECL head,
> > > and the instantiation machinery treat them as the local specialiations
> > > they are.  (openmp UDRs also fall into this category, and are dealt
> > > with similarly.)
> > > 
> > >  gcc/cp/
> > >  * pt.c (push_template_decl_real): Don't attach a template head to
> > >  local externs.
> > >  (tsubst_function_decl): Add support for headless local extern
> > >  decls.
> > >  (tsubst_decl): Add support for headless local extern decls.
> > > 
> > > pushed to trunk
> > > -- 
> > > Nathan Sidwell
> > 
> > > diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
> > > index 0f52a9ed77d..8124efcbe24 100644
> > > --- i/gcc/cp/pt.c
> > > +++ w/gcc/cp/pt.c
> > > @@ -6071,7 +6071,11 @@ push_template_decl_real (tree decl, bool is_friend)
> > >   {
> > > if (is_primary)
> > >   retrofit_lang_decl (decl);
> > > -  if (DECL_LANG_SPECIFIC (decl))
> > > +  if (DECL_LANG_SPECIFIC (decl)
> > > +   && ((TREE_CODE (decl) != VAR_DECL
> > > +&& TREE_CODE (decl) != FUNCTION_DECL)
> > 
> > This is !VAR_OR_FUNCTION_DECL_P.  Want me to "fix" that as obvious?
> 
> ah, thanks -- great.  I knew of VAR_P and the lack of FUNCTION_P, but not
> that one.  (bah, who needs consistency!)

Yup...  I keep wishing we had ARRAY_TYPE_P, for example.

Anyway, I just pushed this:

gcc/cp/ChangeLog:

* pt.c (push_template_decl_real): Use VAR_OR_FUNCTION_DECL_P.
---
 gcc/cp/pt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 8124efcbe24..c630ef5a070 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6072,8 +6072,7 @@ push_template_decl_real (tree decl, bool is_friend)
   if (is_primary)
retrofit_lang_decl (decl);
   if (DECL_LANG_SPECIFIC (decl)
- && ((TREE_CODE (decl) != VAR_DECL
-  && TREE_CODE (decl) != FUNCTION_DECL)
+ && (!VAR_OR_FUNCTION_DECL_P (decl)
  || !ctx
  || !DECL_LOCAL_DECL_P (decl)))
DECL_TEMPLATE_INFO (decl) = info;

base-commit: 5bcc0fa05ef713594f6c6d55d5c837e13a9c9803
-- 
2.26.2



[pushed] Darwin, X86, testsuite: Fix pr87767 tests for Darwin.

2020-09-14 Thread Iain Sandoe
Hi

The tests assume non-PIC for m32 (which means that they fail on default
PIC targets, like Darwin), also on Linux with -fpic, FWIW.

There is also a missing space before the closing '}' on one of the dg-require-
effective-target which means that test fails on machines without avx512f.

Tested on m32 and m64 on 
 x86_64-darwin16 (no AVX512)
 x86_64-darwin17 (with AVX512),
 i686-darwin9 (no AVX support)
 x86_64-linux-gnu (AVX512, no changes).

pushed to master as obvious.
thanks
Iain

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Make the test
run as non-dynamic for m32 Darwin.
* gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-6.c: Adjust dg-requires
clause.
---
 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c | 1 +
 gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-3.c | 1 +
 gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c | 1 +
 8 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c 
b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
index a8ee5f5faf1..0563e696316 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
@@ -1,6 +1,7 @@
 /* PR target/87767 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mavx512dq" } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} }
 /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } }  */
 /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 5 } }  */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c 
b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
index c2f22c4ee5a..e57a5682c31 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
@@ -1,6 +1,7 @@
 /* PR target/87767 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} }
 /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 } }  */
 /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 4 } }  */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c 
b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
index 72e1098ccbe..ffbe95980ca 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
@@ -1,6 +1,7 @@
 /* PR target/87767 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} }
 /* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to8\\\}" 4 } }  */
 /* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to16\\\}" 4 } }  */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c 
b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
index f288f83158c..5bd9a4db339 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
@@ -1,7 +1,7 @@
 /* PR target/87767 */
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512f" } */
-/* { dg-require-effective-target avx512f} */
+/* { dg-require-effective-target avx512f } */
 
 #include "avx512f-helper.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c 
b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
index a8f145d8589..194d888093b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
@@ -1,6 +1,7 @@
 /* PR target/87767 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mavx512vl" } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} }
 /* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
 /* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
 /* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c 

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Qing Zhao via Gcc-patches



> On Sep 14, 2020, at 11:33 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>> Like I mentioned earlier though, passes that run after
>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>> weren't previously used.  For example, on x86_64, the function might
>>> not use %r8 when the prologue, epilogue and returns are generated,
>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>> the “used” version of the new command-line option is supposed to clear
>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>> collected at the point that the return is generated.
>> 
>> Thanks for the information.
>> 
>>> 
>>> That's why I think it's more robust to do this later (at the beginning
>>> of pass_late_compilation) and insert the zeroing before returns that
>>> already exist.
>> 
>> Yes, looks like it’s not correct to insert the zeroing at the time when 
>> prologue, epilogue and return are generated.
>> As I also checked, “return” might be also generated as late as pass 
>> “pass_delay_slots”,  So, shall we move the
>> New pass as late as possible?
> 
> If we insert the zeroing before pass_delay_slots and describe the
> result correctly, pass_delay_slots should do the right thing.
> 
> Describing the result correctly includes ensuring that the cleared
> registers are treated as live on exit from the function, so that the
> zeroing doesn't get deleted again, or skipped by pass_delay_slots.

In the current implementation for x86, when we generating a zeroing insn as the 
following:

(insn 18 16 19 2 (set (reg:SI 1 dx)
(const_int 0 [0])) "t10.c":11:1 -1
 (nil))
(insn 19 18 20 2 (unspec_volatile [
(reg:SI 1 dx)
] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
 (nil))

i.e, after each zeroing insn, the register that is zeroed is marked as 
“UNSPECV_PRO_EPILOGUE_USE”, 
By doing this, we can avoid this zeroing insn from being deleted or skipped. 

Is doing this enough to describe the result correctly?
Is there other thing we need to do in addition to this?

> 
>> Can I put it immediately before “pass_final”? What’s the latest place
>> I can put it?
> 
> Like you say here…
> 
> So IMO the pass should just search for all the
> returns in a function and insert the zeroing instructions before
> each one.
 
 I was considering this approach too for some time, however, there is one 
 major issue with this as 
 Segher mentioned, The middle end does not know some details on the 
 registers, lacking such 
 detailed information might result incorrect code generation at middle end.
 
 For example, on x86_64 target, when “return” with pop, the scratch 
 register “ECX” will be 
 used for returning, then it’s incorrect to zero “ecx” before generating 
 the return. Since middle end
 doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
 incorrect code might be 
 generated. 
 
 Segher also mentioned that on Power, there are some scratch registers also 
 are used for 
 Other purpose, clearing them before return is not correct. 
>>> 
>>> But the dataflow information has to be correct between
>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>> any pass in that region could clobber the registers in the same way.
>> 
>> You mean, the data flow information will be not correct after pass_free_cfg? 
>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
>> “return” generated in “pass_delay_slots”, 
>> If we want to generate zeroing for the new “return” which was generated in 
>> “pass_delay_slots”, can we correctly to do so?
> 
> …the zeroing has to be done before pass_free_cfg, because the information
> isn't reliable after that point.  I think it would make sense to do it
> before pass_compute_alignments, because inserting the zeros will affect
> alignment.

Okay. 

Then there is another problem:  what about the new “return”s that are generated 
at pass_delay_slots?

Should we generate the zeroing for these new returns? Since the data flow 
information might not be correct at this pass,
It looks like that there is no correct way to add the zeroing insn for these 
new “return”, then, what should we do about this?

> 
>>> To get the registers that are live before the return, you can start with
>>> the registers that are live out from the block that contains the return,
>>> then “simulate” the return instruction backwards to get the set of
>>> registers that are live before the return instruction
>>> (see df_simulate_one_insn_backwards).
>> 
>> Okay. 
>> Currently, I am using the following to check whether a reg is live out the 
>> block that contains the return:
>> 
>> /* Check whether the hard register REGNO is live at the exit block
>> * of the current routine.  */
>> static bool
>> is_live_reg_at_exit (unsigned int regno)
>> {
>>  edge e;
>>  edge_iterator ei;
>> 
>>  FOR_EACH_EDGE (e, ei, 

[PATCH] c, c++: Implement -Wsizeof-array-div [PR91741]

2020-09-14 Thread Marek Polacek via Gcc-patches
This patch implements a new warning, -Wsizeof-array-div.  It warns about
code like

  int arr[10];
  sizeof (arr) / sizeof (short);

where we have a division of two sizeof expressions, where the first
argument is an array, and the second sizeof does not equal the size
of the array element.  See e.g. .

Clang makes it possible to suppress the warning by parenthesizing the
second sizeof like this:

  sizeof (arr) / (sizeof (short));

so I followed suit.  In the C++ FE this was rather easy, because
finish_parenthesized_expr already set TREE_NO_WARNING.  In the C FE
it was trickier; I've added a NOP_EXPR to discern between the non-()
and () versions.

This warning is enabled by -Wall.  An example of the output:

x.c:5:23: warning: expression does not compute the number of elements in this 
array; element type is ‘int’, not ‘short int’ [-Wsizeof-array-div]
5 |   return sizeof (arr) / sizeof (short);
  |  ~^~~~
x.c:5:25: note: add parentheses around ‘sizeof (short int)’ to silence this 
warning
5 |   return sizeof (arr) / sizeof (short);
  | ^~
  | ( )
x.c:4:7: note: array ‘arr’ declared here
4 |   int arr[10];
  |   ^~~

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/91741
* c-common.h (maybe_warn_sizeof_array_div): Declare.
* c-warn.c (sizeof_pointer_memaccess_warning): Unwrap NOP_EXPRs.
(maybe_warn_sizeof_array_div): New function.
* c.opt (Wsizeof-array-div): New option.

gcc/c/ChangeLog:

PR c++/91741
* c-parser.c (c_parser_binary_expression): Implement -Wsizeof-array-div.
(c_parser_postfix_expression): Wrap a SIZEOF_EXPR in a NOP_EXPR.
* c-tree.h (char_type_p): Declare.
* c-typeck.c (char_type_p): No longer static.

gcc/cp/ChangeLog:

PR c++/91741
* typeck.c (cp_build_binary_op): Implement -Wsizeof-array-div.

gcc/ChangeLog:

PR c++/91741
* doc/invoke.texi: Document -Wsizeof-array-div.

gcc/testsuite/ChangeLog:

PR c++/91741
* c-c++-common/Wsizeof-pointer-div.c: Add dg-warning.
* c-c++-common/Wsizeof-array-div1.c: New test.
* g++.dg/warn/Wsizeof-array-div1.C: New test.
* g++.dg/warn/Wsizeof-array-div2.C: New test.
---
 gcc/c-family/c-common.h   |  1 +
 gcc/c-family/c-warn.c | 51 +
 gcc/c-family/c.opt|  5 ++
 gcc/c/c-parser.c  | 46 ++-
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.c  |  2 +-
 gcc/cp/typeck.c   | 10 +++-
 gcc/doc/invoke.texi   | 19 +++
 .../c-c++-common/Wsizeof-array-div1.c | 56 +++
 .../c-c++-common/Wsizeof-pointer-div.c|  2 +-
 .../g++.dg/warn/Wsizeof-array-div1.C  | 37 
 .../g++.dg/warn/Wsizeof-array-div2.C  | 15 +
 12 files changed, 228 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wsizeof-array-div1.c
 create mode 100644 gcc/testsuite/g++.dg/warn/Wsizeof-array-div1.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wsizeof-array-div2.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 4fc64bc4aa6..443aaa2ca9c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1321,6 +1321,7 @@ extern void c_do_switch_warnings (splay_tree, location_t, 
tree, tree, bool);
 extern void warn_for_omitted_condop (location_t, tree);
 extern bool warn_for_restrict (unsigned, tree *, unsigned);
 extern void warn_for_address_or_pointer_of_packed_member (tree, tree);
+extern void maybe_warn_sizeof_array_div (location_t, tree, tree, tree, tree);
 
 /* Places where an lvalue, or modifiable lvalue, may be required.
Used to select diagnostic messages in lvalue_error and
diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index c32d8228b5c..e0f0cf65a57 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -864,6 +864,10 @@ sizeof_pointer_memaccess_warning (location_t 
*sizeof_arg_loc, tree callee,
   || error_operand_p (sizeof_arg[idx]))
 return;
 
+  /* Unwrap the NOP_EXPR signalling that the sizeof was parenthesized.  */
+  if (TREE_CODE (sizeof_arg[idx]) == NOP_EXPR)
+sizeof_arg[idx] = TREE_OPERAND (sizeof_arg[idx], 0);
+
   type = TYPE_P (sizeof_arg[idx])
 ? sizeof_arg[idx] : TREE_TYPE (sizeof_arg[idx]);
 
@@ -3099,3 +3103,50 @@ warn_for_address_or_pointer_of_packed_member (tree type, 
tree rhs)
 
   check_and_warn_address_or_pointer_of_packed_member (type, rhs);
 }
+
+/* Warn about divisions of two sizeof operators when the first one is applied
+   to an array and the divisor does not equal the size of the array element.
+   For instance:
+

Re: [PATCH]rs6000: Remove useless insns fed into lvx/stvx [PR97019]

2020-09-14 Thread Segher Boessenkool
Hi!

On Mon, Sep 14, 2020 at 05:53:13PM +0800, Kewen.Lin wrote:
> This patch is to extend the existing function find_alignment_op to
> check all defintions of base_reg are AND operations with mask -16B
> to force the alignment.  If all are satifised, it passes all AND
> operations and instructions in one vector to recombine_lvx_pattern
> and recombine_stvx_pattern, they will remove all useless ANDs further.

Great :-)

>   * config/rs6000/rs6000-p8swap.c (insn_rtx_pair_t): New type.

Please don't do that.  The "first" and "second" are completely
meaningless.  Also,  keeping it separate arrays can very well result in
better machine code, and certainly makes easier to read source code.

> +static bool
> +find_alignment_op (rtx_insn *insn, rtx base_reg,
> +vec *and_insn_vec)

Don't name vecs "_vec" (just keep it "and_insn" here, or sometimes
and_insns is clearer).

> -  rtx and_operation = 0;
> +  rtx and_operation = NULL_RTX;

Don't change code randomly (to something arguably worse, even).

> -  if (and_operation != 0)
> - break;
> +   rtx_insn *and_insn = DF_REF_INSN (base_def_link->ref);
> +   and_operation = alignment_mask (and_insn);
> +
> +   /* Stop if we find any one which doesn't align.  */
> +   if (and_operation == NULL_RTX)
> + return false;

No.   if (!and_operation)  would be fine though :-)

> -  return and_operation;
> +  return and_operation != NULL_RTX;

It was fine already.  You can write  !!and_operation  if you want to
explicitly convert it to bool (but that is not helpful usually).

> +  bool all_and_p = find_alignment_op (insn, base_reg, _insn_vec);

This is not a predicate.  Do not name it _p please.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97019.c
> @@ -0,0 +1,79 @@
> +/* { dg-do compile { target { powerpc_p8vector_ok && le } } } */

Why only on LE?  (If there is a reason, the testcase should say; if
there is not, well, it shouldn't say it does then :-) )

Please resend with those things fixed.


Segher


Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> Something like this for GCC 8 and 9.

Guess my preference would be to do this everywhere and then let's discuss if
we change the warning into error there or keep it being deprecated.

Though, let's see what others want to say about this.

> Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
> pointer is clobbered by asm statement.
> 
> gcc/
> 
>   PR target/97032
>   * cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
>   to true if the stack pointer is clobbered by asm statement.
>   * emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
>   * config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
>   if the stack pointer is clobbered by asm statement.
> 
> gcc/testsuite/
> 
>   PR target/97032
>   * gcc.target/i386/pr97032.c: New test.

Jakub



Re: c++: local externs in templates do not get template head

2020-09-14 Thread Nathan Sidwell

On 9/14/20 12:49 PM, Marek Polacek wrote:

On Mon, Sep 14, 2020 at 12:45:33PM -0400, Nathan Sidwell wrote:

Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
teach the template machinery not to give them a TEMPLATE_DECL head,
and the instantiation machinery treat them as the local specialiations
they are.  (openmp UDRs also fall into this category, and are dealt
with similarly.)

 gcc/cp/
 * pt.c (push_template_decl_real): Don't attach a template head to
 local externs.
 (tsubst_function_decl): Add support for headless local extern
 decls.
 (tsubst_decl): Add support for headless local extern decls.

pushed to trunk
--
Nathan Sidwell



diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 0f52a9ed77d..8124efcbe24 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -6071,7 +6071,11 @@ push_template_decl_real (tree decl, bool is_friend)
  {
if (is_primary)
retrofit_lang_decl (decl);
-  if (DECL_LANG_SPECIFIC (decl))
+  if (DECL_LANG_SPECIFIC (decl)
+ && ((TREE_CODE (decl) != VAR_DECL
+  && TREE_CODE (decl) != FUNCTION_DECL)


This is !VAR_OR_FUNCTION_DECL_P.  Want me to "fix" that as obvious?


ah, thanks -- great.  I knew of VAR_P and the lack of FUNCTION_P, but 
not that one.  (bah, who needs consistency!)



--
Nathan Sidwell


Re: c++: local externs in templates do not get template head

2020-09-14 Thread Marek Polacek via Gcc-patches
On Mon, Sep 14, 2020 at 12:45:33PM -0400, Nathan Sidwell wrote:
> Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
> teach the template machinery not to give them a TEMPLATE_DECL head,
> and the instantiation machinery treat them as the local specialiations
> they are.  (openmp UDRs also fall into this category, and are dealt
> with similarly.)
> 
> gcc/cp/
> * pt.c (push_template_decl_real): Don't attach a template head to
> local externs.
> (tsubst_function_decl): Add support for headless local extern
> decls.
> (tsubst_decl): Add support for headless local extern decls.
> 
> pushed to trunk
> -- 
> Nathan Sidwell

> diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
> index 0f52a9ed77d..8124efcbe24 100644
> --- i/gcc/cp/pt.c
> +++ w/gcc/cp/pt.c
> @@ -6071,7 +6071,11 @@ push_template_decl_real (tree decl, bool is_friend)
>  {
>if (is_primary)
>   retrofit_lang_decl (decl);
> -  if (DECL_LANG_SPECIFIC (decl))
> +  if (DECL_LANG_SPECIFIC (decl)
> +   && ((TREE_CODE (decl) != VAR_DECL
> +&& TREE_CODE (decl) != FUNCTION_DECL)

This is !VAR_OR_FUNCTION_DECL_P.  Want me to "fix" that as obvious?

Marek



c++: local externs in templates do not get template head

2020-09-14 Thread Nathan Sidwell

Now we consistently mark local externs with DECL_LOCAL_DECL_P, we can
teach the template machinery not to give them a TEMPLATE_DECL head,
and the instantiation machinery treat them as the local specialiations
they are.  (openmp UDRs also fall into this category, and are dealt
with similarly.)

gcc/cp/
* pt.c (push_template_decl_real): Don't attach a template head to
local externs.
(tsubst_function_decl): Add support for headless local extern
decls.
(tsubst_decl): Add support for headless local extern decls.

pushed to trunk
--
Nathan Sidwell
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 0f52a9ed77d..8124efcbe24 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -6071,7 +6071,11 @@ push_template_decl_real (tree decl, bool is_friend)
 {
   if (is_primary)
 	retrofit_lang_decl (decl);
-  if (DECL_LANG_SPECIFIC (decl))
+  if (DECL_LANG_SPECIFIC (decl)
+	  && ((TREE_CODE (decl) != VAR_DECL
+	   && TREE_CODE (decl) != FUNCTION_DECL)
+	  || !ctx
+	  || !DECL_LOCAL_DECL_P (decl)))
 	DECL_TEMPLATE_INFO (decl) = info;
 }
 
@@ -13701,14 +13705,20 @@ static tree
 tsubst_function_decl (tree t, tree args, tsubst_flags_t complain,
 		  tree lambda_fntype)
 {
-  tree gen_tmpl, argvec;
+  tree gen_tmpl = NULL_TREE, argvec = NULL_TREE;
   hashval_t hash = 0;
   tree in_decl = t;
 
   /* Nobody should be tsubst'ing into non-template functions.  */
-  gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE);
+  gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE
+	  || DECL_LOCAL_DECL_P (t));
 
-  if (TREE_CODE (DECL_TI_TEMPLATE (t)) == TEMPLATE_DECL)
+  if (DECL_LOCAL_DECL_P (t))
+{
+  if (tree spec = retrieve_local_specialization (t))
+	return spec;
+}
+  else if (TREE_CODE (DECL_TI_TEMPLATE (t)) == TEMPLATE_DECL)
 {
   /* If T is not dependent, just return it.  */
   if (!uses_template_parms (DECL_TI_ARGS (t))
@@ -13958,6 +13968,11 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain,
 	  && !uses_template_parms (argvec))
 	tsubst_default_arguments (r, complain);
 }
+  else if (DECL_LOCAL_DECL_P (r))
+{
+  if (!cp_unevaluated_operand)
+	register_local_specialization (r, t);
+}
   else
 DECL_TEMPLATE_INFO (r) = NULL_TREE;
 
@@ -14503,11 +14518,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
   {
 	tree argvec = NULL_TREE;
 	tree gen_tmpl = NULL_TREE;
-	tree spec;
 	tree tmpl = NULL_TREE;
-	tree ctx;
 	tree type = NULL_TREE;
-	bool local_p;
 
 	if (TREE_TYPE (t) == error_mark_node)
 	  RETURN (error_mark_node);
@@ -14529,19 +14541,13 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 
 	/* Check to see if we already have the specialization we
 	   need.  */
-	spec = NULL_TREE;
-	if (DECL_CLASS_SCOPE_P (t) || DECL_NAMESPACE_SCOPE_P (t))
+	tree spec = NULL_TREE;
+	bool local_p = false;
+	tree ctx = DECL_CONTEXT (t);
+	if (!(VAR_P (t) && DECL_LOCAL_DECL_P (t))
+	&& (DECL_CLASS_SCOPE_P (t) || DECL_NAMESPACE_SCOPE_P (t)))
 	  {
-	/* T is a static data member or namespace-scope entity.
-	   We have to substitute into namespace-scope variables
-	   (not just variable templates) because of cases like:
-
-	 template  void f() { extern T t; }
-
-	   where the entity referenced is not known until
-	   instantiation time.  */
 	local_p = false;
-	ctx = DECL_CONTEXT (t);
 	if (DECL_CLASS_SCOPE_P (t))
 	  {
 		ctx = tsubst_aggr_type (ctx, args,
@@ -14581,10 +14587,11 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	  }
 	else
 	  {
+	if (!(VAR_P (t) && DECL_LOCAL_DECL_P (t)))
+	  /* Subsequent calls to pushdecl will fill this in.  */
+	  ctx = NULL_TREE;
 	/* A local variable.  */
 	local_p = true;
-	/* Subsequent calls to pushdecl will fill this in.  */
-	ctx = NULL_TREE;
 	/* Unless this is a reference to a static variable from an
 	   enclosing function, in which case we need to fill it in now.  */
 	if (TREE_STATIC (t))


[PATCH] aarch64: Fix ICE on fpsr fpcr getters [PR96968]

2020-09-14 Thread Andrea Corallo
Hi all,

I'd like to submit this patch to fix PR96968 and add the corresponding
test.

The fix makes sure the target of these getters is a register so the
insn can be pattern matched correctly.

Regtested and bootsraped on aarch64-linux-gnu.

Okay for trunk?

Thanks

  Andrea

>From 74f5223724fe8ff2649ce6a0860f415052340a04 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Mon, 14 Sep 2020 14:47:24 +0100
Subject: [PATCH] aarch64: Fix ICE on fpsr fpcr getters [PR96968]

gcc/ChangeLog

2020-09-14  Andrea Corallo  

PR target/96968
* config/aarch64/aarch64-builtins.c
(aarch64_expand_fpsr_fpcr_setter): Fix comment nit.
(aarch64_expand_fpsr_fpcr_getter): New function, expand these
getters and have the target in a register.
(aarch64_general_expand_builtin): Make use of.

gcc/testsuite/ChangeLog

2020-09-14  Andrea Corallo  

PR target/96968
* gcc.target/aarch64/pr96968.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.c  | 24 ---
 gcc/testsuite/gcc.target/aarch64/pr96968.c | 28 ++
 2 files changed, 43 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr96968.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 4f33dd936c7..f640a9e5de1 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2024,7 +2024,7 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
   return target;
 }
 
-/* Expand an expression EXP as fpsr or cpsr setter (depending on
+/* Expand an expression EXP as fpsr or fpcr setter (depending on
UNSPEC) using MODE.  */
 static void
 aarch64_expand_fpsr_fpcr_setter (int unspec, machine_mode mode, tree exp)
@@ -2034,6 +2034,16 @@ aarch64_expand_fpsr_fpcr_setter (int unspec, 
machine_mode mode, tree exp)
   emit_insn (gen_aarch64_set (unspec, mode, op));
 }
 
+/* Expand a fpsr or fpcr getter (depending on UNSPEC) using MODE.
+   Return the target.  */
+static rtx
+aarch64_expand_fpsr_fpcr_getter (int unspec, machine_mode mode)
+{
+  rtx target = gen_reg_rtx (mode);
+  emit_insn (gen_aarch64_get (unspec, mode, target));
+  return target;
+}
+
 /* Expand an expression EXP that calls built-in function FCODE,
with result going to TARGET if that's convenient.  IGNORE is true
if the result of the builtin is ignored.  */
@@ -2048,26 +2058,22 @@ aarch64_general_expand_builtin (unsigned int fcode, 
tree exp, rtx target,
   switch (fcode)
 {
 case AARCH64_BUILTIN_GET_FPCR:
-  emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, SImode, target));
-  return target;
+  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, SImode);
 case AARCH64_BUILTIN_SET_FPCR:
   aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, SImode, exp);
   return target;
 case AARCH64_BUILTIN_GET_FPSR:
-  emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, SImode, target));
-  return target;
+  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, SImode);
 case AARCH64_BUILTIN_SET_FPSR:
   aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, SImode, exp);
   return target;
 case AARCH64_BUILTIN_GET_FPCR64:
-  emit_insn (gen_aarch64_get (UNSPECV_GET_FPCR, DImode, target));
-  return target;
+  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPCR, DImode);
 case AARCH64_BUILTIN_SET_FPCR64:
   aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPCR, DImode, exp);
   return target;
 case AARCH64_BUILTIN_GET_FPSR64:
-  emit_insn (gen_aarch64_get (UNSPECV_GET_FPSR, DImode, target));
-  return target;
+  return aarch64_expand_fpsr_fpcr_getter (UNSPECV_GET_FPSR, DImode);
 case AARCH64_BUILTIN_SET_FPSR64:
   aarch64_expand_fpsr_fpcr_setter (UNSPECV_SET_FPSR, DImode, exp);
   return target;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr96968.c 
b/gcc/testsuite/gcc.target/aarch64/pr96968.c
new file mode 100644
index 000..21ffd955153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr96968.c
@@ -0,0 +1,28 @@
+/* { dg-options "-O1" } */
+
+void
+fpsr_getter (void)
+{
+  unsigned int fpsr = __builtin_aarch64_get_fpsr ();
+}
+
+void
+fpsr64_getter (void)
+{
+  unsigned long fpsr = __builtin_aarch64_get_fpsr64 ();
+}
+
+void
+fpcr_getter (void)
+{
+  unsigned int fpcr = __builtin_aarch64_get_fpcr ();
+}
+
+void
+fpcr64_getter (void)
+{
+  unsigned long fpcr = __builtin_aarch64_get_fpcr64 ();
+}
+
+/* { dg-final { scan-assembler-times {\tmrs\tx0, fpsr\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tmrs\tx0, fpcr\n} 2 } } */
-- 
2.17.1



[committed] analyzer: add -param=analyzer-max-constraints=

2020-09-14 Thread David Malcolm via Gcc-patches
On attempting to run the full test suite with -fanalyzer via
  make check RUNTESTFLAGS="-v -v --target_board=unix/-fanalyzer"
I saw it get stuck on:
  gcc.c-torture/compile/20001226-1.c
It turns out this was on a debug build, rather than a release build;
but a release build with -fanalyzer took:
  real 1m33.689s
  user 1m30.239s
  sys  0m2.727s
as compared to:
  real 0m2.361s
  user 0m2.107s
  sys  0m0.214s
without -fanalyzer.

This torture test performs 64 * 64 uniqely-coded comparisons between
elements of a pair of arrays until it finds an element that's different,
leading to an accumulation of 4096 constraints along the path where
no difference is found.

"perf" shows most of the time is spent in canonicalizing and copying
constraint_manager instances, presumably as it copies and merges states
with increasingly more complex sets of constraints as it analyzes
further along the "no differences yet" path.

This patch crudely works around this by adding a
  -param=analyzer-max-constraints=
limit, defaulting to 20, above which constraints will be silently
dropped.  With this -fanalyzer takes:
  real 0m6.935s
  user 0m6.413s
  sys  0m0.396s
on the above case.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 05ab8befe1230c46116aae37d44f2ce0933e8dae.

gcc/analyzer/ChangeLog:
* analyzer.opt (-param=analyzer-max-constraints=): New param.
* constraint-manager.cc
(constraint_manager::add_constraint_internal): Silently reject
attempts to add constraints when the above limit is reached.
---
 gcc/analyzer/analyzer.opt  | 4 
 gcc/analyzer/constraint-manager.cc | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 07bff339217..94a686db6b3 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -30,6 +30,10 @@ The maximum number of 'after supernode' exploded nodes 
within the analyzer per s
 Common Joined UInteger Var(param_analyzer_max_enodes_per_program_point) 
Init(8) Param
 The maximum number of exploded nodes per program point within the analyzer, 
before terminating analysis of that point.
 
+-param=analyzer-max-constraints=
+Common Joined UInteger Var(param_analyzer_max_constraints) Init(20) Param
+The maximum number of constraints per state.
+
 -param=analyzer-max-recursion-depth=
 Common Joined UInteger Var(param_analyzer_max_recursion_depth) Init(2) Param
 The maximum number of times a callsite can appear in a call stack within the 
analyzer, before terminating analysis of a call that would recurse deeper.
diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index c10b770f294..e578e0502f2 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -940,6 +940,9 @@ constraint_manager::add_constraint_internal (equiv_class_id 
lhs_id,
  enum constraint_op c_op,
  equiv_class_id rhs_id)
 {
+  if (m_constraints.length () >= param_analyzer_max_constraints)
+return;
+
   constraint new_c (lhs_id, c_op, rhs_id);
 
   /* Remove existing constraints that would be implied by the
-- 
2.26.2



[committed] analyzer: fix constraint explosion on many-cased-switch [PR96653]

2020-09-14 Thread David Malcolm via Gcc-patches
PR analyzer/96653 reports a CPU-time and memory explosion in -fanalyzer
seen in Linux 5.9-rc1:drivers/media/v4l2-core/v4l2-ctrls.c on a switch
statement with many cases.

The issue is some old code in constraint_manager::get_or_add_equiv_class
for ensuring that comparisons between equivalence classes containing
constants work correctly.  The old code added constraints for every
pair of ECs containing constants, leading to O(N^2) constraints (for
N constants).  Given that get_or_add_equiv_class also involves O(N)
comparisons, this led to at least O(N^3) CPU time, and O(N^2) memory
consumption when handling the "default" case, where N is the number of
other cases in the switch statement.

The state rewrite of r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d
added checking for comparisons between constants, making these explicit
constraints redundant, but failed to remove the code mentioned above.

This patch removes it, fixing the blow-up of constraints in the default
case.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3190-g799dd4e10047a4aa772fd69c910c5c6a96c36b9f.

gcc/analyzer/ChangeLog:
PR analyzer/96653
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Don't accumulate
transitive closure of all constraints on constants.

gcc/testsuite/ChangeLog:
PR analyzer/96653
* gcc.dg/analyzer/pr96653.c: New test.
---
 gcc/analyzer/constraint-manager.cc  |   33 -
 gcc/testsuite/gcc.dg/analyzer/pr96653.c | 1105 +++
 2 files changed, 1105 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96653.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 2f7a6537493..c10b770f294 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1160,39 +1160,6 @@ constraint_manager::get_or_add_equiv_class (const svalue 
*sval)
 
   equiv_class_id new_id (m_equiv_classes.length () - 1);
 
-  if (sval->maybe_get_constant ())
-{
-  /* If we have a new EC for a constant, add constraints comparing this
-to other constants we may have (so that we accumulate the transitive
-closure of all constraints on constants as the constants are
-added).  */
-  for (equiv_class_id other_id (0); other_id.m_idx < new_id.m_idx;
-  other_id.m_idx++)
-   {
- const equiv_class _ec = other_id.get_obj (*this);
- if (other_ec.m_constant
- && types_compatible_p (TREE_TYPE (new_ec->m_constant),
-TREE_TYPE (other_ec.m_constant)))
-   {
- /* If we have two ECs, both with constants, the constants must be
-non-equal (or they would be in the same EC).
-Determine the direction of the inequality, and record that
-fact.  */
- tree lt
-   = fold_binary (LT_EXPR, boolean_type_node,
-  new_ec->m_constant, other_ec.m_constant);
- if (lt == boolean_true_node)
-   add_constraint_internal (new_id, CONSTRAINT_LT, other_id);
- else if (lt == boolean_false_node)
-   add_constraint_internal (other_id, CONSTRAINT_LT, new_id);
- /* Refresh new_id, in case ECs were merged.  SVAL should always
-be present by now, so this should never lead to a
-recursion.  */
- new_id = get_or_add_equiv_class (sval);
-   }
-   }
-}
-
   return new_id;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96653.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96653.c
new file mode 100644
index 000..494c5312c6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96653.c
@@ -0,0 +1,1105 @@
+/* Examples of switch statements with many cases (with default values).
+   Adapted from Linux 5.9-rc1:drivers/media/v4l2-core/v4l2-ctrls.c.  */
+
+/* { dg-additional-options "-O1 -Wno-analyzer-too-complex" } */
+// TODO: remove need for -Wno-analyzer-too-complex
+
+typedef unsigned int u32;
+typedef long long s64;
+typedef unsigned long long u64;
+
+enum v4l2_ctrl_type {
+  V4L2_CTRL_TYPE_INTEGER = 1,
+  V4L2_CTRL_TYPE_BOOLEAN = 2,
+  V4L2_CTRL_TYPE_MENU = 3,
+  V4L2_CTRL_TYPE_BUTTON = 4,
+  V4L2_CTRL_TYPE_INTEGER64 = 5,
+  V4L2_CTRL_TYPE_CTRL_CLASS = 6,
+  V4L2_CTRL_TYPE_STRING = 7,
+  V4L2_CTRL_TYPE_BITMASK = 8,
+  V4L2_CTRL_TYPE_INTEGER_MENU = 9,
+
+  V4L2_CTRL_COMPOUND_TYPES = 0x0100,
+  V4L2_CTRL_TYPE_U8 = 0x0100,
+  V4L2_CTRL_TYPE_U16 = 0x0101,
+  V4L2_CTRL_TYPE_U32 = 0x0102,
+  V4L2_CTRL_TYPE_AREA = 0x0106,
+};
+
+const char *v4l2_ctrl_get_name(u32 id) {
+  switch (id) {
+  case (0x0098 | 1):
+return "User Controls";
+  case ((0x0098 | 0x900) + 0):
+return "Brightness";
+  case ((0x0098 | 0x900) + 1):
+return "Contrast";
+  case ((0x0098 | 0x900) + 2):
+return "Saturation";
+  case ((0x0098 | 

[committed] analyzer: add regression test for leak false positive

2020-09-14 Thread David Malcolm via Gcc-patches
Downstream bug report:
  https://bugzilla.redhat.com/show_bug.cgi?id=1878600
describes a false positive from -Wanalyzer-file-leak seen with
gcc 10.2, but which has been fixed in gcc 11.

This patch adds the reproducer as a regression test.

Successfully tested on x86_64-pc-linux-gnu.
Pushed to master as 00a65689d995d8bdf306d0850c852ff0fd25.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/rhbz1878600.c: New test.
---
 gcc/testsuite/gcc.dg/analyzer/rhbz1878600.c | 34 +
 1 file changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/rhbz1878600.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/rhbz1878600.c 
b/gcc/testsuite/gcc.dg/analyzer/rhbz1878600.c
new file mode 100644
index 000..9f6ccb6dc27
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/rhbz1878600.c
@@ -0,0 +1,34 @@
+#include 
+
+#define INI_MAX_LINE 200
+
+typedef char* (*ini_reader)(char* str, int num, void* stream);
+
+int ini_parse(const char* filename);
+
+static int ini_parse_stream(ini_reader reader, void* stream)
+{
+char line[INI_MAX_LINE];
+int max_line = INI_MAX_LINE;
+while (reader(line, max_line, stream) != NULL)
+   ;
+return 0;
+}
+
+static int ini_parse_file(FILE* file)
+{
+return ini_parse_stream((ini_reader)fgets, file);
+}
+
+int ini_parse(const char* filename)
+{
+FILE* file;
+int error;
+
+file = fopen(filename, "r");
+if (!file)
+return -1;
+error = ini_parse_file(file);
+fclose(file);
+return error;
+}
-- 
2.26.2



[committed] analyzer: fix ICE on setjmp with non-pointer-type [PR97029]

2020-09-14 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3188-g35e3f0829d8e9cdc7ea19917c9f3a7add3f14847.

gcc/analyzer/ChangeLog:
PR analyzer/97029
* analyzer.cc (is_setjmp_call_p): Require the initial arg to be a
pointer.
* region-model.cc (region_model::deref_rvalue): Assert that the
svalue is of pointer type.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/pr97029.c: New test.
---
 gcc/analyzer/analyzer.cc| 4 +++-
 gcc/analyzer/region-model.cc| 2 ++
 gcc/testsuite/gcc.dg/analyzer/pr97029.c | 7 +++
 3 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr97029.c

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 814f6248992..82d487858dc 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -204,7 +204,9 @@ is_setjmp_call_p (const gcall *call)
 {
   if (is_special_named_call_p (call, "setjmp", 1)
   || is_special_named_call_p (call, "sigsetjmp", 2))
-return true;
+/* region_model::on_setjmp requires a pointer.  */
+if (POINTER_TYPE_P (TREE_TYPE (gimple_call_arg (call, 0
+  return true;
 
   return false;
 }
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 75f4eae3083..d53272e4332 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1446,6 +1446,7 @@ region_model::region_exists_p (const region *reg) const
 
 /* Get a region for referencing PTR_SVAL, creating a region if need be, and
potentially generating warnings via CTXT.
+   PTR_SVAL must be of pointer type.
PTR_TREE if non-NULL can be used when emitting diagnostics.  */
 
 const region *
@@ -1453,6 +1454,7 @@ region_model::deref_rvalue (const svalue *ptr_sval, tree 
ptr_tree,
region_model_context *ctxt)
 {
   gcc_assert (ptr_sval);
+  gcc_assert (POINTER_TYPE_P (ptr_sval->get_type ()));
 
   /* If we're dereferencing PTR_SVAL, assume that it is non-NULL; add this
  as a constraint.  This suppresses false positives from
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr97029.c 
b/gcc/testsuite/gcc.dg/analyzer/pr97029.c
new file mode 100644
index 000..ff83ad4d56e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr97029.c
@@ -0,0 +1,7 @@
+struct vj {};
+
+void
+setjmp (struct vj pl)
+{
+  setjmp (pl);
+}
-- 
2.26.2



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Richard Sandiford
Qing Zhao  writes:
>> Like I mentioned earlier though, passes that run after
>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>> weren't previously used.  For example, on x86_64, the function might
>> not use %r8 when the prologue, epilogue and returns are generated,
>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>> the “used” version of the new command-line option is supposed to clear
>> %r8 in these circumstances, but it wouldn't do so if the data was
>> collected at the point that the return is generated.
>
> Thanks for the information.
>
>> 
>> That's why I think it's more robust to do this later (at the beginning
>> of pass_late_compilation) and insert the zeroing before returns that
>> already exist.
>
> Yes, looks like it’s not correct to insert the zeroing at the time when 
> prologue, epilogue and return are generated.
> As I also checked, “return” might be also generated as late as pass 
> “pass_delay_slots”,  So, shall we move the
> New pass as late as possible?

If we insert the zeroing before pass_delay_slots and describe the
result correctly, pass_delay_slots should do the right thing.

Describing the result correctly includes ensuring that the cleared
registers are treated as live on exit from the function, so that the
zeroing doesn't get deleted again, or skipped by pass_delay_slots.

> Can I put it immediately before “pass_final”? What’s the latest place
> I can put it?

Like you say here…

 So IMO the pass should just search for all the
 returns in a function and insert the zeroing instructions before
 each one.
>>> 
>>> I was considering this approach too for some time, however, there is one 
>>> major issue with this as 
>>> Segher mentioned, The middle end does not know some details on the 
>>> registers, lacking such 
>>> detailed information might result incorrect code generation at middle end.
>>> 
>>> For example, on x86_64 target, when “return” with pop, the scratch register 
>>> “ECX” will be 
>>> used for returning, then it’s incorrect to zero “ecx” before generating the 
>>> return. Since middle end
>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
>>> incorrect code might be 
>>> generated. 
>>> 
>>> Segher also mentioned that on Power, there are some scratch registers also 
>>> are used for 
>>> Other purpose, clearing them before return is not correct. 
>> 
>> But the dataflow information has to be correct between
>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>> any pass in that region could clobber the registers in the same way.
>
> You mean, the data flow information will be not correct after pass_free_cfg? 
>  “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
> “return” generated in “pass_delay_slots”, 
> If we want to generate zeroing for the new “return” which was generated in 
> “pass_delay_slots”, can we correctly to do so?

…the zeroing has to be done before pass_free_cfg, because the information
isn't reliable after that point.  I think it would make sense to do it
before pass_compute_alignments, because inserting the zeros will affect
alignment.

>> To get the registers that are live before the return, you can start with
>> the registers that are live out from the block that contains the return,
>> then “simulate” the return instruction backwards to get the set of
>> registers that are live before the return instruction
>> (see df_simulate_one_insn_backwards).
>
> Okay. 
> Currently, I am using the following to check whether a reg is live out the 
> block that contains the return:
>
> /* Check whether the hard register REGNO is live at the exit block
>  * of the current routine.  */
> static bool
> is_live_reg_at_exit (unsigned int regno)
> {
>   edge e;
>   edge_iterator ei;
>
>   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> {
>   bitmap live_out = df_get_live_out (e->src);
>   if (REGNO_REG_SET_P (live_out, regno))
> return true;
> }
>
>   return false;
> }
>
> Is this correct?

df_get_live_out is the right way to get the set of live registers
on exit from a block.  But if we search for return instructions
and find a return instruction R, we should do:

  basic_block bb = BLOCK_FOR_INSN (R);
  auto_bitmap live_regs;
  bitmap_copy (regs, df_get_live_out (bb));
  df_simulate_one_insn_backwards (bb, R, live_regs);

and then use LIVE_REGS as the set of registers that are live before R,
and so can't be clobbered.

For extra safety, you could/should also check targetm.hard_regno_scratch_ok
to see whether there's a target-specific reason why a register can't
be clobbered.

>> In the x86_64 case you mention, the pattern is:
>> 
>> (define_insn "*simple_return_indirect_internal"
>>  [(simple_return)
>>   (use (match_operand:W 0 "register_operand" "r"))]
>>  "reload_completed"
>>  …)
>> 
>> This (use …) tells the df machinery that the instruction needs
>> operand 0 (= ecx).  The process above would therefore 

Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-14 Thread H.J. Lu via Gcc-patches
On Mon, Sep 14, 2020 at 8:12 AM H.J. Lu  wrote:
>
> On Mon, Sep 14, 2020 at 7:35 AM Jakub Jelinek  wrote:
> >
> > On Sat, Sep 12, 2020 at 10:11:36AM -0700, H.J. Lu via Gcc-patches wrote:
> > > Clobbering the stack pointer in asm statment has been deprecated.  Adding
> > > the stack pointer register to the clobber list has traditionally had some
> > > undocumented and somewhat obscure side-effects, including ICE.  Issue
> > > a warning and ignore the clobbered stack pointer in asm statment.
> > >
> > > gcc/
> > >
> > >   PR target/97032
> > >   * cfgexpand.c (asm_clobber_reg_kind): New enum.
> > >   (asm_clobber_reg_is_valid): Renamed to ...
> > >   (get_asm_clobber_reg_kind): This.  Ignore the stack pointer.
> > >   (expand_asm_stmt): Replace asm_clobber_reg_is_valid with
> > >   get_asm_clobber_reg_kind.  Skip ignored clobbered register.
> >
> > AFAIK in the PR52813 discussions it was mentioned that the sp clobbers
> > had some user visible effects like forcing no red-zone, or (at least for
> > some GCC versions) forcing frame pointer in the function containing the asm.
> > Such clobbers are deprecated since GCC 9, so I think e.g. this patch isn't
> > really backportable to 9 branch.  Are we ready to stop that behavior?
> > And if yes, shouldn't we instead just error on this because a warning
> > clearly doesn't help, as users ignore warnings (at least in firefox this
> > isn't fixed yet)?
> > Is there some other way to fix the ICE (mark functions containing that and
> > perhaps penalize the code, in this case e.g. by perhaps forcing unnecessary
> > realignment, but not ICE)?
>
> For GCC 8/9, we can inform the backend that SP is clobbered by asm statement
> and each backend can mitigate its impact.
>

Something like this for GCC 8 and 9.

-- 
H.J.
From 72c62a2576808d715c1232cc6816483229d542e9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 14 Sep 2020 08:52:27 -0700
Subject: [PATCH] [GCC 9] rtl_data: Add sp_is_clobbered_by_asm

Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.

gcc/

	PR target/97032
	* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
	to true if the stack pointer is clobbered by asm statement.
	* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
	* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
	if the stack pointer is clobbered by asm statement.

gcc/testsuite/

	PR target/97032
	* gcc.target/i386/pr97032.c: New test.
---
 gcc/cfgexpand.c | 14 +-
 gcc/config/i386/i386.c  |  6 --
 gcc/emit-rtl.h  |  3 +++
 gcc/testsuite/gcc.target/i386/pr97032.c | 23 +++
 4 files changed, 39 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97032.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e252975f546..1b0591193c3 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2879,11 +2879,15 @@ asm_clobber_reg_is_valid (int regno, int nregs, const char *regname)
  as it was before, so no asm can validly clobber the stack pointer in
  the usual sense.  Adding the stack pointer to the clobber list has
  traditionally had some undocumented and somewhat obscure side-effects.  */
-  if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM)
-  && warning (OPT_Wdeprecated, "listing the stack pointer register"
-		  " %qs in a clobber list is deprecated", regname))
-inform (input_location, "the value of the stack pointer after an %"
-	" statement must be the same as it was before the statement");
+  if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
+{
+  crtl->sp_is_clobbered_by_asm = true;
+  if (warning (OPT_Wdeprecated, "listing the stack pointer register"
+		   " %qs in a clobber list is deprecated", regname))
+	inform (input_location, "the value of the stack pointer after"
+		" an % statement must be the same as it was before"
+		" the statement");
+}
 
   return is_valid;
 }
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ae5046ee6a0..6a0b356803e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12283,10 +12283,12 @@ ix86_update_stack_boundary (void)
 static rtx
 ix86_get_drap_rtx (void)
 {
-  /* We must use DRAP if there are outgoing arguments on stack and
+  /* We must use DRAP if there are outgoing arguments on stack or
+ the stack pointer register is clobbered by asm statment and
  ACCUMULATE_OUTGOING_ARGS is false.  */
   if (ix86_force_drap
-  || (cfun->machine->outgoing_args_on_stack
+  || ((cfun->machine->outgoing_args_on_stack
+	   || crtl->sp_is_clobbered_by_asm)
 	  && !ACCUMULATE_OUTGOING_ARGS))
 crtl->need_drap = true;
 
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 7b1cecd3c44..b8a07d21d37 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -266,6 +266,9 @@ struct GTY(()) rtl_data {
  pass_stack_ptr_mod has 

Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-14 Thread Segher Boessenkool
On Mon, Sep 14, 2020 at 09:46:11AM +0200, Richard Biener wrote:
> On Fri, Sep 11, 2020 at 4:18 PM Segher Boessenkool
>  wrote:
> > Until 2014 (and documented just days ago ;-) ) all bits of a partial
> > integer mode were considered unknown.
> 
> All bits or all bits outside of its precision?  I hope the latter ;)

All bits.  Many things in GCC still follow that older definition, btw.

> > I have looked at a lot of it in
> > our code the past weeks, and we still treat it like that in most places.

Oh I said that already, heh.

> > We now see bootstrap problems if we use POImode in some contexts (that's
> > this PR96791).  POImode can only live in pairs of VSX registers; taking
> > a subreg of POImode that would not be valid on one VSX register is not
> > okay.
> 
> I guess the same applies to i?86 DImode living in two gpr regs.  Or any
> multi-reg pseudo.  It certainly shouldn't be dependent on whether we're
> dealing with a partial integer mode or not.

If some mode can be in GPRs, then taking subregs of it works fine.

> > Maybe we are missing some hooks or macros?
> 
> So this problem must be "solved" in some way already.  How do we asses
> subreg validity?  Through recog in the end?

No, we ICE.  See the PR?  (PR96791).


Segher


Re: PSA: Default C++ dialect is now C++17

2020-09-14 Thread Jason Merrill via Gcc-patches
On Mon, Jun 29, 2020 at 1:25 PM Martin Liška  wrote:
>
> On 6/29/20 4:57 PM, Marek Polacek wrote:
> > On Mon, Jun 29, 2020 at 09:51:57AM +0200, Martin Liška wrote:
> >> On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:
> >>> As discussed last month:
> >>> 
> >>> it's time to change the C++ default to gnu++17.  I've committed the patch 
> >>> after
> >>> testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
> >>> yourselves!
> >>>
> >>> Marek
> >>>
> >>
> >> Just a small note that 510.parest_r SPEC 2017 benchmark can't be built now
> >> with default changed to -std=c++17. The spec config needs to be adjusted.
> >
> > Interesting, do you know why?  Does it use the register keyword?
>
> Apparently it needs -fno-new-ttp-matching for successful compilation.
> There's a reduced test-case I made:
>
> cat fe.ii
> template  class FiniteElement;
> template  class DoFHandler;
> class FETools {
>template 
>void back_interpolate(const DoFHandler &, const InVector &,
>  const FiniteElement &, OutVector &);
>template  class DH, class InVector, class 
> OutVector,
>  int spacedim>
>void back_interpolate(const DH &, InVector,
>  const FiniteElement &, OutVector);
> };
> template  class DoFHandler;
> template  class FiniteElement;
> template 
> void FETools::back_interpolate(const DoFHandler &,
> const InVector &,
> const FiniteElement &,
> OutVector &) {}
> template void FETools::back_interpolate(const DoFHandler<3> &, const float &,
>  const FiniteElement<3> &, float &);

Hmm, looks like I never sent this.

Further reduced:

template  class A;
template  void fn(A &) {}
template  class TT>  void fn(TT &);
template void fn(A<3> &);

This breaks due to the C++17 changes to template template parameters
causing A to now be considered a valid argument for TT; with that
change both function templates are valid candidates, and neither is
more specialized than the other, so it's ambiguous.

There are still some open core issues around these changes.


Jason



Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-14 Thread H.J. Lu via Gcc-patches
On Mon, Sep 14, 2020 at 7:35 AM Jakub Jelinek  wrote:
>
> On Sat, Sep 12, 2020 at 10:11:36AM -0700, H.J. Lu via Gcc-patches wrote:
> > Clobbering the stack pointer in asm statment has been deprecated.  Adding
> > the stack pointer register to the clobber list has traditionally had some
> > undocumented and somewhat obscure side-effects, including ICE.  Issue
> > a warning and ignore the clobbered stack pointer in asm statment.
> >
> > gcc/
> >
> >   PR target/97032
> >   * cfgexpand.c (asm_clobber_reg_kind): New enum.
> >   (asm_clobber_reg_is_valid): Renamed to ...
> >   (get_asm_clobber_reg_kind): This.  Ignore the stack pointer.
> >   (expand_asm_stmt): Replace asm_clobber_reg_is_valid with
> >   get_asm_clobber_reg_kind.  Skip ignored clobbered register.
>
> AFAIK in the PR52813 discussions it was mentioned that the sp clobbers
> had some user visible effects like forcing no red-zone, or (at least for
> some GCC versions) forcing frame pointer in the function containing the asm.
> Such clobbers are deprecated since GCC 9, so I think e.g. this patch isn't
> really backportable to 9 branch.  Are we ready to stop that behavior?
> And if yes, shouldn't we instead just error on this because a warning
> clearly doesn't help, as users ignore warnings (at least in firefox this
> isn't fixed yet)?
> Is there some other way to fix the ICE (mark functions containing that and
> perhaps penalize the code, in this case e.g. by perhaps forcing unnecessary
> realignment, but not ICE)?

For GCC 8/9, we can inform the backend that SP is clobbered by asm statement
and each backend can mitigate its impact.

-- 
H.J.


Re: [PATCH] AArch64: Enable fast shifts on Neoverse N1

2020-09-14 Thread Richard Earnshaw (lists)
On 14/09/2020 13:01, Wilco Dijkstra wrote:
> Enable the fast shift feature in Neoverse N1 tuning - this means additions 
> with
> a shift left by 1-4 are as fast as addition. This improves multiply by 
> constant
> expansions, eg. x * 25 is now emitted using shifts rather than a multiply:
> 
> add w0, w0, w0, lsl 2
> add w0, w0, w0, lsl 2
> 
> Bootstrap OK, regress pass, OK for commit?
> 
> ChangeLog:
> 2020-09-11  Wilco Dijkstra  
> 
> * config/aarch64/aarch64.c (neoversen1_tunings):
> Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 803562df25751f2eb6dbe18b67cae46ea7c478dd..cbbdefa436bf11e9631c90631fb621e90e60754a
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1332,7 +1332,7 @@ static const struct tune_params neoversen1_tunings =
>2, /* min_div_recip_mul_df.  */
>0, /* max_case_values.  */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
>_prefetch_tune
>  };
>  
> 

OK.

R.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-14 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Sep 11, 2020, at 5:56 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
> On Sep 11, 2020, at 12:32 PM, Richard Sandiford 
>  >>  If we go for (2), then I think it would 
> be better to do
> it at the start of pass_late_compilation instead.  (Some targets wouldn't
> cope with doing it later.)  The reason for doing it so late is that the
> set of used “volatile”/caller-saved registers is not fixed at prologue
> and epilogue generation: later optimisation passes can introduce uses
> of volatile registers that weren't used previously.  (Sorry if this
> has already been suggested.)
 
 Yes, I agree.
 
 I thought that it might be better to move this task at the very late of 
 the RTL stage, i.e, before “final” phase. 
 
 Another solution is (discussed with Hongjiu):
 
 1. Define a new target hook:
 
 targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET 
 need_zeroed_hardregs, bool gpr_only)
 
 2. Add the following routine in middle end:
 
 rtx_insn *
 generate_return_rtx (bool simple_return_p)
 {
 if (targetm.return_with_zeroing)
   {
 Compute the hardregs set for clearing into “need_zeroed_hardregs”;
return targetm.return_with_zeroing (simple_return_p, 
 need_zeroed_hardregs, gpr_only);
  }
 else
   {
if (simple_return_p)
  return targetm.gen_simple_return ( );
   else
  return targetm.gen_return ();
 }
 }
 
 Then replace all call to “targetm.gen_simple_return” and 
 “targetm.gen_return” to “generate_return_rtx()”.
 
 3. In the target, 
 Implement “return_with_zeroing”.
 
 
 Let me know your comments on this.
>>> 
>>> I think having a separate pass is better.  We don't normally know
>>> at the point of generating the return which registers will need
>>> to be cleared.  
>> 
>> At the point of generating the return, we can compute the 
>> “need_zeroed_hardregs” HARD_REG_SET 
>> by using data flow information, the function abi of the routine, and also 
>> the user option and source code 
>> attribute information together. These information should be available at 
>> each point when generating the return.
> 
> Like I mentioned earlier though, passes that run after
> pass_thread_prologue_and_epilogue can use call-clobbered registers that
> weren't previously used.  For example, on x86_64, the function might
> not use %r8 when the prologue, epilogue and returns are generated,
> but pass_regrename might later introduce a new use of %r8.  AIUI,
> the “used” version of the new command-line option is supposed to clear
> %r8 in these circumstances, but it wouldn't do so if the data was
> collected at the point that the return is generated.

Thanks for the information.

> 
> That's why I think it's more robust to do this later (at the beginning
> of pass_late_compilation) and insert the zeroing before returns that
> already exist.

Yes, looks like it’s not correct to insert the zeroing at the time when 
prologue, epilogue and return are generated.
As I also checked, “return” might be also generated as late as pass 
“pass_delay_slots”,  So, shall we move the
New pass as late as possible?

Can I put it immediately before “pass_final”? What’s the latest place I can put 
it?


> 
>>> So IMO the pass should just search for all the
>>> returns in a function and insert the zeroing instructions before
>>> each one.
>> 
>> I was considering this approach too for some time, however, there is one 
>> major issue with this as 
>> Segher mentioned, The middle end does not know some details on the 
>> registers, lacking such 
>> detailed information might result incorrect code generation at middle end.
>> 
>> For example, on x86_64 target, when “return” with pop, the scratch register 
>> “ECX” will be 
>> used for returning, then it’s incorrect to zero “ecx” before generating the 
>> return. Since middle end
>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
>> incorrect code might be 
>> generated. 
>> 
>> Segher also mentioned that on Power, there are some scratch registers also 
>> are used for 
>> Other purpose, clearing them before return is not correct. 
> 
> But the dataflow information has to be correct between
> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
> any pass in that region could clobber the registers in the same way.

You mean, the data flow information will be not correct after pass_free_cfg? 
 “pass_delay_slots” is after “pass_free_cfg”,  and there might be new “return” 
generated in “pass_delay_slots”, 
If we want to generate zeroing for the new “return” which was generated in 
“pass_delay_slots”, can we correctly to do so?

> 
> To get the registers that are live before the return, you can start with

[PATCH] libstdc++: Pretty printers for std::_Bit_reference, std::_Bit_iterator and std::_Bit_const_iterator

2020-09-14 Thread Michael Weghorn via Gcc-patches
Hi,

the attached patch implements pretty printers relevant for iteration
over std::vector, thus handling the TODO
added in commit 36d0dada6773d7fd7c5ace64c90e723930a3b81e
("Have std::vector printer's iterator return bool for vector",
2019-06-19):

TODO add printer for vector's _Bit_iterator and
_Bit_const_iterator

Tested on x86_64-pc-linux-gnu (Debian testing).

I haven't filed any copyright assignment for GCC yet, but I'm happy to
do so when pointed to the right place.

Best regards,
Michael
>From a279887aa80971acc2e92157550138eff6c2a4c8 Mon Sep 17 00:00:00 2001
From: Michael Weghorn 
Date: Mon, 14 Sep 2020 15:20:36 +0200
Subject: [PATCH] libstdc++: Pretty printers for std::_Bit_reference,
 std::_Bit_iterator

... and 'std::_Bit_const_iterator'.

'std::_Bit_iterator' and 'std::_Bit_const_iterator' are the iterators
used by 'std::vector'.
'std::_Bit_reference' is e.g. used in range-based for loops over
'std::vector'  like

std::vector vb {true, false, false};
for (auto b : vb) {
// b is of type std::_Bit_reference here
// ...
}

Like iterators of vectors for other types, the actual value is printed.

Add corresponding tests to the testsuite.

libstdc++-v3/ChangeLog:

	* python/libstdcxx/v6/printers.py (StdBitIteratorPrinter,
	StdBitReferencePrinter): Add pretty-printers for
std::_Bit_reference, std::_Bit_iterator and std::_Bit_const_iterator.
* testsuite/libstdc++-prettyprinters/simple.cc: Test
std::_Bit_reference, std::_Bit_iterator
* testsuite/libstdc++-prettyprinters/simple11.cc: Test
std::_Bit_reference, std::_Bit_iterator and std::_Bit_const_iterator
---
 libstdc++-v3/python/libstdcxx/v6/printers.py  | 28 -
 .../libstdc++-prettyprinters/simple.cc| 28 +
 .../libstdc++-prettyprinters/simple11.cc  | 31 +++
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index c0f061f79c1..478e44eefdf 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -479,7 +479,27 @@ class StdVectorIteratorPrinter:
 return 'non-dereferenceable iterator for std::vector'
 return str(self.val['_M_current'].dereference())
 
-# TODO add printer for vector's _Bit_iterator and _Bit_const_iterator
+class StdBitIteratorPrinter:
+"Print std::vector's _Bit_iterator and _Bit_const_iterator"
+
+def __init__(self, typename, val):
+self.val = val
+
+def to_string(self):
+if not self.val['_M_p']:
+return 'non-dereferenceable iterator for std::vector'
+return bool(self.val['_M_p'].dereference() & (1 << self.val['_M_offset']))
+
+class StdBitReferencePrinter:
+"Print std::_Bit_reference"
+
+def __init__(self, typename, val):
+self.val = val
+
+def to_string(self):
+if not self.val['_M_p']:
+return 'invalid std::_Bit_reference'
+return bool(self.val['_M_p'].dereference() & (self.val['_M_mask']))
 
 class StdTuplePrinter:
 "Print a std::tuple"
@@ -1965,6 +1985,12 @@ def build_libstdcxx_dictionary ():
 StdDequeIteratorPrinter)
 libstdcxx_printer.add_version('__gnu_cxx::', '__normal_iterator',
   StdVectorIteratorPrinter)
+libstdcxx_printer.add_version('std::', '_Bit_iterator',
+  StdBitIteratorPrinter)
+libstdcxx_printer.add_version('std::', '_Bit_const_iterator',
+  StdBitIteratorPrinter)
+libstdcxx_printer.add_version('std::', '_Bit_reference',
+  StdBitReferencePrinter)
 libstdcxx_printer.add_version('__gnu_cxx::', '_Slist_iterator',
   StdSlistIteratorPrinter)
 libstdcxx_printer.add_container('std::', '_Fwd_list_iterator',
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index 4b44be594f5..4cfbb7857d1 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -127,6 +127,34 @@ main()
   vb.erase(vb.begin());
 // { dg-final { regexp-test vb {std::(__debug::)?vector of length 5, capacity 128 = \\{true, true, false, false, true\\}} } }
 
+  std::vector::iterator vbIt = vb.begin();
+// { dg-final { note-test vbIt {true} } }
+  std::vector::iterator vbIt2 = ++vbIt;
+// { dg-final { note-test vbIt2 {true} } }
+  std::vector::iterator vbIt3 = ++vbIt;
+// { dg-final { note-test vbIt3 {false} } }
+  std::vector::iterator vbIt4 = ++vbIt;
+// { dg-final { note-test vbIt4 {false} } }
+  std::vector::iterator vbIt5 = ++vbIt;
+// { dg-final { note-test vbIt5 {true} } }
+
+  std::vector::iterator vbIt0;
+// { dg-final { note-test 

Re: [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-14 Thread Jakub Jelinek via Gcc-patches
On Sat, Sep 12, 2020 at 10:11:36AM -0700, H.J. Lu via Gcc-patches wrote:
> Clobbering the stack pointer in asm statment has been deprecated.  Adding
> the stack pointer register to the clobber list has traditionally had some
> undocumented and somewhat obscure side-effects, including ICE.  Issue
> a warning and ignore the clobbered stack pointer in asm statment.
> 
> gcc/
> 
>   PR target/97032
>   * cfgexpand.c (asm_clobber_reg_kind): New enum.
>   (asm_clobber_reg_is_valid): Renamed to ...
>   (get_asm_clobber_reg_kind): This.  Ignore the stack pointer.
>   (expand_asm_stmt): Replace asm_clobber_reg_is_valid with
>   get_asm_clobber_reg_kind.  Skip ignored clobbered register.

AFAIK in the PR52813 discussions it was mentioned that the sp clobbers
had some user visible effects like forcing no red-zone, or (at least for
some GCC versions) forcing frame pointer in the function containing the asm.
Such clobbers are deprecated since GCC 9, so I think e.g. this patch isn't
really backportable to 9 branch.  Are we ready to stop that behavior?
And if yes, shouldn't we instead just error on this because a warning
clearly doesn't help, as users ignore warnings (at least in firefox this
isn't fixed yet)?
Is there some other way to fix the ICE (mark functions containing that and
perhaps penalize the code, in this case e.g. by perhaps forcing unnecessary
realignment, but not ICE)?

Jakub



Re: [PATCH] Fix overflow handling in std::align

2020-09-14 Thread Ville Voutilainen via Gcc-patches
On Mon, 14 Sep 2020 at 15:49, Glen Fernandes  wrote:
>
> On Mon, Sep 14, 2020 at 5:52 AM Ville Voutilainen wrote:
> > On Mon, 14 Sep 2020 at 12:51, Ville Voutilainen
> > wrote:
> > > On Mon, 14 Sep 2020 at 09:18, Glen Fernandes
> >  wrote:
> > > > Edit; Correct patch this time.
> > > >
> > > > Fix overflow handling in align
> > >
> > > Should the test verify that space is unmodified when nullptr is returned?
> >
> > ..and same for ptr.
>
> Sounds like a good idea. Updated patch attached.

Looks good to me.


Re: [PATCH 1/2] AArch64: Cleanup CPU option processing code

2020-09-14 Thread Richard Earnshaw (lists)
On 14/09/2020 15:19, Wilco Dijkstra wrote:
> The --with-cpu/--with-arch configure option processing not only checks valid 
> arguments
> but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.  This 
> isn't used
> however since a --with-cpu is translated into a -mcpu option which is 
> processed as if
> written on the command-line (so TARGET_CPU_DEFAULT is never accessed).
> 
> So remove all the complex processing and bitmask, and just validate the 
> option.
> Fix a bug that always reports valid architecture extensions as invalid.  As a 
> result
> the CPU processing in aarch64.c can be simplified.
> 
> Bootstrap OK, regress pass, OK for commit?

Doesn't this change the default behaviour if cc1 is run directly?  I'm
not saying this is the wrong thing to do (I think we rely on this in the
arm port), but I just want to understand by what you mean when you say
'never used'.

R.

> 
> ChangeLog:
> 2020-09-03  Wilco Dijkstra  
> 
> * config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
> processing.  Add support for architectural extensions.
> * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
> AARCH64_CPU_DEFAULT_FLAGS.
> * config/aarch64/aarch64.c (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
> (get_tune_cpu): Assert CPU is always valid.
> (get_arch): Assert architecture is always valid.
> (aarch64_override_options): Cleanup CPU selection code and simplify 
> logic.
> 
> ---
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 
> 88e428fd2ad0fb605a53f5dfa58d9f14603b4302..918320573ade712ddc252045e0b70fb8b65e0c66
>  100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4067,8 +4067,6 @@ case "${target}" in
> pattern=AARCH64_CORE
>   fi
>  
> - ext_mask=AARCH64_CPU_DEFAULT_FLAGS
> -
>   # Find the base CPU or ARCH id in aarch64-cores.def or
>   # aarch64-arches.def
>   if [ x"$base_val" = x ] \
> @@ -4076,23 +4074,6 @@ case "${target}" in
>   ${srcdir}/config/aarch64/$def \
>   > /dev/null; then
>  
> -   if [ $which = arch ]; then
> - base_id=`grep "^$pattern(\"$base_val\"," \
> -   ${srcdir}/config/aarch64/$def | \
> -   sed -e 's/^[^,]*,[]*//' | \
> -   sed -e 's/,.*$//'`
> - # Extract the architecture flags from 
> aarch64-arches.def
> - ext_mask=`grep "^$pattern(\"$base_val\"," \
> -${srcdir}/config/aarch64/$def | \
> -sed -e 's/)$//' | \
> -sed -e 's/^.*,//'`
> -   else
> - base_id=`grep "^$pattern(\"$base_val\"," \
> -   ${srcdir}/config/aarch64/$def | \
> -   sed -e 's/^[^,]*,[]*//' | \
> -   sed -e 's/,.*$//'`
> -   fi
> -
> # Use the pre-processor to strip flatten the options.
> # This makes the format less rigid than if we use
> # grep and sed directly here.
> @@ -4117,25 +4098,7 @@ case "${target}" in
>   grep "^\"$base_ext\""`
>  
>   if [ x"$base_ext" = x ] \
> - || [[ -n $opt_line ]]; then
> -
> -   # These regexp extract the elements based on
> -   # their group match index in the regexp.
> -   ext_canon=`echo -e "$opt_line" | \
> - sed -e "s/$sed_patt/\2/"`
> -   ext_on=`echo -e "$opt_line" | \
> - sed -e "s/$sed_patt/\3/"`
> -   ext_off=`echo -e "$opt_line" | \
> - sed -e "s/$sed_patt/\4/"`
> -
> -   if [ $ext = $base_ext ]; then
> - # Adding extension
> - ext_mask="("$ext_mask") | ("$ext_on" | 
> "$ext_canon")"
> -   else
> - # Removing extension
> - ext_mask="("$ext_mask") & ~("$ext_off" 
> | "$ext_canon")"
> -   fi
> -
> + || [ x"$opt_line" != x ]; then
> true
>   else
> echo "Unknown extension used in 
> --with-$which=$val" 1>&2
> @@ -4144,10 +4107,6 @@ case "${target}" in
>   ext_val=`echo 

[PATCH 2/2] AArch64: Add support for --with-tune

2020-09-14 Thread Wilco Dijkstra
Add support for --with-tune. Like --with-cpu and --with-arch, the argument is
validated and transformed into a -mtune option to be processed like any other
command-line option.  --with-tune has no effect if a -mcpu or -mtune option
is used. The validating code didn't allow --with-cpu=native, so explicitly
allow that.

Co-authored-by:  Delia Burduv  

Bootstrap OK, regress pass, OK to commit?

ChangeLog
2020-09-03  Wilco Dijkstra  

* config.gcc
(aarch64*-*-*): Add --with-tune. Support --with-cpu=native.
* config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Add --with-tune.

2020-09-03  Wilco Dijkstra  

* gcc/testsuite/lib/target-supports.exp:
(check_effective_target_tune_cortex_a76): New effective target test.
* gcc.target/aarch64/with-tune-config.c: New test.
* gcc.target/aarch64/with-tune-march.c: Likewise.
* gcc.target/aarch64/with-tune-mcpu.c: Likewise.
* gcc.target/aarch64/with-tune-mtune.c: Likewise.

---

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
918320573ade712ddc252045e0b70fb8b65e0c66..947cebb0960d403a9ce40b65bef948fbbe8916a9
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4052,8 +4052,8 @@ fi
 supported_defaults=
 case "${target}" in
aarch64*-*-*)
-   supported_defaults="abi cpu arch"
-   for which in cpu arch; do
+   supported_defaults="abi cpu arch tune"
+   for which in cpu arch tune; do
 
eval "val=\$with_$which"
base_val=`echo $val | sed -e 's/\+.*//'`
@@ -4074,6 +4074,12 @@ case "${target}" in
${srcdir}/config/aarch64/$def \
> /dev/null; then
 
+ # Disallow extensions in --with-tune=cortex-a53+crc.
+ if [ $which = tune ] && [ x"$ext_val" != x ]; then
+   echo "Architecture extensions not supported in 
--with-$which=$val" 1>&2
+   exit 1
+ fi
+
  # Use the pre-processor to strip flatten the options.
  # This makes the format less rigid than if we use
  # grep and sed directly here.
@@ -4109,8 +4115,13 @@ case "${target}" in
 
  true
else
- echo "Unknown $which used in --with-$which=$val" 1>&2
- exit 1
+ # Allow --with-$which=native.
+ if [ "$val" = native ]; then
+   true
+ else
+   echo "Unknown $which used in --with-$which=$val" 
1>&2
+   exit 1
+ fi
fi
done
;;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
30b3a28a6d2893cc29bec4aa5b7cfbe1fd51e0b7..09d1f7f8a84e726129f8ec7f59dacae7fe132eaa
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1200,12 +1200,14 @@ extern enum aarch64_code_model aarch64_cmodel;
 #define ENDIAN_LANE_N(NUNITS, N) \
   (BYTES_BIG_ENDIAN ? NUNITS - 1 - N : N)
 
-/* Support for a configure-time default CPU, etc.  We currently support
-   --with-arch and --with-cpu.  Both are ignored if either is specified
-   explicitly on the command line at run time.  */
+/* Support for configure-time --with-arch, --with-cpu and --with-tune.
+   --with-arch and --with-cpu are ignored if either -mcpu or -march is used.
+   --with-tune is ignored if either -mtune or -mcpu is used (but is not
+   affected by -march).  */
 #define OPTION_DEFAULT_SPECS   \
   {"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" }, \
-  {"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },
+  {"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },   \
+  {"tune", "%{!mcpu=*:%{!mtune=*:-mtune=%(VALUE)}}"},
 
 #define MCPU_TO_MARCH_SPEC \
" %{mcpu=*:-march=%:rewrite_mcpu(%{mcpu=*:%*})}"
diff --git a/gcc/testsuite/gcc.target/aarch64/with-tune-config.c 
b/gcc/testsuite/gcc.target/aarch64/with-tune-config.c
new file mode 100644
index 
..0940e9eea892770fada0b9bc0e05e22bebef1167
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/with-tune-config.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { tune_cortex_a76 } } } */
+/* { dg-additional-options " -dA " } */
+
+void foo ()
+{}
+
+/* { dg-final { scan-assembler "//.tune cortex-a76" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/with-tune-march.c 
b/gcc/testsuite/gcc.target/aarch64/with-tune-march.c
new file mode 100644
index 
..61039adea71878222dc27633d0818bac2daefeea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/with-tune-march.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target { tune_cortex_a76 } } } */
+/* { dg-additional-options 

[PATCH 1/2] AArch64: Cleanup CPU option processing code

2020-09-14 Thread Wilco Dijkstra
The --with-cpu/--with-arch configure option processing not only checks valid 
arguments
but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.  This isn't 
used
however since a --with-cpu is translated into a -mcpu option which is processed 
as if
written on the command-line (so TARGET_CPU_DEFAULT is never accessed).

So remove all the complex processing and bitmask, and just validate the option.
Fix a bug that always reports valid architecture extensions as invalid.  As a 
result
the CPU processing in aarch64.c can be simplified.

Bootstrap OK, regress pass, OK for commit?

ChangeLog:
2020-09-03  Wilco Dijkstra  

* config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
processing.  Add support for architectural extensions.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
AARCH64_CPU_DEFAULT_FLAGS.
* config/aarch64/aarch64.c (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
(get_tune_cpu): Assert CPU is always valid.
(get_arch): Assert architecture is always valid.
(aarch64_override_options): Cleanup CPU selection code and simplify 
logic.

---

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
88e428fd2ad0fb605a53f5dfa58d9f14603b4302..918320573ade712ddc252045e0b70fb8b65e0c66
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4067,8 +4067,6 @@ case "${target}" in
  pattern=AARCH64_CORE
fi
 
-   ext_mask=AARCH64_CPU_DEFAULT_FLAGS
-
# Find the base CPU or ARCH id in aarch64-cores.def or
# aarch64-arches.def
if [ x"$base_val" = x ] \
@@ -4076,23 +4074,6 @@ case "${target}" in
${srcdir}/config/aarch64/$def \
> /dev/null; then
 
- if [ $which = arch ]; then
-   base_id=`grep "^$pattern(\"$base_val\"," \
- ${srcdir}/config/aarch64/$def | \
- sed -e 's/^[^,]*,[]*//' | \
- sed -e 's/,.*$//'`
-   # Extract the architecture flags from 
aarch64-arches.def
-   ext_mask=`grep "^$pattern(\"$base_val\"," \
-  ${srcdir}/config/aarch64/$def | \
-  sed -e 's/)$//' | \
-  sed -e 's/^.*,//'`
- else
-   base_id=`grep "^$pattern(\"$base_val\"," \
- ${srcdir}/config/aarch64/$def | \
- sed -e 's/^[^,]*,[]*//' | \
- sed -e 's/,.*$//'`
- fi
-
  # Use the pre-processor to strip flatten the options.
  # This makes the format less rigid than if we use
  # grep and sed directly here.
@@ -4117,25 +4098,7 @@ case "${target}" in
grep "^\"$base_ext\""`
 
if [ x"$base_ext" = x ] \
-   || [[ -n $opt_line ]]; then
-
- # These regexp extract the elements based on
- # their group match index in the regexp.
- ext_canon=`echo -e "$opt_line" | \
-   sed -e "s/$sed_patt/\2/"`
- ext_on=`echo -e "$opt_line" | \
-   sed -e "s/$sed_patt/\3/"`
- ext_off=`echo -e "$opt_line" | \
-   sed -e "s/$sed_patt/\4/"`
-
- if [ $ext = $base_ext ]; then
-   # Adding extension
-   ext_mask="("$ext_mask") | ("$ext_on" | 
"$ext_canon")"
- else
-   # Removing extension
-   ext_mask="("$ext_mask") & ~("$ext_off" 
| "$ext_canon")"
- fi
-
+   || [ x"$opt_line" != x ]; then
  true
else
  echo "Unknown extension used in 
--with-$which=$val" 1>&2
@@ -4144,10 +4107,6 @@ case "${target}" in
ext_val=`echo $ext_val | sed -e 
's/[a-z0-9]\+//'`
  done
 
- ext_mask="(("$ext_mask") << 6)"
- if [ x"$base_id" != x ]; then
-   target_cpu_cname="TARGET_CPU_$base_id | 
$ext_mask"
- fi
  true
else
  

Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 1:31 PM Eric Botcazou  wrote:
>
> > In fact I can probably reuse cfun->tail_call_marked for this purpose.
>
> Like so.

Looks reasonable to me.

Richard.
>
> * cgraphunit.c (cgraph_node::expand_thunk): Make sure to set
> cfun->tail_call_marked when forcing a tail call.
> * function.c (assign_parm_setup_reg): Always use a register to
> retrieve a parameter passed by reference if cfun->tail_call_marked.
>
> --
> Eric Botcazou


Re: Ping: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 94234)

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 5:17 AM Feng Xue OS via Gcc-patches
 wrote:
>
> Thanks,

@@ -3426,8 +3426,16 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
gimple, operand *result)
  /* Re-fold the toplevel result.  It's basically an embedded
 gimple_build w/o actually building the stmt.  */
  if (!is_predicate)
-   fprintf_indent (f, indent,
-   "res_op->resimplify (lseq, valueize);\n");
+   {
+ fprintf_indent (f, indent,
+ "res_op->resimplify (lseq, valueize);\n");
+ if (e->force_leaf)
+   {
+ fprintf_indent (f, indent,
+ "if (!maybe_push_res_to_seq (res_op, NULL))\n");
+ fprintf_indent (f, indent + 2, "return false;\n");

please use "goto %s;\n", fail_label)  here.  OK with that change.

I've tried again to think about sth prettier to cover these kind of
single-use checks but failed to come up with sth.

Thanks and sorry for the delay,
Richard.

> Feng
>
> 
> From: Feng Xue OS
> Sent: Thursday, September 3, 2020 2:06 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 1/2] Fold plusminus_mult expr with multi-use operands (PR 
> 94234)
>
> For pattern A * C +- B * C -> (A +- B) * C, simplification is disabled
> when A and B are not single-use. This patch is a minor enhancement
> on the pattern, which allows folding if final result is found to be a
> simple gimple value (constant/existing SSA).
>
> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>
> Feng
> ---
> 2020-09-03  Feng Xue  
>
> gcc/
> PR tree-optimization/94234
> * genmatch.c (dt_simplify::gen_1): Emit check on final simplification
> result when "!" is specified on toplevel output expr.
> * match.pd ((A * C) +- (B * C) -> (A +- B) * C): Allow folding for
> expr with multi-use operands if final result is a simple gimple value.
>
> gcc/testsuite/
> PR tree-optimization/94234
> * gcc.dg/pr94234-2.c: New test.
> ---


Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-14 Thread Tobias Burnus

Hello Jakub, hi Honza,

On 8/31/20 5:53 PM, Jakub Jelinek wrote:

On Mon, Aug 03, 2020 at 05:37:40PM +0200, Tobias Burnus wrote:

It turned out that the omp_discover_declare_target_tgt_fn_r
discovered all nodes – but as it tagged the C++ alias nodes
and not the streamed-out nodes, no device function was created

...

if (node != NULL)
 {
+  if (node->cpp_implicit_alias)
+{
+  node = node->get_alias_target ();
+  if (!omp_declare_target_fn_p (node->decl))
+((vec *) data)->safe_push (node->decl);
+  DECL_ATTRIBUTES (node->decl)
+= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (node->decl));
+}
   node->offloadable = 1;

I don't see what is special on cpp_implicit_alias here, compared to any
other aliases.
So, I wonder if the code shouldn't do:
...


Granted that cpp_implicit_alias is not special; however,
for the alias attribute, the name is different and, thus,
the decl is in a different sym node.

Updated version attached. Does it seem to make sense?

Tobias



   tree decl = *tp;
   symtab_node *node = symtab_node::get (decl);
   if (node != NULL)
  {
symtab_node *anode = node->ultimate_alias_target ();
if (anode && anode != node)
  {
decl = anode->decl;
gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
if (omp_declare_target_fn_p (*tp)
|| lookup_attribute ("omp declare target host",
 DECL_ATTRIBUTES (decl)))
  return NULL_TREE;
node = anode;
  }
  }
   tree id = get_identifier ("omp declare target");
   if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
 ((vec *) data)->safe_push (decl);
   DECL_ATTRIBUTES (decl) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES 
(decl));
   if (node != NULL)
 {
   node->offloadable = 1;
   if (ENABLE_OFFLOADING)
 g->have_offload = true;
 }

Otherwise, if we have say:
void foo () { }
void bar () __attribute__((alias ("foo")));
void baz () __attribute__((alias ("bar")));
int
main ()
{
   #pragma omp target
   baz ();
}
we won't mark foo as being declare target.
Though, maybe that is not enough and we need to mark all the aliases from
node to the ultimate alias that way (so perhaps instead iterate one
get_alias_target (if node->alias) by one and mark all the decls the way the
code marks right now (i.e. pushes those non-DECL_EXTERNAL with
DECL_SAVED_TREE for further processing and add attributes to all of them?

  Jakub

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

gcc/ChangeLog:

	PR middle-end/96390
	* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle
	alias nodes.

libgomp/ChangeLog:

	PR middle-end/96390
	* testsuite/libgomp.c++/pr96390.C: New test.
	* testsuite/libgomp.c-c++-common/pr96390.c: New test.

 gcc/omp-offload.c| 27 +
 libgomp/testsuite/libgomp.c++/pr96390.C  | 49 
 libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 17 
 3 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 32c2485abd4..1a3e7a384a8 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -196,21 +196,34 @@ omp_declare_target_var_p (tree decl)
 static tree
 omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data)
 {
-  if (TREE_CODE (*tp) == FUNCTION_DECL
-  && !omp_declare_target_fn_p (*tp)
-  && !lookup_attribute ("omp declare target host", DECL_ATTRIBUTES (*tp)))
+  if (TREE_CODE (*tp) == FUNCTION_DECL)
 {
-  tree id = get_identifier ("omp declare target");
-  if (!DECL_EXTERNAL (*tp) && DECL_SAVED_TREE (*tp))
-	((vec *) data)->safe_push (*tp);
-  DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp));
+  tree decl = *tp;
   symtab_node *node = symtab_node::get (*tp);
   if (node != NULL)
 	{
+	  while (node->alias_target)
+	node = symtab_node::get (node->alias_target);
+	  node = node->ultimate_alias_target ();
+	  decl = node->decl;
+	  if (omp_declare_target_fn_p (decl)
+	  || lookup_attribute ("omp declare target host",
+   DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
 	  node->offloadable = 1;
 	  if (ENABLE_OFFLOADING)
 	g->have_offload = true;
 	}
+  else if (omp_declare_target_fn_p (decl)
+	   || lookup_attribute ("omp declare target host",
+DECL_ATTRIBUTES (decl)))
+	return NULL_TREE;
+
+  tree id = get_identifier ("omp declare target");
+  if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
+	((vec *) data)->safe_push (decl);
+  DECL_ATTRIBUTES (decl) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES 

Re: [PATCH] Fix overflow handling in std::align

2020-09-14 Thread Glen Fernandes via Gcc-patches
On Mon, Sep 14, 2020 at 5:52 AM Ville Voutilainen wrote:
> On Mon, 14 Sep 2020 at 12:51, Ville Voutilainen
> wrote:
> > On Mon, 14 Sep 2020 at 09:18, Glen Fernandes
>  wrote:
> > > Edit; Correct patch this time.
> > >
> > > Fix overflow handling in align
> >
> > Should the test verify that space is unmodified when nullptr is returned?
>
> ..and same for ptr.

Sounds like a good idea. Updated patch attached.

Glen
commit 5ebb97628f888bbc8e6617f2a7eea83aa40c1f37
Author: Glen Joseph Fernandes 
Date:   Mon Sep 14 01:21:27 2020 -0400

Fix overflow handling in align

2020-09-12  Glen Joseph Fernandes  

* include/bits/align.h (align): Fix overflow handling.
* testsuite/20_util/align/3.cc: New tests.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 0878f31562e..e25770ce5ca 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2020-09-12  Glen Joseph Fernandes  
+
+* include/bits/align.h (align): Fix overflow handling.
+* testsuite/20_util/align/3.cc: New tests.
+
 2020-09-11  Thomas Rodgers  
 
* include/std/memory: Move #include  inside C++11
diff --git a/libstdc++-v3/include/bits/align.h 
b/libstdc++-v3/include/bits/align.h
index c3267f22934..b9b81fd785d 100644
--- a/libstdc++-v3/include/bits/align.h
+++ b/libstdc++-v3/include/bits/align.h
@@ -60,6 +60,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline void*
 align(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept
 {
+  if (__space < __size)
+return nullptr;
 #ifdef _GLIBCXX_USE_C99_STDINT_TR1
   const auto __intptr = reinterpret_cast(__ptr);
 #else
@@ -70,7 +72,7 @@ align(size_t __align, size_t __size, void*& __ptr, size_t& 
__space) noexcept
 #endif
   const auto __aligned = (__intptr - 1u + __align) & -__align;
   const auto __diff = __aligned - __intptr;
-  if ((__size + __diff) > __space)
+  if (!(__diff <= (__space - __size)))
 return nullptr;
   else
 {
diff --git a/libstdc++-v3/testsuite/20_util/align/3.cc 
b/libstdc++-v3/testsuite/20_util/align/3.cc
new file mode 100644
index 000..39bff3472ce
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/align/3.cc
@@ -0,0 +1,53 @@
+// { dg-do run { target c++11 } }
+
+// 2020-09-12 Glen Joseph Fernandes 
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the terms
+// of the GNU General Public License as published by the Free Software
+// Foundation; either version 3, or (at your option) any later
+// version.
+
+// This library is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// You should have received a copy of the GNU General Public License
+// along with this library; see the file COPYING3.  If not see
+// .
+
+// C++11 [ptr.align] (20.6.5): std::align
+
+#include 
+#include 
+
+void test01()
+{
+  void* p1 = reinterpret_cast(5);
+  void* p2 = p1;
+  std::size_t s1 = 3072;
+  std::size_t s2 = s1;
+  VERIFY(std::align(1024, static_cast(-1), p1, s1) == nullptr);
+  VERIFY(p1 == p2);
+  VERIFY(s1 == s2);
+}
+
+void test02()
+{
+  void* p1 = reinterpret_cast(1);
+  void* p2 = p1;
+  std::size_t s1 = -1;
+  std::size_t s2 = s1;
+  VERIFY(std::align(2, static_cast(-1), p1, s1) == nullptr);
+  VERIFY(p1 == p2);
+  VERIFY(s1 == s2);
+}
+
+int main()
+{
+  test01();
+  test02();
+}


Re: i386: Fix array index in expander

2020-09-14 Thread Nathan Sidwell

On 9/14/20 6:10 AM, Hongtao Liu wrote:

On Mon, Sep 14, 2020 at 3:51 PM Richard Biener via Gcc-patches
 wrote:


On Fri, Sep 11, 2020 at 11:19 PM Nathan Sidwell  wrote:


I noticed a compiler warning about out-of-bound access.  Fixed thusly.

  gcc/
  * config/i386/sse.md (mov): Fix operand indices.

pushed as obvious


looks like wrong-code as well so could you push to affected branches?



GCC11 and GCC10 are affected.


pushed to 10


--
Nathan Sidwell


[PATCH] AArch64: Enable fast shifts on Neoverse N1

2020-09-14 Thread Wilco Dijkstra
Enable the fast shift feature in Neoverse N1 tuning - this means additions with
a shift left by 1-4 are as fast as addition. This improves multiply by constant
expansions, eg. x * 25 is now emitted using shifts rather than a multiply:

add w0, w0, w0, lsl 2
add w0, w0, w0, lsl 2

Bootstrap OK, regress pass, OK for commit?

ChangeLog:
2020-09-11  Wilco Dijkstra  

* config/aarch64/aarch64.c (neoversen1_tunings):
Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
803562df25751f2eb6dbe18b67cae46ea7c478dd..cbbdefa436bf11e9631c90631fb621e90e60754a
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1332,7 +1332,7 @@ static const struct tune_params neoversen1_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
   _prefetch_tune
 };
 


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Sep 14, 2020 at 12:47 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener via Gcc-patches  writes:
>> > On gimple the above function (after fixing it) looks like
>> >
>> >   VIEW_CONVERT_EXPR(u)[_1] = i_4(D);
>> >
>> > and the IFN idea I had would - for non-global memory 'u' only - transform
>> > this to
>> >
>> >   vector_register_2 = u;
>> >   vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D));
>> >   u = vector_register_3;
>> >
>> > if vec_set can handle variable indexes.  This then becomes a
>> > vec_set on a register and if that was the only variable indexing of 'u'
>> > will also cause 'u' to be expanded as register rather than stack memory.
>> >
>> > Note we can't use the direct-optab method here since the vec_set optab
>> > modifies operand 0 which isn't possible in SSA form.
>>
>> Would it be worth changing the optab so that the input and output are
>> separate?  Having a single operand would be justified if the operation
>> was only supposed to touch the selected bytes, but most targets wouldn't
>> guarantee that for memory operands, even as things stand.
>
> I thought about this as well, just didn't want to bother Xiong Hu Luo with
> it for the experiments.
>
>> Or maybe the idea was to force the RA's hand by making the input and
>> output tied even before RA, with separate moves where necessary.
>> But I'm not sure why vec_set is so special that it requires this
>> treatment and other optabs don't.
>
> Certainly the define_expand do not have to be define_insn so the target
> can force the RAs hand just fine if it likes to.
>
> The more interesting question of course is how to query vec_set whether
> it accepts variable indices w/o building too much garbage RTL.

Probably easiest to do something like can_vcond_compare_p, i.e.:

  machine_mode mode = insn_data[icode].operand[N];
  rtx reg = alloca_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1);
  … insn_operand_matches (icode, N, reg) …

Thanks,
Richard


Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Eric Botcazou
> In fact I can probably reuse cfun->tail_call_marked for this purpose.

Like so.

* cgraphunit.c (cgraph_node::expand_thunk): Make sure to set
cfun->tail_call_marked when forcing a tail call.
* function.c (assign_parm_setup_reg): Always use a register to
retrieve a parameter passed by reference if cfun->tail_call_marked.

-- 
Eric Botcazoudiff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 26d3995a0c0..bedb6e2eea1 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2171,7 +2171,10 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
 		}
 	}
 	  else
-	gimple_call_set_tail (call, true);
+	{
+	  gimple_call_set_tail (call, true);
+	  cfun->tail_call_marked = true;
+	}
 
 	  /* Build return value.  */
 	  if (!DECL_BY_REFERENCE (resdecl))
@@ -2184,6 +2187,7 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
   else
 	{
 	  gimple_call_set_tail (call, true);
+	  cfun->tail_call_marked = true;
 	  remove_edge (single_succ_edge (bb));
 	}
 
diff --git a/gcc/function.c b/gcc/function.c
index 2c8fa217f1f..314fd01c195 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3322,13 +3322,15 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
   else
 emit_move_insn (parmreg, validated_mem);
 
-  /* If we were passed a pointer but the actual value can safely live
- in a register, retrieve it and use it directly.  */
+  /* If we were passed a pointer but the actual value can live in a register,
+ retrieve it and use it directly.  Note that we cannot use nominal_mode,
+ because it will have been set to Pmode above, we must use the actual mode
+ of the parameter instead.  */
   if (data->arg.pass_by_reference && TYPE_MODE (TREE_TYPE (parm)) != BLKmode)
 {
-  /* We can't use nominal_mode, because it will have been set to
-	 Pmode above.  We must use the actual mode of the parm.  */
-  if (use_register_for_decl (parm))
+  /* Use a stack slot for debugging purposes, except if a tail call is
+	 involved because this would create a dangling reference.  */
+  if (use_register_for_decl (parm) || cfun->tail_call_marked)
 	{
 	  parmreg = gen_reg_rtx (TYPE_MODE (TREE_TYPE (parm)));
 	  mark_user_reg (parmreg);


Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 12:56 PM Eric Botcazou  wrote:
>
> > No, so you're right - validity analysis is split.  Still the cause you
> > reference comes from RTL expansion which means RTL expansion should
> > possibly do the disqualification here.  The question is what makes the case
> > you run into at -O0 impossible to trigger with -O1+?
>
> The key test in use_register_for_decl is:
>
>   if (optimize)
> return true;
>
> I think that all the preceding tests that return false cannot trigger in this
> context - a parameter passed by invisible reference - so, by returning true
> here, you're pretty much covered I think.
>
> > Maybe we can also avoid this spilling at -O0?
>
> The code wants to retrieve the parameter at all optimization levels.  The
> question is: register or stack slot?  It uses use_register_for_decl to decide
> as in other circumstances, but so you want to always force a register?
>
> Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE,
> the function is no longer a thunk for the middle-end (cfun->is_thunk false).
>
> That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and
> force the use of a register above only in case the bit is true.

  /* We don't set DECL_IGNORED_P for the function_result_decl.  */
  if (optimize)
return true;
  /* We don't set DECL_REGISTER for the function_result_decl.  */
  return false;

or maybe set (and check) DECL_REGISTER?  But the above certainly
looks like we can pick as we wish.

> > Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is
> > forced here because gimple analysis sometimes would not mark the call.
>
> The low-level assembly thunks simply support the most simple thunks, see the
> condition in expand_thunk.  Moreover targets can opt out as they wish.
>
> --
> Eric Botcazou
>
>


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 12:47 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On gimple the above function (after fixing it) looks like
> >
> >   VIEW_CONVERT_EXPR(u)[_1] = i_4(D);
> >
> > and the IFN idea I had would - for non-global memory 'u' only - transform
> > this to
> >
> >   vector_register_2 = u;
> >   vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D));
> >   u = vector_register_3;
> >
> > if vec_set can handle variable indexes.  This then becomes a
> > vec_set on a register and if that was the only variable indexing of 'u'
> > will also cause 'u' to be expanded as register rather than stack memory.
> >
> > Note we can't use the direct-optab method here since the vec_set optab
> > modifies operand 0 which isn't possible in SSA form.
>
> Would it be worth changing the optab so that the input and output are
> separate?  Having a single operand would be justified if the operation
> was only supposed to touch the selected bytes, but most targets wouldn't
> guarantee that for memory operands, even as things stand.

I thought about this as well, just didn't want to bother Xiong Hu Luo with
it for the experiments.

> Or maybe the idea was to force the RA's hand by making the input and
> output tied even before RA, with separate moves where necessary.
> But I'm not sure why vec_set is so special that it requires this
> treatment and other optabs don't.

Certainly the define_expand do not have to be define_insn so the target
can force the RAs hand just fine if it likes to.

The more interesting question of course is how to query vec_set whether
it accepts variable indices w/o building too much garbage RTL.

Richard.

>
> Thanks,
> Richard
>
>
> > That might hint at that we eventually want to extend BIT_INSERT_EXPR
> > to handle a non-constant bit position but for experiments using an
> > alternate internal function is certainly easier.
> >
> > Richard.


Re: RISC-V: fix a typo in riscv.h

2020-09-14 Thread Yeting Kuo via Gcc-patches
Hi Kito,

> > Could you provide a test case for that?
>
I add the test case and update the git message.
RISC-V: fix a typo in riscv.h

The missing parentheses would make shorten-memrefs pass give a
wrong base when the offset of load/store is not multiple of 4.

2020-09-14 Yeting Kuo 

gcc/ChangeLog:
* config/riscv/riscv.h

gcc/testsuite/ChangeLog:
* gcc.target/riscv/shorten-memrefs-8.c: New test.

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 9f67d82e74e..b7b4a1c88a5 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -941,7 +941,7 @@ extern unsigned riscv_stack_boundary;

 /* This is the maximum value that can be represented in a compressed
load/store
offset (an unsigned 5-bit value scaled by 4).  */
-#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3
+#define CSW_MAX_OFFSET (((4LL << C_S_BITS) - 1) & ~3)

 /* Called from RISCV_REORG, this is defined in riscv-sr.c.  */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-8.c
b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-8.c
new file mode 100644
index 000..f7428bd86cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-8.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* shorten_memrefs should use a correct base address*/
+
+void
+store (char *p, int k)
+{
+  *(int *)(p + 17) = k;
+  *(int *)(p + 21) = k;
+  *(int *)(p + 25) = k;
+  *(int *)(p + 29) = k;
+}
+
+int
+load (char *p)
+{
+  int a = 0;
+  a += *(int *)(p + 17);
+  a += *(int *)(p + 21);
+  a += *(int *)(p + 25);
+  a += *(int *)(p + 29);
+  return a;
+}
+
+/* { dg-final { scan-assembler "store:\n\taddi\ta\[0-7\],a\[0-7\],1" } } */
+/* { dg-final { scan-assembler "load:\n\taddi\ta\[0-7\],a\[0-7\],1" } } */

Thanks,
Yeting


Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Eric Botcazou
> That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and
> force the use of a register above only in case the bit is true.

In fact I can probably reuse cfun->tail_call_marked for this purpose.

-- 
Eric Botcazou




Re: [Patch] OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]

2020-09-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 14, 2020 at 09:50:08AM +0200, Tobias Burnus wrote:
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -1357,6 +1357,15 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
>tree type = TREE_TYPE (decl);
>tree ptr = gfc_conv_descriptor_data_get (decl);
>  
> +  /* OpenMP: automatically map pointer targets with the pointer;
> +  hence, always update the descriptor/pointer itself.
> +  NOTE: This also remaps the pointer for allocatable arrays with
> +  'target' attribute which also don't have the 'restrict' qualifier.  */
> +  bool always_modifier = false;
> +
> +  if (flag_openmp && !(TYPE_QUALS (TREE_TYPE (ptr)) & 
> TYPE_QUAL_RESTRICT))
> + always_modifier = true;

I think we don't want to depend on flag_openmp here, because
one can compile with -fopenmp -fopenacc and simply encounter OpenMP or
OpenACC code or none of that (or both as long as we don't reject it due to
invalid mixing).
So, if we need to handle the two languages differently, we probably need to
change the langhook's arguments and pass from the gimplifier a flag whether
it is OpenMP or OpenACC construct (or tree code of the construct or whatever
else).  Guess for now an openacc flag with (ctx->region_type & ORT_ACC) != 0
passed to it seems easiest.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index 3e292eb..f037151 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1,3 +1,4 @@
> +#pragma GCC optimize("O0")
>  /* Copyright (C) 2013-2020 Free Software Foundation, Inc.
> Contributed by Jakub Jelinek .
>  

Not this change please ;).

> @@ -472,8 +474,8 @@ gomp_map_fields_existing (struct target_mem_desc *tgt,
> && n2->host_start - n->host_start
>== n2->tgt_offset - n->tgt_offset)
>   {
> -   gomp_map_vars_existing (devicep, aq, n2, _node,
> -   >list[i], kind & typemask, cbuf);
> +   gomp_map_vars_existing (devicep, aq, n2, _node, >list[i],
> +   kind & typemask,false,  cbuf);

Formatting (move 1 space before false from after it).

Jakub



Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Eric Botcazou
> No, so you're right - validity analysis is split.  Still the cause you
> reference comes from RTL expansion which means RTL expansion should
> possibly do the disqualification here.  The question is what makes the case
> you run into at -O0 impossible to trigger with -O1+?

The key test in use_register_for_decl is:

  if (optimize)
return true;

I think that all the preceding tests that return false cannot trigger in this 
context - a parameter passed by invisible reference - so, by returning true 
here, you're pretty much covered I think.

> Maybe we can also avoid this spilling at -O0?

The code wants to retrieve the parameter at all optimization levels.  The 
question is: register or stack slot?  It uses use_register_for_decl to decide 
as in other circumstances, but so you want to always force a register?

Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE, 
the function is no longer a thunk for the middle-end (cfun->is_thunk false).

That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and 
force the use of a register above only in case the bit is true.

> Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is
> forced here because gimple analysis sometimes would not mark the call.

The low-level assembly thunks simply support the most simple thunks, see the 
condition in expand_thunk.  Moreover targets can opt out as they wish.

-- 
Eric Botcazou




Re: [PATCH] rtlanal: fix subreg handling in set_noop_p ()

2020-09-14 Thread Richard Sandiford
Ilya Leoshkevich  writes:
> Bootstrapped and regtested on x86_64-redhat-linux.  Ok for master?
>
>
>
> The following s390 rtx is errneously considered a no-op:
>
> (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
>
> Here, SET_DEST is a second register in a floating-point register pair,
> and SET_SRC is the second half of a vector register, so they refer to
> different bits.
>
> Fix by treating subregs of registers in different modes conservatively.
>
> gcc/ChangeLog:
>
> 2020-09-11  Ilya Leoshkevich  
>
>   * rtlanal.c (set_noop_p): Treat subregs of registers in
>   different modes conservatively.
> ---
>  gcc/rtlanal.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 5ae38b7966d..ed3360562d5 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1619,6 +1619,10 @@ set_noop_p (const_rtx set)
>   return 0;
>src = SUBREG_REG (src);
>dst = SUBREG_REG (dst);
> +  if (GET_MODE (src) != GET_MODE (dst))
> + /* It is hard to tell whether subregs refers to the same bits, so act
> +  * conservatively and return 0.  */

Sorry for the nit, but GCC style is not to have leading “*”s on second
and subsequent lines, so just;

/* It is hard to tell whether subregs refers to the same bits, so act
   conservatively and return 0.  */

OK with that change, thanks.

Richard

> + return 0;
>  }
>  
>/* It is a NOOP if destination overlaps with selected src vector


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Richard Sandiford
Richard Biener via Gcc-patches  writes:
> On gimple the above function (after fixing it) looks like
>
>   VIEW_CONVERT_EXPR(u)[_1] = i_4(D);
>
> and the IFN idea I had would - for non-global memory 'u' only - transform
> this to
>
>   vector_register_2 = u;
>   vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D));
>   u = vector_register_3;
>
> if vec_set can handle variable indexes.  This then becomes a
> vec_set on a register and if that was the only variable indexing of 'u'
> will also cause 'u' to be expanded as register rather than stack memory.
>
> Note we can't use the direct-optab method here since the vec_set optab
> modifies operand 0 which isn't possible in SSA form.

Would it be worth changing the optab so that the input and output are
separate?  Having a single operand would be justified if the operation
was only supposed to touch the selected bytes, but most targets wouldn't
guarantee that for memory operands, even as things stand.

Or maybe the idea was to force the RA's hand by making the input and
output tied even before RA, with separate moves where necessary.
But I'm not sure why vec_set is so special that it requires this
treatment and other optabs don't.

Thanks,
Richard


> That might hint at that we eventually want to extend BIT_INSERT_EXPR
> to handle a non-constant bit position but for experiments using an
> alternate internal function is certainly easier.
>
> Richard.


Re: i386: Fix array index in expander

2020-09-14 Thread Hongtao Liu via Gcc-patches
On Mon, Sep 14, 2020 at 3:51 PM Richard Biener via Gcc-patches
 wrote:
>
> On Fri, Sep 11, 2020 at 11:19 PM Nathan Sidwell  wrote:
> >
> > I noticed a compiler warning about out-of-bound access.  Fixed thusly.
> >
> >  gcc/
> >  * config/i386/sse.md (mov): Fix operand indices.
> >
> > pushed as obvious
>
> looks like wrong-code as well so could you push to affected branches?
>

GCC11 and GCC10 are affected.

> Thanks,
> Richard.
>
> > --
> > Nathan Sidwell



-- 
BR,
Hongtao


Re: [patch] Fix dangling references in thunks at -O0

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 10:36 AM Eric Botcazou  wrote:
>
> > ISTR the tailcall flag is only a hint and RTL expansion can decide to
> > not tailcall based on targets.  So to me it looks like a missed
> > disqualification on the RTL expansion side.
>
> That's IMO debatable: the GIMPLE tailcall optimization pass e.g. does:
>
>   /* Make sure the tail invocation of this function does not indirectly
>  refer to local variables.  (Passing variables directly by value
>  is OK.)  */
>   FOR_EACH_LOCAL_DECL (cfun, idx, var)
> {
>   if (TREE_CODE (var) != PARM_DECL
>   && auto_var_in_fn_p (var, cfun->decl)
>   && may_be_aliased (var)
>   && (ref_maybe_used_by_stmt_p (call, var)
>   || call_may_clobber_ref_p (call, var)))
>
> Do you want to replace it with something at the RTL level too?  I don't think
> that you can disqualify things at the RTL as easily as at the GIMPLE level.

No, so you're right - validity analysis is split.  Still the cause you reference
comes from RTL expansion which means RTL expansion should possibly
do the disqualification here.  The question is what makes the case you
run into at -O0 impossible to trigger with -O1+?

Maybe we can also avoid this spilling at -O0?

> > Or do we, besides from this very single spot, simply never tailcall at -O0
> > and thus never hit this latent issue?
>
> Presumably yes, tail calling is an optimization like the others.

Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is forced
here because gimple analysis sometimes would not mark the call.

> > How does this change the debug experience at -O0 when GIMPLE thunks
> > are used?
>
> In Ada this doesn't change much since thunks have line debug info.

I see, so as long as you then don't see extra frames it should be fine
with regard to debug info.

I'm not against the patch but let's leave it a bit for others to comment.

Richard.

> --
> Eric Botcazou
>
>


[PATCH]rs6000: Remove useless insns fed into lvx/stvx [PR97019]

2020-09-14 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to extend the existing function find_alignment_op to
check all defintions of base_reg are AND operations with mask -16B
to force the alignment.  If all are satifised, it passes all AND
operations and instructions in one vector to recombine_lvx_pattern
and recombine_stvx_pattern, they will remove all useless ANDs further.

Bootstrapped/regtested on powerpc64le-linux-gnu P8.

Is it OK for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000-p8swap.c (insn_rtx_pair_t): New type.
(find_alignment_op): Adjust to support multiple defintions which
are all AND operations with the mask -16B.
(recombine_lvx_pattern): Adjust to handle multiple AND operations
from find_alignment_op.
(recombine_stvx_pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr97019.c: New test.
diff --git a/gcc/config/rs6000/rs6000-p8swap.c 
b/gcc/config/rs6000/rs6000-p8swap.c
index 3d5dc7d8aae..2de2edeab67 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -183,6 +183,8 @@ class swap_web_entry : public web_entry_base
   unsigned int will_delete : 1;
 };
 
+typedef std::pair insn_rtx_pair_t;
+
 enum special_handling_values {
   SH_NONE = 0,
   SH_CONST_VECTOR,
@@ -2095,15 +2097,19 @@ alignment_mask (rtx_insn *insn)
   return alignment_with_canonical_addr (SET_SRC (body));
 }
 
-/* Given INSN that's a load or store based at BASE_REG, look for a
-   feeding computation that aligns its address on a 16-byte boundary.
-   Return the rtx and its containing AND_INSN.  */
-static rtx
-find_alignment_op (rtx_insn *insn, rtx base_reg, rtx_insn **and_insn)
+/* Given INSN that's a load or store based at BASE_REG, check if
+   all of its feeding computations align its address on a 16-byte
+   boundary.  If so, return true and put all the computations
+   information with the form of pair {and_operation rtx, and_insn}
+   into AND_INSN_VEC.  Otherwise, return false.  */
+
+static bool
+find_alignment_op (rtx_insn *insn, rtx base_reg,
+  vec *and_insn_vec)
 {
   df_ref base_use;
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
-  rtx and_operation = 0;
+  rtx and_operation = NULL_RTX;
 
   FOR_EACH_INSN_INFO_USE (base_use, insn_info)
 {
@@ -2111,22 +2117,30 @@ find_alignment_op (rtx_insn *insn, rtx base_reg, 
rtx_insn **and_insn)
continue;
 
   struct df_link *base_def_link = DF_REF_CHAIN (base_use);
-  if (!base_def_link || base_def_link->next)
-   break;
+  if (!base_def_link)
+   return false;
 
-  /* With stack-protector code enabled, and possibly in other
-circumstances, there may not be an associated insn for 
-the def.  */
-  if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
-   break;
+  while (base_def_link)
+   {
+ /* With stack-protector code enabled, and possibly in other
+circumstances, there may not be an associated insn for
+the def.  */
+ if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
+   return false;
 
-  *and_insn = DF_REF_INSN (base_def_link->ref);
-  and_operation = alignment_mask (*and_insn);
-  if (and_operation != 0)
-   break;
+ rtx_insn *and_insn = DF_REF_INSN (base_def_link->ref);
+ and_operation = alignment_mask (and_insn);
+
+ /* Stop if we find any one which doesn't align.  */
+ if (and_operation == NULL_RTX)
+   return false;
+
+ and_insn_vec->safe_push (std::make_pair (and_insn, and_operation));
+ base_def_link = base_def_link->next;
+   }
 }
 
-  return and_operation;
+  return and_operation != NULL_RTX;
 }
 
 struct del_info { bool replace; rtx_insn *replace_insn; };
@@ -2143,10 +2157,10 @@ recombine_lvx_pattern (rtx_insn *insn, del_info 
*to_delete)
   rtx mem = XEXP (SET_SRC (body), 0);
   rtx base_reg = XEXP (mem, 0);
 
-  rtx_insn *and_insn;
-  rtx and_operation = find_alignment_op (insn, base_reg, _insn);
+  auto_vec and_insn_vec;
+  bool all_and_p = find_alignment_op (insn, base_reg, _insn_vec);
 
-  if (and_operation != 0)
+  if (all_and_p)
 {
   df_ref def;
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
@@ -2168,25 +2182,34 @@ recombine_lvx_pattern (rtx_insn *insn, del_info 
*to_delete)
  to_delete[INSN_UID (swap_insn)].replace = true;
  to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn;
 
- /* However, first we must be sure that we make the
-base register from the AND operation available
-in case the register has been overwritten.  Copy
-the base register to a new pseudo and use that
-as the base register of the AND operation in
-the new LVX instruction.  */
- rtx and_base = XEXP (and_operation, 0);
- rtx new_reg = gen_reg_rtx (GET_MODE (and_base));
- rtx copy = gen_rtx_SET (new_reg, and_base);
- rtx_insn *new_insn 

Re: [PATCH] Fix overflow handling in std::align

2020-09-14 Thread Ville Voutilainen via Gcc-patches
On Mon, 14 Sep 2020 at 12:51, Ville Voutilainen
 wrote:
>
> On Mon, 14 Sep 2020 at 09:18, Glen Fernandes via Libstdc++
>  wrote:
> >
> > Edit; Correct patch this time.
> >
> > Fix overflow handling in align
>
> Should the test verify that space is unmodified when nullptr is returned?

..and same for ptr.


Re: [PATCH] Fix overflow handling in std::align

2020-09-14 Thread Ville Voutilainen via Gcc-patches
On Mon, 14 Sep 2020 at 09:18, Glen Fernandes via Libstdc++
 wrote:
>
> Edit; Correct patch this time.
>
> Fix overflow handling in align

Should the test verify that space is unmodified when nullptr is returned?


Re: [PATCH] options, lto: Optimize streaming of optimization nodes

2020-09-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 14, 2020 at 11:02:26AM +0200, Jan Hubicka wrote:
> Especially for the new param machinery, most of streamed values are
> probably going to be the default values.  Perhaps somehow we could
> stream them more effectively.

Ah, that seems like a good idea, that brings further savings, the size
goes down from 574 bytes to 273 bytes, i.e. less than half.
Here is an updated version that does that.  Not trying to handle
enums because the code doesn't know if (enum ...) 10 is even valid,
similarly non-parameters because those really generally don't have large
initializers, and params without Init (those are 0 initialized and thus
don't need to be handled).

2020-09-14  Jakub Jelinek  

* optc-save-gen.awk: Initialize var_opt_init.  In
cl_optimization_stream_out use bp_pack_var_len_{int,unsigned} instead
of bp_pack_value and for params with default values larger than 10, xor
the default value with the actual parameter value.  In
cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
instead of bp_unpack_value and repeat the above xor.  Formatting fix.

--- gcc/optc-save-gen.awk.jj2020-09-14 10:51:54.493740942 +0200
+++ gcc/optc-save-gen.awk   2020-09-14 11:39:39.441602594 +0200
@@ -1186,6 +1186,7 @@ for (i = 0; i < n_opts; i++) {
var_opt_val_type[n_opt_val] = otype;
var_opt_val[n_opt_val] = "x_" name;
var_opt_hash[n_opt_val] = flag_set_p("Optimization", flags[i]);
+   var_opt_init[n_opt_val] = opt_args("Init", flags[i]);
n_opt_val++;
}
 }
@@ -1257,8 +1258,21 @@ for (i = 0; i < n_opt_val; i++) {
otype = var_opt_val_type[i];
if (otype ~ "^const char \\**$")
print "  bp_pack_string (ob, bp, ptr->" name", true);";
-   else
-   print "  bp_pack_value (bp, ptr->" name", 64);";
+   else {
+   if (otype ~ "^unsigned") {
+   sgn = "unsigned";
+   } else {
+   sgn = "int";
+   }
+   if (name ~ "^x_param" && !(otype ~ "^enum ") && 
var_opt_init[i]) {
+   print "  if (" var_opt_init[i] " > (" 
var_opt_val_type[i] ") 10)";
+   print "bp_pack_var_len_" sgn " (bp, ptr->" name" ^ 
" var_opt_init[i] ");";
+   print "  else";
+   print "bp_pack_var_len_" sgn " (bp, ptr->" name");";
+   } else {
+   print "  bp_pack_var_len_" sgn " (bp, ptr->" name");";
+   }
+   }
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
@@ -1274,14 +1288,23 @@ print "{";
 for (i = 0; i < n_opt_val; i++) {
name = var_opt_val[i]
otype = var_opt_val_type[i];
-   if (otype ~ "^const char \\**$")
-   {
- print "  ptr->" name" = bp_unpack_string (data_in, bp);";
- print "  if (ptr->" name")";
- print "ptr->" name" = xstrdup (ptr->" name");";
+   if (otype ~ "^const char \\**$") {
+   print "  ptr->" name" = bp_unpack_string (data_in, bp);";
+   print "  if (ptr->" name")";
+   print "ptr->" name" = xstrdup (ptr->" name");";
+   }
+   else {
+   if (otype ~ "^unsigned") {
+   sgn = "unsigned";
+   } else {
+   sgn = "int";
+   }
+   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_" sgn " (bp);";
+   if (name ~ "^x_param" && !(otype ~ "^enum ") && 
var_opt_init[i]) {
+   print "  if (" var_opt_init[i] " > (" 
var_opt_val_type[i] ") 10)";
+   print "ptr->" name" ^= " var_opt_init[i] ";";
+   }
}
-   else
- print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value 
(bp, 64);";
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";


Jakub



Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread Richard Biener via Gcc-patches
On Mon, Sep 14, 2020 at 10:05 AM luoxhu  wrote:
>
>
>
> On 2020/9/10 18:08, Richard Biener wrote:
> > On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool
> >  wrote:
> >>
> >> On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote:
> >>> On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool
> >>>  wrote:
> 
>  Hi!
> 
>  On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote:
> > Hmm, yeah - I guess that's what should be addressed first then.
> > I'm quite sure that in case 'v' is not on the stack but in memory like
> > in my case a SImode store is better than what we get from
> > vec_insert - in fact vec_insert will likely introduce a RMW cycle
> > which is prone to inserting store-data-races?
> 
>  The other way around -- if it is in memory, and was stored as vector
>  recently, then reading back something shorter from it is prone to
>  SHL/LHS problems.  There is nothing special about the stack here, except
>  of course it is more likely to have been stored recently if on the
>  stack.  So it depends how often it has been stored recently which option
>  is best.  On newer CPUs, although they can avoid SHL/LHS flushes more
>  often, the penalty is relatively bigger, so memory does not often win.
> 
>  I.e.: it needs to be measured.  Intuition is often wrong here.
> >>>
> >>> But the current method would simply do a direct store to memory
> >>> without a preceeding read of the whole vector.
> >>
> >> The problem is even worse the other way: you do a short store here, but
> >> so a full vector read later.  If the store and read are far apart, that
> >> is fine, but if they are close (that is on the order of fifty or more
> >> insns), there can be problems.
> >
> > Sure, but you can't simply load/store a whole vector when the code didn't
> > unless you know it will not introduce data races and it will not trap (thus
> > the whole vector needs to be at least naturally aligned).
> >
> > Also if there's a preceeding short store you will now load the whole vector
> > to avoid the short store ... catch-22
> >
> >> There often are problems over function calls (where the compiler cannot
> >> usually *see* how something is used).
> >
> > Yep.  The best way would be to use small loads and larger stores
> > which is what CPUs usually tend to handle fine (with alignment
> > constraints, etc.).  Of course that's not what either of the "solutions"
> > can do.
> >
> > That said, since you seem to be "first" in having an instruction
> > to insert into a vector at a variable position the idea that we'd
> > have to spill anyway for this to be expanded and thus we expand
> > the vector to a stack location in the first place falls down.  And
> > that's where I'd first try to improve things.
> >
> > So what can the CPU actually do?
> >
>
> Not sure whether this reflects the issues you discussed above.
> I constructed below test cases and tested with and without this patch,
> only if "a+c"(which means store only), the performance is getting bad with
> this patch due to extra load/store(about 50% slower).
> For "v = vec_insert (i, v, n);" usage, v is always loaded after store, so this
> patch will always get big benefit.
> But for "v[n % 4] = i;" usage, it depends on whether "v" is used immediately
> inside the function or out of the function soon.  Does this mean unify the two
> usage to same gimple code not a good idea sometimes?  Or is it possible to
> detect the generated IFN ".VEC_INSERT (, i_4(D), _1);" destination "v" is
> used not far away inside or out side of the function?
>
>
> #define TYPE int
>
> vector TYPE v = {1, 2, 3, 4};   // a. global vector.
> vector TYPE s = {4, 3, 2, 1};
>
> __attribute__ ((noinline))
> vector TYPE
> test (vector TYPE u, TYPE i, size_t n)
> {
>   v[n % 4] = i;

^^^

this should be

   u[n % 4] = i;

I guess.  Is the % 4 mandated by the vec_insert semantics btw?

If you tested with the above error you probably need to re-measure.

On gimple the above function (after fixing it) looks like

  VIEW_CONVERT_EXPR(u)[_1] = i_4(D);

and the IFN idea I had would - for non-global memory 'u' only - transform
this to

  vector_register_2 = u;
  vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D));
  u = vector_register_3;

if vec_set can handle variable indexes.  This then becomes a
vec_set on a register and if that was the only variable indexing of 'u'
will also cause 'u' to be expanded as register rather than stack memory.

Note we can't use the direct-optab method here since the vec_set optab
modifies operand 0 which isn't possible in SSA form.  That might hint
at that we eventually want to extend BIT_INSERT_EXPR to handle
a non-constant bit position but for experiments using an alternate
internal function is certainly easier.

Richard.

>   return u;
> }
>
> int main ()
> {
>   //vector TYPE v = {1, 2, 3, 4};// b. local vector.
>   //vector TYPE s = {4, 3, 2, 1};
>   vector TYPE r = {0};
>   for 

[PATCH] tree-optimization/97043 - fix latent wrong-code with SLP vectorization

2020-09-14 Thread Richard Biener
When the unrolling decision comes late and would have prevented
eliding a SLP load permutation we can end up generating aligned
loads when the load is in fact unaligned.  Most of the time
alignment analysis figures out the load is in fact unaligned
but that cannot be relied upon.

The following removes the SLP load permutation eliding based on
the still premature vectorization factor.

This is only necessary on branches since on trunk this optimization
is delayed upon a point where the vectorization factor is final.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2020-09-14  Richard Biener  

PR tree-optimization/97043
* tree-vect-slp.c (vect_analyze_slp_instance): Do not
elide a load permutation if the current vectorization
factor is one.
---
 gcc/tree-vect-slp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f6331eeea86..3fdf56f9335 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2309,9 +2309,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
  /* The load requires permutation when unrolling exposes
 a gap either because the group is larger than the SLP
 group-size or because there is a gap between the groups.  
*/
- && (known_eq (unrolling_factor, 1U)
- || (group_size == DR_GROUP_SIZE (first_stmt_info)
- && DR_GROUP_GAP (first_stmt_info) == 0)))
+ && group_size == DR_GROUP_SIZE (first_stmt_info)
+ && DR_GROUP_GAP (first_stmt_info) == 0)
{
  SLP_TREE_LOAD_PERMUTATION (load_node).release ();
  continue;
-- 
2.26.2


[PATCH] libiberty/pex-win32.c: Initialize orig_err

2020-09-14 Thread Christophe Lyon via Gcc-patches
Initializing orig_err avoids a warning: "may be used uninitialized".

2020-09-14  Torbjörn SVENSSON 
Christophe Lyon  

libiberty/
* pex-win32 (pex_win32_exec_child): Initialize orig_err.
---
 libiberty/pex-win32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libiberty/pex-win32.c b/libiberty/pex-win32.c
index 331067b..f5fafaa 100644
--- a/libiberty/pex-win32.c
+++ b/libiberty/pex-win32.c
@@ -771,7 +771,7 @@ pex_win32_exec_child (struct pex_obj *obj ATTRIBUTE_UNUSED, 
int flags,
   OSVERSIONINFO version_info;
   STARTUPINFO si;
   PROCESS_INFORMATION pi;
-  int orig_out, orig_in, orig_err;
+  int orig_out, orig_in, orig_err = 0;
   BOOL separate_stderr = !(flags & PEX_STDERR_TO_STDOUT);
 
   /* Ensure we have inheritable descriptors to pass to the child.  */
-- 
2.7.4



[PATCH V2] vec: don't select partial vectors when looping on full vectors

2020-09-14 Thread Andrea Corallo
Hi all,

here is the update version of the patch implementing suggestions.

The check for 'vect_need_peeling_or_partial_vectors_p' (and its
comment) has also been move just before so we can short-circuit the
partial vector handling if we know we are using full vectors.

Bootstrapped and regtested on aarch64-linux-gnu.

Okay for trunk?

Thanks

  Andrea

>From 45b5e45a7ab2eecfa8489d2a7b8341556e9e8d7c Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Fri, 28 Aug 2020 16:01:15 +0100
Subject: [PATCH] vec: don't select partial vectors when unnecessary

gcc/ChangeLog

2020-09-09  Andrea Corallo  

* tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New
function.
(vect_analyze_loop_2): Make use of it not to select partial
vectors if no peel is required.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_partial_vectors_p'.

gcc/testsuite/ChangeLog

2020-09-09  Andrea Corallo  

* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
---
 .../gcc.target/aarch64/sve/clastb_8.c |  5 +-
 .../gcc.target/aarch64/sve/cost_model_10.c| 12 +++
 .../gcc.target/aarch64/sve/cost_model_5.c |  4 +-
 .../gcc.target/aarch64/sve/struct_vect_14.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_15.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_16.c   |  8 +-
 .../gcc.target/aarch64/sve/struct_vect_17.c   |  8 +-
 gcc/tree-vect-loop.c  | 86 +++
 8 files changed, 81 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
index 57c42082449..e61ff4ac92d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
@@ -23,7 +23,4 @@ TEST_TYPE (uint64_t);
 /* { dg-final { scan-assembler {\tclastb\t(h[0-9]+), p[0-7], \1, z[0-9]+\.h\n} 
} } */
 /* { dg-final { scan-assembler {\tclastb\t(s[0-9]+), p[0-7], \1, z[0-9]+\.s\n} 
} } */
 /* { dg-final { scan-assembler {\tclastb\t(d[0-9]+), p[0-7], \1, z[0-9]+\.d\n} 
} } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.b,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.h,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.s,} } } */
-/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.d,} } } */
+/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.b,} 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c
new file mode 100644
index 000..bfac09ed1c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c
@@ -0,0 +1,12 @@
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+void
+f (int *restrict x, int *restrict y, unsigned int n)
+{
+  for (unsigned int i = 0; i < n * 8; ++i)
+x[i] += y[i];
+}
+
+/* { dg-final { scan-assembler-not {\twhilelo\t} } } */
+/* { dg-final { scan-assembler {\tptrue\tp} } } */
+/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, x[0-9]+\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
index 250ca837324..f3a29fc38a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c
@@ -9,5 +9,5 @@ vset (int *restrict dst, int *restrict src, int count)
   *dst++ = 1;
 }
 
-/* { dg-final { scan-assembler-not {\tst1w\tz} } } */
-/* { dg-final { scan-assembler-times {\tstp\tq} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz} 2 } } */
+/* { dg-final { scan-assembler-not {\tstp\tq} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c 
b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c
index a16a79e51c0..45644b67bda 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c
@@ -43,12 +43,12 @@
 #undef NAME
 #undef TYPE
 
-/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, 
p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, 
p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, 
p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, 
p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], 
\[x[0-9]+, x[0-9]+\]\n} 1 

Re: [PATCH] options, lto: Optimize streaming of optimization nodes

2020-09-14 Thread Richard Biener
On Mon, 14 Sep 2020, Jakub Jelinek wrote:

> On Mon, Sep 14, 2020 at 09:31:52AM +0200, Richard Biener wrote:
> > But does it make any noticable difference in the end?  Using
> 
> Yes.
> 
> > bp_pack_var_len_unsigned just causes us to [u]leb encode half-bytes
> > rather than full bytes.  Using hardcoded 8/16/32/64 makes it still
> > dependent on what 'int' is at maximum on the host.
> > 
> > That is, I'd indeed prefer bp_pack_var_len_unsigned over hard-coding
> > 8, 16, etc., but can you share a size comparison of the bitpack?
> > I guess with bp_pack_var_len_unsigned it might shrink in half
> > compared to the current code and streaming standard -O2?
> 
> So, I've tried
> --- gcc/tree-streamer-out.c.jj2020-07-28 15:39:10.079755251 +0200
> +++ gcc/tree-streamer-out.c   2020-09-14 10:31:29.106957258 +0200
> @@ -489,7 +489,11 @@ streamer_write_tree_bitfields (struct ou
>  pack_ts_translation_unit_decl_value_fields (ob, , expr);
>  
>if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
> +{
> +long ts = ob->main_stream->total_size;
>  cl_optimization_stream_out (ob, , TREE_OPTIMIZATION (expr));
> +fprintf (stderr, "total_size %ld\n", (long) (ob->main_stream->total_size - 
> ts));
> +}
>  
>if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
>  bp_pack_var_len_unsigned (, CONSTRUCTOR_NELTS (expr));
> hack without and with the following patch on a simple small testcase with
> -O2 -flto.
> Got 574 bytes without the opc-save-gen.awk change and 454 bytes with it,
> that is ~ 21% saving on the TREE_OPTIMIZATION streaming.

OK, not spectacular but I guess good enough.  So I'd say the below
patch using bp_pack_var_len_{int,unsigned} is OK.

Thanks,
Richard.

> 2020-09-14  Jakub Jelinek  
> 
>   * optc-save-gen.awk: In cl_optimization_stream_out use
>   bp_pack_var_len_{int,unsigned} instead of bp_pack_value.  In
>   cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
>   instead of bp_unpack_value.  Formatting fix.
> 
> --- gcc/optc-save-gen.awk.jj  2020-09-14 09:04:35.879854156 +0200
> +++ gcc/optc-save-gen.awk 2020-09-14 10:38:47.722424942 +0200
> @@ -1257,8 +1257,10 @@ for (i = 0; i < n_opt_val; i++) {
>   otype = var_opt_val_type[i];
>   if (otype ~ "^const char \\**$")
>   print "  bp_pack_string (ob, bp, ptr->" name", true);";
> + else if (otype ~ "^unsigned")
> + print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
>   else
> - print "  bp_pack_value (bp, ptr->" name", 64);";
> + print "  bp_pack_var_len_int (bp, ptr->" name");";
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> @@ -1274,14 +1276,15 @@ print "{";
>  for (i = 0; i < n_opt_val; i++) {
>   name = var_opt_val[i]
>   otype = var_opt_val_type[i];
> - if (otype ~ "^const char \\**$")
> - {
> -   print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> -   print "  if (ptr->" name")";
> -   print "ptr->" name" = xstrdup (ptr->" name");";
> + if (otype ~ "^const char \\**$") {
> + print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> + print "  if (ptr->" name")";
> + print "ptr->" name" = xstrdup (ptr->" name");";
>   }
> + else if (otype ~ "^unsigned")
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_unsigned (bp);";
>   else
> -   print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value 
> (bp, 64);";
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_int (bp);";
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] rtlanal: fix subreg handling in set_noop_p ()

2020-09-14 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux.  Ok for master?



The following s390 rtx is errneously considered a no-op:

(set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))

Here, SET_DEST is a second register in a floating-point register pair,
and SET_SRC is the second half of a vector register, so they refer to
different bits.

Fix by treating subregs of registers in different modes conservatively.

gcc/ChangeLog:

2020-09-11  Ilya Leoshkevich  

* rtlanal.c (set_noop_p): Treat subregs of registers in
different modes conservatively.
---
 gcc/rtlanal.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 5ae38b7966d..ed3360562d5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,10 @@ set_noop_p (const_rtx set)
return 0;
   src = SUBREG_REG (src);
   dst = SUBREG_REG (dst);
+  if (GET_MODE (src) != GET_MODE (dst))
+   /* It is hard to tell whether subregs refers to the same bits, so act
+* conservatively and return 0.  */
+   return 0;
 }
 
   /* It is a NOOP if destination overlaps with selected src vector
-- 
2.25.4



Re: [PATCH] options, lto: Optimize streaming of optimization nodes

2020-09-14 Thread Jan Hubicka
> On Mon, Sep 14, 2020 at 09:31:52AM +0200, Richard Biener wrote:
> > But does it make any noticable difference in the end?  Using
> 
> Yes.
> 
> > bp_pack_var_len_unsigned just causes us to [u]leb encode half-bytes
> > rather than full bytes.  Using hardcoded 8/16/32/64 makes it still
> > dependent on what 'int' is at maximum on the host.
> > 
> > That is, I'd indeed prefer bp_pack_var_len_unsigned over hard-coding
> > 8, 16, etc., but can you share a size comparison of the bitpack?
> > I guess with bp_pack_var_len_unsigned it might shrink in half
> > compared to the current code and streaming standard -O2?
> 
> So, I've tried
> --- gcc/tree-streamer-out.c.jj2020-07-28 15:39:10.079755251 +0200
> +++ gcc/tree-streamer-out.c   2020-09-14 10:31:29.106957258 +0200
> @@ -489,7 +489,11 @@ streamer_write_tree_bitfields (struct ou
>  pack_ts_translation_unit_decl_value_fields (ob, , expr);
>  
>if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
> +{
> +long ts = ob->main_stream->total_size;
>  cl_optimization_stream_out (ob, , TREE_OPTIMIZATION (expr));
> +fprintf (stderr, "total_size %ld\n", (long) (ob->main_stream->total_size - 
> ts));
> +}

You should be able to read the sizes from streaming dump file as well.
>  
>if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
>  bp_pack_var_len_unsigned (, CONSTRUCTOR_NELTS (expr));
> hack without and with the following patch on a simple small testcase with
> -O2 -flto.
> Got 574 bytes without the opc-save-gen.awk change and 454 bytes with it,
> that is ~ 21% saving on the TREE_OPTIMIZATION streaming.
> 
> 2020-09-14  Jakub Jelinek  
> 
>   * optc-save-gen.awk: In cl_optimization_stream_out use
>   bp_pack_var_len_{int,unsigned} instead of bp_pack_value.  In
>   cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
>   instead of bp_unpack_value.  Formatting fix.
> 
> --- gcc/optc-save-gen.awk.jj  2020-09-14 09:04:35.879854156 +0200
> +++ gcc/optc-save-gen.awk 2020-09-14 10:38:47.722424942 +0200
> @@ -1257,8 +1257,10 @@ for (i = 0; i < n_opt_val; i++) {
>   otype = var_opt_val_type[i];
>   if (otype ~ "^const char \\**$")
>   print "  bp_pack_string (ob, bp, ptr->" name", true);";
> + else if (otype ~ "^unsigned")
> + print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
>   else
> - print "  bp_pack_value (bp, ptr->" name", 64);";
> + print "  bp_pack_var_len_int (bp, ptr->" name");";
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> @@ -1274,14 +1276,15 @@ print "{";
>  for (i = 0; i < n_opt_val; i++) {
>   name = var_opt_val[i]
>   otype = var_opt_val_type[i];
> - if (otype ~ "^const char \\**$")
> - {
> -   print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> -   print "  if (ptr->" name")";
> -   print "ptr->" name" = xstrdup (ptr->" name");";
> + if (otype ~ "^const char \\**$") {
> + print "  ptr->" name" = bp_unpack_string (data_in, bp);";
> + print "  if (ptr->" name")";
> + print "ptr->" name" = xstrdup (ptr->" name");";
>   }
> + else if (otype ~ "^unsigned")
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_unsigned (bp);";
>   else
> -   print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value 
> (bp, 64);";
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_int (bp);";

Not making difference between signed/unsigned was my implementation
lazyness at the time code was added. So this looks like nice cleanup.

Especially for the new param machinery, most of streamed values are
probably going to be the default values.  Perhaps somehow we could
stream them more effectively.

Overall we sould not get much more than 1 optimize/target node per unit
so the size should show up only when you stream a lot of very small .o
files.

Honza
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";
> 
> 
>   Jakub
> 


Re: [PATCH] options, lto: Optimize streaming of optimization nodes

2020-09-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 14, 2020 at 09:31:52AM +0200, Richard Biener wrote:
> But does it make any noticable difference in the end?  Using

Yes.

> bp_pack_var_len_unsigned just causes us to [u]leb encode half-bytes
> rather than full bytes.  Using hardcoded 8/16/32/64 makes it still
> dependent on what 'int' is at maximum on the host.
> 
> That is, I'd indeed prefer bp_pack_var_len_unsigned over hard-coding
> 8, 16, etc., but can you share a size comparison of the bitpack?
> I guess with bp_pack_var_len_unsigned it might shrink in half
> compared to the current code and streaming standard -O2?

So, I've tried
--- gcc/tree-streamer-out.c.jj  2020-07-28 15:39:10.079755251 +0200
+++ gcc/tree-streamer-out.c 2020-09-14 10:31:29.106957258 +0200
@@ -489,7 +489,11 @@ streamer_write_tree_bitfields (struct ou
 pack_ts_translation_unit_decl_value_fields (ob, , expr);
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
+{
+long ts = ob->main_stream->total_size;
 cl_optimization_stream_out (ob, , TREE_OPTIMIZATION (expr));
+fprintf (stderr, "total_size %ld\n", (long) (ob->main_stream->total_size - 
ts));
+}
 
   if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
 bp_pack_var_len_unsigned (, CONSTRUCTOR_NELTS (expr));
hack without and with the following patch on a simple small testcase with
-O2 -flto.
Got 574 bytes without the opc-save-gen.awk change and 454 bytes with it,
that is ~ 21% saving on the TREE_OPTIMIZATION streaming.

2020-09-14  Jakub Jelinek  

* optc-save-gen.awk: In cl_optimization_stream_out use
bp_pack_var_len_{int,unsigned} instead of bp_pack_value.  In
cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
instead of bp_unpack_value.  Formatting fix.

--- gcc/optc-save-gen.awk.jj2020-09-14 09:04:35.879854156 +0200
+++ gcc/optc-save-gen.awk   2020-09-14 10:38:47.722424942 +0200
@@ -1257,8 +1257,10 @@ for (i = 0; i < n_opt_val; i++) {
otype = var_opt_val_type[i];
if (otype ~ "^const char \\**$")
print "  bp_pack_string (ob, bp, ptr->" name", true);";
+   else if (otype ~ "^unsigned")
+   print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
else
-   print "  bp_pack_value (bp, ptr->" name", 64);";
+   print "  bp_pack_var_len_int (bp, ptr->" name");";
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
@@ -1274,14 +1276,15 @@ print "{";
 for (i = 0; i < n_opt_val; i++) {
name = var_opt_val[i]
otype = var_opt_val_type[i];
-   if (otype ~ "^const char \\**$")
-   {
- print "  ptr->" name" = bp_unpack_string (data_in, bp);";
- print "  if (ptr->" name")";
- print "ptr->" name" = xstrdup (ptr->" name");";
+   if (otype ~ "^const char \\**$") {
+   print "  ptr->" name" = bp_unpack_string (data_in, bp);";
+   print "  if (ptr->" name")";
+   print "ptr->" name" = xstrdup (ptr->" name");";
}
+   else if (otype ~ "^unsigned")
+   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_unsigned (bp);";
else
- print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value 
(bp, 64);";
+   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_int (bp);";
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";


Jakub



RE: [PATCH] arm: Fix up gcc.target/arm/lto/pr96939_* FAIL

2020-09-14 Thread Kyrylo Tkachov
Hi Jakub,

> -Original Message-
> From: Jakub Jelinek 
> Sent: 13 September 2020 09:39
> To: Richard Earnshaw ; Kyrylo Tkachov
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] arm: Fix up gcc.target/arm/lto/pr96939_* FAIL
> 
> Hi!
> 
> The following patch on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553801.html
> patch fixes the gcc.target/arm/lto/pr96939_* test in certain ARM
> configurations.
> As said in the above mentioned patch, the generic code takes care of
> saving/restoring TargetVariables or Target Save options, so this just
> arranges for the generic code to save those instead of needing the
> arm backend to do it manually.
> 
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?

Ok.
Thanks,
Kyrill

> 
> 2020-09-13  Jakub Jelinek  
> 
>   * config/arm/arm.opt (x_arm_arch_string, x_arm_cpu_string,
>   x_arm_tune_string): Remove TargetSave entries.
>   (march=, mcpu=, mtune=): Add Save keyword.
>   * config/arm/arm.c (arm_option_save): Remove.
>   (TARGET_OPTION_SAVE): Don't redefine.
>   (arm_option_restore): Don't restore x_arm_*_string here.
> 
> --- gcc/config/arm/arm.opt.jj 2020-01-12 11:54:36.273415521 +0100
> +++ gcc/config/arm/arm.opt2020-09-12 10:45:51.239935884 +0200
> @@ -21,15 +21,6 @@
>  HeaderInclude
>  config/arm/arm-opts.h
> 
> -TargetSave
> -const char *x_arm_arch_string
> -
> -TargetSave
> -const char *x_arm_cpu_string
> -
> -TargetSave
> -const char *x_arm_tune_string
> -
>  Enum
>  Name(tls_type) Type(enum arm_tls_type)
>  TLS dialect to use:
> @@ -82,7 +73,7 @@ mapcs-stack-check
>  Target Report Mask(APCS_STACK) Undocumented
> 
>  march=
> -Target RejectNegative Negative(march=) ToLower Joined
> Var(arm_arch_string)
> +Target Save RejectNegative Negative(march=) ToLower Joined
> Var(arm_arch_string)
>  Specify the name of the target architecture.
> 
>  ; Other arm_arch values are loaded from arm-tables.opt
> @@ -107,7 +98,7 @@ Target Report Mask(CALLER_INTERWORKING)
>  Thumb: Assume function pointers may go to non-Thumb aware code.
> 
>  mcpu=
> -Target RejectNegative Negative(mcpu=) ToLower Joined
> Var(arm_cpu_string)
> +Target Save RejectNegative Negative(mcpu=) ToLower Joined
> Var(arm_cpu_string)
>  Specify the name of the target CPU.
> 
>  mfloat-abi=
> @@ -232,7 +223,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
>  Thumb: Generate (leaf) stack frames even if not needed.
> 
>  mtune=
> -Target RejectNegative Negative(mtune=) ToLower Joined
> Var(arm_tune_string)
> +Target Save RejectNegative Negative(mtune=) ToLower Joined
> Var(arm_tune_string)
>  Tune code for the given processor.
> 
>  mprint-tune-info
> --- gcc/config/arm/arm.c.jj   2020-09-11 17:44:28.643014087 +0200
> +++ gcc/config/arm/arm.c  2020-09-12 10:48:09.951888347 +0200
> @@ -247,8 +247,6 @@ static tree arm_build_builtin_va_list (v
>  static void arm_expand_builtin_va_start (tree, rtx);
>  static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq
> *);
>  static void arm_option_override (void);
> -static void arm_option_save (struct cl_target_option *, struct gcc_options *,
> -  struct gcc_options *);
>  static void arm_option_restore (struct gcc_options *, struct gcc_options *,
>   struct cl_target_option *);
>  static void arm_override_options_after_change (void);
> @@ -443,9 +441,6 @@ static const struct attribute_spec arm_a
>  #undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
>  #define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
> arm_override_options_after_change
> 
> -#undef TARGET_OPTION_SAVE
> -#define TARGET_OPTION_SAVE arm_option_save
> -
>  #undef TARGET_OPTION_RESTORE
>  #define TARGET_OPTION_RESTORE arm_option_restore
> 
> @@ -3042,24 +3037,11 @@ arm_override_options_after_change (void)
>arm_override_options_after_change_1 (_options,
> _options_set);
>  }
> 
> -/* Implement TARGET_OPTION_SAVE.  */
> -static void
> -arm_option_save (struct cl_target_option *ptr, struct gcc_options *opts,
> -  struct gcc_options */* opts_set */)
> -{
> -  ptr->x_arm_arch_string = opts->x_arm_arch_string;
> -  ptr->x_arm_cpu_string = opts->x_arm_cpu_string;
> -  ptr->x_arm_tune_string = opts->x_arm_tune_string;
> -}
> -
>  /* Implement TARGET_OPTION_RESTORE.  */
>  static void
>  arm_option_restore (struct gcc_options *opts, struct gcc_options *opts_set,
>   struct cl_target_option *ptr)
>  {
> -  opts->x_arm_arch_string = ptr->x_arm_arch_string;
> -  opts->x_arm_cpu_string = ptr->x_arm_cpu_string;
> -  opts->x_arm_tune_string = ptr->x_arm_tune_string;
>arm_configure_build_target (_active_target, ptr, opts_set, false);
>  }
> 
> 
> 
>   Jakub



  1   2   >