Re: [PATCH AArch64]Fix test failure for pr84682-2.c

2018-05-16 Thread Kyrill Tkachov

Hi Bin,


On 22/03/18 11:07, Bin.Cheng wrote:

On Sat, Mar 17, 2018 at 8:54 AM, Richard Sandiford
 wrote:
> Kyrill  Tkachov  writes:
>> Hi Bin,
>>
>> On 16/03/18 11:42, Bin Cheng wrote:
>>> Hi,
>>> This simple patch fixes test case failure for pr84682-2.c by returning
>>> false on wrong mode rtx in aarch64_classify_address, rather than assert.
>>>
>>> Bootstrap and test on aarch64.  Is it OK?
>>>
>>> Thanks,
>>> bin
>>>
>>> 2018-03-16  Bin Cheng 
>>>
>>> * config/aarch64/aarch64.c (aarch64_classify_address): Return false
>>> on wrong mode rtx, rather than assert.
>>
>> This looks ok to me in light of
>> https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00633.html
>> This function is used to validate inline asm operands too, not just
>> internally-generated addresses.
>> Therefore all kinds of garbage must be rejected gracefully rather than 
ICEing.
>>
>> You'll need an approval from an AArch64 maintainer though.
>
> IMO we should make address_operand itself check something like:
>
>   (GET_MODE (x) == VOIDmode || SCALAR_INT_MODE_P (GET_MODE (x)))
>
> Target-independent code fundamentally assumes that an address will not
> be a float, so I think the check should be in target-independent code
> rather than copied to each individual backend.
>
> This was only caught on aarch64 because we added the assert, but I think
> some backends ignore the mode of the address and so would actually accept
> simple float rtxes.
Hi Richard,
Thanks for the suggestion generalizing the fix.  Here is the updated patch.
Bootstrap and test on x86_64 and AArch64, is it OK?



I guess you need a midend maintainer to ok this now.
CC'ing Jeff...

Thanks,
Kyrill


Thanks,
bin

2018-03-22  Bin Cheng  

* recog.c (address_operand): Return false on wrong mode for address.
* config/aarch64/aarch64.c (aarch64_classify_address): Remove assert
since it's checked in general code now.

>
> Thanks,
> Richard




Support fused multiply-adds in fully-masked reductions

2018-05-16 Thread Richard Sandiford
This patch adds support for fusing a conditional add or subtract
with a multiplication, so that we can use fused multiply-add and
multiply-subtract operations for fully-masked reductions.  E.g.
for SVE we vectorise:

  double res = 0.0;
  for (int i = 0; i < n; ++i)
res += x[i] * y[i];

using a fully-masked loop in which the loop body has the form:

  res_1 = PHI<0(preheader), res_2(latch)>;
  avec = IFN_MASK_LOAD (loop_mask, a)
  bvec = IFN_MASK_LOAD (loop_mask, b)
  prod = avec * bvec;
  res_2 = IFN_COND_ADD (loop_mask, res_1, prod);

where the last statement does the equivalent of:

  res_2 = loop_mask ? res_1 + prod : res_1;

(operating elementwise).  The point of the patch is to convert the last
two statements into a single internal function that is the equivalent of:

  res_2 = loop_mask ? fma (avec, bvec, res_1) : res_1;

(again operating elementwise).

All current conditional X operations have the form "do X or don't do X
to the first operand" (add/don't add to first operand, etc.).  However,
the FMA optabs and functions are ordered so that the accumulator comes
last.  There were two obvious ways of resolving this: break the
convention for conditional operators and have "add/don't add to the
final operand" or break the convention for FMA and put the accumulator
first.  The patch goes for the latter, but adds _REV to make it obvious
that the operands are in a different order.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-16  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* doc/md.texi (cond_fma_rev, cond_fnma_rev): Document.
* optabs.def (cond_fma_rev, cond_fnma_rev): New optabs.
* internal-fn.def (COND_FMA_REV, COND_FNMA_REV): New internal
functions.
* internal-fn.h (can_interpret_as_conditional_op_p): Declare.
* internal-fn.c (cond_ternary_direct): New macro.
(expand_cond_ternary_optab_fn): Likewise.
(direct_cond_ternary_optab_supported_p): Likewise.
(FOR_EACH_CODE_MAPPING): Likewise.
(get_conditional_internal_fn): Use FOR_EACH_CODE_MAPPING.
(conditional_internal_fn_code): New function.
(can_interpret_as_conditional_op_p): Likewise.
* tree-ssa-math-opts.c (fused_cond_internal_fn): New function.
(convert_mult_to_fma_1): Transform calls to IFN_COND_ADD to
IFN_COND_FMA_REV and calls to IFN_COND_SUB to IFN_COND_FNMA_REV.
(convert_mult_to_fma): Handle calls to IFN_COND_ADD and IFN_COND_SUB.
* genmatch.c (commutative_op): Handle CFN_COND_FMA_REV and
CFN_COND_FNMA_REV.
* config/aarch64/iterators.md (UNSPEC_COND_FMLA): New unspec.
(UNSPEC_COND_FMLS): Likewise.
(optab, sve_fp_op): Handle them.
(SVE_COND_INT_OP): Rename to...
(SVE_COND_INT2_OP): ...this.
(SVE_COND_FP_OP): Rename to...
(SVE_COND_FP2_OP): ...this.
(SVE_COND_FP3_OP): New iterator.
* config/aarch64/aarch64-sve.md (cond_): Update
for new iterator names.  Add a pattern for SVE_COND_FP3_OP.

gcc/testsuite/
* gcc.target/aarch64/sve/reduc_4.c: New test.
* gcc.target/aarch64/sve/reduc_6.c: Likewise.
* gcc.target/aarch64/sve/reduc_7.c: Likewise.

Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi 2018-05-16 10:23:03.590853492 +0100
+++ gcc/doc/md.texi 2018-05-16 10:23:03.886838736 +0100
@@ -6367,6 +6367,32 @@ be in a normal C @samp{?:} condition.
 Operands 0, 2 and 3 all have mode @var{m}, while operand 1 has the mode
 returned by @code{TARGET_VECTORIZE_GET_MASK_MODE}.
 
+@cindex @code{cond_fma_rev@var{mode}} instruction pattern
+@item @samp{cond_fma_rev@var{mode}}
+Similar to @samp{cond_add@var{m}}, but compute:
+@smallexample
+op0 = op1 ? fma (op3, op4, op2) : op2;
+@end smallexample
+for scalars and:
+@smallexample
+op0[I] = op1[I] ? fma (op3[I], op4[I], op2[I]) : op2[I];
+@end smallexample
+for vectors.  The @samp{_rev} indicates that the addend (operand 2)
+comes first.
+
+@cindex @code{cond_fnma_rev@var{mode}} instruction pattern
+@item @samp{cond_fnma_rev@var{mode}}
+Similar to @samp{cond_fma_rev@var{m}}, but negate operand 3 before
+multiplying it.  That is, the instruction performs:
+@smallexample
+op0 = op1 ? fma (-op3, op4, op2) : op2;
+@end smallexample
+for scalars and:
+@smallexample
+op0[I] = op1[I] ? fma (-op3[I], op4[I], op2[I]) : op2[I];
+@end smallexample
+for vectors.
+
 @cindex @code{neg@var{mode}cc} instruction pattern
 @item @samp{neg@var{mode}cc}
 Similar to @samp{mov@var{mode}cc} but for conditional negation.  Conditionally
Index: gcc/optabs.def
===
--- gcc/optabs.def  2018-05-16 10:23:03.590853492 +0100
+++ gcc/optabs.def  2018-05-16 10:23:03.887838686 

Re: RFA (tree.c): PATCH to make warn_deprecated_use return bool

2018-05-16 Thread Richard Biener
On Wed, May 16, 2018 at 3:01 AM Jason Merrill  wrote:

> The function "warning" returns bool to indicated whether or not any
> diagnostic was actually emitted; warn_deprecated_use should as well.

> It's also unnecessary to duplicate the warning code between the cases
> of null or non-null "decl", since the actual warnings were the same.
> The only thing that's different is whether we indicate the source
> location of "decl".

> Tested x86_64-pc-linux-gnu.  OK for trunk?

OK.

Richard.


Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-16 Thread Richard Biener
On Wed, May 16, 2018 at 2:58 AM Jason Merrill  wrote:

> In C++11 and up, the implicitly-declared copy constructor and
> assignment operator are deprecated if one of them, or the destructor,
> is user-provided.  Implementing that in G++ turned up a few dodgy uses
> in the compiler.

> In general it's unsafe to copy an ipa_edge_args, because if one of the
> pointers is non-null you get two copies of a vec pointer, and when one
> of the objects is destroyed it frees the vec and leaves the other
> object pointing to freed memory.  This specific example is safe
> because it only copies from an object with null pointers, but it would
> be better to avoid the copy.  OK for trunk?

> It's unsafe to copy a releasing_vec for the same reason.  There are a
> few places where the copy constructor is nominally used to initialize
> a releasing_vec variable from a temporary returned from a function; in
> these cases no actual copy is done, and the function directly
> initializes the variable, so it's safe.  I made this clearer by
> declaring the copy constructor but not defining it, so uses that get
> elided are accepted, but uses that actually want to copy will fail to
> link.

> In cp_expr we defined the copy constructor to do the same thing that
> the implicit definition would do, causing the copy assignment operator
> to be deprecated.  We don't need the copy constructor, so let's remove
> it.

> Tested x86_64-pc-linux-gnu.  Are the ipa-prop bits OK for trunk?

Yes.

Richard.


Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-16 Thread Kyrill Tkachov


On 16/05/18 10:42, Richard Biener wrote:

On Wed, May 16, 2018 at 10:37 AM Kyrill Tkachov

wrote:



On 15/05/18 10:58, Richard Biener wrote:

On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov

wrote:


Hi all,
This is a respin of James's patch from:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html

The original patch was approved and committed but was later reverted

because of failures on big-endian.

This tweaked version fixes the big-endian failures in

aarch64_expand_vector_init by picking the right

element of VALS to move into the low part of the vector register

depending on endianness. The rest of the patch

stays the same. I'm looking for approval on the aarch64 parts, as they

are the ones that have changed

since the last approved version of the patch.
---
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting
the remaining bits, and simply move the first element to the corrcet
place
(implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two

memory

addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
, to
catch:
  (vec_merge:OUTER
 (vec_duplicate:OUTER x:INNER)
 (subreg:OUTER y:INNER 0)
 (const_int N))
And simplify it to:
  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function,
without requiring the second operand to also be a vec_duplicate.
Bootstrapped and tested on aarch64-none-linux-gnu and tested on
aarch64-none-elf.
Note that this requires
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
if we don't want to ICE creating broken vector zero extends.
Are the non-AArch64 parts OK?

Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
also doesn't handle the swapped operand case.

Otherwise the middle-end parts looks ok.

I don't see any explicit canonicalisation code for it.
I've updated the simplify-rtx part to handle the swapped operand case.
Is the attached patch better in this regard? I couldn't think of a clean

way to avoid

duplicating some logic (beyond creating a new function away from the

callsite).

Works for me.  Were you able to actually create such RTL from testcases?
Segher, do you know where canonicalization rules are documented?
IIRC we do not actively try to canonicalize in most cases.


The documentation we have for RTL canonicalisation is at:
https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html#Insn-Canonicalizations

It doesn't mention anything about vec_merge AFAICS so I couldn't convince 
myself that there
is a canonicalisation that we enforce (though maybe someone can prove me wrong).

Kyrill


Richard.


Thanks,
Kyrill

Thanks,
Richard.


Thanks,
James
---
2018-05-15  James Greenhalgh  
Kyrylo Tkachov  
* config/aarch64/aarch64.c (aarch64_expand_vector_init):

Modify

code generation for cases where splatting a value is not

useful.

* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg

forming a vector

mode to a vec_concat.
2018-05-15  James Greenhalgh  
* gcc.target/aarch64/vect-slp-dup.c: New.




Use conditional internal functions in if-conversion

2018-05-16 Thread Richard Sandiford
This patch uses IFN_COND_* to vectorise conditionally-executed,
potentially-trapping arithmetic, such as most floating-point
ops with -ftrapping-math.  E.g.:

if (cond) { ... x = a + b; ... }

becomes:

...
x = IFN_COND_ADD (cond, a, b);
...

When this transformation is done on its own, the value of x for
!cond isn't important.

However, the patch also looks for the equivalent of:

y = cond ? x : a;

in which the "then" value is the result of the conditionally-executed
operation and the "else" value is the first operand of that operation.
This "else" value is the one guaranteed by IFN_COND_* and so we can
replace y with x.

The patch also adds new conditional functions for multiplication
and division, which previously weren't needed.  This enables an
extra fully-masked reduction (of dubious value) in gcc.dg/vect/pr53773.c.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-16  Richard Sandiford  

gcc/
* internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD): New
internal functions.
* internal-fn.h (vectorized_internal_fn_supported_p): Declare.
* internal-fn.c (FOR_EACH_CODE_MAPPING): Handle IFN_COND_MUL,
IFN_COND_DIV and IFN_COND_MOD.
(get_conditional_internal_fn): Handle RDIV_EXPR.
(can_interpret_as_conditional_op_p): Use RDIV_EXPR for floating-point
divisions.
(internal_fn_mask_index): Handle conditional internal functions.
(vectorized_internal_fn_supported_p): New function.
* optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab)
(cond_udiv_optab, cond_umod_optab): New optabs.
* tree-if-conv.c: Include internal-fn.h.
(any_pred_load_store): Replace with...
(need_to_predicate): ...this new variable.
(redundant_ssa_names): New variable.
(ifcvt_can_use_mask_load_store): Move initial checks to...
(ifcvt_can_predicate): ...this new function.  Handle tree codes
for which a conditional internal function exists.
(if_convertible_gimple_assign_stmt_p): Use ifcvt_can_predicate
instead of ifcvt_can_use_mask_load_store.  Update after variable
name change.
(predicate_load_or_store): New function, split out from
predicate_mem_writes.
(check_redundant_cond_expr, predicate_rhs_code): New functions.
(predicate_mem_writes): Rename to...
(predicate_statements): ...this.  Use predicate_load_or_store
and predicate_rhs_code.
(combine_blocks, tree_if_conversion): Update after above name changes.
(ifcvt_local_dce): Handle redundant_ssa_names.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Handle
general conditional functions.
* tree-vect-stmts.c (vectorizable_call): Likewise.
* config/aarch64/aarch64-sve.md (cond_): New pattern
for SVE_COND_INT2_SD_OP.
* config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_SDIV)
(UNSPEC_UDIV): New unspecs.
(SVE_COND_INT2_OP): Include UNSPEC_MUL.
(SVE_COND_INT2_SD_OP): New int iterator.
(SVE_COND_FP2_OP): Include UNSPEC_MUL and UNSPEC_SDIV.
(optab, sve_int_op): Handle UNSPEC_COND_MUL, UNSPEC_COND_SDIV
and UNSPEC_COND_UDIV.
(sve_fp_op): Handle UNSPEC_COND_MUL and UNSPEC_COND_SDIV.

gcc/testsuite/
* gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using
fully-masked loops with a fixed vector length.
* gcc.target/aarch64/sve/cond_arith_1.c: New test.
* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_2.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.

Index: gcc/internal-fn.def
===
--- gcc/internal-fn.def 2018-05-16 11:06:14.191592902 +0100
+++ gcc/internal-fn.def 2018-05-16 11:06:14.513574219 +0100
@@ -149,6 +149,11 @@ DEF_INTERNAL_OPTAB_FN (COND_FNMA_REV, EC
 
 DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
 DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
+DEF_INTERNAL_OPTAB_FN (COND_MUL, ECF_CONST, cond_smul, cond_binary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (COND_DIV, ECF_CONST, first,
+ cond_sdiv, cond_udiv, cond_binary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MOD, ECF_CONST, first,
+ cond_smod, cond_umod, cond_binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
  cond_smin, cond_umin, cond_binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MAX, ECF_CONST, first,
Index: gcc/internal-fn.h
===
--- gcc/internal-fn.h   2018-05-16 

Implement SLP of internal functions

2018-05-16 Thread Richard Sandiford
SLP of calls was previously restricted to built-in functions.
This patch extends it to internal functions.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-16  Richard Sandiford  

gcc/
* internal-fn.h (vectorizable_internal_fn_p): New function.
* tree-vect-slp.c (compatible_calls_p): Likewise.
(vect_build_slp_tree_1): Remove nops argument.  Handle calls
to internal functions.
(vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_arith_4.c: New test.
* gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_5.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.
* gcc.target/aarch64/sve/slp_14.c: Likewise.
* gcc.target/aarch64/sve/slp_14_run.c: Likewise.

Index: gcc/internal-fn.h
===
--- gcc/internal-fn.h   2018-05-16 11:06:14.513574219 +0100
+++ gcc/internal-fn.h   2018-05-16 11:12:11.872116220 +0100
@@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn)
   return direct_internal_fn_array[fn].type0 >= -1;
 }
 
+/* Return true if FN is a direct internal function that can be vectorized by
+   converting the return type and all argument types to vectors of the same
+   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an
+   IFN_SQRT on vectors of N floats.  */
+
+inline bool
+vectorizable_internal_fn_p (internal_fn fn)
+{
+  return direct_internal_fn_array[fn].vectorizable;
+}
+
 /* Return optab information about internal function FN.  Only meaningful
if direct_internal_fn_p (FN).  */
 
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100
+++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100
@@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v
   return 0;
 }
 
+/* Return true if call statements CALL1 and CALL2 are similar enough
+   to be combined into the same SLP group.  */
+
+static bool
+compatible_calls_p (gcall *call1, gcall *call2)
+{
+  unsigned int nargs = gimple_call_num_args (call1);
+  if (nargs != gimple_call_num_args (call2))
+return false;
+
+  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))
+return false;
+
+  if (gimple_call_internal_p (call1))
+{
+  if (TREE_TYPE (gimple_call_lhs (call1))
+ != TREE_TYPE (gimple_call_lhs (call2)))
+   return false;
+  for (unsigned int i = 0; i < nargs; ++i)
+   if (TREE_TYPE (gimple_call_arg (call1, i))
+   != TREE_TYPE (gimple_call_arg (call2, i)))
+ return false;
+}
+  else
+{
+  if (!operand_equal_p (gimple_call_fn (call1),
+   gimple_call_fn (call2), 0))
+   return false;
+
+  if (gimple_call_fntype (call1) != gimple_call_fntype (call2))
+   return false;
+}
+  return true;
+}
+
 /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
caller's attempt to find the vector type in STMT with the narrowest
element type.  Return true if VECTYPE is nonnull and if it is valid
@@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo,
 static bool
 vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   vec stmts, unsigned int group_size,
-  unsigned nops, poly_uint64 *max_nunits,
-  bool *matches, bool *two_operators)
+  poly_uint64 *max_nunits, bool *matches,
+  bool *two_operators)
 {
   unsigned int i;
   gimple *first_stmt = stmts[0], *stmt = stmts[0];
@@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo,
   if (gcall *call_stmt = dyn_cast  (stmt))
{
  rhs_code = CALL_EXPR;
- if (gimple_call_internal_p (call_stmt)
+ if ((gimple_call_internal_p (call_stmt)
+  && (!vectorizable_internal_fn_p
+  (gimple_call_internal_fn (call_stmt
  || gimple_call_tail_p (call_stmt)
  || gimple_call_noreturn_p (call_stmt)
  || !gimple_call_nothrow_p (call_stmt)
@@ -833,11 +870,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  if (rhs_code == CALL_EXPR)
{
  gimple *first_stmt = stmts[0];
- if (gimple_call_num_args (stmt) != nops
- || !operand_equal_p (gimple_call_fn (first_stmt),
-  gimple_call_fn (stmt), 0)
- || gimple_call_fntype (first_stmt)
-!= gimple_call_fntype (stmt))
+ if (!compatible_calls_p (as_a  (first_stmt),
+  as_a  (stmt)))
{
  if (dump_enabled_p ())
{
@@ -1166,8 +1200,7 @@ 

Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-05-16 Thread Kyrill Tkachov


On 15/05/18 12:12, Luis Machado wrote:

Hi,

On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:

Hi Luis,

On 14/05/18 22:18, Luis Machado wrote:

Hi,

Here's an updated version of the patch (now reverted) that addresses the 
previous bootstrap problem (signedness and long long/int conversion).

I've checked that it bootstraps properly on both aarch64-linux and x86_64-linux 
and that tests look sane.

James, would you please give this one a try to see if you can still reproduce 
PR85682? I couldn't reproduce it in multiple attempts.



The patch doesn't hit the regressions in PR85682 from what I can see.
I have a comment on the patch below.



Great. Thanks for checking Kyrill.


--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
  static bool
  should_issue_prefetch_p (struct mem_ref *ref)
  {
+  /* Some processors may have a hardware prefetcher that may conflict with
+ prefetch hints for a range of strides.  Make sure we don't issue
+ prefetches for such cases if the stride is within this particular
+ range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+  && abs_hwi (int_cst_value (ref->group->step)) <
+  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+{

The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.


I've fixed this locally now.


Thanks. I haven't followed the patch in detail, are you looking for midend 
changes approval since the last version?
Or do you need aarch64 approval?

Kyrill



Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-16 Thread Richard Biener
On Wed, May 16, 2018 at 10:37 AM Kyrill Tkachov

wrote:


> On 15/05/18 10:58, Richard Biener wrote:
> > On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov
> > 
> > wrote:
> >
> >> Hi all,
> >> This is a respin of James's patch from:
> > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> >> The original patch was approved and committed but was later reverted
> > because of failures on big-endian.
> >> This tweaked version fixes the big-endian failures in
> > aarch64_expand_vector_init by picking the right
> >> element of VALS to move into the low part of the vector register
> > depending on endianness. The rest of the patch
> >> stays the same. I'm looking for approval on the aarch64 parts, as they
> > are the ones that have changed
> >> since the last approved version of the patch.
> >> ---
> >> In the testcase in this patch we create an SLP vector with only two
> >> elements. Our current vector initialisation code will first duplicate
> >> the first element to both lanes, then overwrite the top lane with a new
> >> value.
> >> This duplication can be clunky and wasteful.
> >> Better would be to simply use the fact that we will always be
> >> overwriting
> >> the remaining bits, and simply move the first element to the corrcet
> >> place
> >> (implicitly zeroing all other bits).
> >> This reduces the code generation for this case, and can allow more
> >> efficient addressing modes, and other second order benefits for AArch64
> >> code which has been vectorized to V2DI mode.
> >> Note that the change is generic enough to catch the case for any vector
> >> mode, but is expected to be most useful for 2x64-bit vectorization.
> >> Unfortunately, on its own, this would cause failures in
> >> gcc.target/aarch64/load_v2vec_lanes_1.c and
> >> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> >> vec_merge and vec_duplicate for their simplifications to apply. To fix
> >> this,
> >> add a special case to the AArch64 code if we are loading from two
memory
> >> addresses, and use the load_pair_lanes patterns directly.
> >> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
> >> , to
> >> catch:
> >>  (vec_merge:OUTER
> >> (vec_duplicate:OUTER x:INNER)
> >> (subreg:OUTER y:INNER 0)
> >> (const_int N))
> >> And simplify it to:
> >>  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
> >> This is similar to the existing patterns which are tested in this
> >> function,
> >> without requiring the second operand to also be a vec_duplicate.
> >> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> >> aarch64-none-elf.
> >> Note that this requires
> >> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> >> if we don't want to ICE creating broken vector zero extends.
> >> Are the non-AArch64 parts OK?
> > Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
> > you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
> > also doesn't handle the swapped operand case.
> >
> > Otherwise the middle-end parts looks ok.

> I don't see any explicit canonicalisation code for it.
> I've updated the simplify-rtx part to handle the swapped operand case.
> Is the attached patch better in this regard? I couldn't think of a clean
way to avoid
> duplicating some logic (beyond creating a new function away from the
callsite).

Works for me.  Were you able to actually create such RTL from testcases?
Segher, do you know where canonicalization rules are documented?
IIRC we do not actively try to canonicalize in most cases.

Richard.

> Thanks,
> Kyrill

> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> James
> >> ---
> >> 2018-05-15  James Greenhalgh  
> >>Kyrylo Tkachov  
> >>* config/aarch64/aarch64.c (aarch64_expand_vector_init):
Modify
> >>code generation for cases where splatting a value is not
useful.
> >>* simplify-rtx.c (simplify_ternary_operation): Simplify
> >>vec_merge across a vec_duplicate and a paradoxical subreg
> > forming a vector
> >>mode to a vec_concat.
> >> 2018-05-15  James Greenhalgh  
> >>* gcc.target/aarch64/vect-slp-dup.c: New.


Re: [PR63185][RFC] Improve DSE with branches

2018-05-16 Thread Richard Biener
On Tue, 15 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
> 
> > On Tue, 15 May 2018, Richard Biener wrote:
> > 
> > > On Tue, 15 May 2018, Richard Biener wrote:
> > > 
> > > > On Tue, 15 May 2018, Richard Biener wrote:
> > > > 
> > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> > > > > > We could see the PHI and if there isn't any uses for PHI that is
> > > > > > interesting, we could ignore that ?
> > > > > > 
> > > > > > Bootstrapped and regression tested on x86_64-linux-gnu.
> > > > > > Is this OK?
> > > > > 
> > > > > No, as Jeff said we can't do it this way.
> > > > > 
> > > > > If we end up with multiple VDEFs in the walk of defvar immediate
> > > > > uses we know we are dealing with a CFG fork.  We can't really
> > > > > ignore any of the paths but we have to
> > > > > 
> > > > >  a) find the merge point (and the associated VDEF)
> > > > >  b) verify for each each chain of VDEFs with associated VUSEs
> > > > > up to that merge VDEF that we have no uses of the to classify
> > > > > store and collect (partial) kills
> > > > >  c) intersect kill info and continue walking from the merge point
> > > > > 
> > > > > in b) there's the optional possibility to find sinking opportunities
> > > > > in case we have kills on some paths but uses on others.  This is why
> > > > > DSE should be really merged with (store) sinking.
> > > > > 
> > > > > So if we want to enhance DSEs handling of branches then we need
> > > > > to refactor the simple dse_classify_store function.  Let me take
> > > > > an attempt at this today.
> > > > 
> > > > First (baby) step is the following - it arranges to collect the
> > > > defs we need to continue walking from and implements trivial
> > > > reduction by stopping at (full) kills.  This allows us to handle
> > > > the new testcase (which was already handled in the very late DSE
> > > > pass with the help of sinking the store).
> > > > 
> > > > I took the opportunity to kill the use_stmt parameter of
> > > > dse_classify_store as the only user is only looking for whether
> > > > the kills were all clobbers which I added a new parameter for.
> > > > 
> > > > I didn't adjust the byte-tracking case fully (I'm not fully 
> > > > understanding
> > > > the code in the case of a use and I'm not sure whether it's worth
> > > > doing the def reduction with byte-tracking).
> > > > 
> > > > Your testcase can be handled by reducing the PHI and the call def
> > > > by seeing that the only use of a candidate def is another def
> > > > we have already processed.  Not sure if worth special-casing though,
> > > > I'd rather have a go at "recursing".  That will be the next
> > > > patch.
> > > > 
> > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > Applied.
> > > 
> > > Another intermediate one below, fixing the byte-tracking for
> > > stmt with uses.  This also re-does the PHI handling by simply
> > > avoiding recursion by means of a visited bitmap and stopping
> > > at the DSE classify stmt when re-visiting it instead of failing.
> > > This covers Pratamesh loop case for which I added ssa-dse-33.c.
> > > For the do-while loop this still runs into the inability to
> > > handle two defs to walk from.
> > > 
> > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > 
> > Ok, loop handling doesn't work in general since we run into the
> > issue that SSA form across the backedge is not representing the
> > same values.  Consider
> > 
> >  
> >  # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
> >  # n_20 = PHI <0(2), n_7(4)>
> >  # .MEM_13 = VDEF <.MEM_22>
> >  bytes[n_20] = _4;
> >  if (n_20 > 7)
> >goto ;
> > 
> >  
> >  n_7 = n_20 + 1;
> >  # .MEM_15 = VDEF <.MEM_13>
> >  bytes[n_20] = _5;
> >  goto ;
> > 
> > then when classifying the store in bb4, visiting the PHI node
> > gets us to the store in bb3 which appears to be killing.
> > 
> >if (gimple_code (temp) == GIMPLE_PHI)
> > -   defvar = PHI_RESULT (temp);
> > +   {
> > + /* If we visit this PHI by following a backedge then reset
> > +any info in ref that may refer to SSA names which we'd need
> > +to PHI translate.  */
> > + if (gimple_bb (temp) == gimple_bb (stmt)
> > + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
> > +gimple_bb (temp)))
> > +   /* ???  ref->ref may not refer to SSA names or it may only
> > +  refer to SSA names that are invariant with respect to the
> > +  loop represented by this PHI node.  */
> > +   ref->ref = NULL_TREE;
> > + defvar = PHI_RESULT (temp);
> > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
> > +   }
> > 
> > should be a workable solution for that.  I'm checking that, but
> > eventually you can think of other things that might prevent 

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-16 Thread Kyrill Tkachov


On 15/05/18 10:58, Richard Biener wrote:

On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov

wrote:


Hi all,
This is a respin of James's patch from:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html

The original patch was approved and committed but was later reverted

because of failures on big-endian.

This tweaked version fixes the big-endian failures in

aarch64_expand_vector_init by picking the right

element of VALS to move into the low part of the vector register

depending on endianness. The rest of the patch

stays the same. I'm looking for approval on the aarch64 parts, as they

are the ones that have changed

since the last approved version of the patch.
---
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting
the remaining bits, and simply move the first element to the corrcet
place
(implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
, to
catch:
 (vec_merge:OUTER
(vec_duplicate:OUTER x:INNER)
(subreg:OUTER y:INNER 0)
(const_int N))
And simplify it to:
 (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function,
without requiring the second operand to also be a vec_duplicate.
Bootstrapped and tested on aarch64-none-linux-gnu and tested on
aarch64-none-elf.
Note that this requires
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
if we don't want to ICE creating broken vector zero extends.
Are the non-AArch64 parts OK?

Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
also doesn't handle the swapped operand case.

Otherwise the middle-end parts looks ok.


I don't see any explicit canonicalisation code for it.
I've updated the simplify-rtx part to handle the swapped operand case.
Is the attached patch better in this regard? I couldn't think of a clean way to 
avoid
duplicating some logic (beyond creating a new function away from the callsite).

Thanks,
Kyrill


Thanks,
Richard.


Thanks,
James
---
2018-05-15  James Greenhalgh  
   Kyrylo Tkachov  
   * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
   code generation for cases where splatting a value is not useful.
   * simplify-rtx.c (simplify_ternary_operation): Simplify
   vec_merge across a vec_duplicate and a paradoxical subreg

forming a vector

   mode to a vec_concat.
2018-05-15  James Greenhalgh  
   * gcc.target/aarch64/vect-slp-dup.c: New.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2003fe52875f1653d644347bafd7773d1f01e91..6bf6c05535b61eef1021d46bcd8448fb3a0b25f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13916,9 +13916,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	maxv = matches[i][1];
 	  }
 
-  /* Create a duplicate of the most common element.  */
-  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+  /* Create a duplicate of the most common element, unless all elements
+	 are equally useless to us, in which case just immediately set the
+	 vector register using the first element.  */
+
+  if (maxv == 1)
+	{
+	  /* For vectors of two 64-bit elements, we can do even better.  */
+	  if (n_elts == 2
+	  && (inner_mode == E_DImode
+		  || inner_mode == E_DFmode))
+
+	{
+	  rtx x0 = XVECEXP (vals, 0, 0);
+	  rtx x1 = XVECEXP (vals, 0, 1);
+	  /* Combine can pick up this case, but handling it directly
+		 here leaves clearer RTL.
+
+		 This is load_pair_lanes, and also gives us a clean-up
+		 for store_pair_lanes.  */
+	  if (memory_operand (x0, 

[PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Martin Liška
Hi.

I consider it handy sometimes to trigger just a single invocation of
an optimization driven by a debug counter. Doing that one needs to
be able to limit both lower and upper limit of a counter. It's implemented
in the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-05-15  Martin Liska  

* dbgcnt.c (limit_low): Renamed from limit.
(limit_high): New variable.
(dbg_cnt_is_enabled): Check for upper limit.
(dbg_cnt): Adjust dumping.
(dbg_cnt_set_limit_by_index): Add new argument for high
value.
(dbg_cnt_set_limit_by_name): Likewise.
(dbg_cnt_process_single_pair): Parse new format.
(dbg_cnt_process_opt): Use strtok.
(dbg_cnt_list_all_counters): Remove 'value' and add
'limit_high'.
* doc/invoke.texi: Document changes.

gcc/testsuite/ChangeLog:

2018-05-15  Martin Liska  

* gcc.dg/ipa/ipa-icf-39.c: New test.
* gcc.dg/pr68766.c: Adjust pruned output.
---
 gcc/common.opt|   2 +-
 gcc/dbgcnt.c  | 113 ++
 gcc/doc/invoke.texi   |   9 +--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c |  33 ++
 gcc/testsuite/gcc.dg/pr68766.c|   2 +-
 5 files changed, 112 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c


diff --git a/gcc/common.opt b/gcc/common.opt
index d6ef85928f3..d2f8736a62d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1171,7 +1171,7 @@ List all available debugging counters with their limits and counts.
 
 fdbg-cnt=
 Common RejectNegative Joined Var(common_deferred_options) Defer
--fdbg-cnt=:[,:,...]	Set the debug counter limit.
+-fdbg-cnt=[:]:[,::,...]	Set the debug counter limit.
 
 fdebug-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 96b3df28f5e..13d0ad8190a 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -41,53 +41,72 @@ static struct string2counter_map map[debug_counter_number_of_counters] =
 #undef DEBUG_COUNTER
 
 #define DEBUG_COUNTER(a) UINT_MAX,
-static unsigned int limit[debug_counter_number_of_counters] =
+static unsigned int limit_high[debug_counter_number_of_counters] =
 {
 #include "dbgcnt.def"
 };
 #undef DEBUG_COUNTER
 
+#define DEBUG_COUNTER(a) 1,
+static unsigned int limit_low[debug_counter_number_of_counters] =
+{
+#include "dbgcnt.def"
+};
+#undef DEBUG_COUNTER
+
+
 static unsigned int count[debug_counter_number_of_counters];
 
 bool
 dbg_cnt_is_enabled (enum debug_counter index)
 {
-  return count[index] <= limit[index];
+  return limit_low[index] <= count[index] && count[index] <= limit_high[index];
 }
 
 bool
 dbg_cnt (enum debug_counter index)
 {
   count[index]++;
-  if (dump_file && count[index] == limit[index])
-fprintf (dump_file, "***dbgcnt: limit reached for %s.***\n",
-	 map[index].name);
+
+  if (dump_file)
+{
+  /* Do not print the info for default lower limit.  */
+  if (count[index] == limit_low[index] && limit_low[index] > 1)
+	fprintf (dump_file, "***dbgcnt: lower limit %d reached for %s.***\n",
+		 limit_low[index], map[index].name);
+  else if (count[index] == limit_high[index])
+	fprintf (dump_file, "***dbgcnt: upper limit %d reached for %s.***\n",
+		 limit_high[index], map[index].name);
+}
 
   return dbg_cnt_is_enabled (index);
 }
 
-
 static void
-dbg_cnt_set_limit_by_index (enum debug_counter index, int value)
+dbg_cnt_set_limit_by_index (enum debug_counter index, int low, int high)
 {
-  limit[index] = value;
+  limit_low[index] = low;
+  limit_high[index] = high;
 
-  fprintf (stderr, "dbg_cnt '%s' set to %d\n", map[index].name, value);
+  fprintf (stderr, "dbg_cnt '%s' set to %d-%d\n", map[index].name, low, high);
 }
 
 static bool
-dbg_cnt_set_limit_by_name (const char *name, int len, int value)
+dbg_cnt_set_limit_by_name (const char *name, int low, int high)
 {
+  if (high < low)
+error ("-fdbg-cnt=%s:%d:%d has smaller upper limit than the lower",
+	   name, low, high);
+
   int i;
   for (i = debug_counter_number_of_counters - 1; i >= 0; i--)
-if (strncmp (map[i].name, name, len) == 0
-&& map[i].name[len] == '\0')
+if (strcmp (map[i].name, name) == 0)
   break;
 
   if (i < 0)
 return false;
 
-  dbg_cnt_set_limit_by_index ((enum debug_counter) i, value);
+  dbg_cnt_set_limit_by_index ((enum debug_counter) i, low, high);
   return true;
 }
 
@@ -96,42 +115,53 @@ dbg_cnt_set_limit_by_name (const char *name, int len, int value)
Returns NULL if there's no valid pair is found.
Otherwise returns a pointer to the end of the pair. */
 
-static const char *
+static bool
 dbg_cnt_process_single_pair (const char *arg)
 {
-   const char *colon = strchr (arg, ':');
-   char *endptr = NULL;
-   int value;
-
-   if (colon == NULL)
- return NULL;
-
-   

Re: [PATCH ARM] Fix armv8-m multilib build failure with stdint.h

2018-05-16 Thread Jérôme Lambourg
Hello Kyrill,

> Thanks for the patch! To validate it your changes you can also look at the 
> disassembly
> of the cmse.c binary in the build tree. If the binary changes with your patch 
> then that
> would indicate some trouble.

Good idea. So I just did that and the assembly of both objects are identical
(before and after the patch).

> There are places in arm_cmse.h that use intptr_t. You should replace those as 
> well.
> Look for the cmse_nsfptr_create and cmse_is_nsfptr macros...

Indeed, good catch. I did not see those as this part is not included in the 
armv8-m.

Below the updated patch and modified changelog.

2018-05-16  Jerome Lambourg  
gcc/
* config/arm/arm_cmse.h (cmse_nsfptr_create, cmse_is_nsfptr): Remove
#include . Replace intptr_t with __INTPTR_TYPE__.

libgcc/
* config/arm/cmse.c (cmse_check_address_range): Replace
UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__.



gcc.patch
Description: Binary data


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-16 Thread Richard Earnshaw (lists)
On 15/05/18 17:58, Wilco Dijkstra wrote:
> Hi,
> 
>> Which doesn't appear to have been approved.  Did you follow up with Jeff?
> 
> I'll get back to that one at some point - it'll take some time to agree on a 
> way
> forward with the callback.
> 
> Wilco
> 
> 

So it seems to me that this should then be queued until that is resolved.

R.


Re: [PR63185][RFC] Improve DSE with branches

2018-05-16 Thread Richard Biener
On Wed, 16 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
> 
> > On Tue, 15 May 2018, Richard Biener wrote:
> > 
> > > On Tue, 15 May 2018, Richard Biener wrote:
> > > 
> > > > On Tue, 15 May 2018, Richard Biener wrote:
> > > > 
> > > > > On Tue, 15 May 2018, Richard Biener wrote:
> > > > > 
> > > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Attached patch handles PR63185 when we reach PHI with temp != 
> > > > > > > NULLL.
> > > > > > > We could see the PHI and if there isn't any uses for PHI that is
> > > > > > > interesting, we could ignore that ?
> > > > > > > 
> > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu.
> > > > > > > Is this OK?
> > > > > > 
> > > > > > No, as Jeff said we can't do it this way.
> > > > > > 
> > > > > > If we end up with multiple VDEFs in the walk of defvar immediate
> > > > > > uses we know we are dealing with a CFG fork.  We can't really
> > > > > > ignore any of the paths but we have to
> > > > > > 
> > > > > >  a) find the merge point (and the associated VDEF)
> > > > > >  b) verify for each each chain of VDEFs with associated VUSEs
> > > > > > up to that merge VDEF that we have no uses of the to classify
> > > > > > store and collect (partial) kills
> > > > > >  c) intersect kill info and continue walking from the merge point
> > > > > > 
> > > > > > in b) there's the optional possibility to find sinking opportunities
> > > > > > in case we have kills on some paths but uses on others.  This is why
> > > > > > DSE should be really merged with (store) sinking.
> > > > > > 
> > > > > > So if we want to enhance DSEs handling of branches then we need
> > > > > > to refactor the simple dse_classify_store function.  Let me take
> > > > > > an attempt at this today.
> > > > > 
> > > > > First (baby) step is the following - it arranges to collect the
> > > > > defs we need to continue walking from and implements trivial
> > > > > reduction by stopping at (full) kills.  This allows us to handle
> > > > > the new testcase (which was already handled in the very late DSE
> > > > > pass with the help of sinking the store).
> > > > > 
> > > > > I took the opportunity to kill the use_stmt parameter of
> > > > > dse_classify_store as the only user is only looking for whether
> > > > > the kills were all clobbers which I added a new parameter for.
> > > > > 
> > > > > I didn't adjust the byte-tracking case fully (I'm not fully 
> > > > > understanding
> > > > > the code in the case of a use and I'm not sure whether it's worth
> > > > > doing the def reduction with byte-tracking).
> > > > > 
> > > > > Your testcase can be handled by reducing the PHI and the call def
> > > > > by seeing that the only use of a candidate def is another def
> > > > > we have already processed.  Not sure if worth special-casing though,
> > > > > I'd rather have a go at "recursing".  That will be the next
> > > > > patch.
> > > > > 
> > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > > 
> > > > Applied.
> > > > 
> > > > Another intermediate one below, fixing the byte-tracking for
> > > > stmt with uses.  This also re-does the PHI handling by simply
> > > > avoiding recursion by means of a visited bitmap and stopping
> > > > at the DSE classify stmt when re-visiting it instead of failing.
> > > > This covers Pratamesh loop case for which I added ssa-dse-33.c.
> > > > For the do-while loop this still runs into the inability to
> > > > handle two defs to walk from.
> > > > 
> > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > Ok, loop handling doesn't work in general since we run into the
> > > issue that SSA form across the backedge is not representing the
> > > same values.  Consider
> > > 
> > >  
> > >  # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
> > >  # n_20 = PHI <0(2), n_7(4)>
> > >  # .MEM_13 = VDEF <.MEM_22>
> > >  bytes[n_20] = _4;
> > >  if (n_20 > 7)
> > >goto ;
> > > 
> > >  
> > >  n_7 = n_20 + 1;
> > >  # .MEM_15 = VDEF <.MEM_13>
> > >  bytes[n_20] = _5;
> > >  goto ;
> > > 
> > > then when classifying the store in bb4, visiting the PHI node
> > > gets us to the store in bb3 which appears to be killing.
> > > 
> > >if (gimple_code (temp) == GIMPLE_PHI)
> > > -   defvar = PHI_RESULT (temp);
> > > +   {
> > > + /* If we visit this PHI by following a backedge then reset
> > > +any info in ref that may refer to SSA names which we'd need
> > > +to PHI translate.  */
> > > + if (gimple_bb (temp) == gimple_bb (stmt)
> > > + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
> > > +gimple_bb (temp)))
> > > +   /* ???  ref->ref may not refer to SSA names or it may only
> > > +  refer to SSA names that are invariant with respect to the
> > > +  loop represented by this PHI node.  */
> > > +   

Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Martin Liška
On 05/16/2018 03:39 PM, Alexander Monakov wrote:
> On Wed, 16 May 2018, Martin Liška wrote:
>>> Hm, is the off-by-one in the new explanatory text really intended? I think
>>> the previous text was accurate, and the new text should say "9th and 10th"
>>> and then "first 10 invocations", unless I'm missing something?
>>
>> I've reconsidered that once more time and having zero-based values:
>> * -fdbg-cnt=event:N - trigger event N-times
>> * -fdbg-cnt=event:N:(N+M) - skip even N-times and then enable it M-1 times
>>
>> Does that make sense?
> 
> Yes, I like this, but I think the implementation does not match. New docs say:
> 
>> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
>> -@code{dbg_cnt(dce)} returns true only for first 10 invocations.
>> +For example, with @option{-fdbg-cnt=dce:2:4,tail_call:10},
>> +@code{dbg_cnt(dce)} returns true only for third and fourth invocation.
>> +For @code{dbg_cnt(tail_call)} true is returned for first 10 invocations.
> 
> which is good, but the implementation reads:
> 
>>  bool
>>  dbg_cnt_is_enabled (enum debug_counter index)
>>  {
>> -  return count[index] <= limit[index];
>> +  unsigned v = count[index];
>> +  return v >= limit_low[index] && v < limit_high[index];
>>  }
> 
> which I believe is misaligned with the docs' intention. It should be the
> other way around:
> 
>   return v > limit_low[index] && v <= limit_high[index];

Note that I changed count[index]++ to happen after dbg_cnt_is_enabled. I'm 
reverting that
and now it works fine with your condition.

Martin

> 
> Alexander
> 

>From 8d21c709fdc956f951454e910557cd86f73d6a98 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 15 May 2018 15:04:30 +0200
Subject: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

gcc/ChangeLog:

2018-05-15  Martin Liska  

	* dbgcnt.c (limit_low): Renamed from limit.
	(limit_high): New variable.
	(dbg_cnt_is_enabled): Check for upper limit.
	(dbg_cnt): Adjust dumping.
	(dbg_cnt_set_limit_by_index): Add new argument for high
	value.
	(dbg_cnt_set_limit_by_name): Likewise.
	(dbg_cnt_process_single_pair): Parse new format.
	(dbg_cnt_process_opt): Use strtok.
	(dbg_cnt_list_all_counters): Remove 'value' and add
	'limit_high'.
	* doc/invoke.texi: Document changes.

gcc/testsuite/ChangeLog:

2018-05-15  Martin Liska  

	* gcc.dg/ipa/ipa-icf-39.c: New test.
	* gcc.dg/pr68766.c: Adjust pruned output.
---
 gcc/common.opt|   2 +-
 gcc/dbgcnt.c  | 125 +++---
 gcc/doc/invoke.texi   |  13 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c |  33 +
 gcc/testsuite/gcc.dg/pr68766.c|   2 +-
 5 files changed, 127 insertions(+), 48 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c

diff --git a/gcc/common.opt b/gcc/common.opt
index d6ef85928f3..13ab5c65d43 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1171,7 +1171,7 @@ List all available debugging counters with their limits and counts.
 
 fdbg-cnt=
 Common RejectNegative Joined Var(common_deferred_options) Defer
--fdbg-cnt=:[,:,...]	Set the debug counter limit.
+-fdbg-cnt=[:]:[,:...]	Set the debug counter limit.
 
 fdebug-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 96b3df28f5e..ddb0e8e76d9 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -41,53 +41,84 @@ static struct string2counter_map map[debug_counter_number_of_counters] =
 #undef DEBUG_COUNTER
 
 #define DEBUG_COUNTER(a) UINT_MAX,
-static unsigned int limit[debug_counter_number_of_counters] =
+static unsigned int limit_high[debug_counter_number_of_counters] =
 {
 #include "dbgcnt.def"
 };
 #undef DEBUG_COUNTER
 
+static unsigned int limit_low[debug_counter_number_of_counters];
+
 static unsigned int count[debug_counter_number_of_counters];
 
 bool
 dbg_cnt_is_enabled (enum debug_counter index)
 {
-  return count[index] <= limit[index];
+  unsigned v = count[index];
+  return v > limit_low[index] && v <= limit_high[index];
 }
 
 bool
 dbg_cnt (enum debug_counter index)
 {
   count[index]++;
-  if (dump_file && count[index] == limit[index])
-fprintf (dump_file, "***dbgcnt: limit reached for %s.***\n",
-	 map[index].name);
+
+  if (dump_file)
+{
+  /* Do not print the info for default lower limit.  */
+  if (count[index] == limit_low[index] && limit_low[index] > 0)
+	fprintf (dump_file, "***dbgcnt: lower limit %d reached for %s.***\n",
+		 limit_low[index], map[index].name);
+  else if (count[index] == limit_high[index])
+	fprintf (dump_file, "***dbgcnt: upper limit %d reached for %s.***\n",
+		 limit_high[index], map[index].name);
+}
 
   return dbg_cnt_is_enabled (index);
 }
 
-
 static void
-dbg_cnt_set_limit_by_index (enum debug_counter index, int value)
+dbg_cnt_set_limit_by_index (enum debug_counter index, int low, int high)
 {
-  limit[index] = value;
+  limit_low[index] = low;
+  

[PATCH PR85793]Fix ICE by loading vector(1) scalara_type for 1 element-wise case

2018-05-16 Thread Bin Cheng
Hi,
This patch fixes ICE by loading vector(1) scalar_type if it's 1 element-wise 
for VMAT_ELEMENTWISE.
Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?

Thanks,
bin
2018-05-16  Bin Cheng  
Richard Biener  

PR tree-optimization/85793
* tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise load
for VMAT_ELEMENTWISE.

gcc/testsuite
2018-05-16  Bin Cheng  

PR tree-optimization/85793
* gcc.dg/vect/pr85793.c: New test.From 85ef7f0c6ee0cb89804f1cd9d5a39ba26f8aaba3 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 16 May 2018 14:30:06 +0100
Subject: [PATCH] pr85793-20180515

---
 gcc/testsuite/gcc.dg/vect/pr85793.c | 12 
 gcc/tree-vect-stmts.c   |  4 
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr85793.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr85793.c b/gcc/testsuite/gcc.dg/vect/pr85793.c
new file mode 100644
index 000..9b5d518
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr85793.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_perm } */
+
+int a, c, d;
+long b[6];
+void fn1() {
+  for (; a < 2; a++) {
+c = 0;
+for (; c <= 5; c++)
+  d &= b[a * 3];
+  }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 1e8ccbc..64a157d 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7662,6 +7662,10 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 	}
 	  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
 	}
+  /* Load vector(1) scalar_type if it's 1 element-wise vectype.  */
+  else if (nloads == 1)
+	ltype = vectype;
+
   if (slp)
 	{
 	  /* For SLP permutation support we need to load the whole group,
-- 
1.9.1



C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Marek Polacek
This PR has been on my mind for quite some time but I've not found a solution
that I'd like.  Maybe one of you can think of something better.

The problem in this test is that in C++11, .eh optimizes out the catch,
so the exception is never caught.  That is because lower_catch doesn't
think that the region may throw (eh_region_may_contain_throw).  That's
so because P::P () is marked as TREE_NOTHROW, which is wrong, because
it calls X::X() which calls init() with throw.  TREE_NOTHROW is set in
finish_function:

  /* If this function can't throw any exceptions, remember that.  */
  if (!processing_template_decl
  && !cp_function_chain->can_throw
  && !flag_non_call_exceptions
  && !decl_replaceable_p (fndecl))
TREE_NOTHROW (fndecl) = 1;

P::P() should have been marked as can_throw in set_flags_from_callee, but when
processing X::X() cfun is null, so we can't set it.  P::P() is created only
later via implicitly_declare_fn.  So one way to fix it would be to remember
that the class has a subobject whose constructor may throw, so the class's
constructor can throw, too.  I added a test with more nested classes, too.

I dislike adding a new flag for this scenario; anybody see a better way to
approach this?

Bootstrapped/regtested on x86_64-linux.

2018-05-16  Marek Polacek  

PR c++/85363
* cp-tree.h (struct lang_type): Add ctor_may_throw.
(CLASSTYPE_CTOR_MAY_THROW): New.
* call.c (set_flags_from_callee): Set it.
* decl.c (finish_function): Check it.

* g++.dg/cpp0x/initlist-throw1.C: New test.
* g++.dg/cpp0x/initlist-throw2.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 09a3618b007..f839d943443 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -332,6 +332,8 @@ set_flags_from_callee (tree call)
 
   if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
 cp_function_chain->can_throw = 1;
+  else if (!nothrow && at_class_scope_p () && decl && DECL_CONSTRUCTOR_P 
(decl))
+CLASSTYPE_CTOR_MAY_THROW (current_class_type) = true;
 
   if (decl && TREE_THIS_VOLATILE (decl) && cfun && cp_function_chain)
 current_function_returns_abnormally = 1;
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index cab926028b8..7a781a9a8a3 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -2025,6 +2025,7 @@ struct GTY(()) lang_type {
   unsigned has_constexpr_ctor : 1;
   unsigned unique_obj_representations : 1;
   unsigned unique_obj_representations_set : 1;
+  unsigned ctor_may_throw : 1;
 
   /* When adding a flag here, consider whether or not it ought to
  apply to a template instance if it applies to the template.  If
@@ -2033,7 +2034,7 @@ struct GTY(()) lang_type {
   /* There are some bits left to fill out a 32-bit word.  Keep track
  of this by updating the size of this bitfield whenever you add or
  remove a flag.  */
-  unsigned dummy : 4;
+  unsigned dummy : 3;
 
   tree primary_base;
   vec *vcall_indices;
@@ -2105,6 +2106,10 @@ struct GTY(()) lang_type {
 #define CLASSTYPE_LAZY_DESTRUCTOR(NODE) \
   (LANG_TYPE_CLASS_CHECK (NODE)->lazy_destructor)
 
+/* Nonzero means that NODE (a class type) has a constructor that can throw.  */
+#define CLASSTYPE_CTOR_MAY_THROW(NODE) \
+  (LANG_TYPE_CLASS_CHECK (NODE)->ctor_may_throw)
+
 /* Nonzero means that NODE (a class type) is final */
 #define CLASSTYPE_FINAL(NODE) \
   TYPE_FINAL_P (NODE)
diff --git gcc/cp/decl.c gcc/cp/decl.c
index 10e3079beed..c5799459210 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -15651,7 +15651,11 @@ finish_function (bool inline_p)
   if (!processing_template_decl
   && !cp_function_chain->can_throw
   && !flag_non_call_exceptions
-  && !decl_replaceable_p (fndecl))
+  && !decl_replaceable_p (fndecl)
+  /* If FNDECL is a constructor of a class that can call a throwing
+constructor, don't mark it as non-throwing.  */
+  && (!DECL_CONSTRUCTOR_P (fndecl)
+ || !CLASSTYPE_CTOR_MAY_THROW (DECL_CONTEXT (fndecl
 TREE_NOTHROW (fndecl) = 1;
 
   /* This must come after expand_function_end because cleanups might
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
index e69de29bb2d..264c6c7a7a0 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
@@ -0,0 +1,29 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw f;
+}
+
+struct X {
+  X (int f) : n {init (f)} {}
+  int n;
+};
+
+struct P {
+  X x{20};
+};
+
+int
+main ()
+{
+  try {
+P p {};
+  }
+  catch (int n) {
+return 0;
+  }
+  return 1;
+}
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
index e69de29bb2d..24906374fa4 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
@@ -0,0 +1,33 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw 

Re: C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Jason Merrill
On Wed, May 16, 2018 at 11:15 AM, Marek Polacek  wrote:
> This PR has been on my mind for quite some time but I've not found a solution
> that I'd like.  Maybe one of you can think of something better.
>
> The problem in this test is that in C++11, .eh optimizes out the catch,
> so the exception is never caught.  That is because lower_catch doesn't
> think that the region may throw (eh_region_may_contain_throw).  That's
> so because P::P () is marked as TREE_NOTHROW, which is wrong, because
> it calls X::X() which calls init() with throw.  TREE_NOTHROW is set in
> finish_function:
>
>   /* If this function can't throw any exceptions, remember that.  */
>   if (!processing_template_decl
>   && !cp_function_chain->can_throw
>   && !flag_non_call_exceptions
>   && !decl_replaceable_p (fndecl))
> TREE_NOTHROW (fndecl) = 1;
>
> P::P() should have been marked as can_throw in set_flags_from_callee, but when
> processing X::X() cfun is null, so we can't set it.  P::P() is created only
> later via implicitly_declare_fn.

This should be handled by bot_manip (under break_out_target_exprs,
under get_nsdmi), but it seems that it currently only calls
set_flags_from_callee for CALL_EXPR, not for AGGR_INIT_EXPR as we have
in this case.

Jason


Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Alexander Monakov
On Wed, 16 May 2018, Martin Liška wrote:
> > Hm, is the off-by-one in the new explanatory text really intended? I think
> > the previous text was accurate, and the new text should say "9th and 10th"
> > and then "first 10 invocations", unless I'm missing something?
> 
> I've reconsidered that once more time and having zero-based values:
> * -fdbg-cnt=event:N - trigger event N-times
> * -fdbg-cnt=event:N:(N+M) - skip even N-times and then enable it M-1 times
> 
> Does that make sense?

Yes, I like this, but I think the implementation does not match. New docs say:

> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
> -@code{dbg_cnt(dce)} returns true only for first 10 invocations.
> +For example, with @option{-fdbg-cnt=dce:2:4,tail_call:10},
> +@code{dbg_cnt(dce)} returns true only for third and fourth invocation.
> +For @code{dbg_cnt(tail_call)} true is returned for first 10 invocations.

which is good, but the implementation reads:

>  bool
>  dbg_cnt_is_enabled (enum debug_counter index)
>  {
> -  return count[index] <= limit[index];
> +  unsigned v = count[index];
> +  return v >= limit_low[index] && v < limit_high[index];
>  }

which I believe is misaligned with the docs' intention. It should be the
other way around:

  return v > limit_low[index] && v <= limit_high[index];

Alexander

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-16 Thread Segher Boessenkool
On Wed, May 16, 2018 at 11:10:55AM +0100, Kyrill Tkachov wrote:
> On 16/05/18 10:42, Richard Biener wrote:
> >Segher, do you know where canonicalization rules are documented?
> >IIRC we do not actively try to canonicalize in most cases.
> 
> The documentation we have for RTL canonicalisation is at:
> https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html#Insn-Canonicalizations
> 
> It doesn't mention anything about vec_merge AFAICS so I couldn't convince 
> myself that there
> is a canonicalisation that we enforce (though maybe someone can prove me 
> wrong).

Many canonicalisations aren't documented, it's never clear which of the
canonicalisations are how canonical :-/


Segher


[PATCH, rs6000] Fixes for builtin_prefetch for AIX compatability.

2018-05-16 Thread Carl Love
GCC Maintainers:

The previous patch to map dcbtstt, dcbtt to n2=0 for __builtin_prefetch
builtin caused issues on AIX.  The issue is AIX does not support
the dcbtstt and dcbtt extended mnemonics.  Unfortunately, the AIX
assembler also does not support the three operand form of dcbt and
dcbtst on Power 7.  

This patch fixes up the support for dcbtstt and dcbtt to make it
compatible with Linux and AIX.  The new support now starts with Power 8
rather then Power 7 on both systems for simplicity.

The patch has been tested on 

   powerpc64le-unknown-linux-gnu (Power 8 LE)
   AIX 7.2.0.0   Power 8

Please let me know if the fix is acceptable for trunk.  Thanks.

   Carl Love
-

gcc/ChangeLog:

2018-05-16  Carl Love  

* config/rs6000/rs6000.md (prefetch): Generate ISA 2.06 instructions
dcbt and dcbtstt with TH=16 if operands[2] is 0 and Power 8 or newer.
---
 gcc/config/rs6000/rs6000.md | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8536c89..19b4465 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13233,22 +13233,27 @@
 (match_operand:SI 2 "const_int_operand" "n"))]
   ""
 {
-  /* dcbtstt, dcbtt and TM=0b1 support starts with ISA 2.06.  */
-  int inst_select = INTVAL (operands[2]) || !TARGET_POPCNTD;
+
+
+  /* dcbtstt, dcbtt and TH=0b1 support starts with ISA 2.06 (Power7).
+ AIX does not support the dcbtstt and dcbtt extended mnemonics.
+ The AIX assembler does not support the three operand form of dcbt
+ and dcbtst on Power 7 (-mpwr7).  */
+  int inst_select = INTVAL (operands[2]) || !TARGET_DIRECT_MOVE;
 
   if (REG_P (operands[0]))
 {
   if (INTVAL (operands[1]) == 0)
-return inst_select ? "dcbt 0,%0" : "dcbtt 0,%0";
+return inst_select ? "dcbt 0,%0" : "dcbt 0,%0,16";
   else
-return inst_select ? "dcbtst 0,%0" : "dcbtstt 0,%0";
+return inst_select ? "dcbtst 0,%0" : "dcbtst 0,%0,16";
 }
   else
 {
   if (INTVAL (operands[1]) == 0)
-return inst_select ? "dcbt %a0" : "dcbtt %a0";
+return inst_select ? "dcbt %a0" : "dcbt %a0,16";
   else
-return inst_select ? "dcbtst %a0" : "dcbtstt %a0";
+return inst_select ? "dcbtst %a0" : "dcbtst %a0,16";
 }
 }
   [(set_attr "type" "load")])
-- 
2.7.4



[patch][i386] Goldmont Plus -march/-mtune options

2018-05-16 Thread Makhotina, Olga
Hi,

This patch implements Goldmont Plus -march/-mtune.

2018-05-16  Olga Makhotina  

gcc/

* config.gcc: Support "goldmont-plus".
* config/i386/driver-i386.c (host_detect_local_cpu): Detect 
"goldmont-plus".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_GOLDMONT_PLUS.
* config/i386/i386.c (m_GOLDMONT_PLUS): Define.
(processor_target_table): Add "goldmont-plus".
(PTA_GOLDMONT_PLUS): Define.
(ix86_lea_outperforms): Add TARGET_GOLDMONT_PLUS.
(get_builtin_code_for_version): Handle PROCESSOR_GOLDMONT_PLUS.
(fold_builtin_cpu): Add M_INTEL_GOLDMONT_PLUS.
(fold_builtin_cpu): Add "goldmont-plus".
(ix86_add_stmt_cost): Add TARGET_GOLDMONT_PLUS.
(ix86_option_override_internal): Add "goldmont-plus".
* config/i386/i386.h (processor_costs): Define TARGET_GOLDMONT_PLUS.
(processor_type): Add PROCESSOR_GOLDMONT_PLUS.
* config/i386/x86-tune.def: Add m_GOLDMONT_PLUS.
* doc/invoke.texi: Add goldmont-plus as x86 -march=/-mtune= CPU type.

libgcc/

* config/i386/cpuinfo.h (processor_types): Add INTEL_GOLDMONT_PLUS.
* config/i386/cpuinfo.c (get_intel_cpu): Detect Goldmont Plus.

gcc/testsuite/

* gcc.target/i386/builtin_target.c: Test goldmont-plus.
* gcc.target/i386/funcspec-56.inc: Test arch=goldmont-plus.

Is it Ok?

Thanks.


0001-goldmont-plus.patch
Description: 0001-goldmont-plus.patch


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-16 Thread Wilco Dijkstra
Richard Earnshaw wrote:
 
>>> Which doesn't appear to have been approved.  Did you follow up with Jeff?
>> 
>> I'll get back to that one at some point - it'll take some time to agree on a 
>> way
>> forward with the callback.
>> 
>> Wilco
>> 
>> 
>
> So it seems to me that this should then be queued until that is resolved.

Why? The patch as is doesn't at all depend on the resolution of how to improve
the callback. If we stopped all patches until GCC is 100% perfect we'd never
make any progress. 

Wilco

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-16 Thread Segher Boessenkool
On Wed, May 16, 2018 at 11:42:39AM +0200, Richard Biener wrote:
> Works for me.  Were you able to actually create such RTL from testcases?
> Segher, do you know where canonicalization rules are documented?
> IIRC we do not actively try to canonicalize in most cases.

md.texi, node "Insn Canonicalizations"?


Segher


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-16 Thread Richard Earnshaw (lists)
On 16/05/18 14:56, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>  
 Which doesn't appear to have been approved.  Did you follow up with Jeff?
>>>
>>> I'll get back to that one at some point - it'll take some time to agree on 
>>> a way
>>> forward with the callback.
>>>
>>> Wilco
>>>  
>>>
>>
>> So it seems to me that this should then be queued until that is resolved.
> 
> Why? The patch as is doesn't at all depend on the resolution of how to improve
> the callback. If we stopped all patches until GCC is 100% perfect we'd never
> make any progress. 
> 
> Wilco
> 

Because we don't want to build up technical debt for things that should
and can be fixed properly.

R.


Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Martin Liška
On 05/16/2018 09:47 AM, Alexander Monakov wrote:
> On Wed, 16 May 2018, Martin Liška wrote:
> 
>> Hi.
>>
>> I consider it handy sometimes to trigger just a single invocation of
>> an optimization driven by a debug counter. Doing that one needs to
>> be able to limit both lower and upper limit of a counter. It's implemented
>> in the patch.
> 
> I'd like to offer some non-reviewer comments on the patch (below)

Hi.

I always appreciate these comments!

> 
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1171,7 +1171,7 @@ List all available debugging counters with their 
>> limits and counts.
>>  
>>  fdbg-cnt=
>>  Common RejectNegative Joined Var(common_deferred_options) Defer
>> --fdbg-cnt=:[,:,...] Set the debug counter 
>> limit.
>> +-fdbg-cnt=[:]:[,::,...]
>>Set the debug counter limit.
> 
> This line has gotten quite long and repeating the same thing in the second
> brackets is not very helpful. Can we use something simpler like this?
> 
> -fdbg-cnt=[:]:[,:...]

Yes, it's a nice simplfication.

> 
>> +#define DEBUG_COUNTER(a) 1,
>> +static unsigned int limit_low[debug_counter_number_of_counters] =
>> +{
>> +#include "dbgcnt.def"
>> +};
>> +#undef DEBUG_COUNTER
>> +
>> +
>>  static unsigned int count[debug_counter_number_of_counters];
>>  
>>  bool
>>  dbg_cnt_is_enabled (enum debug_counter index)
>>  {
>> -  return count[index] <= limit[index];
>> +  return limit_low[index] <= count[index] && count[index] <= 
>> limit_high[index];
> 
> I recall Jakub recently applied a tree-wide change of A < B && B < C to read
> B > A && B < C.

Can you please point to a revision where it was done?

> 
> Please consider making limit_low non-inclusive by testing for strict 
> inequality
> count[index] > limit_low[index]. This will allow to make limit_low[] array
> zero-initialized (taking up space only in BSS).

Sure, nice idea. I did it, now all -dbg-cnt values are zero-based. I consider 
it easier
to work with. And as the usage is quite internal I hope we can adjust the logic 
to be
zero-based.

> 
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -14326,13 +14326,14 @@ Print the name and the counter upper bound for all 
>> debug counters.
>>  
>>  @item -fdbg-cnt=@var{counter-value-list}
>>  @opindex fdbg-cnt
>> -Set the internal debug counter upper bound.  @var{counter-value-list}
>> -is a comma-separated list of @var{name}:@var{value} pairs
>> -which sets the upper bound of each debug counter @var{name} to @var{value}.
>> +Set the internal debug counter lower and upper bound.  
>> @var{counter-value-list}
>> +is a comma-separated list of @var{name}:@var{lower_bound}:@var{upper_bound}
>> +tuples which sets the lower and the upper bound of each debug
>> +counter @var{name}.
> 
> Shouldn't this mention that lower bound is optional?

Yes, fixed.

> 
>>  All debug counters have the initial upper bound of @code{UINT_MAX};
>>  thus @code{dbg_cnt} returns true always unless the upper bound
>>  is set by this option.
>> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
>> +For example, with @option{-fdbg-cnt=dce:9:10,tail_call:0},
>>  @code{dbg_cnt(dce)} returns true only for first 10 invocations.
> 
> This seems confusing, you added a lower bound to the 'dce' counter,
> but the following text remains unchanged and says it's enabled for
> first 10 calls?

Yes, now fixed.

Thanks again,
Martin

> 
> Alexander
> 

>From c57144bbe6cb339230f887918615b7a206716b82 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 15 May 2018 15:04:30 +0200
Subject: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

gcc/ChangeLog:

2018-05-15  Martin Liska  

	* dbgcnt.c (limit_low): Renamed from limit.
	(limit_high): New variable.
	(dbg_cnt_is_enabled): Check for upper limit.
	(dbg_cnt): Adjust dumping.
	(dbg_cnt_set_limit_by_index): Add new argument for high
	value.
	(dbg_cnt_set_limit_by_name): Likewise.
	(dbg_cnt_process_single_pair): Parse new format.
	(dbg_cnt_process_opt): Use strtok.
	(dbg_cnt_list_all_counters): Remove 'value' and add
	'limit_high'.
	* doc/invoke.texi: Document changes.

gcc/testsuite/ChangeLog:

2018-05-15  Martin Liska  

	* gcc.dg/ipa/ipa-icf-39.c: New test.
	* gcc.dg/pr68766.c: Adjust pruned output.
---
 gcc/common.opt|   2 +-
 gcc/dbgcnt.c  | 135 +++---
 gcc/doc/invoke.texi   |  13 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c |  33 +
 gcc/testsuite/gcc.dg/pr68766.c|   2 +-
 5 files changed, 135 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c

diff --git a/gcc/common.opt b/gcc/common.opt
index d6ef85928f3..13ab5c65d43 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1171,7 +1171,7 @@ List all available debugging counters with their limits and counts.
 
 fdbg-cnt=
 Common RejectNegative Joined Var(common_deferred_options) Defer
--fdbg-cnt=:[,:,...]	Set the debug counter limit.

Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-16 Thread Martin Jambor
Hi,

On Tue, May 15 2018, Jason Merrill wrote:
> In C++11 and up, the implicitly-declared copy constructor and
> assignment operator are deprecated if one of them, or the destructor,
> is user-provided.  Implementing that in G++ turned up a few dodgy uses
> in the compiler.
>
> In general it's unsafe to copy an ipa_edge_args, because if one of the
> pointers is non-null you get two copies of a vec pointer, and when one
> of the objects is destroyed it frees the vec and leaves the other
> object pointing to freed memory.  This specific example is safe
> because it only copies from an object with null pointers, but it would
> be better to avoid the copy.  OK for trunk?

I have had a look and found out that the function in question
(ipa_free_edge_args_substructures) has no uses, apparently I forgot to
remove it when I did the conversion of jump functions to be stored in
call graph edge summaries.  So thanks lot for spotting this but I'd
prefer the following (compiled but untested) patch:

Martin


2018-05-16  Martin Jambor  

* ipa-prop.c (ipa_free_all_edge_args): Remove.
* ipa-prop.h (ipa_free_all_edge_args): Likewise.


diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 38441cc49bc..19d55cda009 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3708,16 +3708,6 @@ ipa_check_create_edge_args (void)
 ipa_vr_hash_table = hash_table::create_ggc (37);
 }
 
-/* Frees all dynamically allocated structures that the argument info points
-   to.  */
-
-void
-ipa_free_edge_args_substructures (struct ipa_edge_args *args)
-{
-  vec_free (args->jump_functions);
-  *args = ipa_edge_args ();
-}
-
 /* Free all ipa_edge structures.  */
 
 void
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index a61e06135e3..dc45cea9c71 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -664,7 +664,6 @@ extern GTY(()) vec 
*ipcp_transformations;
 void ipa_create_all_node_params (void);
 void ipa_create_all_edge_args (void);
 void ipa_check_create_edge_args (void);
-void ipa_free_edge_args_substructures (struct ipa_edge_args *);
 void ipa_free_all_node_params (void);
 void ipa_free_all_edge_args (void);
 void ipa_free_all_structures_after_ipa_cp (void);




Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Martin Liška
On 05/16/2018 02:56 PM, Alexander Monakov wrote:
> On Wed, 16 May 2018, Martin Liška wrote:
>>> I recall Jakub recently applied a tree-wide change of A < B && B < C to read
>>> B > A && B < C.
>>
>> Can you please point to a revision where it was done?
> 
> It is SVN r255831, mailing list thread here ("Replace Yoda conditions"):
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01257.html

Thanks, fix according to that.

> 
> In your revised patch:
> 
>> --- a/gcc/dbgcnt.c
>> +++ b/gcc/dbgcnt.c
>> @@ -41,53 +41,90 @@ static struct string2counter_map 
>> map[debug_counter_number_of_counters] =
>>  #undef DEBUG_COUNTER
>>  
>>  #define DEBUG_COUNTER(a) UINT_MAX,
>> -static unsigned int limit[debug_counter_number_of_counters] =
>> +static unsigned int limit_high[debug_counter_number_of_counters] =
>>  {
>>  #include "dbgcnt.def"
>>  };
>>  #undef DEBUG_COUNTER
>>  
>> +#define DEBUG_COUNTER(a) 0,
>> +static unsigned int limit_low[debug_counter_number_of_counters] =
>> +{
>> +#include "dbgcnt.def"
>> +};
>> +#undef DEBUG_COUNTER
> 
> No need to spell out the all-zeros initializer of a static object:
> 
> static unsigned int limit_low[debug_counter_number_of_counters];

Got it.

> 
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -14326,14 +14326,17 @@ Print the name and the counter upper bound for all 
>> debug counters.
>>  
>>  @item -fdbg-cnt=@var{counter-value-list}
>>  @opindex fdbg-cnt
>> -Set the internal debug counter upper bound.  @var{counter-value-list}
>> -is a comma-separated list of @var{name}:@var{value} pairs
>> -which sets the upper bound of each debug counter @var{name} to @var{value}.
>> +Set the internal debug counter lower and upper bound.  
>> @var{counter-value-list}
>> +is a comma-separated list of @var{name}:@var{lower_bound}:@var{upper_bound}
>> +tuples which sets the lower and the upper bound of each debug
>> +counter @var{name}.  The @var{lower_bound} is optional and is zero
>> +initialized if not set.
>>  All debug counters have the initial upper bound of @code{UINT_MAX};
>>  thus @code{dbg_cnt} returns true always unless the upper bound
>>  is set by this option.
>> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
>> -@code{dbg_cnt(dce)} returns true only for first 10 invocations.
>> +For example, with @option{-fdbg-cnt=dce:9:10,tail_call:10},
>> +@code{dbg_cnt(dce)} returns true only for 10th and 11th invocation.
>> +For @code{dbg_cnt(tail_call)} true is returned for first 11 invocations.
> 
> Hm, is the off-by-one in the new explanatory text really intended? I think
> the previous text was accurate, and the new text should say "9th and 10th"
> and then "first 10 invocations", unless I'm missing something?

I've reconsidered that once more time and having zero-based values:
* -fdbg-cnt=event:N - trigger event N-times
* -fdbg-cnt=event:N:(N+M) - skip even N-times and then enable it M-1 times

Does that make sense?
Martin


> 
> Alexander
> 

>From 49f185588d1f5c796f83bb9b546c6199c7e80d2f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 15 May 2018 15:04:30 +0200
Subject: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

gcc/ChangeLog:

2018-05-15  Martin Liska  

	* dbgcnt.c (limit_low): Renamed from limit.
	(limit_high): New variable.
	(dbg_cnt_is_enabled): Check for upper limit.
	(dbg_cnt): Adjust dumping.
	(dbg_cnt_set_limit_by_index): Add new argument for high
	value.
	(dbg_cnt_set_limit_by_name): Likewise.
	(dbg_cnt_process_single_pair): Parse new format.
	(dbg_cnt_process_opt): Use strtok.
	(dbg_cnt_list_all_counters): Remove 'value' and add
	'limit_high'.
	* doc/invoke.texi: Document changes.

gcc/testsuite/ChangeLog:

2018-05-15  Martin Liska  

	* gcc.dg/ipa/ipa-icf-39.c: New test.
	* gcc.dg/pr68766.c: Adjust pruned output.
---
 gcc/common.opt|   2 +-
 gcc/dbgcnt.c  | 129 ++
 gcc/doc/invoke.texi   |  13 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c |  33 +
 gcc/testsuite/gcc.dg/pr68766.c|   2 +-
 5 files changed, 129 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c

diff --git a/gcc/common.opt b/gcc/common.opt
index d6ef85928f3..13ab5c65d43 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1171,7 +1171,7 @@ List all available debugging counters with their limits and counts.
 
 fdbg-cnt=
 Common RejectNegative Joined Var(common_deferred_options) Defer
--fdbg-cnt=:[,:,...]	Set the debug counter limit.
+-fdbg-cnt=[:]:[,:...]	Set the debug counter limit.
 
 fdebug-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 96b3df28f5e..e7c6da2fac0 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -41,53 +41,84 @@ static struct string2counter_map map[debug_counter_number_of_counters] =
 #undef DEBUG_COUNTER
 
 #define DEBUG_COUNTER(a) UINT_MAX,
-static unsigned int 

Re: Display priority in "Serious" bugs for gcc 8 from web page

2018-05-16 Thread Richard Sandiford
Thomas Koenig  writes:
> Hello world,
>
> whenever I look at the list of serious bugs, I find myself chaning the
> columns to add the priority field.
>
> What do you think about adding the priority field when clicking on that
> link?  A patch is attached.

I don't think anyone replied to this so far, but +1 FWIW.  It seemed
strange that the priority was one of the main search criteria but wasn't
included in the results.

The "Resolution" field seems a bit redundant when the search is only
for open bugs, so if people are worried about having too many columns,
maybe it would be OK to do a swap?

Thanks,
Richard


[PATCH] Enhance [VEC_]COND_EXPR verification

2018-05-16 Thread Richard Biener

PR85794 shows we do not verify those.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-05-16  Richard Biener  

* tree-cfg.c (verify_gimple_assign_ternary): Properly
verify the [VEC_]COND_EXPR embedded comparison.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 260280)
+++ gcc/tree-cfg.c  (working copy)
@@ -4137,6 +4137,12 @@ verify_gimple_assign_ternary (gassign *s
}
   /* Fallthrough.  */
 case COND_EXPR:
+  if (!is_gimple_val (rhs1)
+ && verify_gimple_comparison (TREE_TYPE (rhs1),
+  TREE_OPERAND (rhs1, 0),
+  TREE_OPERAND (rhs1, 1),
+  TREE_CODE (rhs1)))
+   return true;
   if (!useless_type_conversion_p (lhs_type, rhs2_type)
  || !useless_type_conversion_p (lhs_type, rhs3_type))
{


Re: [PATCH 2/2] df-scan: remove ad-hoc handling of global regs in asms

2018-05-16 Thread Alexander Monakov


On Mon, 23 Apr 2018, Alexander Monakov wrote:

> As discussed in the cover letter, the code removed in this patch is 
> unnecessary,
> references to global reg vars from inline asms do not work reliably, and so we
> should simply require that inline asms use constraints to make such references
> properly visible to the compiler.
> 
> Bootstrapped/regtested on powerpc64, will retest on ppc64le and x86 in stage 
> 1.
> 
> PR rtl-optimization/79985
>   * df-scan.c (df_insn_refs_collect): Remove special case for
> global registers and asm statements.

Ping. I've retested once on ppc64le since posting.

> ---
>  gcc/df-scan.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 95e1e0df2d5..cbb08fc36ae 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3207,17 +3207,6 @@ df_insn_refs_collect (struct df_collection_rec 
> *collection_rec,
>if (CALL_P (insn_info->insn))
>  df_get_call_refs (collection_rec, bb, insn_info, flags);
>  
> -  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
> -for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -  if (global_regs[i])
> -   {
> - /* As with calls, asm statements reference all global regs. */
> - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
> -NULL, bb, insn_info, DF_REF_REG_USE, flags);
> - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
> -NULL, bb, insn_info, DF_REF_REG_DEF, flags);
> -   }
> -
>/* Record other defs.  These should be mostly for DF_REF_REGULAR, so
>   that a qsort on the defs is unnecessary in most cases.  */
>df_defs_record (collection_rec,
> 


[PATCH] When using -fprofile-generate=/some/path mangle absolute path of file (PR lto/85759).

2018-05-16 Thread Martin Liška
On 12/21/2017 10:13 AM, Martin Liška wrote:
> On 12/20/2017 06:45 PM, Jakub Jelinek wrote:
>> Another thing is that the "/" in there is wrong, so
>>   const char dir_separator_str[] = { DIR_SEPARATOR, '\0' };
>>   char *b = concat (profile_data_prefix, dir_separator_str, pwd, NULL);
>> needs to be used instead.
> 
> This looks much nicer, I forgot about DIR_SEPARATOR.
> 
>> Does profile_data_prefix have any dir separators stripped from the end?
> 
> That's easy to achieve..
> 
>> Is pwd guaranteed to be relative in this case?
> 
> .. however this is absolute path, which would be problematic on a DOC based 
> FS.
> Maybe we should do the same path mangling as we do for purpose of gcov:
> 
> https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov.c#L2424

Hi.

I decided to implement that. Which means for:

$ gcc -fprofile-generate=/tmp/myfolder empty.c -O2 && ./a.out 

we get following file:
/tmp/myfolder/#home#marxin#Programming#testcases#tmp#empty.gcda

That guarantees we have a unique file path. As seen in the PR it
can produce a funny ICE.

I've been testing the patch.
Ready after it finishes tests?

Martin

> 
> What do you think about it?
> Regarding the string manipulation: I'm not an expert, but work with string in 
> C
> is for me always a pain :)
> 
> Martin
> 

>From 386a4561a4d1501e8959871791289e95f6a89af5 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 16 Aug 2017 10:22:57 +0200
Subject: [PATCH] When using -fprofile-generate=/some/path mangle absolute path
 of file (PR lto/85759).

gcc/ChangeLog:

2018-05-16  Martin Liska  

	PR lto/85759
	* coverage.c (coverage_init): Mangle full path name.
	* doc/invoke.texi: Document the change.
	* gcov-io.c (mangle_path): New.
	* gcov-io.h (mangle_path): Likewise.
	* gcov.c (mangle_name): Use mangle_path for path mangling.
---
 gcc/coverage.c  | 20 ++--
 gcc/doc/invoke.texi |  3 +++
 gcc/gcov-io.c   | 49 +
 gcc/gcov-io.h   |  1 +
 gcc/gcov.c  | 37 +
 5 files changed, 72 insertions(+), 38 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 32ef298a11f..6e621c3ff96 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1227,8 +1227,24 @@ coverage_init (const char *filename)
 g->get_passes ()->get_pass_profile ()->static_pass_number;
   g->get_dumps ()->dump_start (profile_pass_num, NULL);
 
-  if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
-profile_data_prefix = getpwd ();
+  if (!IS_ABSOLUTE_PATH (filename))
+{
+  /* When a profile_data_prefix is provided, then mangle full path
+	 of filename in order to prevent file path clashing.  */
+  if (profile_data_prefix)
+	{
+#if HAVE_DOS_BASED_FILE_SYSTEM
+	  const char separator = "\\";
+#else
+	  const char *separator = "/";
+#endif
+	  filename = concat (getpwd (), separator, filename, NULL);
+	  filename = mangle_path (filename);
+	  len = strlen (filename);
+	}
+  else
+	profile_data_prefix = getpwd ();
+}
 
   if (profile_data_prefix)
 prefix_len = strlen (profile_data_prefix);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ca3772bbebf..4859cec0ab5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11253,6 +11253,9 @@ and used by @option{-fprofile-use} and @option{-fbranch-probabilities}
 and its related options.  Both absolute and relative paths can be used.
 By default, GCC uses the current directory as @var{path}, thus the
 profile data file appears in the same directory as the object file.
+In order to prevent filename clashing, if object file name is not an absolute
+path, we mangle absolute path of @file{@var{sourcename}.gcda} file and
+use it as file name of a @file{.gcda} file.
 
 @item -fprofile-generate
 @itemx -fprofile-generate=@var{path}
diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c
index 3fe1e613ebc..68660d6d3cf 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -576,6 +576,55 @@ gcov_read_counter (void)
   return value;
 }
 
+/* Mangle filename path of BASE and output new allocated pointer with
+   mangled path.  */
+
+char *
+mangle_path (char const *base)
+{
+  /* Convert '/' to '#', convert '..' to '^',
+ convert ':' to '~' on DOS based file system.  */
+  const char *probe;
+  char *buffer = (char *)xmalloc (strlen (base) + 10);
+  char *ptr = buffer;
+
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  if (base[0] && base[1] == ':')
+{
+  ptr[0] = base[0];
+  ptr[1] = '~';
+  ptr += 2;
+  base += 2;
+}
+#endif
+  for (; *base; base = probe)
+{
+  size_t len;
+
+  for (probe = base; *probe; probe++)
+	if (*probe == '/')
+	  break;
+  len = probe - base;
+  if (len == 2 && base[0] == '.' && base[1] == '.')
+	*ptr++ = '^';
+  else
+	{
+	  memcpy (ptr, base, len);
+	  ptr += len;
+	}
+  if (*probe)
+	{
+	  *ptr++ = '#';
+	  probe++;
+	}
+}
+
+  /* Terminate the string.  */
+  *ptr = '\0';
+
+  return buffer;
+}
+
 /* We need 

Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-16 Thread Andreas Schwab
On Mai 15 2018, Jason Merrill <ja...@redhat.com> wrote:

> commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
> Author: Jason Merrill <ja...@redhat.com>
> Date:   Tue May 15 20:46:54 2018 -0400
>
> * cp-tree.h (cp_expr): Remove copy constructor.
> 
> * mangle.c (struct releasing_vec): Declare copy constructor.

I'm getting an ICE on ia64 during the stage1 build of libstdc++ (perhaps
related that this uses gcc 4.8 as the bootstrap compiler):

/usr/local/gcc/gcc-20180516/Build/./gcc/xgcc -shared-libgcc 
-B/usr/local/gcc/gcc-20180516/Build/./gcc -nostdinc++ 
-L/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/src 
-L/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/src/.libs 
-L/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/libsupc++/.libs
 -B/usr/ia64-suse-linux/bin/ -B/usr/ia64-suse-linux/lib/ -isystem 
/usr/ia64-suse-linux/include -isystem /usr/ia64-suse-linux/sys-include   
-fno-checking -x c++-header -nostdinc++ -O2 -g -D_GNU_SOURCE  
-I/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/include/ia64-suse-linux
 -I/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/include 
-I/usr/local/gcc/gcc-20180516/libstdc++-v3/libsupc++  -O2 -g -std=gnu++0x 
/usr/local/gcc/gcc-20180516/libstdc++-v3/include/precompiled/stdc++.h \
-o ia64-suse-linux/bits/stdc++.h.gch/O2ggnu++0x.gch
In file included from 
/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/include/cmath:42,
 from 
/usr/local/gcc/gcc-20180516/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/local/gcc/gcc-20180516/Build/ia64-suse-linux/libstdc++-v3/include/bits/cpp_type_traits.h:89:34:
 internal compiler error: in cp_parser_lookup_name, at cp/parser.c:26107
   enum { __value = bool(_Sp::__value) || bool(_Tp::__value) };
  ^~~
0x403a5cff cp_parser_lookup_name
../../gcc/cp/parser.c:26107
0x403d576f cp_parser_primary_expression
../../gcc/cp/parser.c:5517
0x403f1daf cp_parser_postfix_expression
../../gcc/cp/parser.c:7008
0x4040534f cp_parser_unary_expression
../../gcc/cp/parser.c:8300
0x403b2abf cp_parser_cast_expression
../../gcc/cp/parser.c:9068
0x403b3f9f cp_parser_binary_expression
../../gcc/cp/parser.c:9170
0x403b518f cp_parser_assignment_expression
../../gcc/cp/parser.c:9466
0x403bbf9f cp_parser_parenthesized_expression_list
../../gcc/cp/parser.c:7740
0x403bde1f cp_parser_functional_cast
../../gcc/cp/parser.c:27387
0x403f1c9f cp_parser_postfix_expression
../../gcc/cp/parser.c:6931
0x4040534f cp_parser_unary_expression
../../gcc/cp/parser.c:8300
0x403b2abf cp_parser_cast_expression
../../gcc/cp/parser.c:9068
0x403b3f9f cp_parser_binary_expression
../../gcc/cp/parser.c:9170
0x403b518f cp_parser_assignment_expression
../../gcc/cp/parser.c:9466
0x403b775f cp_parser_constant_expression
../../gcc/cp/parser.c:9748
0x403ce8bf cp_parser_enumerator_definition
../../gcc/cp/parser.c:18410
0x403ce8bf cp_parser_enumerator_list
../../gcc/cp/parser.c:18350
0x403ce8bf cp_parser_enum_specifier
../../gcc/cp/parser.c:18277
0x403ce8bf cp_parser_type_specifier
../../gcc/cp/parser.c:16736
0x403fad7f cp_parser_decl_specifier_seq
../../gcc/cp/parser.c:13606

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-05-16 Thread Luis Machado



On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:


On 15/05/18 12:12, Luis Machado wrote:

Hi,

On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:

Hi Luis,

On 14/05/18 22:18, Luis Machado wrote:

Hi,

Here's an updated version of the patch (now reverted) that addresses 
the previous bootstrap problem (signedness and long long/int 
conversion).


I've checked that it bootstraps properly on both aarch64-linux and 
x86_64-linux and that tests look sane.


James, would you please give this one a try to see if you can still 
reproduce PR85682? I couldn't reproduce it in multiple attempts.




The patch doesn't hit the regressions in PR85682 from what I can see.
I have a comment on the patch below.



Great. Thanks for checking Kyrill.


--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
  static bool
  should_issue_prefetch_p (struct mem_ref *ref)
  {
+  /* Some processors may have a hardware prefetcher that may 
conflict with

+ prefetch hints for a range of strides.  Make sure we don't issue
+ prefetches for such cases if the stride is within this particular
+ range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+  && abs_hwi (int_cst_value (ref->group->step)) <
+  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+    {

The '<' should go on the line below together with 
PREFETCH_MINIMUM_STRIDE.


I've fixed this locally now.


Thanks. I haven't followed the patch in detail, are you looking for 
midend changes approval since the last version?

Or do you need aarch64 approval?


The changes are not substantial, but midend approval i what i was aiming at.

Also the confirmation that PR85682 is no longer happening.

Luis


Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Alexander Monakov
On Wed, 16 May 2018, Martin Liška wrote:
> > I recall Jakub recently applied a tree-wide change of A < B && B < C to read
> > B > A && B < C.
> 
> Can you please point to a revision where it was done?

It is SVN r255831, mailing list thread here ("Replace Yoda conditions"):
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01257.html

In your revised patch:

> --- a/gcc/dbgcnt.c
> +++ b/gcc/dbgcnt.c
> @@ -41,53 +41,90 @@ static struct string2counter_map 
> map[debug_counter_number_of_counters] =
>  #undef DEBUG_COUNTER
>  
>  #define DEBUG_COUNTER(a) UINT_MAX,
> -static unsigned int limit[debug_counter_number_of_counters] =
> +static unsigned int limit_high[debug_counter_number_of_counters] =
>  {
>  #include "dbgcnt.def"
>  };
>  #undef DEBUG_COUNTER
>  
> +#define DEBUG_COUNTER(a) 0,
> +static unsigned int limit_low[debug_counter_number_of_counters] =
> +{
> +#include "dbgcnt.def"
> +};
> +#undef DEBUG_COUNTER

No need to spell out the all-zeros initializer of a static object:

static unsigned int limit_low[debug_counter_number_of_counters];

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14326,14 +14326,17 @@ Print the name and the counter upper bound for all 
> debug counters.
>  
>  @item -fdbg-cnt=@var{counter-value-list}
>  @opindex fdbg-cnt
> -Set the internal debug counter upper bound.  @var{counter-value-list}
> -is a comma-separated list of @var{name}:@var{value} pairs
> -which sets the upper bound of each debug counter @var{name} to @var{value}.
> +Set the internal debug counter lower and upper bound.  
> @var{counter-value-list}
> +is a comma-separated list of @var{name}:@var{lower_bound}:@var{upper_bound}
> +tuples which sets the lower and the upper bound of each debug
> +counter @var{name}.  The @var{lower_bound} is optional and is zero
> +initialized if not set.
>  All debug counters have the initial upper bound of @code{UINT_MAX};
>  thus @code{dbg_cnt} returns true always unless the upper bound
>  is set by this option.
> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
> -@code{dbg_cnt(dce)} returns true only for first 10 invocations.
> +For example, with @option{-fdbg-cnt=dce:9:10,tail_call:10},
> +@code{dbg_cnt(dce)} returns true only for 10th and 11th invocation.
> +For @code{dbg_cnt(tail_call)} true is returned for first 11 invocations.

Hm, is the off-by-one in the new explanatory text really intended? I think
the previous text was accurate, and the new text should say "9th and 10th"
and then "first 10 invocations", unless I'm missing something?

Alexander

Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-16 Thread Jason Merrill
On Wed, May 16, 2018 at 6:35 AM, Martin Jambor  wrote:
> On Tue, May 15 2018, Jason Merrill wrote:
>> In C++11 and up, the implicitly-declared copy constructor and
>> assignment operator are deprecated if one of them, or the destructor,
>> is user-provided.  Implementing that in G++ turned up a few dodgy uses
>> in the compiler.
>>
>> In general it's unsafe to copy an ipa_edge_args, because if one of the
>> pointers is non-null you get two copies of a vec pointer, and when one
>> of the objects is destroyed it frees the vec and leaves the other
>> object pointing to freed memory.  This specific example is safe
>> because it only copies from an object with null pointers, but it would
>> be better to avoid the copy.  OK for trunk?
>
> I have had a look and found out that the function in question
> (ipa_free_edge_args_substructures) has no uses, apparently I forgot to
> remove it when I did the conversion of jump functions to be stored in
> call graph edge summaries.  So thanks lot for spotting this but I'd
> prefer the following (compiled but untested) patch:
>
> Martin
>
>
> 2018-05-16  Martin Jambor  
>
> * ipa-prop.c (ipa_free_all_edge_args): Remove.
> * ipa-prop.h (ipa_free_all_edge_args): Likewise.

That works for me, too.

Jason


Re: RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-16 Thread Martin Jambor
Hi,

On Wed, May 16 2018, Jason Merrill wrote:
> On Wed, May 16, 2018 at 6:35 AM, Martin Jambor  wrote:
>> On Tue, May 15 2018, Jason Merrill wrote:
>>> In C++11 and up, the implicitly-declared copy constructor and
>>> assignment operator are deprecated if one of them, or the destructor,
>>> is user-provided.  Implementing that in G++ turned up a few dodgy uses
>>> in the compiler.
>>>
>>> In general it's unsafe to copy an ipa_edge_args, because if one of the
>>> pointers is non-null you get two copies of a vec pointer, and when one
>>> of the objects is destroyed it frees the vec and leaves the other
>>> object pointing to freed memory.  This specific example is safe
>>> because it only copies from an object with null pointers, but it would
>>> be better to avoid the copy.  OK for trunk?
>>
>> I have had a look and found out that the function in question
>> (ipa_free_edge_args_substructures) has no uses, apparently I forgot to
>> remove it when I did the conversion of jump functions to be stored in
>> call graph edge summaries.  So thanks lot for spotting this but I'd
>> prefer the following (compiled but untested) patch:
>>
>> Martin
>>
>>
>> 2018-05-16  Martin Jambor  
>>
>> * ipa-prop.c (ipa_free_all_edge_args): Remove.
>> * ipa-prop.h (ipa_free_all_edge_args): Likewise.
>
> That works for me, too.
>
> Jason


OK, so after bootstrapping and testing on x86_64, I committed the patch
as obvious.

Thanks,

Martin


2018-05-16  Martin Jambor  

* ipa-prop.c (ipa_free_all_edge_args): Remove.
* ipa-prop.h (ipa_free_all_edge_args): Likewise.
---
 gcc/ipa-prop.c | 10 --
 gcc/ipa-prop.h |  1 -
 2 files changed, 11 deletions(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 38441cc49bc..19d55cda009 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3708,16 +3708,6 @@ ipa_check_create_edge_args (void)
 ipa_vr_hash_table = hash_table::create_ggc (37);
 }
 
-/* Frees all dynamically allocated structures that the argument info points
-   to.  */
-
-void
-ipa_free_edge_args_substructures (struct ipa_edge_args *args)
-{
-  vec_free (args->jump_functions);
-  *args = ipa_edge_args ();
-}
-
 /* Free all ipa_edge structures.  */
 
 void
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index a61e06135e3..dc45cea9c71 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -664,7 +664,6 @@ extern GTY(()) vec 
*ipcp_transformations;
 void ipa_create_all_node_params (void);
 void ipa_create_all_edge_args (void);
 void ipa_check_create_edge_args (void);
-void ipa_free_edge_args_substructures (struct ipa_edge_args *);
 void ipa_free_all_node_params (void);
 void ipa_free_all_edge_args (void);
 void ipa_free_all_structures_after_ipa_cp (void);
-- 
2.16.3



Re: C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Jason Merrill
On Wed, May 16, 2018 at 1:37 PM, Marek Polacek  wrote:
> On Wed, May 16, 2018 at 11:35:56AM -0400, Jason Merrill wrote:
>> On Wed, May 16, 2018 at 11:15 AM, Marek Polacek  wrote:
>> > This PR has been on my mind for quite some time but I've not found a 
>> > solution
>> > that I'd like.  Maybe one of you can think of something better.
>> >
>> > The problem in this test is that in C++11, .eh optimizes out the catch,
>> > so the exception is never caught.  That is because lower_catch doesn't
>> > think that the region may throw (eh_region_may_contain_throw).  That's
>> > so because P::P () is marked as TREE_NOTHROW, which is wrong, because
>> > it calls X::X() which calls init() with throw.  TREE_NOTHROW is set in
>> > finish_function:
>> >
>> >   /* If this function can't throw any exceptions, remember that.  */
>> >   if (!processing_template_decl
>> >   && !cp_function_chain->can_throw
>> >   && !flag_non_call_exceptions
>> >   && !decl_replaceable_p (fndecl))
>> > TREE_NOTHROW (fndecl) = 1;
>> >
>> > P::P() should have been marked as can_throw in set_flags_from_callee, but 
>> > when
>> > processing X::X() cfun is null, so we can't set it.  P::P() is created only
>> > later via implicitly_declare_fn.
>>
>> This should be handled by bot_manip (under break_out_target_exprs,
>> under get_nsdmi), but it seems that it currently only calls
>> set_flags_from_callee for CALL_EXPR, not for AGGR_INIT_EXPR as we have
>> in this case.
>
> Ah, nice!  So, this tweaks set_flags_from_callee to also work for
> AGGR_INIT_EXPRs.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-05-16  Marek Polacek  
>
> PR c++/85363
> * call.c (set_flags_from_callee): Handle AGGR_INIT_EXPRs too.
> * tree.c (bot_manip): Call set_flags_from_callee for
> AGGR_INIT_EXPRs too.
>
> * g++.dg/cpp0x/initlist-throw1.C: New test.
> * g++.dg/cpp0x/initlist-throw2.C: New test.
>
> diff --git gcc/cp/call.c gcc/cp/call.c
> index 09a3618b007..11b40747932 100644
> --- gcc/cp/call.c
> +++ gcc/cp/call.c
> @@ -319,16 +319,23 @@ build_call_n (tree function, int n, ...)
>  void
>  set_flags_from_callee (tree call)
>  {
> -  bool nothrow;
> -  tree decl = get_callee_fndecl (call);
> +  /* Handle both CALL_EXPRs and AGGR_INIT_EXPRs.  */
> +  tree decl = cp_get_callee_fndecl_nofold (call);
>
>/* We check both the decl and the type; a function may be known not to
>   throw without being declared throw().  */
> -  nothrow = decl && TREE_NOTHROW (decl);
> -  if (CALL_EXPR_FN (call))
> -nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
> -  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
> -nothrow = true;
> +  bool nothrow = decl && TREE_NOTHROW (decl);
> +  if (TREE_CODE (call) == CALL_EXPR)
> +{
> +  if (CALL_EXPR_FN (call))
> +   nothrow
> + |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
> +  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
> +   nothrow = true;
> +}
> +  else if (AGGR_INIT_EXPR_FN (call))
> +nothrow
> +  |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (AGGR_INIT_EXPR_FN (call;

You should be able to avoid duplication here by using cp_get_callee
rather than *_FN.

Jason


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-16 Thread Jim Wilson
On Wed, May 9, 2018 at 2:20 PM, Jim Wilson  wrote:
> On Wed, May 2, 2018 at 3:05 PM, Jim Wilson  wrote:
>> This improves the code for a switch statement on targets that sign-extend
>> function arguments, such as RISC-V.  Given a simple testcase
>> ...
>> gcc/
>> * expr.c (do_tablejump): When converting index to Pmode, if we have a
>> sign extended promoted subreg, and the range does not have the sign 
>> bit
>> set, then do a sign extend.

ping^2

https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00118.html

Jim


Re: [PATCH] DWARF: Emit DWARF5 forms for indirect addresses and string offsets.

2018-05-16 Thread Jason Merrill
OK.

On Mon, May 14, 2018 at 9:42 AM, Mark Wielaard  wrote:
> On Mon, 2018-04-30 at 14:35 +0200, Mark Wielaard wrote:
>> We already emit DWARF5 attributes and tables for indirect addresses
>> and string offsets, but still use GNU forms. Add a new helper function
>> dwarf_FORM () for emitting the right form.
>>
>> Currently we only use the uleb128 forms. But DWARF5 also allows
>> 1, 2, 3 and 4 byte forms (DW_FORM_strx[1234] and DW_FORM_addrx[1234])
>> which might be more space efficient.
>
> Ping.
>
> gcc/ChangeLog:
>
> * dwarf2out.c (dwarf_FORM): New function.
> (set_indirect_string): Use dwarf_FORM.
> (reset_indirect_string): Likewise.
> (size_of_die): Likewise.
> (value_format): Likewise.
> (output_die): Likewise.
> (add_skeleton_AT_string): Likewise.
> (output_macinfo_op): Likewise.
> (index_string): Likewise.
> (output_index_string_offset): Likewise.
> (output_index_string): Likewise.
> (count_index_strings): Likewise.
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 340de5b..85a1a8b 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -246,7 +246,7 @@ static GTY (()) hash_table 
> *debug_line_str_hash;
> That is, the comp_dir and dwo_name will appear in both places.
>
> 2) Strings can use four forms: DW_FORM_string, DW_FORM_strp,
> -   DW_FORM_line_strp or DW_FORM_GNU_str_index.
> +   DW_FORM_line_strp or DW_FORM_strx/GNU_str_index.
>
> 3) GCC chooses the form to use late, depending on the size and
> reference count.
> @@ -1757,6 +1757,28 @@ dwarf_TAG (enum dwarf_tag tag)
>return tag;
>  }
>
> +/* And similarly for forms.  */
> +static inline enum dwarf_form
> +dwarf_FORM (enum dwarf_form form)
> +{
> +  switch (form)
> +{
> +case DW_FORM_addrx:
> +  if (dwarf_version < 5)
> +   return DW_FORM_GNU_addr_index;
> +  break;
> +
> +case DW_FORM_strx:
> +  if (dwarf_version < 5)
> +   return DW_FORM_GNU_str_index;
> +  break;
> +
> +default:
> +  break;
> +}
> +  return form;
> +}
> +
>  static unsigned long int get_base_type_offset (dw_die_ref);
>
>  /* Return the size of a location descriptor.  */
> @@ -4387,8 +4409,8 @@ AT_class (dw_attr_node *a)
>  }
>
>  /* Return the index for any attribute that will be referenced with a
> -   DW_FORM_GNU_addr_index or DW_FORM_GNU_str_index.  String indices
> -   are stored in dw_attr_val.v.val_str for reference counting
> +   DW_FORM_addrx/GNU_addr_index or DW_FORM_strx/GNU_str_index.  String
> +   indices are stored in dw_attr_val.v.val_str for reference counting
> pruning.  */
>
>  static inline unsigned int
> @@ -4652,7 +4674,7 @@ set_indirect_string (struct indirect_string_node *node)
>/* Already indirect is a no op.  */
>if (node->form == DW_FORM_strp
>|| node->form == DW_FORM_line_strp
> -  || node->form == DW_FORM_GNU_str_index)
> +  || node->form == dwarf_FORM (DW_FORM_strx))
>  {
>gcc_assert (node->label);
>return;
> @@ -4668,7 +4690,7 @@ set_indirect_string (struct indirect_string_node *node)
>  }
>else
>  {
> -  node->form = DW_FORM_GNU_str_index;
> +  node->form = dwarf_FORM (DW_FORM_strx);
>node->index = NO_INDEX_ASSIGNED;
>  }
>  }
> @@ -4681,7 +4703,7 @@ int
>  reset_indirect_string (indirect_string_node **h, void *)
>  {
>struct indirect_string_node *node = *h;
> -  if (node->form == DW_FORM_strp || node->form == DW_FORM_GNU_str_index)
> +  if (node->form == DW_FORM_strp || node->form == dwarf_FORM (DW_FORM_strx))
>  {
>free (node->label);
>node->label = NULL;
> @@ -9419,7 +9441,7 @@ size_of_die (dw_die_ref die)
>form = AT_string_form (a);
>   if (form == DW_FORM_strp || form == DW_FORM_line_strp)
> size += DWARF_OFFSET_SIZE;
> - else if (form == DW_FORM_GNU_str_index)
> + else if (form == dwarf_FORM (DW_FORM_strx))
> size += size_of_uleb128 (AT_index (a));
>   else
> size += strlen (a->dw_attr_val.v.val_str->str) + 1;
> @@ -9666,7 +9688,7 @@ value_format (dw_attr_node *a)
> case DW_AT_entry_pc:
> case DW_AT_trampoline:
>return (AT_index (a) == NOT_INDEXED
> -  ? DW_FORM_addr : DW_FORM_GNU_addr_index);
> +  ? DW_FORM_addr : dwarf_FORM (DW_FORM_addrx));
> default:
>   break;
> }
> @@ -9839,7 +9861,7 @@ value_format (dw_attr_node *a)
>return DW_FORM_data;
>  case dw_val_class_lbl_id:
>return (AT_index (a) == NOT_INDEXED
> -  ? DW_FORM_addr : DW_FORM_GNU_addr_index);
> +  ? DW_FORM_addr : dwarf_FORM (DW_FORM_addrx));
>  case dw_val_class_lineptr:
>  case dw_val_class_macptr:
>  case dw_val_class_loclistsptr:
> @@ -10807,7 +10829,7 @@ output_die (dw_die_ref die)
>a->dw_attr_val.v.val_str->label,
>   

Re: C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Marek Polacek
On Wed, May 16, 2018 at 01:53:50PM -0400, Jason Merrill wrote:
> You should be able to avoid duplication here by using cp_get_callee
> rather than *_FN.

Even better, thanks!

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-05-16  Marek Polacek  

PR c++/85363
* call.c (set_flags_from_callee): Handle AGGR_INIT_EXPRs too.
* tree.c (bot_manip): Call set_flags_from_callee for
AGGR_INIT_EXPRs too.

* g++.dg/cpp0x/initlist-throw1.C: New test.
* g++.dg/cpp0x/initlist-throw2.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 09a3618b007..4d04785f2b9 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -319,15 +319,17 @@ build_call_n (tree function, int n, ...)
 void
 set_flags_from_callee (tree call)
 {
-  bool nothrow;
-  tree decl = get_callee_fndecl (call);
+  /* Handle both CALL_EXPRs and AGGR_INIT_EXPRs.  */
+  tree decl = cp_get_callee_fndecl_nofold (call);
 
   /* We check both the decl and the type; a function may be known not to
  throw without being declared throw().  */
-  nothrow = decl && TREE_NOTHROW (decl);
-  if (CALL_EXPR_FN (call))
-nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
-  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
+  bool nothrow = decl && TREE_NOTHROW (decl);
+  tree callee = cp_get_callee (call);
+  if (callee)
+nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (callee)));
+  else if (TREE_CODE (call) == CALL_EXPR
+  && internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
 nothrow = true;
 
   if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
diff --git gcc/cp/tree.c gcc/cp/tree.c
index ecb88df23b9..db81da91676 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -2987,7 +2987,7 @@ bot_manip (tree* tp, int* walk_subtrees, void* data_)
 
   /* Make a copy of this node.  */
   t = copy_tree_r (tp, walk_subtrees, NULL);
-  if (TREE_CODE (*tp) == CALL_EXPR)
+  if (TREE_CODE (*tp) == CALL_EXPR || TREE_CODE (*tp) == AGGR_INIT_EXPR)
 set_flags_from_callee (*tp);
   if (data.clear_location && EXPR_HAS_LOCATION (*tp))
 SET_EXPR_LOCATION (*tp, input_location);
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
index e69de29bb2d..264c6c7a7a0 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
@@ -0,0 +1,29 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw f;
+}
+
+struct X {
+  X (int f) : n {init (f)} {}
+  int n;
+};
+
+struct P {
+  X x{20};
+};
+
+int
+main ()
+{
+  try {
+P p {};
+  }
+  catch (int n) {
+return 0;
+  }
+  return 1;
+}
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
index e69de29bb2d..2bb05834d9e 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
@@ -0,0 +1,33 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw f;
+}
+
+struct X {
+  X () : n {init (42)} {}
+  int n;
+};
+
+struct P {
+  struct R {
+struct Q {
+  X x = {};
+} q;
+  } r;
+};
+
+int
+main ()
+{
+  try {
+P p {};
+  }
+  catch (int n) {
+return 0;
+  }
+  return 1;
+}


Re: [PATCH, rs6000] Fixes for builtin_prefetch for AIX compatability.

2018-05-16 Thread Segher Boessenkool
Hi Carl,

On Wed, May 16, 2018 at 08:53:24AM -0700, Carl Love wrote:
> The previous patch to map dcbtstt, dcbtt to n2=0 for __builtin_prefetch
> builtin caused issues on AIX.  The issue is AIX does not support
> the dcbtstt and dcbtt extended mnemonics.  Unfortunately, the AIX
> assembler also does not support the three operand form of dcbt and
> dcbtst on Power 7.  
> 
> This patch fixes up the support for dcbtstt and dcbtt to make it
> compatible with Linux and AIX.  The new support now starts with Power 8
> rather then Power 7 on both systems for simplicity.

Okay for trunk.  Thank you!


Segher


> 2018-05-16  Carl Love  
> 
>   * config/rs6000/rs6000.md (prefetch): Generate ISA 2.06 instructions
>   dcbt and dcbtstt with TH=16 if operands[2] is 0 and Power 8 or newer.


Re: [PATCH] DWARF: Add header for .debug_str_offsets table for dwarf_version 5.

2018-05-16 Thread Jason Merrill
On Mon, Apr 30, 2018 at 8:34 AM, Mark Wielaard  wrote:
> DWARF5 defines a small header for .debug_str_offsets.  Since we only use
> it for split dwarf .dwo files we don't need to keep track of the actual
> index offset in an attribute.
>
> gcc/ChangeLog:
>
> * dwarf2out.c (count_index_strings): New function.
> (output_indirect_strings): Call count_index_strings and generate
> header for dwarf_version >= 5.

OK.

Jason


Re: C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Marek Polacek
On Wed, May 16, 2018 at 11:35:56AM -0400, Jason Merrill wrote:
> On Wed, May 16, 2018 at 11:15 AM, Marek Polacek  wrote:
> > This PR has been on my mind for quite some time but I've not found a 
> > solution
> > that I'd like.  Maybe one of you can think of something better.
> >
> > The problem in this test is that in C++11, .eh optimizes out the catch,
> > so the exception is never caught.  That is because lower_catch doesn't
> > think that the region may throw (eh_region_may_contain_throw).  That's
> > so because P::P () is marked as TREE_NOTHROW, which is wrong, because
> > it calls X::X() which calls init() with throw.  TREE_NOTHROW is set in
> > finish_function:
> >
> >   /* If this function can't throw any exceptions, remember that.  */
> >   if (!processing_template_decl
> >   && !cp_function_chain->can_throw
> >   && !flag_non_call_exceptions
> >   && !decl_replaceable_p (fndecl))
> > TREE_NOTHROW (fndecl) = 1;
> >
> > P::P() should have been marked as can_throw in set_flags_from_callee, but 
> > when
> > processing X::X() cfun is null, so we can't set it.  P::P() is created only
> > later via implicitly_declare_fn.
> 
> This should be handled by bot_manip (under break_out_target_exprs,
> under get_nsdmi), but it seems that it currently only calls
> set_flags_from_callee for CALL_EXPR, not for AGGR_INIT_EXPR as we have
> in this case.

Ah, nice!  So, this tweaks set_flags_from_callee to also work for
AGGR_INIT_EXPRs.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-05-16  Marek Polacek  

PR c++/85363
* call.c (set_flags_from_callee): Handle AGGR_INIT_EXPRs too.
* tree.c (bot_manip): Call set_flags_from_callee for
AGGR_INIT_EXPRs too.

* g++.dg/cpp0x/initlist-throw1.C: New test.
* g++.dg/cpp0x/initlist-throw2.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 09a3618b007..11b40747932 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -319,16 +319,23 @@ build_call_n (tree function, int n, ...)
 void
 set_flags_from_callee (tree call)
 {
-  bool nothrow;
-  tree decl = get_callee_fndecl (call);
+  /* Handle both CALL_EXPRs and AGGR_INIT_EXPRs.  */
+  tree decl = cp_get_callee_fndecl_nofold (call);
 
   /* We check both the decl and the type; a function may be known not to
  throw without being declared throw().  */
-  nothrow = decl && TREE_NOTHROW (decl);
-  if (CALL_EXPR_FN (call))
-nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
-  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
-nothrow = true;
+  bool nothrow = decl && TREE_NOTHROW (decl);
+  if (TREE_CODE (call) == CALL_EXPR)
+{
+  if (CALL_EXPR_FN (call))
+   nothrow
+ |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
+  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
+   nothrow = true;
+}
+  else if (AGGR_INIT_EXPR_FN (call))
+nothrow
+  |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (AGGR_INIT_EXPR_FN (call;
 
   if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
 cp_function_chain->can_throw = 1;
diff --git gcc/cp/tree.c gcc/cp/tree.c
index ecb88df23b9..db81da91676 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -2987,7 +2987,7 @@ bot_manip (tree* tp, int* walk_subtrees, void* data_)
 
   /* Make a copy of this node.  */
   t = copy_tree_r (tp, walk_subtrees, NULL);
-  if (TREE_CODE (*tp) == CALL_EXPR)
+  if (TREE_CODE (*tp) == CALL_EXPR || TREE_CODE (*tp) == AGGR_INIT_EXPR)
 set_flags_from_callee (*tp);
   if (data.clear_location && EXPR_HAS_LOCATION (*tp))
 SET_EXPR_LOCATION (*tp, input_location);
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
index e69de29bb2d..264c6c7a7a0 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
@@ -0,0 +1,29 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw f;
+}
+
+struct X {
+  X (int f) : n {init (f)} {}
+  int n;
+};
+
+struct P {
+  X x{20};
+};
+
+int
+main ()
+{
+  try {
+P p {};
+  }
+  catch (int n) {
+return 0;
+  }
+  return 1;
+}
diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C 
gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
index e69de29bb2d..2bb05834d9e 100644
--- gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
@@ -0,0 +1,33 @@
+// PR c++/85363
+// { dg-do run { target c++11 } }
+
+int
+init (int f)
+{
+  throw f;
+}
+
+struct X {
+  X () : n {init (42)} {}
+  int n;
+};
+
+struct P {
+  struct R {
+struct Q {
+  X x = {};
+} q;
+  } r;
+};
+
+int
+main ()
+{
+  try {
+P p {};
+  }
+  catch (int n) {
+return 0;
+  }
+  return 1;
+}


[PATCH] RISC-V: Minor pattern name cleanup.

2018-05-16 Thread Jim Wilson
This just fixes a minor nit with an earlier patch.  I added some new patterns
that are never directly called, and forgot to put the asterisk in the name.

Tested with riscv{32,64}-{elf,linux} builds to verify that the compiler still
builds OK.

Committed.

Jim

gcc/
* config/riscv/riscv.md (si3_mask, si3_mask_1): Prepend
asterisk to name.
(di3_mask, di3_mask_1): Likewise.
---
 gcc/config/riscv/riscv.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9d222731a06..56fe516dbcf 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1504,7 +1504,7 @@
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
-(define_insn_and_split "si3_mask"
+(define_insn_and_split "*si3_mask"
   [(set (match_operand:SI 0 "register_operand" "= r")
(any_shift:SI
(match_operand:SI 1 "register_operand" "  r")
@@ -1523,7 +1523,7 @@
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
-(define_insn_and_split "si3_mask_1"
+(define_insn_and_split "*si3_mask_1"
   [(set (match_operand:SI 0 "register_operand" "= r")
(any_shift:SI
(match_operand:SI 1 "register_operand" "  r")
@@ -1559,7 +1559,7 @@
   [(set_attr "type" "shift")
(set_attr "mode" "DI")])
 
-(define_insn_and_split "di3_mask"
+(define_insn_and_split "*di3_mask"
   [(set (match_operand:DI 0 "register_operand" "= r")
(any_shift:DI
(match_operand:DI 1 "register_operand" "  r")
@@ -1579,7 +1579,7 @@
   [(set_attr "type" "shift")
(set_attr "mode" "DI")])
 
-(define_insn_and_split "di3_mask_1"
+(define_insn_and_split "*di3_mask_1"
   [(set (match_operand:DI 0 "register_operand" "= r")
(any_shift:DI
(match_operand:DI 1 "register_operand" "  r")
-- 
2.14.1



Re: [PATCH] Support lower and upper limit for -fdbg-cnt flag.

2018-05-16 Thread Alexander Monakov
On Wed, 16 May 2018, Martin Liška wrote:

> Hi.
> 
> I consider it handy sometimes to trigger just a single invocation of
> an optimization driven by a debug counter. Doing that one needs to
> be able to limit both lower and upper limit of a counter. It's implemented
> in the patch.

I'd like to offer some non-reviewer comments on the patch (below)

> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1171,7 +1171,7 @@ List all available debugging counters with their limits 
> and counts.
>  
>  fdbg-cnt=
>  Common RejectNegative Joined Var(common_deferred_options) Defer
> --fdbg-cnt=:[,:,...]  Set the debug counter 
> limit.
> +-fdbg-cnt=[:]:[,::,...]
> Set the debug counter limit.

This line has gotten quite long and repeating the same thing in the second
brackets is not very helpful. Can we use something simpler like this?

-fdbg-cnt=[:]:[,:...]

> +#define DEBUG_COUNTER(a) 1,
> +static unsigned int limit_low[debug_counter_number_of_counters] =
> +{
> +#include "dbgcnt.def"
> +};
> +#undef DEBUG_COUNTER
> +
> +
>  static unsigned int count[debug_counter_number_of_counters];
>  
>  bool
>  dbg_cnt_is_enabled (enum debug_counter index)
>  {
> -  return count[index] <= limit[index];
> +  return limit_low[index] <= count[index] && count[index] <= 
> limit_high[index];

I recall Jakub recently applied a tree-wide change of A < B && B < C to read
B > A && B < C.

Please consider making limit_low non-inclusive by testing for strict inequality
count[index] > limit_low[index]. This will allow to make limit_low[] array
zero-initialized (taking up space only in BSS).

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14326,13 +14326,14 @@ Print the name and the counter upper bound for all 
> debug counters.
>  
>  @item -fdbg-cnt=@var{counter-value-list}
>  @opindex fdbg-cnt
> -Set the internal debug counter upper bound.  @var{counter-value-list}
> -is a comma-separated list of @var{name}:@var{value} pairs
> -which sets the upper bound of each debug counter @var{name} to @var{value}.
> +Set the internal debug counter lower and upper bound.  
> @var{counter-value-list}
> +is a comma-separated list of @var{name}:@var{lower_bound}:@var{upper_bound}
> +tuples which sets the lower and the upper bound of each debug
> +counter @var{name}.

Shouldn't this mention that lower bound is optional?

>  All debug counters have the initial upper bound of @code{UINT_MAX};
>  thus @code{dbg_cnt} returns true always unless the upper bound
>  is set by this option.
> -For example, with @option{-fdbg-cnt=dce:10,tail_call:0},
> +For example, with @option{-fdbg-cnt=dce:9:10,tail_call:0},
>  @code{dbg_cnt(dce)} returns true only for first 10 invocations.

This seems confusing, you added a lower bound to the 'dce' counter,
but the following text remains unchanged and says it's enabled for
first 10 calls?

Alexander

Re: [RFC][PR82479] missing popcount builtin detection

2018-05-16 Thread Kugan Vivekanandarajah
Hi Richard,

On 6 March 2018 at 02:24, Richard Biener  wrote:
> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> On 1 February 2018 at 23:21, Richard Biener  
>> wrote:
>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>  wrote:
 Hi Richard,

 On 31 January 2018 at 21:39, Richard Biener  
 wrote:
> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> Thanks for the review.
>> On 25 January 2018 at 20:04, Richard Biener  
>> wrote:
>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>  wrote:
 Hi All,

 Here is a patch for popcount builtin detection similar to LLVM. I
 would like to queue this for review for next stage 1.

 1. This is done part of loop-distribution and effective for -O3 and 
 above.
 2. This does not distribute loop to detect popcount (like
 memcpy/memmove). I dont think that happens in practice. Please correct
 me if I am wrong.
>>>
>>> But then it has no business inside loop distribution but instead is
>>> doing final value
>>> replacement, right?  You are pattern-matching the whole loop after all. 
>>>  I think
>>> final value replacement would already do the correct thing if you
>>> teached number of
>>> iteration analysis that niter for
>>>
>>>[local count: 955630224]:
>>>   # b_11 = PHI 
>>>   _1 = b_11 + -1;
>>>   b_8 = _1 & b_11;
>>>   if (b_8 != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 850510900]:
>>>   goto ; [100.00%]
>>
>> I am looking into this approach. What should be the scalar evolution
>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>> and can this be represented with the scev?
>
> No, it's not affine and thus cannot be represented.  You only need the
> scalar evolution of the counting IV which is already handled and
> the number of iteration analysis needs to handle the above IV - this
> is the missing part.
 Thanks for the clarification. I am now matching this loop pattern in
 number_of_iterations_exit when number_of_iterations_exit_assumptions
 fails. If the pattern matches, I am inserting the _builtin_popcount in
 the loop preheater and setting the loop niter with this. This will be
 used by the final value replacement. Is this what you wanted?
>>>
>>> No, you shouldn't insert a popcount stmt but instead the niter
>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>> appropriate argument.
>>
>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>> niter in tree_niter_desc can take such.
>>
>> Attached patch now does this. Also had to add support for CALL_EXPR in
>> few places to handle niter with CALL_EXPR. Does this look OK?
>
> Overall this looks ok - the patch includes changes in places that I don't 
> think
> need changes such as chrec_convert_1 or extract_ops_from_tree.
> The expression_expensive_p change should be more specific than making
> all calls inexpensive as well.

Changed it.

>
> The verify_ssa change looks bogus, you do
>
> +  dest = gimple_phi_result (count_phi);
> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> +
> +  var = build_call_expr (fn, 1, src);
> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
> +   build_int_cst (TREE_TYPE (dest), 1));
>
> why do you allocate a new SSA name here?  It seems unused
> as you overwrive 'var' with the CALL_EXPR immediately.
Changed now.

>
> I didn't review the pattern matching thoroughly nor the exact place you
> call it.  But
>
> +  if (check_popcount_pattern (loop, ))
> +   {
> + niter->assumptions = boolean_false_node;
> + niter->control.base = NULL_TREE;
> + niter->control.step = NULL_TREE;
> + niter->control.no_overflow = false;
> + niter->niter = count;
> + niter->assumptions = boolean_true_node;
> + niter->may_be_zero = boolean_false_node;
> + niter->max = -1;
> + niter->bound = NULL_TREE;
> + niter->cmp = ERROR_MARK;
> + return true;
> +   }
>
> simply setting may_be_zero to false looks fishy.
Should I set this to (argument to popcount == zero)?

> Try with -fno-tree-loop-ch.
I changed the pattern matching to handle loop without header copying
too. Looks a bit complicated checking all the conditions. Wondering if
this can be done in a simpler and easier to read way.

>  Also max should not 

[RFC][PR64946] "abs" vectorization fails for char/short types

2018-05-16 Thread Kugan Vivekanandarajah
As mentioned in the PR, I am trying to add ABSU_EXPR to fix this
issue. In the attached patch, in fold_cond_expr_with_comparison I am
generating ABSU_EXPR for these cases. As I understand, absu_expr is
well defined in RTL. So, the issue is generating absu_expr  and
transferring to RTL in the correct way. I am not sure I am not doing
all that is needed. I will clean up and add more test-cases based on
the feedback.

Thanks,
Kugan


gcc/ChangeLog:

2018-05-13  Kugan Vivekanandarajah  

* expr.c (expand_expr_real_2): Handle ABSU_EXPR.
* fold-const.c (fold_cond_expr_with_comparison): Generate ABSU_EXPR
(fold_unary_loc): Handle ABSU_EXPR.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_expr): Likewise.
(verify_gimple_assign_unary):  Likewise.
* tree-if-conv.c (fold_build_cond_expr):  Likewise.
* tree-inline.c (estimate_operator_cost):  Likewise.
* tree-pretty-print.c (dump_generic_node):  Likewise.
* tree.def (ABSU_EXPR): New.

gcc/testsuite/ChangeLog:

2018-05-13  Kugan Vivekanandarajah  

* gcc.dg/absu.c: New test.
diff --git a/gcc/expr.c b/gcc/expr.c
index 5e3d9a5..67f8dd1 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9063,6 +9063,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
   return REDUCE_BIT_FIELD (temp);
 
 case ABS_EXPR:
+case ABSU_EXPR:
   op0 = expand_expr (treeop0, subtarget,
 VOIDmode, EXPAND_NORMAL);
   if (modifier == EXPAND_STACK_PARM)
@@ -9074,7 +9075,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
 
   /* Unsigned abs is simply the operand.  Testing here means we don't
 risk generating incorrect code below.  */
-  if (TYPE_UNSIGNED (type))
+  if (TYPE_UNSIGNED (type)
+ && (code != ABSU_EXPR))
return op0;
 
   return expand_abs (mode, op0, target, unsignedp,
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 3a99b66..6e80178 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -5324,8 +5324,17 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
   case GT_EXPR:
if (TYPE_UNSIGNED (TREE_TYPE (arg1)))
  break;
-   tem = fold_build1_loc (loc, ABS_EXPR, TREE_TYPE (arg1), arg1);
-   return fold_convert_loc (loc, type, tem);
+   if (TREE_CODE (arg1) == NOP_EXPR)
+ {
+   arg1 = TREE_OPERAND (arg1, 0);
+   tem = fold_build1_loc (loc, ABSU_EXPR, unsigned_type_for 
(arg1_type), arg1);
+   return fold_convert_loc (loc, type, tem);
+ }
+   else
+ {
+   tem = fold_build1_loc (loc, ABS_EXPR, TREE_TYPE (arg1), arg1);
+   return fold_convert_loc (loc, type, tem);
+ }
   case UNLE_EXPR:
   case UNLT_EXPR:
if (flag_trapping_math)
@@ -7698,7 +7707,8 @@ fold_unary_loc (location_t loc, enum tree_code code, tree 
type, tree op0)
   if (arg0)
 {
   if (CONVERT_EXPR_CODE_P (code)
- || code == FLOAT_EXPR || code == ABS_EXPR || code == NEGATE_EXPR)
+ || code == FLOAT_EXPR || code == ABS_EXPR
+ || code == ABSU_EXPR || code == NEGATE_EXPR)
{
  /* Don't use STRIP_NOPS, because signedness of argument type
 matters.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 71e172c..2b812e5 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -235,6 +235,7 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return trapv ? negv_optab : neg_optab;
 
 case ABS_EXPR:
+case ABSU_EXPR:
   return trapv ? absv_optab : abs_optab;
 
 default:
diff --git a/gcc/testsuite/gcc.dg/absu.c b/gcc/testsuite/gcc.dg/absu.c
index e69de29..43e651b 100644
--- a/gcc/testsuite/gcc.dg/absu.c
+++ b/gcc/testsuite/gcc.dg/absu.c
@@ -0,0 +1,39 @@
+
+/* { dg-do run  } */
+/* { dg-options "-O0" } */
+
+#include 
+#define ABS(x) (((x) >= 0) ? (x) : -(x))
+
+#define DEF_TEST(TYPE) \
+void foo_##TYPE (signed TYPE x, unsigned TYPE y){  \
+TYPE t = ABS (x);  \
+if (t != y)\
+   __builtin_abort (); \
+}  \
+
+DEF_TEST (char);
+DEF_TEST (short);
+DEF_TEST (int);
+DEF_TEST (long);
+void main ()
+{
+  foo_char (SCHAR_MIN + 1, SCHAR_MAX);
+  foo_char (0, 0);
+  foo_char (SCHAR_MAX, SCHAR_MAX);
+
+  foo_int (-1, 1);
+  foo_int (0, 0);
+  foo_int (INT_MAX, INT_MAX);
+  foo_int (INT_MIN + 1, INT_MAX);
+
+  foo_short (-1, 1);
+  foo_short (0, 0);
+  foo_short (SHRT_MAX, SHRT_MAX);
+  foo_short (SHRT_MIN + 1, SHRT_MAX);
+
+  foo_long (-1, 1);
+  foo_long (0, 0);
+  foo_long (LONG_MAX, LONG_MAX);
+  foo_long (LONG_MIN + 1, LONG_MAX);
+}
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 9485f73..59a115c 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3167,6 +3167,9 @@ verify_expr (tree *tp, int *walk_subtrees, void *data 

Re: [RFC][PR64946] "abs" vectorization fails for char/short types

2018-05-16 Thread Andrew Pinski
On Wed, May 16, 2018 at 7:14 PM, Kugan Vivekanandarajah
 wrote:
> As mentioned in the PR, I am trying to add ABSU_EXPR to fix this
> issue. In the attached patch, in fold_cond_expr_with_comparison I am
> generating ABSU_EXPR for these cases. As I understand, absu_expr is
> well defined in RTL. So, the issue is generating absu_expr  and
> transferring to RTL in the correct way. I am not sure I am not doing
> all that is needed. I will clean up and add more test-cases based on
> the feedback.


diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 71e172c..2b812e5 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -235,6 +235,7 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return trapv ? negv_optab : neg_optab;

 case ABS_EXPR:
+case ABSU_EXPR:
   return trapv ? absv_optab : abs_optab;


This part is not correct, it should something like this:

 case ABS_EXPR:
   return trapv ? absv_optab : abs_optab;
+case ABSU_EXPR:
+   return abs_optab ;

Because ABSU is not undefined at the TYPE_MAX.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2018-05-13  Kugan Vivekanandarajah  
>
> * expr.c (expand_expr_real_2): Handle ABSU_EXPR.
> * fold-const.c (fold_cond_expr_with_comparison): Generate ABSU_EXPR
> (fold_unary_loc): Handle ABSU_EXPR.
> * optabs-tree.c (optab_for_tree_code): Likewise.
> * tree-cfg.c (verify_expr): Likewise.
> (verify_gimple_assign_unary):  Likewise.
> * tree-if-conv.c (fold_build_cond_expr):  Likewise.
> * tree-inline.c (estimate_operator_cost):  Likewise.
> * tree-pretty-print.c (dump_generic_node):  Likewise.
> * tree.def (ABSU_EXPR): New.
>
> gcc/testsuite/ChangeLog:
>
> 2018-05-13  Kugan Vivekanandarajah  
>
> * gcc.dg/absu.c: New test.


Re: [PR63185][RFC] Improve DSE with branches

2018-05-16 Thread Jeff Law
On 05/15/2018 08:42 AM, Richard Biener wrote:
> On Tue, 15 May 2018, Richard Biener wrote:
> 
>> On Tue, 15 May 2018, Richard Biener wrote:
>>
>>> On Tue, 15 May 2018, Richard Biener wrote:
>>>
 On Tue, 15 May 2018, Richard Biener wrote:

> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
>
>> Hi,
>>
>> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>> We could see the PHI and if there isn't any uses for PHI that is
>> interesting, we could ignore that ?
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.
>> Is this OK?
>
> No, as Jeff said we can't do it this way.
>
> If we end up with multiple VDEFs in the walk of defvar immediate
> uses we know we are dealing with a CFG fork.  We can't really
> ignore any of the paths but we have to
>
>  a) find the merge point (and the associated VDEF)
>  b) verify for each each chain of VDEFs with associated VUSEs
> up to that merge VDEF that we have no uses of the to classify
> store and collect (partial) kills
>  c) intersect kill info and continue walking from the merge point
>
> in b) there's the optional possibility to find sinking opportunities
> in case we have kills on some paths but uses on others.  This is why
> DSE should be really merged with (store) sinking.
>
> So if we want to enhance DSEs handling of branches then we need
> to refactor the simple dse_classify_store function.  Let me take
> an attempt at this today.

 First (baby) step is the following - it arranges to collect the
 defs we need to continue walking from and implements trivial
 reduction by stopping at (full) kills.  This allows us to handle
 the new testcase (which was already handled in the very late DSE
 pass with the help of sinking the store).

 I took the opportunity to kill the use_stmt parameter of
 dse_classify_store as the only user is only looking for whether
 the kills were all clobbers which I added a new parameter for.

 I didn't adjust the byte-tracking case fully (I'm not fully understanding
 the code in the case of a use and I'm not sure whether it's worth
 doing the def reduction with byte-tracking).

 Your testcase can be handled by reducing the PHI and the call def
 by seeing that the only use of a candidate def is another def
 we have already processed.  Not sure if worth special-casing though,
 I'd rather have a go at "recursing".  That will be the next
 patch.

 Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>>
>>> Applied.
>>>
>>> Another intermediate one below, fixing the byte-tracking for
>>> stmt with uses.  This also re-does the PHI handling by simply
>>> avoiding recursion by means of a visited bitmap and stopping
>>> at the DSE classify stmt when re-visiting it instead of failing.
>>> This covers Pratamesh loop case for which I added ssa-dse-33.c.
>>> For the do-while loop this still runs into the inability to
>>> handle two defs to walk from.
>>>
>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>
>> Ok, loop handling doesn't work in general since we run into the
>> issue that SSA form across the backedge is not representing the
>> same values.  Consider
>>
>>  
>>  # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
>>  # n_20 = PHI <0(2), n_7(4)>
>>  # .MEM_13 = VDEF <.MEM_22>
>>  bytes[n_20] = _4;
>>  if (n_20 > 7)
>>goto ;
>>
>>  
>>  n_7 = n_20 + 1;
>>  # .MEM_15 = VDEF <.MEM_13>
>>  bytes[n_20] = _5;
>>  goto ;
>>
>> then when classifying the store in bb4, visiting the PHI node
>> gets us to the store in bb3 which appears to be killing.
>>
>>if (gimple_code (temp) == GIMPLE_PHI)
>> -   defvar = PHI_RESULT (temp);
>> +   {
>> + /* If we visit this PHI by following a backedge then reset
>> +any info in ref that may refer to SSA names which we'd need
>> +to PHI translate.  */
>> + if (gimple_bb (temp) == gimple_bb (stmt)
>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
>> +gimple_bb (temp)))
>> +   /* ???  ref->ref may not refer to SSA names or it may only
>> +  refer to SSA names that are invariant with respect to the
>> +  loop represented by this PHI node.  */
>> +   ref->ref = NULL_TREE;
>> + defvar = PHI_RESULT (temp);
>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
>> +   }
>>
>> should be a workable solution for that.  I'm checking that, but
>> eventually you can think of other things that might prevent us from
>> handling backedges.  Note the current code tries to allow
>> looking across loops but not handle backedges of loops the
>> original stmt belongs to.
> 
> Just to mention before I leave for the day I think I've identified
> a latent issue where I just fail to produce a testcase right now
> 

Re: [PR63185][RFC] Improve DSE with branches

2018-05-16 Thread Jeff Law
On 05/16/2018 04:12 AM, Richard Biener wrote:
> On Tue, 15 May 2018, Richard Biener wrote:
> 
>> On Tue, 15 May 2018, Richard Biener wrote:
>>
>>> On Tue, 15 May 2018, Richard Biener wrote:
>>>
 On Tue, 15 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
>
>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
>>
>>> Hi,
>>>
>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>>> We could see the PHI and if there isn't any uses for PHI that is
>>> interesting, we could ignore that ?
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.
>>> Is this OK?
>>
>> No, as Jeff said we can't do it this way.
>>
>> If we end up with multiple VDEFs in the walk of defvar immediate
>> uses we know we are dealing with a CFG fork.  We can't really
>> ignore any of the paths but we have to
>>
>>  a) find the merge point (and the associated VDEF)
>>  b) verify for each each chain of VDEFs with associated VUSEs
>> up to that merge VDEF that we have no uses of the to classify
>> store and collect (partial) kills
>>  c) intersect kill info and continue walking from the merge point
>>
>> in b) there's the optional possibility to find sinking opportunities
>> in case we have kills on some paths but uses on others.  This is why
>> DSE should be really merged with (store) sinking.
>>
>> So if we want to enhance DSEs handling of branches then we need
>> to refactor the simple dse_classify_store function.  Let me take
>> an attempt at this today.
>
> First (baby) step is the following - it arranges to collect the
> defs we need to continue walking from and implements trivial
> reduction by stopping at (full) kills.  This allows us to handle
> the new testcase (which was already handled in the very late DSE
> pass with the help of sinking the store).
>
> I took the opportunity to kill the use_stmt parameter of
> dse_classify_store as the only user is only looking for whether
> the kills were all clobbers which I added a new parameter for.
>
> I didn't adjust the byte-tracking case fully (I'm not fully understanding
> the code in the case of a use and I'm not sure whether it's worth
> doing the def reduction with byte-tracking).
>
> Your testcase can be handled by reducing the PHI and the call def
> by seeing that the only use of a candidate def is another def
> we have already processed.  Not sure if worth special-casing though,
> I'd rather have a go at "recursing".  That will be the next
> patch.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

 Applied.

 Another intermediate one below, fixing the byte-tracking for
 stmt with uses.  This also re-does the PHI handling by simply
 avoiding recursion by means of a visited bitmap and stopping
 at the DSE classify stmt when re-visiting it instead of failing.
 This covers Pratamesh loop case for which I added ssa-dse-33.c.
 For the do-while loop this still runs into the inability to
 handle two defs to walk from.

 Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>>
>>> Ok, loop handling doesn't work in general since we run into the
>>> issue that SSA form across the backedge is not representing the
>>> same values.  Consider
>>>
>>>  
>>>  # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
>>>  # n_20 = PHI <0(2), n_7(4)>
>>>  # .MEM_13 = VDEF <.MEM_22>
>>>  bytes[n_20] = _4;
>>>  if (n_20 > 7)
>>>goto ;
>>>
>>>  
>>>  n_7 = n_20 + 1;
>>>  # .MEM_15 = VDEF <.MEM_13>
>>>  bytes[n_20] = _5;
>>>  goto ;
>>>
>>> then when classifying the store in bb4, visiting the PHI node
>>> gets us to the store in bb3 which appears to be killing.
>>>
>>>if (gimple_code (temp) == GIMPLE_PHI)
>>> -   defvar = PHI_RESULT (temp);
>>> +   {
>>> + /* If we visit this PHI by following a backedge then reset
>>> +any info in ref that may refer to SSA names which we'd need
>>> +to PHI translate.  */
>>> + if (gimple_bb (temp) == gimple_bb (stmt)
>>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
>>> +gimple_bb (temp)))
>>> +   /* ???  ref->ref may not refer to SSA names or it may only
>>> +  refer to SSA names that are invariant with respect to the
>>> +  loop represented by this PHI node.  */
>>> +   ref->ref = NULL_TREE;
>>> + defvar = PHI_RESULT (temp);
>>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
>>> +   }
>>>
>>> should be a workable solution for that.  I'm checking that, but
>>> eventually you can think of other things that might prevent us from
>>> handling backedges.  Note the current code tries to allow
>>> looking across loops but not handle backedges of loops the
>>> 

Re: [PATCH] ARC: Add multilib support for linux targets

2018-05-16 Thread Andrew Burgess
* Alexey Brodkin  [2018-05-16 22:42:36 +0300]:

> We used to build baremetal (AKA Elf32) multilibbed toolchains for years
> now but never made that for Linux targets since there were problems with
> uClibc n multilib setup. Now with help of Crosstool-NG it is finally
> possible to create uClibc-based multilibbed toolchains and so we add
> relevant CPUs for multilib in case of configuration for "arc*-*-linux*".
> 
> This will be essentially useful for glibc-based multilibbbed toolchains
> in the future.
> 
> gcc/
> 2018-05-16  Alexey Brodkin 
> 
> * config.gcc: Add arc/t-multilib-linux to tmake_file for
> arc*-*-linux*.
> * config/arc/t-multilib-linux: Specify MULTILIB_OPTIONS and
> MULTILIB_DIRNAMES

Looks good.

Thanks,
Andrew

> ---
>  gcc/config.gcc  |  2 +-
>  gcc/config/arc/t-multilib-linux | 25 +
>  2 files changed, 26 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/arc/t-multilib-linux
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index a5defb0f0058..8e038a72f613 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1059,7 +1059,7 @@ arc*-*-elf*)
>   ;;
>  arc*-*-linux*)
>   tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
> linux-android.h glibc-stdint.h arc/linux.h ${tm_file}"
> - tmake_file="${tmake_file} arc/t-arc"
> + tmake_file="${tmake_file} arc/t-multilib-linux arc/t-arc"
>   extra_gcc_objs="driver-arc.o"
>   if test "x$with_cpu" != x; then
>   tm_defines="${tm_defines} TARGET_CPU_BUILD=PROCESSOR_$with_cpu"
> diff --git a/gcc/config/arc/t-multilib-linux b/gcc/config/arc/t-multilib-linux
> new file mode 100644
> index ..f357cfc3f926
> --- /dev/null
> +++ b/gcc/config/arc/t-multilib-linux
> @@ -0,0 +1,25 @@
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +#
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +MULTILIB_OPTIONS = 
> mcpu=hs/mcpu=archs/mcpu=hs38/mcpu=hs38_linux/mcpu=arc700/mcpu=nps400
> +
> +MULTILIB_DIRNAMES = hs archs hs38 hs38_linux arc700 nps400
> +
> +# Aliases:
> +MULTILIB_MATCHES += mcpu?arc700=mA7
> +MULTILIB_MATCHES += mcpu?arc700=mARC700
> -- 
> 2.17.0
> 


[PATCH] ARC: Add multilib support for linux targets

2018-05-16 Thread Alexey Brodkin
We used to build baremetal (AKA Elf32) multilibbed toolchains for years
now but never made that for Linux targets since there were problems with
uClibc n multilib setup. Now with help of Crosstool-NG it is finally
possible to create uClibc-based multilibbed toolchains and so we add
relevant CPUs for multilib in case of configuration for "arc*-*-linux*".

This will be essentially useful for glibc-based multilibbbed toolchains
in the future.

gcc/
2018-05-16  Alexey Brodkin 

* config.gcc: Add arc/t-multilib-linux to tmake_file for
arc*-*-linux*.
* config/arc/t-multilib-linux: Specify MULTILIB_OPTIONS and
MULTILIB_DIRNAMES
---
 gcc/config.gcc  |  2 +-
 gcc/config/arc/t-multilib-linux | 25 +
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arc/t-multilib-linux

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a5defb0f0058..8e038a72f613 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1059,7 +1059,7 @@ arc*-*-elf*)
;;
 arc*-*-linux*)
tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
linux-android.h glibc-stdint.h arc/linux.h ${tm_file}"
-   tmake_file="${tmake_file} arc/t-arc"
+   tmake_file="${tmake_file} arc/t-multilib-linux arc/t-arc"
extra_gcc_objs="driver-arc.o"
if test "x$with_cpu" != x; then
tm_defines="${tm_defines} TARGET_CPU_BUILD=PROCESSOR_$with_cpu"
diff --git a/gcc/config/arc/t-multilib-linux b/gcc/config/arc/t-multilib-linux
new file mode 100644
index ..f357cfc3f926
--- /dev/null
+++ b/gcc/config/arc/t-multilib-linux
@@ -0,0 +1,25 @@
+# Copyright (C) 2018 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+MULTILIB_OPTIONS = 
mcpu=hs/mcpu=archs/mcpu=hs38/mcpu=hs38_linux/mcpu=arc700/mcpu=nps400
+
+MULTILIB_DIRNAMES = hs archs hs38 hs38_linux arc700 nps400
+
+# Aliases:
+MULTILIB_MATCHES += mcpu?arc700=mA7
+MULTILIB_MATCHES += mcpu?arc700=mARC700
-- 
2.17.0



Re: C++ PATCH for c++/85363, wrong-code with throwing list-initializer

2018-05-16 Thread Jason Merrill
OK.

On Wed, May 16, 2018 at 2:44 PM, Marek Polacek  wrote:
> On Wed, May 16, 2018 at 01:53:50PM -0400, Jason Merrill wrote:
>> You should be able to avoid duplication here by using cp_get_callee
>> rather than *_FN.
>
> Even better, thanks!
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-05-16  Marek Polacek  
>
> PR c++/85363
> * call.c (set_flags_from_callee): Handle AGGR_INIT_EXPRs too.
> * tree.c (bot_manip): Call set_flags_from_callee for
> AGGR_INIT_EXPRs too.
>
> * g++.dg/cpp0x/initlist-throw1.C: New test.
> * g++.dg/cpp0x/initlist-throw2.C: New test.
>
> diff --git gcc/cp/call.c gcc/cp/call.c
> index 09a3618b007..4d04785f2b9 100644
> --- gcc/cp/call.c
> +++ gcc/cp/call.c
> @@ -319,15 +319,17 @@ build_call_n (tree function, int n, ...)
>  void
>  set_flags_from_callee (tree call)
>  {
> -  bool nothrow;
> -  tree decl = get_callee_fndecl (call);
> +  /* Handle both CALL_EXPRs and AGGR_INIT_EXPRs.  */
> +  tree decl = cp_get_callee_fndecl_nofold (call);
>
>/* We check both the decl and the type; a function may be known not to
>   throw without being declared throw().  */
> -  nothrow = decl && TREE_NOTHROW (decl);
> -  if (CALL_EXPR_FN (call))
> -nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (call;
> -  else if (internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
> +  bool nothrow = decl && TREE_NOTHROW (decl);
> +  tree callee = cp_get_callee (call);
> +  if (callee)
> +nothrow |= TYPE_NOTHROW_P (TREE_TYPE (TREE_TYPE (callee)));
> +  else if (TREE_CODE (call) == CALL_EXPR
> +  && internal_fn_flags (CALL_EXPR_IFN (call)) & ECF_NOTHROW)
>  nothrow = true;
>
>if (!nothrow && at_function_scope_p () && cfun && cp_function_chain)
> diff --git gcc/cp/tree.c gcc/cp/tree.c
> index ecb88df23b9..db81da91676 100644
> --- gcc/cp/tree.c
> +++ gcc/cp/tree.c
> @@ -2987,7 +2987,7 @@ bot_manip (tree* tp, int* walk_subtrees, void* data_)
>
>/* Make a copy of this node.  */
>t = copy_tree_r (tp, walk_subtrees, NULL);
> -  if (TREE_CODE (*tp) == CALL_EXPR)
> +  if (TREE_CODE (*tp) == CALL_EXPR || TREE_CODE (*tp) == AGGR_INIT_EXPR)
>  set_flags_from_callee (*tp);
>if (data.clear_location && EXPR_HAS_LOCATION (*tp))
>  SET_EXPR_LOCATION (*tp, input_location);
> diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C 
> gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
> index e69de29bb2d..264c6c7a7a0 100644
> --- gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
> +++ gcc/testsuite/g++.dg/cpp0x/initlist-throw1.C
> @@ -0,0 +1,29 @@
> +// PR c++/85363
> +// { dg-do run { target c++11 } }
> +
> +int
> +init (int f)
> +{
> +  throw f;
> +}
> +
> +struct X {
> +  X (int f) : n {init (f)} {}
> +  int n;
> +};
> +
> +struct P {
> +  X x{20};
> +};
> +
> +int
> +main ()
> +{
> +  try {
> +P p {};
> +  }
> +  catch (int n) {
> +return 0;
> +  }
> +  return 1;
> +}
> diff --git gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C 
> gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
> index e69de29bb2d..2bb05834d9e 100644
> --- gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
> +++ gcc/testsuite/g++.dg/cpp0x/initlist-throw2.C
> @@ -0,0 +1,33 @@
> +// PR c++/85363
> +// { dg-do run { target c++11 } }
> +
> +int
> +init (int f)
> +{
> +  throw f;
> +}
> +
> +struct X {
> +  X () : n {init (42)} {}
> +  int n;
> +};
> +
> +struct P {
> +  struct R {
> +struct Q {
> +  X x = {};
> +} q;
> +  } r;
> +};
> +
> +int
> +main ()
> +{
> +  try {
> +P p {};
> +  }
> +  catch (int n) {
> +return 0;
> +  }
> +  return 1;
> +}


[PATCH , rs6000] Add missing builtin test cases, fix arguments to match specifications.

2018-05-16 Thread Carl Love
GCC maintainers:

The following patch adds various missing builtin test cases.  I also
went through the various test files and made sure that each test had a
corresponding instruction count test if appropriate.  In some cases, I
had to add count tests.  For one of the tests, I had to create a BE and
LE version as the instruction counts are different on the two
platforms.

The patch has been tested on:

powerpc64le-unknown-linux-gnu (Power 8 LE)   
    powerpc64le-unknown-linux-gnu (Power 9 LE)
    powerpc64-unknown-linux-gnu (Power 8 BE)

With no regressions.

Please let me know if the patch looks OK for GCC mainline.

 Carl Love
---

gcc/testsuite/ChangeLog:

2018-05-15  Carl Love  

* gcc.target/powerpc/altivec-12.c (main): Change vector char ucz
vector unsigned char ucz.
* gcc.target/powerpc/altivec-7-be.c (dg-do): Fix target.
Update instruction counts.
* gcc.target/powerpc/altivec-7-le.c (dg-final): Update instruction
counts.
* gcc.target/powerpc/altivec-7.h (main): Add vec_unpackh and
vec_unpackl tests.
* gcc.target/powerpc/builtins-1-le.c (Do not override) Change target
to LE.
(scan-assembler-times): Clean up arguments.  Add instruction counts
for new tests.
* gcc.target/powerpc/builtins-1-be.c (scan-assembler-times): Clean up
arguments.
Add instruction counts for new tests.
* gcc.target/powerpc/builtins-1.h (main): Add test case for vec_and.
vec_round, vec_rsqrt, vec_rsqrte, vec_mergee, vec_mergh, vec_mergo.
Remove vec_ctf tests returning double.  Remove vec_cts with
double args. Remove vec_sel with invalid arguments. Add tests for
vec_splat.
* gcc.target/powerpc/builtins-3-runnable.c (main): Add test for
vec_doublee, vec_doubleo, vec_doublel, vec_doubleh, vec_signed,
vec_unsigned.
* gcc.target/powerpc/builtins-3.c: Rename to builtins-3-be.h.
Add tests test_sll_vuill_vuill_vuc, test_sll_vsill_vsill_vuc.
Move dg-final checks for BE to builtins-3-be.c.
Move dg-final checks for LE to builtins-3-le.c.
* gcc.target/powerpc/builtins-3-be.c: New file.
* gcc.target/powerpc/builtins-3-le.c: New file.
* gcc.target/powerpc/p9-xxbr-2.c (rev_bool_long_long): Added test for
vec_revb.
* gcc.target/powerpc/vsx-7-be.c (dg-do): Make target BE. Clean up
scan-assembler-times arguments.
* gcc.target/powerpc/vsx-builtin-7.c: Add test functions splat_sc_s8,
splat_uc_u8, splat_ssi_s16, splat_usi_s16, splat_si_s32, splat_ui_u32,
splat_sll, splat_uc, splat_int128, splat_uint128.
Make second argument of vec_extract and vec_insert a signed int.
* gcc.target/powerpc/vsx-vector-5.c (vrint): Add vrint test for float
argument.
---
 gcc/testsuite/gcc.target/powerpc/altivec-12.c  |   2 +-
 gcc/testsuite/gcc.target/powerpc/altivec-7-be.c|  23 +-
 gcc/testsuite/gcc.target/powerpc/altivec-7-le.c|  25 +-
 gcc/testsuite/gcc.target/powerpc/altivec-7.h   |  17 +
 gcc/testsuite/gcc.target/powerpc/builtins-1-be.c   | 110 ---
 gcc/testsuite/gcc.target/powerpc/builtins-1-le.c   | 117 ---
 gcc/testsuite/gcc.target/powerpc/builtins-1.h  |  53 +++-
 gcc/testsuite/gcc.target/powerpc/builtins-3-be.c   |  77 +
 gcc/testsuite/gcc.target/powerpc/builtins-3-le.c   |  77 +
 .../gcc.target/powerpc/builtins-3-runnable.c   |  23 +-
 gcc/testsuite/gcc.target/powerpc/builtins-3.c  | 342 -
 gcc/testsuite/gcc.target/powerpc/builtins-3.h  | 309 +++
 gcc/testsuite/gcc.target/powerpc/p9-xxbr-2.c   |   8 +-
 gcc/testsuite/gcc.target/powerpc/vsx-7-be.c|  16 +-
 gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c   | 135 
 gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c|  17 +-
 16 files changed, 820 insertions(+), 531 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-be.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-le.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3.h

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-12.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
index b0267b5..1f3175f 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-12.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-12.c
@@ -18,7 +18,7 @@ vector char scz;
 vector unsigned char uca = {0,4,8,1,5,9,2,6,10,3,7,11,15,12,14,13};
 vector unsigned char ucb = {6,4,8,3,1,9,2,6,10,3,7,11,15,12,14,13};
 vector unsigned char uc_expected = {3,4,8,2,3,9,2,6,10,3,7,11,15,12,14,13};
-vector char ucz;
+vector unsigned char ucz;
 
 vector short int ssia = {9, 16, 25, 36};
 vector short int ssib = {-8, -27, -64, -125};
diff --git 

Re: [PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64

2018-05-16 Thread vladimir . mezentsev
Ping.

-Vladimir


On 05/10/2018 11:30 PM, vladimir.mezent...@oracle.com wrote:
> From: Vladimir Mezentsev 
>
> When weakref_targets is not empty a target cannot be removed from the weak 
> list.
> A small example is below when 'wv12' is removed from the weak list on aarch64:
>   static vtype Wv12 __attribute__((weakref ("wv12")));
>   extern vtype wv12 __attribute__((weak));
>
> Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go).
> Tested on aarch64-linux-gnu.
> No regression. The attr-weakref-1.c test passed.
>
> ChangeLog:
> 2018-05-10  Vladimir Mezentsev  
>
> PR gcc/84923
> * varasm.c (weak_finish): clean up weak_decls
> ---
>  gcc/varasm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 85296b4..8cf6e1e 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -5652,7 +5652,8 @@ weak_finish (void)
>tree alias_decl = TREE_PURPOSE (t);
>tree target = ultimate_transparent_alias_target (_VALUE (t));
>  
> -  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl)))
> +  if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl))
> + || TREE_SYMBOL_REFERENCED (target))
>   /* Remove alias_decl from the weak list, but leave entries for
>  the target alone.  */
>   target = NULL_TREE;



Re: [PATCH] Add missing _mm512_set{_epi16,_epi8,zero} intrinsics

2018-05-16 Thread Kirill Yukhin
Hello Jakub,
On 08 мая 17:14, Jakub Jelinek wrote:
> Hi!
> 
> While working on PR85323 testsuite coverage, I've noticed we lack these
> intrinsics.  ICC and since Mar 2017 also clang do have these.
> 
> The _mm512_setzero is just a misnamed alias to another intrinsic, but for
> compatibility we likely want to have it too.
> 
> Surprisingly, the _mm512_setr_epi{8,16} intrinsics one would expect too
> are missing in the ICC I have around.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Your patch is OK for trunk.

--
Regards, Kirill Yukhin


Re: [PATCH, i386]: Implement usadv64qi

2018-05-16 Thread Kirill Yukhin
Hello Uroš,
On 09 мая 13:07, Uros Bizjak wrote:
> This patch adds usadv64qi expander, so the compiler is able to
> vectorize with 512bit vpsadbw insn.
> 
> 2017-05-09  Uros Bizjak  
> 
> PR target/85693
> * config/i386/sse.md (usadv64qi): New expander.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> OK for mainline?
Patch is OK.
> 
> Uros.

--
Thanks, K


Re: [PATCH] Add constant folding for x86 shift builtins by vector

2018-05-16 Thread Kirill Yukhin
Hello Jakub,
On 09 мая 22:54, Jakub Jelinek wrote:
> Hi!
> 
> The following patch on top of the earlier ix86_*fold_builtin patch adds
> folding also for the *s{ll,rl,ra}v* builtins.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Your patch is OK for trunk.

--
Thanks, K


Re: [PATCH] Constant folding of x86 vector shift by scalar builtins (PR target/85323)

2018-05-16 Thread Kirill Yukhin
Hello Jakub,
On 08 мая 17:29, Jakub Jelinek wrote:
> Hi!
> 
> The following patch adds folding for vector shift by scalar builtins.
> If they are masked, so far we only optimize them only if the mask is all
> ones.  ix86_fold_builtin handles the all constant argument cases, where the
> effect of the instructions can be fully precomputed at compile time and can
> be useful even in constant expressions etc.
> The ix86_gimple_fold_builtin deals with the cases where the first argument
> is an arbitrary vector, but we can still optimize the cases:
> 1) if the shift count is 0, just return the first argument directly
> 2) if the shift count is equal or higher than element precision and the
> shift is not arithmetic right shift, the result is 0
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Your patch is OK for trunk.

--
Thanks, K


Re: [PATCH] Constant fold even masked shift builtins

2018-05-16 Thread Kirill Yukhin
Hello Jakub,
On 11 мая 10:10, Jakub Jelinek wrote:
> Hi!
> 
> On top of the earlier 3 pending patches, this patch adds constant folding
> of the shifts even when the mask is not all ones (as long as the orig value
> argument is VECTOR_CST too).  Then we can just do the blend according to the
> constant mask.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Patch is OK for main trunk.

--
Thanks, K