Re: [PATCH 5/7 v6] vect: Support vector load/store with length in vectorizer

2020-07-01 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2020/7/1 下午11:17, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> on 2020/7/1 上午3:53, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 +  tree length_limit = NULL_TREE;
 +  /* For length, we need length_limit to check length in range.  */
 +  if (!vect_for_masking)
 +{
 +  poly_uint64 len_limit = nscalars_per_ctrl * rgc->factor;
 +  length_limit = build_int_cst (compare_type, len_limit);
 +}
  
/* Calculate the maximum number of scalar values that the rgroup
   handles in total, the number that it handles for each iteration
 @@ -434,12 +445,12 @@ vect_set_loop_controls_directly (class loop *loop, 
 loop_vec_info loop_vinfo,
tree nscalars_total = niters;
tree nscalars_step = build_int_cst (iv_type, vf);
tree nscalars_skip = niters_skip;
 -  if (nscalars_per_iter != 1)
 +  if (nscalars_per_iter_ft != 1)
  {
/* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that
 these multiplications don't overflow.  */
 -  tree compare_factor = build_int_cst (compare_type, 
 nscalars_per_iter);
 -  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
 +  tree compare_factor = build_int_cst (compare_type, 
 nscalars_per_iter_ft);
 +  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter_ft);
nscalars_total = gimple_build (preheader_seq, MULT_EXPR, 
 compare_type,
 nscalars_total, compare_factor);
nscalars_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
 @@ -509,7 +520,7 @@ vect_set_loop_controls_directly (class loop *loop, 
 loop_vec_info loop_vinfo,
 NSCALARS_SKIP to that cannot overflow.  */
  tree const_limit = build_int_cst (compare_type,
LOOP_VINFO_VECT_FACTOR (loop_vinfo)
 -  * nscalars_per_iter);
 +  * nscalars_per_iter_ft);
  first_limit = gimple_build (preheader_seq, MIN_EXPR, compare_type,
  nscalars_total, const_limit);
  first_limit = gimple_build (preheader_seq, PLUS_EXPR, compare_type,
>>>
>>> It looks odd that we don't need to adjust the other nscalars_* values too.
>>> E.g. the above seems to be comparing an unscaled nscalars_total with
>>> a scaled nscalars_per_iter.  I think the units ought to “agree”,
>>> both here and in the rest of the function.
>>>
>>
>> Sorry, I didn't quite follow this comment.  Both nscalars_totoal and
>> nscalars_step are scaled here.  The remaining related nscalars_*
>> seems only nscalars_skip, but length can't support skip.
> 
> Hmm, OK.  But in that case can you update the names of the variables
> to match?  It's confusing to have some nscalars_* variables actually
> count scalars (and thus have “nitems” equivalents) and other nscalars_*
> variables count something else (and thus effectively be nitems_* variables
> themselves).
> 

OK.  I'll update the names like nscalars_total/nscalars_step and equivalents
to nitems_total/... (or nunits_total better?)

>>
}
  
 +  /* First iteration is full.  */
>>>
>>> This comment belongs inside the “if”.
>>>
>>
>> Sorry, I might miss something, but isn't this applied for both?
> 
> I meant it should be…
> 
>>
if (!init_ctrl)
 -  /* First iteration is full.  */
 -  init_ctrl = build_minus_one_cst (ctrl_type);
 +  {
 +if (vect_for_masking)
 +  init_ctrl = build_minus_one_cst (ctrl_type);
 +else
 +  init_ctrl = length_limit;
 +  }
> 
>   if (!init_ctrl)
> {
>   /* First iteration is full.  */
>   if (vect_for_masking)
> init_ctrl = build_minus_one_cst (ctrl_type);
>   else
> init_ctrl = length_limit;
> }
> 
> since the comment only applies to the “!init_ctrl” case.  The point
> of a nonnull init_ctrl is to create cases in which the first vector
> is not a full vector.
> 

Got it, will fix it.

  
 […]
 @@ -2568,7 +2608,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
 niters, tree nitersm1,
if (vect_epilogues
&& LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& prolog_peeling >= 0
 -  && known_eq (vf, lowest_vf))
 +  && known_eq (vf, lowest_vf)
 +  && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo))
  {
unsigned HOST_WIDE_INT eiters
= (LOOP_VINFO_INT_NITERS (loop_vinfo)
>>>
>>> I'm still not really convinced that this check is right.  It feels
>>> like it's hiding a problem elsewhere.
>>>
>>
>> The comments above this hunk is that:
>>
>>   /* If we know the number of scalar iterations for the main loop we should
>>  check whether after the main loop there are enough iterations 

Re: [PATCH v3] Add -fuse-ld= to specify an arbitrary executable as the linker

2020-07-01 Thread Fāng-ruì Sòng via Gcc-patches

On 2020-07-01, Martin Liška wrote:

On 6/30/20 5:32 PM, Fāng-ruì Sòng wrote:

There is some concern about clang's -fuse-ld=path
http://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html and use
of COMPILER_PATH vs PATH.
Shall we introduce another option like -fld-path=path (PATH is used,
COMPILER_PATH is not used)?


I would recommend first landing a patch to LLVM and then we can do
a corresponding change to GCC.

Martin


Thank a lot for you welcoming words! This is what I intend to add for clang: 
https://reviews.llvm.org/D83015

I'll create a GCC patch superseding this one later.



On Tue, Jun 30, 2020 at 6:04 AM Martin Liška  wrote:


PING^1

On 5/29/20 3:10 AM, Fangrui Song wrote:

On 2020-05-25, Martin Liška wrote:

On 5/22/20 6:42 AM, Fangrui Song wrote:

but I can't fix this one because joining two lines will break the 80-column 
rule.


What about this:

diff --git a/gcc/collect2.c b/gcc/collect2.c
index cc57a20e08b..e5b54b080f7 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -1138,8 +1138,8 @@ main (int argc, char **argv)
  /* Search the ordinary system bin directories
 for `ld' (if native linking) or `TARGET-ld' (if cross).  */
  if (ld_file_name == 0)
-ld_file_name =
-  find_a_file (, full_ld_suffixes[selected_linker], X_OK);
+ld_file_name
+  = find_a_file (, full_ld_suffixes[selected_linker], X_OK);
}
#ifdef REAL_NM_FILE_NAME

Apart from that, the patch is fine.

Martin


Adding people who may be able to approve and commit on my behalf.

This formatting issue seems small enough. Hopefully a maintainer can do
it for me.









Re: [PATCH] RISC-V: Preserve arch version info during normalizing arch string

2020-07-01 Thread Kito Cheng
Hi Jim:
> I think it is reasonable to start a discussion in riscv-isa-manual, to
> try to get an official answer for what is valid and how to interpret
> it, instead of just making up our own rules.

Agree, issue[1] created on riscv-isa-manual.

[1] https://github.com/riscv/riscv-isa-manual/issues/533


Re: [PATCH v2] RISC-V: Handle multi-letter extension for multilib-generator

2020-07-01 Thread Kito Cheng
Hi Jim:

Committed, thanks.

On Thu, Jul 2, 2020 at 7:28 AM Jim Wilson  wrote:
>
> On Wed, Jul 1, 2020 at 12:13 AM Kito Cheng  wrote:
> > * config/riscv/multilib-generator (arch_canonicalize): Handle
> > multi-letter extension.
> > Using underline as separator between different extensions.
>
> Looks fine to me.  Though I was expecting you to just commit the patch
> with or without the change I suggested.

Oh, OK, I'll treat `LGTM/This looks good to me.` as OK to commit
signals in future :)

>
> Jim


Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-01 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

I would like to reping this patch.
Since this is correcting COUNT and PROB for hot blocks, it helps some
cases.

https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html

Thanks,
Jiufu Guo

> Jiufu Guo  writes:
>
> Gentle ping.
> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html
>
> BR,
> Jiufu Guo
>
>> Jiufu Guo via Gcc-patches  writes:
>>
>> Hi,
>>
>> I would like to reping this, hope to get approval for this patch.
>> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html
>>
>> BR,
>> Jiufu Guo
>>
>>> Jiufu Guo  writes:
>>>
>>> Hi,
>>>
>>> I'd like to ping this patch for trunk on stage 1.
>>>
>>> This patch could fix the issue on incorrect COUNT/FREQUENCES of loop
>>> unrolled blocks, and also could help the improve the cold/hot issue of
>>> the unrolled loops.
>>>
>>> patch is also at
>>> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html
>>>
>>> Thanks,
>>> Jiufu
>>>
 Jiufu Guo  writes:

 Hi!

 I'd like to ping following patch. As near end of gcc10 stage 4, it seems
 I would ask approval for GCC11 trunk.

 Thanks,
 Jiufu Guo

> Hi Honza and all,
>
> I updated the patch a little as below. Bootstrap and regtest are ok
> on powerpc64le.
>
> Is OK for trunk?
>
> Thanks for comments.
> Jiufu
>
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 727e951..ded0046 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify-me.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "dumpfile.h"
> +#include "cfgrtl.h"
>  
>  static void copy_loops_to (class loop **, int,
>  class loop *);
> @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, 
> edge e,
> /* If original loop is executed COUNT_IN times, the unrolled
>loop will account SCALE_MAIN_DEN times.  */
> scale_main = count_in.probability_in (scale_main_den);
> +
> +   /* If we are guessing at the number of iterations and count_in
> +  becomes unrealistically small, reset probability.  */
> +   if (!(count_in.reliable_p () || loop->any_estimate))
> + {
> +   profile_count new_count_in = count_in.apply_probability 
> (scale_main);
> +   profile_count preheader_count = loop_preheader_edge (loop)->count 
> ();
> +   if (new_count_in.apply_scale (1, 10) < preheader_count)
> + scale_main = profile_probability::likely ();
> + }
> +
> scale_act = scale_main * prob_pass_main;
>   }
>else
>   {
> +   profile_count new_loop_count;
> profile_count preheader_count = e->count ();
> -   for (i = 0; i < ndupl; i++)
> - scale_main = scale_main * scale_step[i];
> scale_act = preheader_count.probability_in (count_in);
> +   /* Compute final preheader count after peeling NDUPL copies.  */
> +   for (i = 0; i < ndupl; i++)
> + preheader_count = preheader_count.apply_probability (scale_step[i]);
> +   /* Subtract out exit(s) from peeled copies.  */
> +   new_loop_count = count_in - (e->count () - preheader_count);
> +   scale_main = new_loop_count.probability_in (count_in);
>   }
>  }
>  
> @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, 
> edge e,
> scale_bbs_frequencies (new_bbs, n, scale_act);
> scale_act = scale_act * scale_step[j];
>   }
> +
> +  /* Need to update PROB of exit edge and corresponding COUNT.  */
> +  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
> +   && bbs_to_scale)
> + {
> +   edge new_exit = new_spec_edges[SE_ORIG];
> +   profile_count new_count_in = new_exit->src->count;
> +   profile_count preheader_count = loop_preheader_edge (loop)->count ();
> +   edge e;
> +   edge_iterator ei;
> +
> +   FOR_EACH_EDGE (e, ei, new_exit->src->succs)
> + if (e != new_exit)
> +   break;
> +
> +   gcc_assert (e && e != new_exit);
> +
> +   new_exit->probability = preheader_count.probability_in (new_count_in);
> +   e->probability = new_exit->probability.invert ();
> +
> +   profile_count new_latch_count
> + = new_exit->src->count.apply_probability (e->probability);
> +   profile_count old_latch_count = e->dest->count;
> +
> +   EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
> + scale_bbs_frequencies_profile_count (new_bbs + i, 1,
> +  new_latch_count,
> +  old_latch_count);
> +
> +   if (current_ir_type () != IR_GIMPLE)
> + update_br_prob_note (e->src);
> + }
>  }
>free (new_bbs);
>free (orig_loops);

Re: [PATCH] RISC-V: Preserve arch version info during normalizing arch string

2020-07-01 Thread Jim Wilson
On Tue, Jun 30, 2020 at 8:16 PM Kito Cheng  wrote:
> I agree the version of G is kind of problematic for GCC implementation,
> That reminds me there was a long discussion[1] last year,
> The conclusion is version of G is too confusing, it might just don't
> accept any version for G.
> I thought it could deprecate the version for G in GCC 11 and then drop
> that in GCC 12?
> For riscv-isa-manual, we could ask them to add a note about the G
> don't accept version?
> What do you think about this?

I think it is reasonable to start a discussion in riscv-isa-manual, to
try to get an official answer for what is valid and how to interpret
it, instead of just making up our own rules.

Jim


Re: [PATCH v2] RISC-V: Handle multi-letter extension for multilib-generator

2020-07-01 Thread Jim Wilson
On Wed, Jul 1, 2020 at 12:13 AM Kito Cheng  wrote:
> * config/riscv/multilib-generator (arch_canonicalize): Handle
> multi-letter extension.
> Using underline as separator between different extensions.

Looks fine to me.  Though I was expecting you to just commit the patch
with or without the change I suggested.

Jim


[PATCH] PowerPC: Optimize DImode -> vector store.

2020-07-01 Thread Michael Meissner via Gcc-patches
This patch fixes a PR that I noticed several years ago during power8
development.  I noticed that the compiler would often create a two element
vector and store the vector.

Particularly for DImode on power8, this could involve two direct moves and a
XXPERMDI to glue the two parts together.  On power9, there a single direct move
instruction that combines the two elements.

Originally I had the optimization for DFmode as well as DImode.  I found if the
values were already in vector registers, that generally it was faster to do the
XXPERMDI and vector store.

So I rewrote this patch to only optimize the DImode where the assumption is the
DImode values will be in GPRs.  I have done bootstraps with/without the patch,
and there were no regressions.  I did the builds on a little endian power9
linux system and a big endian power8 system (both 32/64-bit support on big
endian).  Can I check this change into the master branch.

gcc/
2020-06-30  Michael Meissner  

PR target/81594
* config/rs6000/predicates.md (ds_form_memory): New predicate.
* config/rs6000/vsx.md (concatv2di_store): New insn.
(dupv2di_store): New insn.

gcc/testsuite/
2020-06-30  Michael Meissner  

PR target/81594
* gcc.target/powerpc/pr81594.c: New test.
---
 gcc/config/rs6000/predicates.md| 42 +++
 gcc/config/rs6000/vsx.md   | 84 ++
 gcc/testsuite/gcc.target/powerpc/pr81594.c | 61 ++
 3 files changed, 187 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr81594.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9762855..4f7e313 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1856,3 +1856,45 @@ (define_predicate "prefixed_memory"
 {
   return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
 })
+
+;; Return true if the operand is a valid memory operand with an offsettable
+;; address that can be split into 2 sub-addresses, each of which is a valid
+;; DS-form (bottom 2 bits of the offset are 0).  This is used to optimize
+;; creating a vector of two DImode elements and then storing the vector.  We
+;; want to eliminate the direct moves from GPRs to form the vector and do the
+;; store directly from the GPRs.
+
+(define_predicate "ds_form_memory"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+return false;
+
+  rtx addr = XEXP (op, 0);
+
+  if (REG_P (addr) || SUBREG_P (addr))
+return true;
+
+  if (GET_CODE (addr) != PLUS)
+return false;
+
+  if (!base_reg_operand (XEXP (addr, 0), Pmode))
+return false;
+
+  rtx offset = XEXP (addr, 1);
+  if (!CONST_INT_P (offset))
+return false;
+
+  HOST_WIDE_INT value = INTVAL (offset);
+
+  if (TARGET_PREFIXED)
+return SIGNED_34BIT_OFFSET_EXTRA_P (value, GET_MODE_SIZE (DImode));
+
+  /* If we don't support prefixed addressing, ensure that the two addresses
+ created would each be valid for doing a STD instruction (which is a
+ DS-form instruction that requires the bottom 2 bits to be 0).  */
+  if ((value & 0x3) != 0)
+return false;
+
+  return SIGNED_16BIT_OFFSET_EXTRA_P (value, GET_MODE_SIZE (DImode));
+})
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 732a548..a9ebd24 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2896,6 +2896,90 @@ (define_insn "*vsx_concat__3"
 }
   [(set_attr "type" "vecperm")])
 
+;; If the only use for a VEC_CONCAT is to store 2 64-bit values, replace it
+;; with two stores.  Only do this on DImode, since it saves doing 1 direct move
+;; on power9, and 2 direct moves + XXPERMDI on power8 to form the vector so we
+;; can do a vector store.  This typically shows up with -O3 where two stores
+;; are combined into a vector.
+;;
+;; Typically DFmode would generate XXPERMDI and a vector store.  Benchmarks
+;; like Spec show that is typically the same speed or faster than doing the two
+;; scalar DFmode stores.
+(define_insn_and_split "*concatv2di_store"
+  [(set (match_operand:V2DI 0 "memory_operand" "=m,m,m,m")
+   (vec_concat:V2DI
+(match_operand:DI 1 "gpc_reg_operand" "r,wa,r,wa")
+(match_operand:DI 2 "gpc_reg_operand" "r,wa,wa,r")))
+   (clobber (match_scratch:DI 3 "=,,,"))]
+  "TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& 1"
+  [(set (match_dup 4)
+   (match_dup 5))
+   (set (match_dup 6)
+   (match_dup 7))]
+{
+  rtx mem = operands[0];
+
+  /* If the address can't be used directly for both stores, copy it to the
+ temporary base register.  */
+  if (!ds_form_memory (mem, V2DImode))
+{
+  rtx old_addr = XEXP (mem, 0);
+  rtx new_addr = operands[3];
+  if (GET_CODE (new_addr) == SCRATCH)
+   new_addr = gen_reg_rtx (Pmode);
+
+  emit_move_insn (new_addr, old_addr);
+  mem = change_address (mem, VOIDmode, new_addr);
+}
+
+  /* Because we are creating scalar stores, we don't have to swap the 

Re: [PATCH v2] RS6000, add VSX mask manipulation support

2020-07-01 Thread Segher Boessenkool
On Wed, May 27, 2020 at 08:50:43AM -0700, Carl Love wrote:
> The following patch adds support for builtins vec_genbm(),  vec_genhm(),
> vec_genwm(), vec_gendm(), vec_genqm(), vec_cntm(), vec_expandm(),
> vec_extractm().  Support for instructions mtvsrbm, mtvsrhm, mtvsrwm,
> mtvsrdm, mtvsrqm, cntm, vexpandm, vextractm.

> +;; Mode attribute to give the suffix for the mask instruction
> +(define_mode_attr VSX_MM_SUFFIX [(V16QI "b") (V8HI "h") (V4SI "w") (V2DI 
> "d") (V1TI "q")])

Please shorten that line?  It doesn't have to be one line ;-)

> +(define_expand "vec_mtvsrbm_mtvsrbmi"
> +
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")
> +(unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b")]
> +   UNSPEC_MTVSBM))]
> +   "TARGET_FUTURE"
> + {
> +  /* Six bit constant operand.  */
> +  if (IN_RANGE (INTVAL (operands[1]), 0, 63))
> +emit_insn (gen_vec_mtvsrbmi (operands[0], operands[1]));
> +  else
> +emit_insn (gen_vec_mtvsr_v16qi (operands[0], operands[1]));

operands[1] isn't a CONST_INT (it is a REG), so this won't work (INTVAL
on it will ICE with checking, and do something non-sensical otherwise).

So needs a test first?  Could just use u6bit_cint_operand even, and lose
the explicit IN_RANGE.

> +(define_insn "vec_mtvsr_"
> +  [(set (match_operand:VSX_MM 0 "vsx_register_operand" "=v")
> +(unspec:VSX_MM [(match_operand:DI 1 "gpc_reg_operand" "b")]
> +UNSPEC_MTVSBM))]
> +  "TARGET_FUTURE"
> +  "mtvsrm %0,%1";
> +  [(set_attr "type" "vecsimple")])

vsx_register_operand together with a "v" constraint is curious, btw.
It is used in a few more places, and it probably works, but would
altivec_register_operand be better?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx_mask-count-runnable.c
> @@ -0,0 +1,149 @@
> +/* { dg-do run } */
> +/* { dg-options "-mcpu=future -O2 -save-temps" } */
> +/* { dg-require-effective-target powerpc_future_hw } */

Drop the -save-temps?  (Same in the other tests.)

Looks good otherwise.


Segher


[PATCH 2/2] PowerPC: Add ISA 3.1 IEEE 128-bit min, max, and cmove.

2020-07-01 Thread Michael Meissner via Gcc-patches
This patch adds support for the IEEE 128-bit floating point minimum, maximum,
and compare instructions generating a mask.  These instructions were added in
ISA 3.1 (i.e. power10).

Compared to the last time I submitted the patches, I changed from using
-mcpu=future to -mcpu=power10.  Along with the previous patch, I have done
bootstrap compilers with/without the pages and there were no regressions.  I
did this on a little endian power9 system and a big endian power8 system.  The
power8 system had support for both 32/64-bit.  Can I check these patches into
the master branch?

gcc/
2020-06-30  Michael Meissner  

* config/rs6000/rs6000.c (emit_fp_min_max_insn): Update comment.
(emit_fp_cmove_with_mask_xxsel): Update comment.
(rs6000_emit_cmove): Add support for IEEE 128-bit min, max, and
comparisons on ISA 3.1.
(rs6000_emit_minmax): Add support for IEEE 128-bit min/max on ISA
3.1.
* config/rs6000/rs6000.md (s3, IEEE128 iterator):
New insns for IEEE 128-bit min/max.
(movcc, IEEE128 iterator): New insns for IEEE 128-bit
conditional move.
(movcc_future, IEEE128 iterator): New insns for IEEE 128-bit
conditional move.
(movcc_invert_future, IEEE128 iterator): New insns for IEEE
128-bit conditional move.
(fpmask, IEEE128 iterator): New insns for IEEE 128-bit
conditional move.

gcc/testsuite/
2020-06-30  Michael Meissner  

* gcc.target/powerpc/float128-minmax-2.c: New test.
---
 gcc/config/rs6000/rs6000.c |  23 +++-
 gcc/config/rs6000/rs6000.md| 121 +
 .../gcc.target/powerpc/float128-minmax-2.c |  70 
 3 files changed, 211 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 754431f..1c8d7c3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14965,7 +14965,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
rtx op_false,
 }
 
 /* Min/max subcase to emit an appropriate instruction for SF/DF scalars on ISA
-   3.0.
+   3.0 and for IEEE 128-bit scalars on ISA 3.1.
 
Move TRUE_COND to DEST if OP of the operands of the last comparison is
nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the hardware has
@@ -15009,7 +15009,8 @@ emit_fp_min_max_insn (rtx dest, rtx op, rtx true_cond, 
rtx false_cond)
 }
 
 /* Conditional move subcase to emit a floating point compare setting a mask
-   instruction and a XXSEL select instruction for SF/DF scalars on ISA 3.0.
+   instruction and a XXSEL select instruction for SF/DF scalars on ISA 3.0 and
+   for IEEE 128-bit scalars on ISA 3.1.
 
Move TRUE_COND to DEST if OP of the operands of the last comparison is
nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the hardware has
@@ -15105,6 +15106,21 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
rtx false_cond)
return 1;
 }
 
+  /* See if we can use the ISA 3.1 min/max/compare instructions for IEEE
+ 128-bit floating point.  At present, don't worry about doing conditional
+ moves with different types for the comparison and movement (unlike SF/DF,
+ where you can do a conditional test between double and use float as the
+ if/then parts. */
+  if (TARGET_FLOAT128_HW && TARGET_POWER10 && FLOAT128_IEEE_P (compare_mode)
+  && compare_mode == result_mode)
+{
+  if (emit_fp_min_max_insn (dest, op, true_cond, false_cond))
+   return 1;
+
+  if (emit_fp_cmove_with_mask_xxsel (dest, op, true_cond, false_cond))
+   return 1;
+}
+
   /* Don't allow using floating point comparisons for integer results for
  now.  */
   if (FLOAT_MODE_P (compare_mode) && !FLOAT_MODE_P (result_mode))
@@ -15328,7 +15344,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx 
op0, rtx op1)
   /* VSX/altivec have direct min/max insns.  */
   if ((code == SMAX || code == SMIN)
   && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
- || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode
+ || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
+ || (TARGET_FLOAT128_HW && TARGET_POWER10 && FLOAT128_IEEE_P (mode
 {
   emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 86c8c02..0964891 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14646,6 +14646,127 @@ (define_insn "*cmp_hw"
"xscmpuqp %0,%1,%2"
   [(set_attr "type" "veccmp")
(set_attr "size" "128")])
+
+;; IEEE 128-bit min/max
+(define_insn "s3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+   (fp_minmax:IEEE128
+(match_operand:IEEE128 1 "altivec_register_operand" "v")
+(match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  

[PATCH 1/2] PowerPC: Rename FP min/max/cmove functions.

2020-07-01 Thread Michael Meissner via Gcc-patches
This patch changes the name of two functions that were added to support power9
instructions, so that the name of these functions are no longer specific to
power9.

The next patch will add support for the power10 IEEE 128-bit minimum, maximum,
and conditional move instructions.  This patch renames the support functions
that generate the appropriate instructions.

I have done bootstraps with/without these patches on a little endian power9
system and a big endian power8 system.  The big endian system has both
32/64-bit support.  There were no regressions in the patches.  Can I check this
patch into the master branch?

gcc/
2020-06-30  Michael Meissner  

* config/rs6000/rs6000.c (emit_fp_min_max_insn): Rename
rs6000_emit_p9_fp_minmax.
(emit_fp_cmove_with_mask_xxsel): Rename rs6000_emit_p9_fp_cmove.
(rs6000_emit_cmove): Update to use renamed functions.
---
 gcc/config/rs6000/rs6000.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fef7288..754431f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14964,13 +14964,15 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
rtx op_false,
   return 1;
 }
 
-/* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction
-   for SF/DF scalars.  Move TRUE_COND to DEST if OP of the operands of the last
-   comparison is nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the
-   hardware has no such operation.  */
+/* Min/max subcase to emit an appropriate instruction for SF/DF scalars on ISA
+   3.0.
+
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the hardware has
+   no such operation.  */
 
 static int
-rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+emit_fp_min_max_insn (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15006,13 +15008,15 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   return 1;
 }
 
-/* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and
-   XXSEL instructions for SF/DF scalars.  Move TRUE_COND to DEST if OP of the
-   operands of the last comparison is nonzero/true, FALSE_COND if it is
-   zero/false.  Return 0 if the hardware has no such operation.  */
+/* Conditional move subcase to emit a floating point compare setting a mask
+   instruction and a XXSEL select instruction for SF/DF scalars on ISA 3.0.
+
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the hardware has
+   no such operation.  */
 
 static int
-rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+emit_fp_cmove_with_mask_xxsel (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15094,10 +15098,10 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
rtx false_cond)
   && (compare_mode == SFmode || compare_mode == DFmode)
   && (result_mode == SFmode || result_mode == DFmode))
 {
-  if (rs6000_emit_p9_fp_minmax (dest, op, true_cond, false_cond))
+  if (emit_fp_min_max_insn (dest, op, true_cond, false_cond))
return 1;
 
-  if (rs6000_emit_p9_fp_cmove (dest, op, true_cond, false_cond))
+  if (emit_fp_cmove_with_mask_xxsel (dest, op, true_cond, false_cond))
return 1;
 }
 
-- 
1.8.3.1



PowerPC ISA 3.1 IEEE 128-bit support introduction

2020-07-01 Thread Michael Meissner via Gcc-patches
The two patches that will be mailed as replies to this patch add support for
the IEEE 128-bit floating point minimum, maximum, and set mask based on
comparison instructions.

These patches were previous submitted as:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546992.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546991.html

Based on the feed back, I have changed the names of the functions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] Fortran : Fill in missing array dimensions using the lower, bound (for review)

2020-07-01 Thread Jerry DeLisle via Gcc-patches




On 6/27/20 1:40 AM, Thomas Koenig via Fortran wrote:

Hi Mark,


Use -fdec-add-missing-indexes to enable feature. Also enabled by fdec.
A warning that the lower bound is being used for a mission dimension
is output unless suppressed by using -Wno-missing-index.


This is... seriously problematic.  I forsee all sorts of not-so-funny
interactions with more modern features.

What do people actually do with this kind of code?  What kind of test
cases do you have that "work" with this?

And people would actually want to save a few keystrokes so they don't
have to write A(N,M,1) instead of A(N,M)?  And is this even the right
fix, how sure are you of the semantics; is there documentation for
this feature (maybe on Bitsavers)?  If not, this is not be done.

If this goes in at all, I want this rejected with any modern Fortran
feature, i.e. it should not be contain

- allocatable arrays
- coarrays
- pointers
- derived types
- CLASS
- assumed-shape arrays
- assumed-rank arrays (well, it probably doesn't make sense)
- KIND=4 characters
- as an argument to any of the array intrinsics like MATMUL,
  EOSHIFT, ...

but even with these restrictions, I will still take a lot of convincing
that this make sense.

Just imagine what will happen if people specify -fdec for some
relatively benign reason (for example because they want -fdec-math)
and start not finding bugs in their code because of this feature.

Best regards

Thomas


Please stop fixing problematic DEC programs by using the compiler as the 
pet tool. Use an editor or python or some suitable tool to initialize 
arrays properly.


I appreciate the effort, but need things run by here before the effort 
so you can spend the effort on really true compiler bugs, and not on the 
wishes of perhaps a paying customer.


We should never have caved on the previous DEC enhancement.

Just my humble opinion.

Jerry


[PATCH] ipa-sra: Prevent constructing debug info from wrong argument

2020-07-01 Thread Martin Jambor
Hi,

the mechanism generating debug info for removed parameters did not
adjust index of the argument in the call statement to take into
account extra arguments IPA-SRA might have produced when splitting a
strucutre.  This patch addresses that omission and stops gdb from
showing incorrect value for the removed parameter and says "value
optimized out" instead.  The guality testcase will end up as
UNSUPPORTED in the results which is how Richi told me on IRC we deal
with this.

It is possible to generate debug info to actually show the value of the
removed parameter but so far my approaches to do that seem too
controversial
(https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546705.html), so
before I come up with something better I'd like to push this to master
and the gcc-10 branch in time for the GCC 10.2 release.

Bootstrapped and tested on master on x86_64-linux, bootstrap on top of
the gcc-10 branch is underway?  OK for both if it passes?

Thanks,

Martin


gcc/ChangeLog:

2020-07-01  Martin Jambor  

PR guality/95343
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call): Adjust
argument index if necessary.

gcc/testsuite/ChangeLog:

2020-07-01  Martin Jambor  

PR guality/95343
* gcc.dg/guality/pr95343.c: New test.
---
 gcc/ipa-param-manipulation.c   |  6 +++-
 gcc/testsuite/gcc.dg/guality/pr95343.c | 45 ++
 2 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr95343.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 2cc4bc79dc1..5fc0de56556 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -790,7 +790,11 @@ ipa_param_adjustments::modify_call (gcall *stmt,
  if (!is_gimple_reg (old_parm) || kept[i])
continue;
  tree origin = DECL_ORIGIN (old_parm);
- tree arg = gimple_call_arg (stmt, i);
+ tree arg;
+ if (transitive_remapping)
+   arg = gimple_call_arg (stmt, index_map[i]);
+ else
+   arg = gimple_call_arg (stmt, i);
 
  if (!useless_type_conversion_p (TREE_TYPE (origin), TREE_TYPE (arg)))
{
diff --git a/gcc/testsuite/gcc.dg/guality/pr95343.c 
b/gcc/testsuite/gcc.dg/guality/pr95343.c
new file mode 100644
index 000..a3e57decda8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr95343.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-g -fno-ipa-icf" } */
+
+volatile int v;
+
+int __attribute__((noipa))
+get_val0 (void)  {return 0;}
+int __attribute__((noipa))
+get_val2 (void)  {return 2;}
+
+struct S
+{
+  int a, b, c;
+};
+
+static int __attribute__((noinline))
+bar (struct S s, int x, int y, int z, int i)
+{
+  int r;
+  v = s.a + s.b;   /* { dg-final { gdb-test . "i+1" "3" } } */
+  return r;
+}
+
+static int __attribute__((noinline))
+foo (struct S s, int i)
+{
+  int r;
+  r = bar (s, 3, 4, 5, i);
+  return r;
+}
+
+
+int
+main (void)
+{
+  struct S s;
+  int i;
+  i = get_val2 ();
+  s.a = get_val0 ();
+  s.b = get_val0 ();
+  s.c = get_val0 ();
+  int r = foo (s, i);
+  v = r + i;
+  return 0;
+}
-- 
2.27.0



Re: [PATCH] avoid -Wnonnull for lambda stubs (PR c++/95984)

2020-07-01 Thread Jason Merrill via Gcc-patches

On 7/1/20 3:31 PM, Martin Sebor wrote:

The attached patch avoids null pointer checking for the first
argument to calls to the member operator() of lambda objects
emitted by the C++ front end.  This avoids both the spurious
-Wnonnull warnings for such calls as well as the ICE reported
in the bug.

In addition, the patch also avoids function argument checking
for any calls when the tf_warning bit isn't set.  This isn't
strictly necessary to avoid the ICE but it seems like a good
precaution against something similar occurring in other checks
in the future(*).

Finally, while testing the fix I noticed that the common code
doesn't recognize nullptr as a poiner when processing templates.
I've extended the handling to let it handle it as well.


Any possible value of nullptr_t is null, so I think ignoring it is 
appropriate.



[*] It seems to me that a more robust solution to prevent
the diagnostic subsystem from being re-entered as a result
of callbacks into the front end would be to have the pretty
printer disable all warnings prior to the bcallbacks and
re-enable them afterwards.  That would require an efficient
and reliable way of controlling all warnings (as well as
querying their state), which I think would be a useful
feature to have in any case.  For one thing, it would make
handling warnings and responding to #pragma GCC diagnostics
more robust.



+ const char *arg0str = IDENTIFIER_POINTER (arg0name);
+ closure = !strcmp (arg0str, "__closure");


Let's use id_equal here.

Jason



Re: [patch] Extend store merging to STRING_CST

2020-07-01 Thread Eric Botcazou
> Hmm, that's a good question - so would your patch be replaceable by
> simply amending var_decl_component_p by
> 
>   (var_decl_component_p (TREE_OPERAND (src, 0))
> 
>|| TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST)
> 
> ?

The gimple_call_alloca_for_var_p (stmt) kludge is still needed though and the 
end of the transformation needs to be adjusted, so I'm not sure it's better.

-- 
Eric Botcazoudiff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..6d5670c8e06 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -840,6 +840,32 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	}
 	}
 
+  /* If this was a variable-sized copy operation that has been turned
+	 into a fixed-size string store, typically after inlining, then do
+	 the fixed-size store explicitly.  We might be able to concatenate
+	 several of them later into a single string store.  */
+  if (tree_fits_uhwi_p (len)
+	  && gimple_call_alloca_for_var_p (stmt)
+	  && TREE_CODE (src) == ADDR_EXPR
+	  && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+	  && tree_int_cst_equal
+	 (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len)
+	  && BITS_PER_UNIT == 8)
+	{
+	  tree new_src = TREE_OPERAND (src, 0);
+	  tree new_dest
+	= fold_build2 (MEM_REF, TREE_TYPE (new_src), dest, off0);
+	  gimple *new_stmt = gimple_build_assign (new_dest, new_src);
+	  gimple_move_vops (new_stmt, stmt);
+	  if (!lhs)
+	{
+	  gsi_replace (gsi, new_stmt, false);
+	  return true;
+	}
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  goto done;
+	}
+
   if (code == BUILT_IN_MEMMOVE)
 	{
 	  /* Both DEST and SRC must be pointer types.
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..52c54ac14be 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1002,7 +1002,9 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	  || src_align >= TYPE_ALIGN (desttype)))
 	destvar = fold_build2 (MEM_REF, desttype, dest, off0);
   else if (TREE_CODE (src) == ADDR_EXPR
-	   && var_decl_component_p (TREE_OPERAND (src, 0))
+	   && (var_decl_component_p (TREE_OPERAND (src, 0))
+		   || (TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+		   && gimple_call_alloca_for_var_p (stmt)))
 	   && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len)
 	   && src_align >= TYPE_ALIGN (srctype)
 	   && (is_gimple_reg_type (srctype)
@@ -1071,19 +1073,34 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	  goto set_vop_and_replace;
 	}
 
-  /* We get an aggregate copy.  Use an unsigned char[] type to
-	 perform the copying to preserve padding and to avoid any issues
-	 with TREE_ADDRESSABLE types or float modes behavior on copying.  */
-  desttype = build_array_type_nelts (unsigned_char_type_node,
-	 tree_to_uhwi (len));
-  srctype = desttype;
-  if (src_align > TYPE_ALIGN (srctype))
-	srctype = build_aligned_type (srctype, src_align);
+  /* We get an aggregate copy.  If the source is a STRING_CST, then
+	 directly use its type to perform the copy.  */
+  if (TREE_CODE (src) == ADDR_EXPR
+	  && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+	  && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len))
+	{
+	  desttype = srctype;
+	  srcvar = TREE_OPERAND (src, 0);
+	}
+
+  /* Or else, use an unsigned char[] type to perform the copy in order
+	 to preserve padding and to avoid any issues with TREE_ADDRESSABLE
+	 types or float modes behavior on copying.  */
+  else
+	{
+	  desttype = build_array_type_nelts (unsigned_char_type_node,
+	 tree_to_uhwi (len));
+	  srctype = desttype;
+	  if (src_align > TYPE_ALIGN (srctype))
+	srctype = build_aligned_type (srctype, src_align);
+	  srcvar = fold_build2 (MEM_REF, srctype, src, off0);
+	}
+
   if (dest_align > TYPE_ALIGN (desttype))
 	desttype = build_aligned_type (desttype, dest_align);
-  new_stmt
-	= gimple_build_assign (fold_build2 (MEM_REF, desttype, dest, off0),
-			   fold_build2 (MEM_REF, srctype, src, off0));
+  destvar = fold_build2 (MEM_REF, desttype, dest, off0);
+  new_stmt = gimple_build_assign (destvar, srcvar);
+
 set_vop_and_replace:
   gimple_move_vops (new_stmt, stmt);
   if (!lhs)


Re: [PATCH] libstdc++: std::variant with multiple identical types assignment fail to compile [PR91807]

2020-07-01 Thread Ville Voutilainen via Gcc-patches
On Wed, 1 Jul 2020 at 23:53, Jonathan Wakely  wrote:
>
> On 01/07/20 23:32 +0300, Ville Voutilainen via Libstdc++ wrote:
> >On Wed, 1 Jul 2020 at 21:09, Ville Voutilainen
> > wrote:
> >> And sure, s/move-construction/move-assignment/.
> >
> >And with dg-options.
>
> OK for master and gcc-10, thanks.
>
> Does it apply cleanly to gcc-9 too?

Yes. I managed to misremember where the new variant implementation
landed and misread the
bug title/version, so backporting there, too.


Re: [PATCH] Fix unnecessary register spill that depends on function ordering

2020-07-01 Thread Richard Sandiford
Omar Tahir  writes:
> Hi Richard,
>
> From: Richard Sandiford 
>> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) 
>> > { #ifdef INSN_SCHEDULING
>> > +  first_moveable_pseudo = last_moveable_pseudo;
>> >if (flag_selective_scheduling
>> >&& ! maybe_skip_selective_scheduling ())
>> >  run_selective_scheduling ();
>> 
>> I think instead we should zero out both variables at the end of IRA.
>> There are other places besides the scheduler that call into the IRA code, so 
>> tackling the problem there seems more general.
>
> If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then
> they'll be zero for the second scheduler pass, which uses them.

Are you sure?  It shouldn't be doing that, since there are no pseudos
left when the second scheduling pass runs.  RA replaces all pseudos
with hard registers.

So if the values in the variables has a noticeable effect on sched2,
I think that's even more reason to clear them after IRA :-)

Thanks,
Richard


Re: [PATCH] Prefer simple case changes in spelling suggestions

2020-07-01 Thread Jeff Law via Gcc-patches
On Fri, 2020-06-05 at 18:23 +, Pip Cet via Gcc-patches wrote:
> David Malcolm  writes:
> 
> > On Sat, 2020-05-30 at 18:51 +, Pip Cet wrote:
> > > How's this?
> > 
> > Thanks; looks good to me.  Hopefully this doesn't clash with Tom's
> > patch.
> 
> It doesn't, but I hope I got the commit message right this time.
> 
> (I don't have git access, so if someone could commit this if it's
> approved, that would be great).
Thanks.  Pushed.
jeff



Re: [PATCH] aarch64: Fix missing BTI instruction in trampolines

2020-07-01 Thread Richard Sandiford
Omar Tahir  writes:
> Hi,
>
> Got a small bugfix here regarding BTIs and trampolines.
>
> If two functions require trampolines, and the first has BTI enabled while the
> second doesn't, the generated template will be lacking a BTI instruction.
> This patch fixes this by always adding a BTI instruction, which is safe as BTI
> instructions are ignored on unsupported architecture versions.
>
> I don't have write access, so could someone commit for me?
>
> Bootstrapped and tested on aarch64 with no regressions.
>
> gcc/ChangeLog:
>
> 2020-06-29  Omar Tahir omar.ta...@arm.com
>
> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Always
> generate a BTI instruction.
>
> gcc/testsuite/ChangeLog:
>
> 2020-06-29  Omar Tahir omar.ta...@arm.com
>
> * gcc.target/aarch64/bti-4.c: New test.

Thanks for the patch, pushed to master.

Richard


Re: [PATCH] libstdc++: std::variant with multiple identical types assignment fail to compile [PR91807]

2020-07-01 Thread Jonathan Wakely via Gcc-patches

On 01/07/20 23:32 +0300, Ville Voutilainen via Libstdc++ wrote:

On Wed, 1 Jul 2020 at 21:09, Ville Voutilainen
 wrote:

And sure, s/move-construction/move-assignment/.


And with dg-options.


OK for master and gcc-10, thanks.

Does it apply cleanly to gcc-9 too?




Re: [EXT] Re: [PATCH] Optimize and+or+sub into xor+not (PR94882)

2020-07-01 Thread Jeff Law via Gcc-patches
On Mon, 2020-06-15 at 09:32 +0200, Richard Biener via Gcc-patches wrote:
> On Thu, Jun 4, 2020 at 5:09 PM Naveen Hurugalawadi  
> wrote:
> > Hi,
> > 
> > Thanks for reviewing the patch and sharing your comments.
> > 
> > > > nop_convert4 cannot happen, constants will have been constant folded 
> > > > here.
> > Removed.
> > 
> > > > So I think it should be and the other patterns adjusted accordingly.
> > Modified the remaining patterns accordingly.
> > 
> > Please find attached the modified patch as per your suggestions.
> > Bootstrapped and regression tested on x86_64-pc-linux-gnu.
> 
> OK.
> 
> Thanks and sorry for the delay.
I went ahead and committed the changes to the trunk.

jeff



[committed] libstdc++: Remove noexcept from match_results comparisons (PR 94627)

2020-07-01 Thread Jonathan Wakely via Gcc-patches
These functions can't be noexcept because the iterators stored in the
sub_match objects can throw on any operation.

libstdc++-v3/ChangeLog:

PR libstdc++/94627
* include/bits/regex.h (operator==, operator!=): Remove noexcept
equality comparisons for match_results.
* testsuite/28_regex/match_results/94627.cc: New test.

Tested x86_64-linux, committed to master.

Backports to follow.

commit a1a0dc4548979f8a340a7ea71624a52a20e1e0b3
Author: Jonathan Wakely 
Date:   Wed Jul 1 21:01:15 2020 +0100

libstdc++: Remove noexcept from match_results comparisons (PR 94627)

These functions can't be noexcept because the iterators stored in the
sub_match objects can throw on any operation.

libstdc++-v3/ChangeLog:

PR libstdc++/94627
* include/bits/regex.h (operator==, operator!=): Remove noexcept
equality comparisons for match_results.
* testsuite/28_regex/match_results/94627.cc: New test.

diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index 6db05889e8c..4032fd7559b 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -2099,7 +2099,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   template
 inline bool
 operator==(const match_results<_Bi_iter, _Alloc>& __m1,
-  const match_results<_Bi_iter, _Alloc>& __m2) noexcept
+  const match_results<_Bi_iter, _Alloc>& __m2)
 {
   if (__m1.ready() != __m2.ready())
return false;
@@ -2124,7 +2124,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   template
 inline bool
 operator!=(const match_results<_Bi_iter, _Alloc>& __m1,
-  const match_results<_Bi_iter, _Alloc>& __m2) noexcept
+  const match_results<_Bi_iter, _Alloc>& __m2)
 { return !(__m1 == __m2); }
 #endif
 
diff --git a/libstdc++-v3/testsuite/28_regex/match_results/94627.cc 
b/libstdc++-v3/testsuite/28_regex/match_results/94627.cc
new file mode 100644
index 000..dc4883c19a0
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/match_results/94627.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+struct iterator
+{
+  using value_type = char;
+  using difference_type = std::ptrdiff_t;
+  using reference = char&;
+  using pointer = char*;
+  using iterator_category = std::bidirectional_iterator_tag;
+
+  iterator() : ptr() { }
+  explicit iterator(pointer p) : ptr(p) { }
+
+  iterator& operator++() { if (bang) throw 1; ++ptr; return *this; }
+  iterator operator++(int) { auto copy = *this; ++*this; return copy; }
+  iterator& operator--() { if (bang) throw 1; --ptr; return *this; }
+  iterator operator--(int) { auto copy = *this; --*this; return copy; }
+
+  reference operator*() const noexcept { return *ptr; }
+  pointer operator->() const noexcept { return ptr; }
+
+  bool operator==(iterator rhs) const noexcept { return ptr == rhs.ptr; }
+  bool operator!=(iterator rhs) const noexcept { return ptr != rhs.ptr; }
+
+  static bool bang;
+
+private:
+  pointer ptr;
+};
+
+bool iterator::bang = false;
+
+int main()
+{
+  char str[] = "abc";
+  std::regex r(str);
+  std::match_results m;
+  std::regex_match(iterator(str), iterator(str+3), m, r);
+  iterator::bang = true;
+  bool caught = false;
+  try {
+(void) (m == m);
+  } catch (int) {
+caught = true;
+  }
+  VERIFY( caught );
+  caught = false;
+
+  try {
+(void) (m != m);
+  } catch (int) {
+caught = true;
+  }
+  VERIFY( caught );
+}


Re: [PATCH] libstdc++: std::variant with multiple identical types assignment fail to compile [PR91807]

2020-07-01 Thread Ville Voutilainen via Gcc-patches
On Wed, 1 Jul 2020 at 21:09, Ville Voutilainen
 wrote:
> And sure, s/move-construction/move-assignment/.

And with dg-options.

2020-07-01  Ville Voutilainen  

PR libstdc++/91807
* include/std/variant
(_Copy_assign_base::operator=(const _Copy_assign_base&):
Do the move-assignment from a temporary so that the temporary
is constructed with an explicit index.
* testsuite/20_util/variant/91807.cc: New.
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c9504914365..eb3d6779205 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -611,7 +611,8 @@ namespace __variant
 		  this->_M_destructive_copy(__rhs_index, __rhs_mem);
 		else
 		  __variant_cast<_Types...>(*this)
-			= variant<_Types...>(__rhs_mem);
+			= variant<_Types...>(std::in_place_index<__rhs_index>,
+	 __rhs_mem);
 		  }
 	  }
 	else
diff --git a/libstdc++-v3/testsuite/20_util/variant/91807.cc b/libstdc++-v3/testsuite/20_util/variant/91807.cc
new file mode 100644
index 000..04bb5d7c807
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/variant/91807.cc
@@ -0,0 +1,35 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+struct me_data {
+  me_data() = default;
+
+  me_data(const me_data &) {};
+  me_data(me_data &&) noexcept {};
+  me_data& operator=(const me_data &) = default;
+};
+
+int main() {
+  std::variant v1, v2;
+
+  v2 = v1;
+}


Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-07-01 Thread Jeff Law via Gcc-patches
On Mon, 2020-06-29 at 08:58 +0200, Richard Biener wrote:
> On Sat, Jun 27, 2020 at 4:52 PM Marc Glisse  wrote:
> > On Fri, 26 Jun 2020, Jeff Law via Gcc-patches wrote:
> > 
> > > In theory yes, but there are cases where paths converge (like you've 
> > > shown) where
> > > you may have evaluated to a constant on the paths, but it's not a 
> > > constant at the
> > > convergence point.  You have to be very careful using b_c_p like this and 
> > > it's
> > > been a regular source of kernel bugs.
> > > 
> > > 
> > > I'd recommend looking at the .ssa dump and walk forward from there if the 
> > > .ssa
> > > dump looks correct.
> > 
> > Here is the last dump before thread1 (105t.mergephi2). I don't see
> > anything incorrect in it.
> > 
> > ledtrig_cpu (_Bool is_active)
> > {
> >int old;
> >int iftmp.0_1;
> >int _5;
> > 
> > [local count: 1073741824]:
> >if (is_active_2(D) != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870913]:
> > 
> > [local count: 1073741824]:
> ># iftmp.0_1 = PHI <1(2), -1(3)>
> >_5 = __builtin_constant_p (iftmp.0_1);
> >if (_5 != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870913]:
> >if (iftmp.0_1 >= -128)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 268435456]:
> >if (iftmp.0_1 <= 127)
> >  goto ; [34.00%]
> >else
> >  goto ; [66.00%]
> > 
> > [local count: 91268056]:
> >__asm__ __volatile__("asi %0,%1
> > " : "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "i" iftmp.0_1, "Q" 
> > MEM[(int *)_active_cpus] : "memory", "cc");
> >goto ; [100.00%]
> > 
> > [local count: 982473769]:
> >__asm__ __volatile__("laa %0,%2,%1
> > " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "d" 
> > iftmp.0_1, "Q" MEM[(int *)_active_cpus] : "memory", "cc");
> > 
> > [local count: 1073741824]:
> >return;
> > 
> > }
> > 
> > There is a single _b_c_p, the immediate asm argument is exactly the
> > argument of _b_c_p, and it is in the branch protected by _b_c_p.
> > 
> > Now the thread1 dump, for comparison
> > 
> > ledtrig_cpu (_Bool is_active)
> > {
> >int old;
> >int iftmp.0_4;
> >int iftmp.0_6;
> >int _7;
> >int _12;
> >int iftmp.0_13;
> >int iftmp.0_14;
> > 
> > [local count: 1073741824]:
> >if (is_active_2(D) != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870912]:
> ># iftmp.0_6 = PHI <1(2)>
> >_7 = __builtin_constant_p (iftmp.0_6);
> >if (_7 != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870912]:
> ># iftmp.0_4 = PHI <-1(2)>
> >_12 = __builtin_constant_p (iftmp.0_4);
> >if (_12 != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 268435456]:
> >if (iftmp.0_4 >= -128)
> >  goto ; [20.00%]
> >else
> >  goto ; [80.00%]
> > 
> > [local count: 214748364]:
> >if (iftmp.0_6 <= 127)
> >  goto ; [12.00%]
> >else
> >  goto ; [88.00%]
> > 
> > [local count: 91268056]:
> ># iftmp.0_13 = PHI 
> >__asm__ __volatile__("asi %0,%1
> > " : "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "i" iftmp.0_13, "Q" 
> > MEM[(int *)_active_cpus] : "memory", "cc");
> >goto ; [100.00%]
> > 
> > [local count: 982473769]:
> ># iftmp.0_14 = PHI  > iftmp.0_6(3)>
> >__asm__ __volatile__("laa %0,%2,%1
> > " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "d" 
> > iftmp.0_14, "Q" MEM[(int *)_active_cpus] : "memory", "cc");
> > 
> > [local count: 1073741824]:
> >return;
> > 
> > }
> > 
> > Thread1 decides to separate the paths is_active and !is_active
> > (surprisingly, for one it optimizes out the comparison <= 127 and for the
> > other the comparison >= -128, while it could optimize both in both cases).
> > And it decides to converge after the comparisons, but before the asm.
> > 
> > What the pass did does seem to hurt. It looks like if we duplicate _b_c_p,
> > we may need to duplicate far enough to include all the blocks dominated by
> > _b_c_p==true (including the asm, here). Otherwise, any _b_c_p can be
> > optimized to true, because for a boolean
> > 
> > b is the same as b ? true : false
> > __builtin_constant_p(b ? true : false) would be the same as b ?
> > __builtin_constant_p(true) : __builtin_constant_p(false), i.e. true.
> > 
> > It is too bad we don't have any optimization pass using ranges between IPA
> > and thread1, that would have gotten rid of the comparisons, and hence the
> > temptation to thread. Adding always_inline on atomic_add (or flatten on
> > the caller) does help: EVRP removes the comparisons.
> > 
> > Do you see a way forward without changing what thread1 does or declaring
> > the testcase as unsupported?
> 
> Most of the cases I've seen involve transforms that make _b_c_p constant
> on one 

Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-07-01 Thread Jeff Law via Gcc-patches
On Sat, 2020-06-27 at 01:03 +0200, Ilya Leoshkevich wrote:
> On Fri, 2020-06-26 at 16:04 -0600, Jeff Law wrote:
> > On Fri, 2020-06-26 at 23:54 +0200, Ilya Leoshkevich wrote:
> > > How should this work ideally?  Suppose we have:
> > > 
> > > static inline void foo (int i)
> > > {
> > >   if (__builtin_is_constant_p (i))
> > > asm volatile ("bar" :: "i" (i))
> > >   else
> > > asm volatile ("baz" :: "d" (i));
> > > }
> > > 
> > > First of all, this is a supported use case, right?
> > Yes, this is a supported case.
> > 
> > > Then, the way I see it, is that at least for a while there must
> > > exist
> > > trees like the ones above, regardless of whether their asm
> > > arguments
> > > match constraints.  But ultimately dead code elimination should get
> > > rid
> > > of the wrong ones before they reach RTL.
> > > E.g. in the example above, the non-inlined version should have
> > > `!__builtin_is_constant_p (i)`, so `bar` should not survive until
> > > RTL.  The same for inlined `foo (1)` version's `baz`.
> > In theory yes, but there are cases where paths converge (like you've
> > shown) where
> > you may have evaluated to a constant on the paths, but it's not a
> > constant at the
> > convergence point.  You have to be very careful using b_c_p like this
> > and it's
> > been a regular source of kernel bugs.
> 
> Is there something specific that a compiler user should look out for?
> For example, here is the kernel code, from which the test was derived:
> 
> static inline void atomic_add(int i, atomic_t *v)
> {
> #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES
> if (__builtin_constant_p(i) && (i > -129) && (i < 128)) {
> __atomic_add_const(i, >counter);
> return;
> }
> #endif
> __atomic_add(i, >counter);
> }
>   
> It looks very straightforward - can there still be something wrong
> with its usage of b_c_p?
> 
> > I'd recommend looking at the .ssa dump and walk forward from there if
> > the .ssa
> > dump looks correct.
> > 
> > jeff
> 
> Well, 021t.ssa already has:
> 
> __attribute__((gnu_inline))
> __atomic_add_const (intD.6 valD.2269, intD.6 * ptrD.2270)
> {
>   intD.6 val_3(D) = valD.2269;
>   intD.6 * ptr_2(D) = ptrD.2270;
> ;;   basic block 2, loop depth 0, maybe hot
> ;;prev block 0, next block 1, flags: (NEW)
> ;;pred:   ENTRY (FALLTHRU)
>   # .MEM_4 = VDEF <.MEM_1(D)>
>   __asm__ __volatile__("asi %0,%1
> " : "ptr" "=Q" *ptr_2(D) : "val" "i" val_3(D), "Q" *ptr_2(D) :
> "memory", "cc");
>   # VUSE <.MEM_4>
>   return;
> ;;succ:   EXIT
> 
> }
> 
> which is, strictly speaking, not correct, because val_3(D) and
> valD.2269 are not constant.  But as far as I understand we are willing
> to tolerate trees like this until a certain point.
> 
> What is this point supposed to be?  If I understood you right, 
> 106t.thread1 is already too late - why is it so?
Well, you need to show more context to know if it's strictly OK.

Here's what I get with a cross in the .ssa dump:

;;   basic block 2, loop depth 0, maybe hot
;;prev block 0, next block 3, flags: (NEW)
;;pred:   ENTRY (FALLTHRU)
  _1 = __builtin_constant_p (i_5(D));
  if (_1 != 0)
goto ; [INV]
  else
goto ; [INV]
;;succ:   3 (TRUE_VALUE)
;;6 (FALSE_VALUE)

;;   basic block 3, loop depth 0, maybe hot
;;prev block 2, next block 4, flags: (NEW)
;;pred:   2 (TRUE_VALUE)
  if (i_5(D) >= -128)
goto ; [INV]
  else
goto ; [INV]
;;succ:   4 (TRUE_VALUE)
;;6 (FALSE_VALUE)

;;   basic block 4, loop depth 0, maybe hot
;;prev block 3, next block 5, flags: (NEW)
;;pred:   3 (TRUE_VALUE)
  if (i_5(D) <= 127)
goto ; [INV]
  else
goto ; [INV]
;;succ:   5 (TRUE_VALUE)
;;6 (FALSE_VALUE)

;;   basic block 5, loop depth 0, maybe hot
;;prev block 4, next block 6, flags: (NEW)
;;pred:   4 (TRUE_VALUE)
  _2 = _6(D)->counter;
  __atomic_add_const (i_5(D), _2);
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]
;;succ:   7 (FALLTHRU)

And that's OK.  In particular we have the _b_c_p call on i_5 and the only path 
to
the call to __atomic_add_const is predicated on the result of that b_c_p call. 
Furthermore, we use i_5 in the call to atomic_add_const.

Contrast that to the .thread1 dump:


;;   basic block 3, loop depth 0
;;pred:   2
  # iftmp.0_6 = PHI <1(2)>
  _7 = __builtin_constant_p (iftmp.0_6);
  if (_7 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]
;;succ:   6
;;8

;;   basic block 4, loop depth 0
;;pred:   2
  # iftmp.0_4 = PHI <-1(2)>
  _12 = __builtin_constant_p (iftmp.0_4);
  if (_12 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]
;;succ:   5
;;8

;;   basic block 5, loop depth 0
;;pred:   4
  if (iftmp.0_4 >= -128)
goto ; [20.00%]
  else
goto ; [80.00%]
;;succ:   7
;;8

;;   basic block 6, loop depth 0

Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-07-01 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote:
> On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc-patches
> wrote:
> > gcc/ChangeLog:
> > 
> > 2020-06-30  Ilya Leoshkevich  
> > 
> > * asan.c (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
> > * defaults.h (CODE_LABEL_BOUNDARY): New macro.
> > * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
> > * doc/tm.texi.in: Likewise.
> Don't we already have the ability to set label alignments?  See
> LABEL_ALIGN.

The following works with -falign-labels=2:

--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
-  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
+  SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) *
BITS_PER_UNIT);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode,
ASAN_SHADOW_SHIFT),

In order to go this way, we would need to raise `-falign-labels=`
default to 2 for s390, which is not incorrect, but would unnecessarily
clutter asm with `.align 2` before each label.  So IMHO it would be
nicer to simply ask the backend "what is your target's instruction
alignment?".



[PATCH] avoid -Wnonnull for lambda stubs (PR c++/95984)

2020-07-01 Thread Martin Sebor via Gcc-patches

The attached patch avoids null pointer checking for the first
argument to calls to the member operator() of lambda objects
emitted by the C++ front end.  This avoids both the spurious
-Wnonnull warnings for such calls as well as the ICE reported
in the bug.

In addition, the patch also avoids function argument checking
for any calls when the tf_warning bit isn't set.  This isn't
strictly necessary to avoid the ICE but it seems like a good
precaution against something similar occurring in other checks
in the future(*).

Finally, while testing the fix I noticed that the common code
doesn't recognize nullptr as a poiner when processing templates.
I've extended the handling to let it handle it as well.

Martin

[*] It seems to me that a more robust solution to prevent
the diagnostic subsystem from being re-entered as a result
of callbacks into the front end would be to have the pretty
printer disable all warnings prior to the bcallbacks and
re-enable them afterwards.  That would require an efficient
and reliable way of controlling all warnings (as well as
querying their state), which I think would be a useful
feature to have in any case.  For one thing, it would make
handling warnings and responding to #pragma GCC diagnostics
more robust.
Exclude calls to variadic lambda stubs from -Wnonnull checking (PR c++/95984).

Resolves:
PR c++/95984 - Internal compiler error: Error reporting routines re-entered in -Wnonnull on a variadic lamnda
PR c++/96021 - missing -Wnonnull passing nullptr to a nonnull variadic lambda

gcc/c-family/ChangeLog:

	PR c++/95984
	* c-common.c (check_function_nonnull):
	(check_nonnull_arg):

gcc/cp/ChangeLog:

	PR c++/95984
	* call.c (build_over_call):

gcc/testsuite/ChangeLog:

	PR c++/95984
	* g++.dg/warn/Wnonnull6.C: New test.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index aae1ddb6b89..695e6d27262 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5308,12 +5308,29 @@ check_function_nonnull (nonnull_arg_ctx , int nargs, tree *argarray)
   int firstarg = 0;
   if (TREE_CODE (ctx.fntype) == METHOD_TYPE)
 {
+  bool closure = false;
+  if (ctx.fndecl)
+	{
+	  /* For certain lambda expressions the C++ front end emits calls
+	 that pass a null this pointer as an argument named __closure
+	 to the member operator() of empty function.  Detect those
+	 and avoid checking them, but proceed to check the remaining
+	 arguments.  */
+	  tree arg0 = DECL_ARGUMENTS (ctx.fndecl);
+	  if (tree arg0name = DECL_NAME (arg0))
+	{
+	  const char *arg0str = IDENTIFIER_POINTER (arg0name);
+	  closure = !strcmp (arg0str, "__closure");
+	}
+	}
+
   /* In calls to C++ non-static member functions check the this
 	 pointer regardless of whether the function is declared with
 	 attribute nonnull.  */
   firstarg = 1;
-  check_function_arguments_recurse (check_nonnull_arg, , argarray[0],
-	firstarg);
+  if (!closure)
+	check_function_arguments_recurse (check_nonnull_arg, , argarray[0],
+	  firstarg);
 }
 
   tree attrs = lookup_attribute ("nonnull", TYPE_ATTRIBUTES (ctx.fntype));
@@ -5503,7 +5520,9 @@ check_nonnull_arg (void *ctx, tree param, unsigned HOST_WIDE_INT param_num)
  happen if the "nonnull" attribute was given without an operand
  list (which means to check every pointer argument).  */
 
-  if (TREE_CODE (TREE_TYPE (param)) != POINTER_TYPE)
+  tree paramtype = TREE_TYPE (param);
+  if (TREE_CODE (paramtype) != POINTER_TYPE
+  && TREE_CODE (paramtype) != NULLPTR_TYPE)
 return;
 
   /* Diagnose the simple cases of null arguments.  */
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d8923be1d68..5341a572980 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8842,15 +8842,16 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain)
   gcc_assert (j <= nargs);
   nargs = j;
 
-  /* Avoid to do argument-transformation, if warnings for format, and for
- nonnull are disabled.  Just in case that at least one of them is active
+  /* Avoid performing argument transformation if warnings are disabled.
+ When tf_warning is set and at least one of the warnings is active
  the check_function_arguments function might warn about something.  */
 
   bool warned_p = false;
-  if (warn_nonnull
-  || warn_format
-  || warn_suggest_attribute_format
-  || warn_restrict)
+  if ((complain & tf_warning)
+  && (warn_nonnull
+	  || warn_format
+	  || warn_suggest_attribute_format
+	  || warn_restrict))
 {
   tree *fargs = (!nargs ? argarray
 			: (tree *) alloca (nargs * sizeof (tree)));
diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull6.C b/gcc/testsuite/g++.dg/warn/Wnonnull6.C
new file mode 100644
index 000..dae6dd2d912
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wnonnull6.C
@@ -0,0 +1,37 @@
+/* PR c++/95984 - Internal compiler error: Error reporting routines re-entered
+   in -Wnonnull on a variadic lamnda
+   PR c++/96021 - 

Re: Fortran : Fortran translation issues PR52279

2020-07-01 Thread David Edelsohn via Gcc-patches
This patch breaks bootstrap.

It is not portable to use _( ... ) to initialize an array.

In file included from /nasfarm/edelsohn/src/src/gcc/fortran/gfortran.h:52,
 from /nasfarm/edelsohn/src/src/gcc/fortran/check.c:32:
/nasfarm/edelsohn/src/src/gcc/fortran/check.c: In function 'bool gfc_invalid_boz
(const char*, locus*)':
/nasfarm/edelsohn/src/src/gcc/intl.h:51:27: error: initializer fails
to determine size of 'hint'
   51 | # define _(msgid) gettext (msgid)
  |   ^~~
/nasfarm/edelsohn/src/src/gcc/fortran/check.c:70:23: note: in expansion of macro
 '_'
   70 |   const char hint[] = _(" [see %<-fno-allow-invalid-boz%>]");
  |   ^
/nasfarm/edelsohn/src/src/gcc/intl.h:51:27: error: array must be
initialized with a brace-enclosed initializer


Re: [PATCH v4 2/2] S/390: Define CODE_LABEL_BOUNDARY

2020-07-01 Thread Andreas Krebbel via Gcc-patches
On 01.07.20 14:29, Ilya Leoshkevich wrote:
...
> gcc/ChangeLog:
> 
> 2020-06-30  Ilya Leoshkevich  
> 
> * config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
> requires code labels to be aligned on a 2-byte boundary.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-30  Ilya Leoshkevich  
> 
> * gcc.target/s390/asan-no-gotoff.c: New test.

Ok. Thanks!

Andreas


Re: [PATCH] libstdc++: std::variant with multiple identical types assignment fail to compile [PR91807]

2020-07-01 Thread Ville Voutilainen via Gcc-patches
On Wed, 1 Jul 2020 at 20:46, Ville Voutilainen
 wrote:
>
> Looks like just a small thinko. We construct a temporary and move-construct
> from it, but we should construct the temporary with the right index.
>
> OK for trunk and gcc-10 if full tests pass?
>
> 2020-07-01  Ville Voutilainen  
>
> PR libstdc++/91807
> * include/std/variant
> (_Copy_assign_base::operator=(const _Copy_assign_base&):
> Do the move-construction from a temporary so that the temporary
> is constructed with an explicit index.
> * testsuite/20_util/variant/91807.cc: New.

And sure, s/move-construction/move-assignment/.


Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-07-01 Thread Jeff Law via Gcc-patches
On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc-patches wrote:
> gcc/ChangeLog:
> 
> 2020-06-30  Ilya Leoshkevich  
> 
>   * asan.c (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
>   * defaults.h (CODE_LABEL_BOUNDARY): New macro.
>   * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
>   * doc/tm.texi.in: Likewise.
Don't we already have the ability to set label alignments?  See LABEL_ALIGN.



jeff
> 



Re: Fortran : Fortran translation issues PR52279

2020-07-01 Thread Martin Sebor via Gcc-patches

Hi Mark,

I suspect this change may be behind a bootstrap failure I'm seeing.
My guess is that the hint needs to be a pointer rather than an array.

Thanks
Martin

In file included from /ssd/test/src/gcc/trunk/gcc/fortran/gfortran.h:52,
 from /ssd/test/src/gcc/trunk/gcc/fortran/check.c:32:
/ssd/test/src/gcc/trunk/gcc/fortran/check.c: In function ‘bool 
gfc_invalid_boz(const char*, locus*)’:
/ssd/test/src/gcc/trunk/gcc/intl.h:51:27: error: initializer fails to 
determine size of ‘hint’

   51 | # define _(msgid) gettext (msgid)
  |   ^~~
/ssd/test/src/gcc/trunk/gcc/fortran/check.c:70:23: note: in expansion of 
macro ‘_’

   70 |   const char hint[] = _(" [see %<-fno-allow-invalid-boz%>]");
  |   ^
/ssd/test/src/gcc/trunk/gcc/intl.h:51:27: error: array must be 
initialized with a brace-enclosed initializer

   51 | # define _(msgid) gettext (msgid)
  |   ^~~
/ssd/test/src/gcc/trunk/gcc/fortran/check.c:70:23: note: in expansion of 
macro ‘_’

   70 |   const char hint[] = _(" [see %<-fno-allow-invalid-boz%>]");
  |   ^


On 6/30/20 6:59 AM, Mark Eggleston wrote:

ping!

On 25/06/2020 07:28, Mark Eggleston wrote:
Patch to fix PR52279.  Marked strings that are indirectly used in 
error and warning messages so that they are marked for translation.


OK to commit?

Commit message:

Fortran  : Fortran translation issues PR52279

Mark strings for translation by enclosing in G_() and _().

2020-06-24  Mark Eggleston 

gcc/fortran/

    PR fortran/52279
    * arith.c (reduce_binary_aa): Mark for translation the string
    parameter to gfc_check_conformance with G_().
    * check.c (gfc_invalid_boz): Mark hint for translation using
    _().  (gfc_check_achar): Mark for translation the message
    parameter to gfc_invalid_boz using G_(). (gfc_check_char):
    Mark for translation the message parameter to gfc_invalid_boz
    using G_().  (gfc_check_complex): Mark for translation the
    message parameter to gfc_invalid_boz using G_().
    (gfc_check_float): Mark for translation the message
    parameter to gfc_invalid_boz using G_(). (check_rest): Mark
    for translation the string parameter to gfc_check_conformance
    with _().  (gfc_check_minloc_maxloc): Mark for translation
    the string parameter to gfc_check_conformance with _().
    (gfc_check_findloc): Mark for translation the string parameter
    to gfc_check_conformance with _(). (check_reduction): Mark
    for translation the string parameter to gfc_check_conformance
    with _().  (gfc_check_pack): Mark for translation the string
    parameter to gfc_check_conformance with _().
    * decl.c (match_old_style_init): Mark for translation the
    message parameter to gfc_invalid_boz using G_().
    * decl.c (gfc_check_assign): Mark for translation the string
    parameter to gfc_check_conformance with _().
    * intrinsic.c (check_specific): Mark for translation the string
    parameter to gfc_check_conformance with _().
    (gfc_check_intrinsic_standard): Mark symstd_msg strings for
    translation using G_(). No need to mark symstd_msg for
    translation in call to gfc_warning or when setting symstd.
    * io.c (check_open_constraints):  Mark strings for translation
    using G_() in all calls to warn_or_error. (match_io_element):
    Mark for translation the message parameter to gfc_invalid_boz
    using G_().
    * primary.c (match_boz_constant): Mark for translation the
    message parameter to gfc_invalid_boz using G_().
    * resolve.c (resolve_elemental_actual):  Mark for translation
    the string parameter to gfc_check_conformance with _().
    (resolve_operator):  Mark for translation the string parameter
    to gfc_check_conformance with _().  Mark translation strings
    assigned to msg using G_() for use in a call to cfg_warning.





Re: [PATCH] analyzer: Fix -Wanalyzer-possible-null-argument warning

2020-07-01 Thread Nathan Sidwell

On 7/1/20 1:29 PM, Jonathan Wakely wrote:

On 30/06/20 17:43 +0100, Jonathan Wakely wrote:

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/pr94028.C: Make operator new non-throwing so
that the compiler doesn't implicitly mark it as returning
non-null.

Fixes these:

FAIL: g++.dg/analyzer/pr94028.C  -std=c++98 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++2a (test for excess errors)


Updated to add PR 96014 to the commit log.

OK for master?


ok


--
Nathan Sidwell


[PATCH] libstdc++: std::variant with multiple identical types assignment fail to compile [PR91807]

2020-07-01 Thread Ville Voutilainen via Gcc-patches
Looks like just a small thinko. We construct a temporary and move-construct
from it, but we should construct the temporary with the right index.

OK for trunk and gcc-10 if full tests pass?

2020-07-01  Ville Voutilainen  

PR libstdc++/91807
* include/std/variant
(_Copy_assign_base::operator=(const _Copy_assign_base&):
Do the move-construction from a temporary so that the temporary
is constructed with an explicit index.
* testsuite/20_util/variant/91807.cc: New.
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c9504914365..eb3d6779205 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -611,7 +611,8 @@ namespace __variant
 		  this->_M_destructive_copy(__rhs_index, __rhs_mem);
 		else
 		  __variant_cast<_Types...>(*this)
-			= variant<_Types...>(__rhs_mem);
+			= variant<_Types...>(std::in_place_index<__rhs_index>,
+	 __rhs_mem);
 		  }
 	  }
 	else
diff --git a/libstdc++-v3/testsuite/20_util/variant/91807.cc b/libstdc++-v3/testsuite/20_util/variant/91807.cc
new file mode 100644
index 000..ddede7c9b32
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/variant/91807.cc
@@ -0,0 +1,34 @@
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+struct me_data {
+  me_data() = default;
+  
+  me_data(const me_data &) {};
+  me_data(me_data &&) noexcept {};
+  me_data& operator=(const me_data &) = default;
+};
+
+int main() {
+  std::variant v1, v2;
+  
+  v2 = v1;
+}


Re: [PATCH] analyzer: Fix -Wanalyzer-possible-null-argument warning

2020-07-01 Thread Jonathan Wakely via Gcc-patches

On 30/06/20 17:43 +0100, Jonathan Wakely wrote:

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/pr94028.C: Make operator new non-throwing so
that the compiler doesn't implicitly mark it as returning
non-null.

Fixes these:

FAIL: g++.dg/analyzer/pr94028.C  -std=c++98 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/analyzer/pr94028.C  -std=c++2a (test for excess errors)


Updated to add PR 96014 to the commit log.

OK for master?

commit 8fedcc43382868ffc345a9410584f668ff9c04c0
Author: Jonathan Wakely 
Date:   Tue Jun 30 17:40:08 2020 +0100

analyzer: Fix -Wanalyzer-possible-null-argument warning (PR 96014)

gcc/testsuite/ChangeLog:

PR testsuite/96014
* g++.dg/analyzer/pr94028.C: Make operator new non-throwing so
that the compiler doesn't implicitly mark it as returning
non-null.

diff --git a/gcc/testsuite/g++.dg/analyzer/pr94028.C b/gcc/testsuite/g++.dg/analyzer/pr94028.C
index 0a222d1b991..c0c35d65829 100644
--- a/gcc/testsuite/g++.dg/analyzer/pr94028.C
+++ b/gcc/testsuite/g++.dg/analyzer/pr94028.C
@@ -12,7 +12,7 @@ enum e {} i;
 
 struct j
 {
-  void *operator new (__SIZE_TYPE__ b)
+  void *operator new (__SIZE_TYPE__ b) throw()
   {
 return calloc (b, sizeof (int)); // { dg-warning "leak" }
   }


RE: [PATCH 5/6 ver 3] rs6000, Add vector splat builtin support

2020-07-01 Thread Carl Love via Gcc-patches
On Wed, 2020-07-01 at 12:00 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 29, 2020 at 03:31:48PM -0700, Carl Love wrote:
> > On Mon, 2020-06-29 at 16:58 -0500, Segher Boessenkool wrote:
> > > On Mon, Jun 29, 2020 at 02:29:54PM -0700, Carl Love wrote:
> > > > Segher:
> > > > 
> > > > On Thu, 2020-06-25 at 17:39 -0500, Segher Boessenkool wrote:
> > > > > > +;; Return 1 if op is a constant 32-bit floating point
> > > > > > value
> > > > > > +(define_predicate "f32bit_const_operand"
> > > > > > +  (match_code "const_double")
> > > > > > +{
> > > > > > +  if (GET_MODE (op) == SFmode)
> > > > > > +return 1;
> > > > > > +
> > > > > > +  else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >>
> > > > > > 32)
> > > > > > ==
> > > > > > 0))
> > > > > > +   {
> > > > > > +/* Value fits in 32-bits */
> > > > > > +return 1;
> > > > > > +}
> > > > > > +  else
> > > > > > +/* Not the expected mode.  */
> > > > > > +return 0;
> > > > > > +})
> > > > > 
> > > > > I don't think this is the correct test.  What you want to see
> > > > > is
> > > > > if
> > > > > the
> > > > > number in "op" can be converted to an IEEE single-precision
> > > > > number,
> > > > > and
> > > > > back again, losslessly.  (And subnormal SP numbers aren't
> > > > > allowed
> > > > > either, but NaNs and infinities are).
> > > > 
> > > > The predicate is used with the xxsplitw_v4sf
> > > > define_expand.  The
> > > > "user"
> > > > claims the given immediate bit pattern is the bit pattern for a
> > > > single
> > > > precision floating point number.  The immediate value is not
> > > > converted
> > > > to a float.  Rather we are splatting a bit pattern that the
> > > > "user"
> > > > already claims represents a 32-bit floating point value.  I
> > > > just
> > > > need
> > > > to make sure the immediate value actually fits into 32-bits.
> > > > 
> > > > I don't see that I need to check that the value can be
> > > > converted to
> > > > IEEE float and back.  
> > > 
> > > Ah, okay.  Can you please put that in the function comment
> > > then?  Right
> > > now it says
> > > ;; Return 1 if op is a constant 32-bit floating point value
> > > and that is quite misleading.
> > 
> > Would the following be more clear
> > 
> > ;; Return 1 if op is a constant bit pattern representing a floating
> > ;; point value that fits in 32-bits. 
> 
> Yes...  But I still don't get it.
> 
> Say you have the number  785.066650390625 .  As a single-precision
> IEEE
> float, that is x_.  But as a double-precision IEEE float, it
> is
> 0x4088__8000_, which does not fit in 32 bits!

The value 0x_ would be mode SFmode and the test would return 1.
We would be able to splat that 32-bit wide pattern.  

The value 0x4088__8000_ would be DFmode but when you shift it
right 32 bits it would be non-zero and 

   else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32)

would be true and the function would return 1 incorrectly.  Argh!

So yes, we have a problem.  The second part is supposed to be checking
for all zero not non zero.   The test should be:

   else if ((GET_MODE (op) == DFmode) && (((UINTVAL (op) >> 32) == 0)

  Carl 





[PATCH] Add power10 BRD/BRW/BRH support.

2020-07-01 Thread Michael Meissner via Gcc-patches
This patch adds support for the Power10 BRD, BRW, and BRH instructions that do
byte swapping on values in GPRs.

This patch is a revision of the byte swap patch proposed in:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546990.html

I changed the DI swap function bswapdi2_brd instead of bswapdi2_hw that I used
in the previous patch as requested.

I have changed all of the places that used 'future' to be 'power10'.

I have split the tests into 3 tests, one that tests BRH, one that tests BRW,
and one that tests BRD.  The BRD test must have a 64-bit ABI, but the other two
do not.

I used '*' for the length attribute where it generates a single instruction.

I did not change the tests in the splitter as I don't see any easier way to do
the test.  For reference, there are 3 cases:

   1) Power10 instruction swapping a GPR which does not get split
   2) Non-power10 instruction splitting the insn into component parts; (and)
   3) Power9 instruction swapping a vector register which does not get split.

I have done bootstrap compilers and run make check with and without this patch
on a little endian power9 system running Linux.  I have also done the boostrap
on a big endian power8 system with both 32/64-bit support, and the tests pass.
Can I check this patch into the master brannch?


gcc/
2020-06-30  Michael Meissner  

* config/rs6000/rs6000.md (bswaphi2_reg): Generate the BRH
instruction on ISA 3.1.
(bswapsi2_reg): Generate the BRW instruction on ISA 3.1.
(bswapdi2): Rename bswapdi2_xxbrd to bswapdi2_brd.
(bswapdi2_brd): Rename from bswapdi2_xxbrd.  Generate the BRD
instruction on ISA 3.1.

gcc/testsuite/
2020-06-30  Michael Meissner  

* gcc.target/powerpc/bswap-brd.c: New test.
* gcc.target/powerpc/bswap-brw.c: New test.
* gcc.target/powerpc/bswap-brh.c: New test.
---
 gcc/config/rs6000/rs6000.md  | 44 +++-
 gcc/testsuite/gcc.target/powerpc/bswap-brd.c | 23 +++
 gcc/testsuite/gcc.target/powerpc/bswap-brh.c | 12 
 gcc/testsuite/gcc.target/powerpc/bswap-brw.c | 23 +++
 4 files changed, 82 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brh.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brw.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7baaa61..86c8c02 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2586,15 +2586,16 @@ (define_insn "bswap2_store"
   [(set_attr "type" "store")])
 
 (define_insn_and_split "bswaphi2_reg"
-  [(set (match_operand:HI 0 "gpc_reg_operand" "=,wa")
+  [(set (match_operand:HI 0 "gpc_reg_operand" "=r,,wa")
(bswap:HI
-(match_operand:HI 1 "gpc_reg_operand" "r,wa")))
-   (clobber (match_scratch:SI 2 "=,X"))]
+(match_operand:HI 1 "gpc_reg_operand" "r,r,wa")))
+   (clobber (match_scratch:SI 2 "=X,,X"))]
   ""
   "@
+   brh %0,%1
#
xxbrh %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], HImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], 
HImode)"
   [(set (match_dup 3)
(and:SI (lshiftrt:SI (match_dup 4)
 (const_int 8))
@@ -2610,21 +2611,22 @@ (define_insn_and_split "bswaphi2_reg"
   operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0);
   operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
 }
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "*,12,*")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; We are always BITS_BIG_ENDIAN, so the bit positions below in
 ;; zero_extract insns do not change for -mlittle.
 (define_insn_and_split "bswapsi2_reg"
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=,wa")
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,,wa")
(bswap:SI
-(match_operand:SI 1 "gpc_reg_operand" "r,wa")))]
+(match_operand:SI 1 "gpc_reg_operand" "r,r,wa")))]
   ""
   "@
+   brw %0,%1
#
xxbrw %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], SImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], 
SImode)"
   [(set (match_dup 0)  ; DABC
(rotate:SI (match_dup 1)
   (const_int 24)))
@@ -2641,9 +2643,9 @@ (define_insn_and_split "bswapsi2_reg"
(and:SI (match_dup 0)
(const_int -256]
   ""
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "4,12,4")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like
 ;; we do for L{H,W}BRX and ST{H,W}BRX above.  If not, we have to generate more
@@ -2676,7 

Re: [PATCH 5/6 ver 3] rs6000, Add vector splat builtin support

2020-07-01 Thread Segher Boessenkool
Hi!

On Mon, Jun 29, 2020 at 03:31:48PM -0700, Carl Love wrote:
> On Mon, 2020-06-29 at 16:58 -0500, Segher Boessenkool wrote:
> > On Mon, Jun 29, 2020 at 02:29:54PM -0700, Carl Love wrote:
> > > Segher:
> > > 
> > > On Thu, 2020-06-25 at 17:39 -0500, Segher Boessenkool wrote:
> > > > > +;; Return 1 if op is a constant 32-bit floating point value
> > > > > +(define_predicate "f32bit_const_operand"
> > > > > +  (match_code "const_double")
> > > > > +{
> > > > > +  if (GET_MODE (op) == SFmode)
> > > > > +return 1;
> > > > > +
> > > > > +  else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32)
> > > > > ==
> > > > > 0))
> > > > > +   {
> > > > > +/* Value fits in 32-bits */
> > > > > +return 1;
> > > > > +}
> > > > > +  else
> > > > > +/* Not the expected mode.  */
> > > > > +return 0;
> > > > > +})
> > > > 
> > > > I don't think this is the correct test.  What you want to see is
> > > > if
> > > > the
> > > > number in "op" can be converted to an IEEE single-precision
> > > > number,
> > > > and
> > > > back again, losslessly.  (And subnormal SP numbers aren't allowed
> > > > either, but NaNs and infinities are).
> > > 
> > > The predicate is used with the xxsplitw_v4sf define_expand.  The
> > > "user"
> > > claims the given immediate bit pattern is the bit pattern for a
> > > single
> > > precision floating point number.  The immediate value is not
> > > converted
> > > to a float.  Rather we are splatting a bit pattern that the "user"
> > > already claims represents a 32-bit floating point value.  I just
> > > need
> > > to make sure the immediate value actually fits into 32-bits.
> > > 
> > > I don't see that I need to check that the value can be converted to
> > > IEEE float and back.  
> > 
> > Ah, okay.  Can you please put that in the function comment
> > then?  Right
> > now it says
> > ;; Return 1 if op is a constant 32-bit floating point value
> > and that is quite misleading.
> 
> Would the following be more clear
> 
> ;; Return 1 if op is a constant bit pattern representing a floating
> ;; point value that fits in 32-bits. 

Yes...  But I still don't get it.

Say you have the number  785.066650390625 .  As a single-precision IEEE
float, that is x_.  But as a double-precision IEEE float, it is
0x4088__8000_, which does not fit in 32 bits!


Seghr


[PATCH V3] Practical Improvement to libgcc Complex Divide

2020-07-01 Thread Patrick McGehearty via Gcc-patches
(Version 3)

(Added in version 3)
Support for half, float, extended, and long double precision has
been added to the prior work for double precision. Since half precision
is computed with float precision as per current libgcc practice,
the enhanced underflow/overflow tests provide no benefit for half
precision and would cost performance. Therefore half precision is
left unchanged.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.

Additional investigation showed that testing for when both parts of
the denominator had exponents roughly small enough to allow shifting
any subnormal values to normal values, all input values could be
scaled up without risking unnecessary overflow and gaining a clear
improvement in accuracy. The test and scaling values used all fit
within the allowed exponent range for each precision required by the C
standard. The remaining number of troubling results in version 3 is
measurably smaller than in versions 1 and 2.

The timing and precision tables below have been revised appropriately
to match the algorithms used in this version for double precision
and additional tables added to include results for other precisions.

In prior versions, I omitted mention of the bug report that started me
on this project: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714
complex division is surprising on targets with FMA (was: on aarch64)
With the proposed method, whether using FMA or not, dividing
1.0+3.0i by 1.0+3.0i correctly returns 1.0+0.0i.

I also have added a reference to Beebe's "The Mathematical Function
Computation Handbook" [4] which was my starting point for research
into better complex divide methods.

(Added for Version 2)
In my initial research, I missed Elen Kalda's proposed patch
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html [3]
Thanks to Joseph Myers for providing me with the pointer.
This version includes performance and accuracy comparisions
between Elen's proposed patch and my latest patch version for
double precision.

(from earlier Versions)

The following patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [1]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

NOTATION

For all of the following, the notation is:
Input complex values:
  a+bi  (a= real part, b= imaginary part)
  c+di
Output complex value:
  e+fi = (a+bi)/(c+di)

For the result tables:
current = current method (SMITH)
b1div = method proposed by Elen Kalda
b2div = alternate method considered by Elen Kalda
new1 = new method using 1 divide and 2 multiplies
new = new method proposed by this patch

DESCRIPTIONS of different complex divide methods:

NAIVE COMPUTATION (-fcx-limited-range):
  e = (a*c + b*d)/(c*c + d*d)
  f = (b*c - a*d)/(c*c + d*d)

Note that c*c and d*d will overflow or underflow if either
c or d is outside the range 2^-538 to 2^512.

This method is available in gcc when the switch -fcx-limited-range is
used. That switch is also enabled by -ffast-math. Only one who has a
clear understanding of the maximum range of intermediate values
generated by a computation should consider using this switch.

SMITH's METHOD (current libgcc):
  if(fabs(c) RBIG) || (FABS (a) > RBIG) || (FABS (b) > RBIG) ) {
  a = a * 0.5;
  b = b * 0.5;
  c = c * 0.5;
  d = d * 0.5;
  }
  /* minimize overflow/underflow issues when c and d are small */
  else if (FABS (d) < RMIN2) {
  a = a * RMINSCAL;
  b = b * RMINSCAL;
  c = c * RMINSCAL;
  d = d * RMINSCAL;
  }
  r = c/d; denom = (c*r) + d;
  if( r > RMIN ) {
  e = (a*r + b) / denom   ;
  f = (b*r - a) / denom
  } else {
  e = (c * (a/d) + b) / denom;
  f = (c * (b/d) - a) / denom;
  }
[ only presenting the fabs(c) < fabs(d) case here, full code in patch. ]

Before any computation of the answer, the code checks for any input
values near maximum to allow down scaling to avoid overflow.  These
scalings almost never harm the accuracy since 

Re: [PATCH] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-07-01 Thread Andrea Corallo
Andrea Corallo  writes:

>> It occurred to me that the entrypoint is combining two things:
>> - creating a global char[]
>> - creating an initializer for that global
>>
>> which got me wondering if we should instead have a way to add
>> initializers for globals.
>>
>> My first thought was something like:
>>
>> gcc_jit_context_new_global_with_initializer
>>
>> which would be like gcc_jit_context_new_global but would have an
>> additional gcc_jit_rvalue *init_value param?
>> The global would have to be of kind GCC_JIT_GLOBAL_EXPORTED or
>> GCC_JIT_GLOBAL_INTERNAL, not GCC_JIT_GLOBAL_IMPORTED.
>>
>> Alternatively, maybe it would be better to have
>>
>> gcc_jit_global_set_initializer (gcc_jit_lvalue *global,
>>  gcc_jit_rvalue *init_val);
>>
>> to make the API more flexible.
>>
>> But even if we had this, we'd still need some way to create the rvalue
>> for that initial value.  Also, maybe there ought to be a distinction
>> between rvalues that can vary at runtime vs those that can be computed
>> at compile-time (and are thus suitable for use in static
>> initialization).
>>
>> I suspect you may have gone through the same thought process and come
>> up with a simpler approach.   (I'm mostly just "thinking out loud"
>> here, sorry if it isn't very coherent).
>
> Yes I had kind of similar thoughs.
>
> Ideally would be good to have a generic solution, the complication is
> that as you mentioned not every rvalue is suitable for initializing
> every global, but rather the opposite.  My fear was that the space to be
> covered would be non trivial for a robust solution in this case.
>
> Also I believe we currently have no way to express in libgccjit rvalues
> an array with some content, so how to use this as initializer?  Perhaps
> also we should have a new type gcc_jit_initializer?
>
> On the other hand I thought that for simple things like integers the
> problem is tipically already solved with an assignment in some init code
> (infact I think so far nobody complained) while the real standing
> limitation is for blobs (perhaps I'm wrong).  And in the end if you can
> stuff some data in, you can use it later for any scope.
>
> Another "hybrid" solution would be to have specific entry point for each
> type of the subset we want to allow for static initialization.  This way
> we still control the creation of the initializer internally so it's
> safe.  In this view this blob entry point would be just one of these
> (probably with a different name like
> 'gcc_jit_context_new_glob_init_char_array').
>

Hi Dave,

wanted to ask if you formed an opinion about the patch and/or more in
general the problem of static initialize data.

Thanks

  Andrea


[committed] Fix m68k bootstrap

2020-07-01 Thread Jeff Law via Gcc-patches

Similar to what we've seen elsewhere.  The change to c++17 by default is causing
m68k to not bootstrap due to use of the "register" keyword.  This patch removes
the half-dozen or so instances I immediately saw.  I'll spin up another m68k
bootstrap attempt today/tonight.

Jeff


commit fb43b412502fd975b5ff40aa3cb4b3fdd5d7b39a
Author: Jeff Law 
Date:   Wed Jul 1 10:09:48 2020 -0600

Fix bootstrap for m68k.

gcc/
* config/m68k/m68k.c (m68k_output_btst): Drop "register" keyword.
(emit_move_sequence, output_iorsi3, output_xorsi3): Likewise.

diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 67b109447b3..36a0e34fb06 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -1946,7 +1946,7 @@ m68k_output_btst (rtx countop, rtx dataop, rtx_code code, 
int signpos)
 
   if (GET_CODE (countop) == CONST_INT)
 {
-  register int count = INTVAL (countop);
+  int count = INTVAL (countop);
   /* If COUNT is bigger than size of storage unit in use,
 advance to the containing unit of same size.  */
   if (count > signpos)
@@ -3850,9 +3850,9 @@ fp_reg_operand (rtx op, machine_mode mode 
ATTRIBUTE_UNUSED)
 int
 emit_move_sequence (rtx *operands, machine_mode mode, rtx scratch_reg)
 {
-  register rtx operand0 = operands[0];
-  register rtx operand1 = operands[1];
-  register rtx tem;
+  rtx operand0 = operands[0];
+  rtx operand1 = operands[1];
+  rtx tem;
 
   if (scratch_reg
   && reload_in_progress && GET_CODE (operand0) == REG
@@ -5474,7 +5474,7 @@ output_andsi3 (rtx *operands)
 const char *
 output_iorsi3 (rtx *operands)
 {
-  register int logval;
+  int logval;
   CC_STATUS_INIT;
   if (GET_CODE (operands[2]) == CONST_INT
   && INTVAL (operands[2]) >> 16 == 0
@@ -5513,7 +5513,7 @@ output_iorsi3 (rtx *operands)
 const char *
 output_xorsi3 (rtx *operands)
 {
-  register int logval;
+  int logval;
   CC_STATUS_INIT;
   if (GET_CODE (operands[2]) == CONST_INT
   && INTVAL (operands[2]) >> 16 == 0


RE: [PATCH][GCC-10 Backport] arm: Fix the failing mve scalar shift execution tests.

2020-07-01 Thread Kyrylo Tkachov



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 22 June 2020 17:21
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH][GCC-10 Backport] arm: Fix the failing mve scalar shift
> execution tests.
> 
> Hello,
> 
> In GCC testsuite the MVE scalar shift execution tests 
> (mve_scalar_shifts[1-4].c)
> are failings
> because of executing them on target hardware which doesn't support MVE
> instructions. This patch
> restricts those tests to execute only on target hardware that support MVE
> instructions.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for GCC-10 branch?
> 

Ok.
Thanks,
Kyrill

> Thanks,
> Srinath.
> 
> 2020-06-18  Srinath Parvathaneni  
> 
> gcc/
>   * doc/sourcebuild.texi (arm_v8_1m_mve_fp_ok): Add item.
>   (arm_mve_hw): Likewise.
> 
> gcc/testsuite/
>   * gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c: Modify.
>   * gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c: Likewise.
>   * lib/target-supports.exp (check_effective_target_arm_mve_hw):
> Define.
> 
> (cherry picked from commit 99abb146fd0923ebda2c7e7681adb18e6798a90c)
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index
> 240d6e4b08e2ed1a3742c4e82b98d284b9774216..563d29506e725562c593bb
> f6a067c06111f84f0e 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1915,6 +1915,15 @@ ARM target supports options to generate
> instructions from ARMv8.1-M with
>  the M-Profile Vector Extension (MVE). Some multilibs may be incompatible
>  with these options.
> 
> +@item arm_v8_1m_mve_fp_ok
> +ARM target supports options to generate instructions from ARMv8.1-M with
> +the Half-precision floating-point instructions (HP), Floating-point Extension
> +(FP) along with M-Profile Vector Extension (MVE). Some multilibs may be
> +incompatible with these options.
> +
> +@item arm_mve_hw
> +Test system supports executing MVE instructions.
> +
>  @item arm_v8m_main_cde
>  ARM target supports options to generate instructions from ARMv8-M with
>  the Custom Datapath Extension (CDE). Some multilibs may be incompatible
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c
> index
> e1c136e7f302c1824f0b00b5e7bc468ff5fcfe27..db2335fc76ed3204fcfc033ef67
> 77dc85dbd465f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-require-effective-target arm_mve_hw } */
>  /* { dg-options "-O2" } */
>  /* { dg-add-options arm_v8_1m_mve } */
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c
> index
> 0b5a8edb15849913d4f2849ab86decb286692386..5a63d385cb7e7d18345974
> 421035f3090f409554 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-require-effective-target arm_mve_hw } */
>  /* { dg-options "-O2" } */
>  /* { dg-add-options arm_v8_1m_mve } */
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c
> index
> 7e3da54f5e62467e2cdaabd85df3db127f608802..f0d5ee32f829c482cbeb4f1fb
> 1f714e01b43da87 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-require-effective-target arm_mve_hw } */
>  /* { dg-options "-O2" } */
>  /* { dg-add-options arm_v8_1m_mve } */
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c
> index
> 8bee12f7fdfaef51daf7e2f5d3c1e284115d2649..283742fcf982b6cf2746e7745a
> 3c5258e75d3982 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-require-effective-target arm_mve_hw } */
>  /* { dg-options "-O2" } */
>  /* { dg-add-options arm_v8_1m_mve } */
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index
> 

c++: Expose cloning form predicates

2020-07-01 Thread Nathan Sidwell

A further adjustment of the function cloning.  Rather than have
copy_fndecl_with_name deduce whether a particular cdtor needs a
vtt_parm and/or has inherited parms to drop, pass that information in
from the caller.  In particular build_cdtor_clones knows when its
building the particular cdtors that might need these.  On the modules
branch I need to clone cdtors before the underlying class information
is necessarily complete.  There build_cdtor_clones is externally
callable to facilitate that.

nathan

--
Nathan Sidwell
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index b0cc027e0de..7b5f1669d04 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -182,7 +182,6 @@ static void dfs_accumulate_vtbl_inits (tree, tree, tree, tree, tree,
 static void build_rtti_vtbl_entries (tree, vtbl_init_data *);
 static void build_vcall_and_vbase_vtbl_entries (tree, vtbl_init_data *);
 static void clone_constructors_and_destructors (tree);
-static tree build_clone (tree, tree);
 static void update_vtable_entry_for_fn (tree, tree, tree, tree *, unsigned);
 static void build_ctor_vtbl_group (tree, tree);
 static void build_vtt (tree);
@@ -4697,7 +4696,8 @@ check_methods (tree t)
 }
 
 static tree
-copy_fndecl_with_name (tree fn, tree name)
+copy_fndecl_with_name (tree fn, tree name, tree_code code,
+		   bool need_vtt_parm_p, bool omit_inherited_parms_p)
 {
   /* Copy the function.  */
   tree clone = copy_decl (fn);
@@ -4714,23 +4714,24 @@ copy_fndecl_with_name (tree fn, tree name)
   DECL_PENDING_INLINE_INFO (clone) = NULL;
   DECL_PENDING_INLINE_P (clone) = 0;
 
-  /* The base-class destructor is not virtual.  */
   if (name == base_dtor_identifier)
 {
+  /* The base-class destructor is not virtual.  */
   DECL_VIRTUAL_P (clone) = 0;
   DECL_VINDEX (clone) = NULL_TREE;
 }
-  else if (IDENTIFIER_OVL_OP_P (name))
+  else if (code != ERROR_MARK)
 {
-  const ovl_op_info_t *ovl_op = IDENTIFIER_OVL_OP_INFO (name);
+  /* Set the operator code.  */
+  const ovl_op_info_t *ovl_op = OVL_OP_INFO (false, code);
   DECL_OVERLOADED_OPERATOR_CODE_RAW (clone) = ovl_op->ovl_op_code;
-}
 
-  if (DECL_VIRTUAL_P (clone))
-IDENTIFIER_VIRTUAL_P (name) = true;
+  /* The operator could be virtual.  */
+  if (DECL_VIRTUAL_P (clone))
+	IDENTIFIER_VIRTUAL_P (name) = true;
+   }
 
-  bool ctor_omit_inherited_parms_p = ctor_omit_inherited_parms (clone);
-  if (ctor_omit_inherited_parms_p)
+  if (omit_inherited_parms_p)
 gcc_assert (DECL_HAS_IN_CHARGE_PARM_P (clone));
 
   /* If there was an in-charge parameter, drop it from the function
@@ -4744,13 +4745,12 @@ copy_fndecl_with_name (tree fn, tree name)
   /* Skip the in-charge parameter.  */
   parmtypes = TREE_CHAIN (parmtypes);
   /* And the VTT parm, in a complete [cd]tor.  */
-  if (DECL_HAS_VTT_PARM_P (fn)
-	  && ! DECL_NEEDS_VTT_PARM_P (clone))
+  if (DECL_HAS_VTT_PARM_P (fn) && !need_vtt_parm_p)
 	parmtypes = TREE_CHAIN (parmtypes);
-  if (ctor_omit_inherited_parms_p)
+  if (omit_inherited_parms_p)
 	{
 	  /* If we're omitting inherited parms, that just leaves the VTT.  */
-	  gcc_assert (DECL_NEEDS_VTT_PARM_P (clone));
+	  gcc_assert (need_vtt_parm_p);
 	  parmtypes = tree_cons (NULL_TREE, vtt_parm_type, void_list_node);
 	}
   TREE_TYPE (clone)
@@ -4766,6 +4766,7 @@ copy_fndecl_with_name (tree fn, tree name)
 
   /* Copy the function parameters.  */
   DECL_ARGUMENTS (clone) = copy_list (DECL_ARGUMENTS (clone));
+
   /* Remove the in-charge parameter.  */
   if (DECL_HAS_IN_CHARGE_PARM_P (clone))
 {
@@ -4773,10 +4774,11 @@ copy_fndecl_with_name (tree fn, tree name)
 	= DECL_CHAIN (DECL_CHAIN (DECL_ARGUMENTS (clone)));
   DECL_HAS_IN_CHARGE_PARM_P (clone) = 0;
 }
+
   /* And the VTT parm, in a complete [cd]tor.  */
   if (DECL_HAS_VTT_PARM_P (fn))
 {
-  if (DECL_NEEDS_VTT_PARM_P (clone))
+  if (need_vtt_parm_p)
 	DECL_HAS_VTT_PARM_P (clone) = 1;
   else
 	{
@@ -4788,7 +4790,7 @@ copy_fndecl_with_name (tree fn, tree name)
 
   /* A base constructor inheriting from a virtual base doesn't get the
  arguments.  */
-  if (ctor_omit_inherited_parms_p)
+  if (omit_inherited_parms_p)
 DECL_CHAIN (DECL_CHAIN (DECL_ARGUMENTS (clone))) = NULL_TREE;
 
   for (tree parms = DECL_ARGUMENTS (clone); parms; parms = DECL_CHAIN (parms))
@@ -4809,7 +4811,8 @@ copy_fndecl_with_name (tree fn, tree name)
 tree
 copy_operator_fn (tree fn, tree_code code)
 {
-  return copy_fndecl_with_name (fn, ovl_op_identifier (code));
+  return copy_fndecl_with_name (fn, ovl_op_identifier (code),
+code, false, false);
 }
 
 /* FN is a constructor or destructor.  Clone the declaration to create
@@ -4817,7 +4820,8 @@ copy_operator_fn (tree fn, tree_code code)
NAME.  */
 
 static tree
-build_clone (tree fn, tree name)
+build_clone (tree fn, tree name, bool need_vtt_parm_p,
+	 bool omit_inherited_parms_p)
 {
   tree clone;
 
@@ -4827,7 +4831,8 @@ build_clone (tree fn, tree name)
   clone = 

Re: [PATCH v2] aarch64: Add 64 bits fpcr fpsr getter/setter builtins

2020-07-01 Thread Andrea Corallo
Richard Sandiford  writes:

> OK with those changes, thanks.
>
> Richard

Hi Richard,

thanks for the quick review.

I've installed the patch with the suggested changes as:

0d7e5fa655e aarch64: Add 64 bit setter getter fpsr fpcr

Regards

  Andrea

gcc/ChangeLog

* config/aarch64/aarch64-builtins.c (aarch64_builtins): Add enums
for 64bits fpsr/fpcr getter setters builtin variants.
(aarch64_init_fpsr_fpcr_builtins): New function.
(aarch64_general_init_builtins): Modify to make use of the later.
(aarch64_expand_fpsr_fpcr_setter): New function.
(aarch64_general_expand_builtin): Modify to make use of the later.
* config/aarch64/aarch64.md (@aarch64_set_)
(@aarch64_get_): New patterns replacing and
generalizing 'get_fpcr', 'set_fpsr'.
* config/aarch64/iterators.md (GET_FPSCR, SET_FPSCR): New int
iterators.
(fpscr_name): New int attribute.
* doc/extend.texi (__builtin_aarch64_get_fpcr64)
(__builtin_aarch64_set_fpcr64, __builtin_aarch64_get_fpsr64)
(__builtin_aarch64_set_fpsr64): Add into AArch64 Built-in
Functions.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/get_fpcr64_1.c: New test.
* gcc.target/aarch64/set_fpcr64_1.c: New test.
* gcc.target/aarch64/get_fpsr64_1.c: New test.
* gcc.target/aarch64/set_fpsr64_1.c: New test.


[PATCH] libgomp: defined OMPD library-wide functions.

2020-07-01 Thread y2s1982 via Gcc-patches
This patch provides an initial set of definition for the rest of library-wide 
OMPD functions.

It addresses all feedbacks.

2020-07-01  Tony Sim  

libgomp/ChangeLog:

* libgompd.h: Include omp-tools.h.
(gompd_callbacks): New extern declaration.
* ompd-lib.c (gompd_callbacks): New declaration.
(ompd_initialized): New declaration.
(ompd_initialize): Checks callbacks and assigns it to gompd_callbacks.
(ompd_finalize): Add new function.

---
 libgomp/libgompd.h |  4 
 libgomp/ompd-lib.c | 16 ++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 9782828bff5..3a428e1c1e4 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -29,9 +29,13 @@
 #ifndef LIBGOMPD_H
 #define LIBGOMPD_H 1
 
+#include "omp-tools.h"
+
 #define ompd_stringify(x) ompd_str2(x)
 #define ompd_str2(x) #x
 
 #define OMPD_VERSION 201811
 
+extern ompd_callbacks_t gompd_callbacks;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-lib.c b/libgomp/ompd-lib.c
index f0ae9e85a7e..d5350e1045c 100644
--- a/libgomp/ompd-lib.c
+++ b/libgomp/ompd-lib.c
@@ -29,6 +29,9 @@
 #include "omp-tools.h"
 #include "libgompd.h"
 
+ompd_callbacks_t gompd_callbacks;
+static int ompd_initialized = 0;
+
 ompd_rc_t
 ompd_get_api_version (ompd_word_t *version)
 {
@@ -47,15 +50,24 @@ ompd_get_version_string (const char **string)
 ompd_rc_t
 ompd_initialize (ompd_word_t api_version, const ompd_callbacks_t *callbacks)
 {
-  static int ompd_initialized = 0;
+  if (!callbacks)
+return ompd_rc_bad_input;
 
   if (ompd_initialized)
 return ompd_rc_error;
 
+  gompd_callbacks = *callbacks;
+
   (void) api_version;
-  (void) callbacks;
 
   ompd_initialized = 1;
 
   return ompd_rc_ok;
 }
+
+ompd_rc_t
+ompd_finalize (void)
+{
+  ompd_initialized = 0;
+  return ompd_rc_ok;
+}
-- 
2.27.0



Re: [PATCH 5/7 v6] vect: Support vector load/store with length in vectorizer

2020-07-01 Thread Richard Sandiford
"Kewen.Lin"  writes:
> on 2020/7/1 上午3:53, Richard Sandiford wrote:
>> "Kewen.Lin"  writes:
>>>poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>> +  tree length_limit = NULL_TREE;
>>> +  /* For length, we need length_limit to check length in range.  */
>>> +  if (!vect_for_masking)
>>> +{
>>> +  poly_uint64 len_limit = nscalars_per_ctrl * rgc->factor;
>>> +  length_limit = build_int_cst (compare_type, len_limit);
>>> +}
>>>  
>>>/* Calculate the maximum number of scalar values that the rgroup
>>>   handles in total, the number that it handles for each iteration
>>> @@ -434,12 +445,12 @@ vect_set_loop_controls_directly (class loop *loop, 
>>> loop_vec_info loop_vinfo,
>>>tree nscalars_total = niters;
>>>tree nscalars_step = build_int_cst (iv_type, vf);
>>>tree nscalars_skip = niters_skip;
>>> -  if (nscalars_per_iter != 1)
>>> +  if (nscalars_per_iter_ft != 1)
>>>  {
>>>/* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that
>>>  these multiplications don't overflow.  */
>>> -  tree compare_factor = build_int_cst (compare_type, 
>>> nscalars_per_iter);
>>> -  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
>>> +  tree compare_factor = build_int_cst (compare_type, 
>>> nscalars_per_iter_ft);
>>> +  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter_ft);
>>>nscalars_total = gimple_build (preheader_seq, MULT_EXPR, 
>>> compare_type,
>>>  nscalars_total, compare_factor);
>>>nscalars_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
>>> @@ -509,7 +520,7 @@ vect_set_loop_controls_directly (class loop *loop, 
>>> loop_vec_info loop_vinfo,
>>>  NSCALARS_SKIP to that cannot overflow.  */
>>>   tree const_limit = build_int_cst (compare_type,
>>> LOOP_VINFO_VECT_FACTOR (loop_vinfo)
>>> -   * nscalars_per_iter);
>>> +   * nscalars_per_iter_ft);
>>>   first_limit = gimple_build (preheader_seq, MIN_EXPR, compare_type,
>>>   nscalars_total, const_limit);
>>>   first_limit = gimple_build (preheader_seq, PLUS_EXPR, compare_type,
>> 
>> It looks odd that we don't need to adjust the other nscalars_* values too.
>> E.g. the above seems to be comparing an unscaled nscalars_total with
>> a scaled nscalars_per_iter.  I think the units ought to “agree”,
>> both here and in the rest of the function.
>> 
>
> Sorry, I didn't quite follow this comment.  Both nscalars_totoal and
> nscalars_step are scaled here.  The remaining related nscalars_*
> seems only nscalars_skip, but length can't support skip.

Hmm, OK.  But in that case can you update the names of the variables
to match?  It's confusing to have some nscalars_* variables actually
count scalars (and thus have “nitems” equivalents) and other nscalars_*
variables count something else (and thus effectively be nitems_* variables
themselves).

>
>>> }
>>>  
>>> +  /* First iteration is full.  */
>> 
>> This comment belongs inside the “if”.
>> 
>
> Sorry, I might miss something, but isn't this applied for both?

I meant it should be…

>
>>>if (!init_ctrl)
>>> -   /* First iteration is full.  */
>>> -   init_ctrl = build_minus_one_cst (ctrl_type);
>>> +   {
>>> + if (vect_for_masking)
>>> +   init_ctrl = build_minus_one_cst (ctrl_type);
>>> + else
>>> +   init_ctrl = length_limit;
>>> +   }

  if (!init_ctrl)
{
  /* First iteration is full.  */
  if (vect_for_masking)
init_ctrl = build_minus_one_cst (ctrl_type);
  else
init_ctrl = length_limit;
}

since the comment only applies to the “!init_ctrl” case.  The point
of a nonnull init_ctrl is to create cases in which the first vector
is not a full vector.

>>>  
>>> […]
>>> @@ -2568,7 +2608,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
>>> niters, tree nitersm1,
>>>if (vect_epilogues
>>>&& LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>>&& prolog_peeling >= 0
>>> -  && known_eq (vf, lowest_vf))
>>> +  && known_eq (vf, lowest_vf)
>>> +  && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo))
>>>  {
>>>unsigned HOST_WIDE_INT eiters
>>> = (LOOP_VINFO_INT_NITERS (loop_vinfo)
>> 
>> I'm still not really convinced that this check is right.  It feels
>> like it's hiding a problem elsewhere.
>> 
>
> The comments above this hunk is that:
>
>   /* If we know the number of scalar iterations for the main loop we should
>  check whether after the main loop there are enough iterations left over
>  for the epilogue.  */
>
> So it's to check the ones in loop_vinfo->epilogue_vinfos whether can be 
> removed.
> And the main work in the loop is to remove epil_info from epilogue_vinfos.

Oops, I think I misread it as checking loop_vinfo rather than
epilogue_vinfo.  It makes more sense now. :-)

>>> +
>>> +  

Re: [PATCH 00/28] rs6000: Auto-generate builtins from descriptions

2020-07-01 Thread Bill Schmidt via Gcc-patches
Gentle ping to keep this in your inbox. :)  There are still many more 
important things ahead of this.


Bill

On 6/17/20 2:46 PM, Bill Schmidt via Gcc-patches wrote:

I posted a version of these patches back in stage 4 (February),
but we agreed that holding off until stage 1 was a better idea.
Since then I've made more progress and reorganized the patches
accordingly.  This group of patches lays groundwork, but does not
actually change GCC's behavior yet, other than to generate the
new initialization information and ignore it.

The current built-in support in the rs6000 back end requires at least
a master's degree in spelunking to comprehend.  It's full of cruft,
redundancy, and unused bits of code, and long overdue for a
replacement.  This is the first part of my project to do that.

My intent is to make adding new built-in functions as simple as adding
a few lines to a couple of files, and automatically generating as much
of the initialization, overload resolution, and expansion logic as
possible.  This patch series establishes the format of the input files
and creates a new program (rs6000-gen-builtins) to:

  * Parse the input files into an internal representation;
  * Generate a file of #defines (rs6000-vecdefines.h) for eventual
inclusion into altivec.h; and
  * Generate an initialization file to create and initialize tables of
built-in functions and overloads.

Patches 1, 3-7, and 9-19 contain the logic for rs6000-gen-builtins.
Patch 8 provides balanced tree search support for parsing scalability.
Patches 2 and 21-27 provide a first cut at the input files.
Patch 20 incorporates the new code into the GCC build.
Patch 28 adds comments to some existing files that will help
during the transition from the previous builtin mechanism.

The patch series is constructed so that any prefix set of the patches
can be upstreamed without breaking anything, so we can take the
reviews slowly.  There's still plenty of work left, but I think it
will be helpful to get this big chunk of patches upstream to make
further progress easier.

Thanks in advance for your reviews!


Bill Schmidt (28):
   rs6000: Initial create of rs6000-gen-builtins.c
   rs6000: Add initial input files
   rs6000: Add file support and functions for diagnostic support
   rs6000: Add helper functions for parsing
   rs6000: Add functions for matching types, part 1 of 3
   rs6000: Add functions for matching types, part 2 of 3
   rs6000: Add functions for matching types, part 3 of 3
   rs6000: Red-black tree implementation for balanced tree search
   rs6000: Main function with stubs for parsing and output
   rs6000: Parsing built-in input file, part 1 of 3
   rs6000: Parsing built-in input file, part 2 of 3
   rs6000: Parsing built-in input file, part 3 of 3
   rs6000: Parsing of overload input file
   rs6000: Build and store function type identifiers
   rs6000: Write output to the vector definition include file
   rs6000: Write output to the builtins header file
   rs6000: Write output to the builtins init file, part 1 of 3
   rs6000: Write output to the builtins init file, part 2 of 3
   rs6000: Write output to the builtins init file, part 3 of 3
   rs6000: Incorporate new builtins code into the build machinery
   rs6000: Add remaining MASK_ALTIVEC builtins
   rs6000: Add MASK_VSX builtins
   rs6000: Add available-everywhere and ancient builtins
   rs6000: Add Power7 builtins
   rs6000: Add MASK_P8_VECTOR builtins
   rs6000: Add MASK_P9_VECTOR and MASK_P9_MISC builtins
   rs6000: Add remaining builtins
   rs6000: Add comments to help with transition

  gcc/config.gcc   |3 +-
  gcc/config/rs6000/rbtree.c   |  233 ++
  gcc/config/rs6000/rbtree.h   |   51 +
  gcc/config/rs6000/rs6000-builtin-new.def | 2965 ++
  gcc/config/rs6000/rs6000-builtin.def |   15 +
  gcc/config/rs6000/rs6000-call.c  |  166 ++
  gcc/config/rs6000/rs6000-gen-builtins.c  | 2586 +++
  gcc/config/rs6000/rs6000-overload.def|   57 +
  gcc/config/rs6000/t-rs6000   |   25 +-
  9 files changed, 6099 insertions(+), 2 deletions(-)
  create mode 100644 gcc/config/rs6000/rbtree.c
  create mode 100644 gcc/config/rs6000/rbtree.h
  create mode 100644 gcc/config/rs6000/rs6000-builtin-new.def
  create mode 100644 gcc/config/rs6000/rs6000-gen-builtins.c
  create mode 100644 gcc/config/rs6000/rs6000-overload.def



RE: Chemical industries list

2020-07-01 Thread Brittnay Wall
Hi,

How are you doing today?

I am following up with my previous email to check if you are interested in
our services.

Did you receive my email? Please let me know your thoughts.

Thank you and I look forward to hearing from you.

Regards

Brittany Wall

*From:* Brittnay Wall [mailto:brittany.w...@informativesearchus.com]
*Sent:* Tuesday, June 09, 2020 12:18 PM
*To:* 'gcc-patches@gcc.gnu.org'
*Subject:* Chemical industries list



Hi,



Hope you are safe and doing great.



Would you be interested in acquiring the records of Chemical industries
list to promote your product and service?



*Key contacts*: Ceo, Engineers, Chemists, Health and Safety Specialists,
Soil and Plant Scientists and more.



If you are interested please let me know your target criteria, so that I
can filter the counts and revert with pricing option.



Looking forward to hear from you.



Regards,

*Brittany Wall |Manager Demand Generation|*



If you do not wish to receive further mail please reply with “Good Day” in
your subject line.


Re: [PATCH] nvptx: Add support for subword compare-and-swap

2020-07-01 Thread Tom de Vries
On 6/15/20 10:28 PM, Kwok Cheung Yeung wrote:
> Hello
> 
> This patch adds support on nvptx for __sync_val_compare_and_swap
> operations on 1- and 2-byte values. The implementation is a straight
> copy of the version for AMD GCN.

Actually it also adds support for __sync_bool_compare_and_swap, be sure
to mention this in the commit log.

> I have added a new libgomp test that exercises the new operation. I have
> also verified that the new code does not cause any regressions on the
> nvptx offloading tests, and that the new test passes with both nvptx and
> amdgcn as offload targets.
> 

AFAICT, that excercises __sync_val_compare_and_swap, but not
__sync_bool_compare_and_swap.

You've covered offloading, but now consider the nvptx standalone target.

I was surprised to find only one generic test-case exercising char/short
__sync_val_compare_and_swap:
...
$ find gcc/testsuite/ -type f | xargs grep -l sync_char_short | xargs
grep sync_val
gcc/testsuite/gcc.dg/pr49696.c:  __sync_val_compare_and_swap (x, 1, 0);
...
and that's not a run test.

So, I think gcc needs a copy of (some of) the
gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
sync_char_short.

However, since this patch only adds partial support, we cannot enable
sync_char_short for nvptx yet.  So, if you stick to partial support, you
should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
ideally could be an include of a generic test-case that is active for
sync_char_short only, with mention that it can be removed once
sync_char_short is enabled for nvptx).

> Okay for master and OG10?
> 

I looked at the implementation, and it looks ok to me, though I think we
need to make explicit in a comment what the assumptions are:
- that we have read and write access to the entire word, and
- that the word is not volatile.

As for the discussion about libgcc vs backend implementation, I think
it's good to have both.  In particular, a backend implementation might
be able to take advantage of the address space and issue an atom.shared.cas.

But since the current patch fixes something, I don't want to hold it
until a backend implementation is done.

So, config/nvptx part okay for master, provided:
- nvptx test-case added
- nvptx standalone target testing done
- assumption comment added to implementation.
This bit can be committed as a separate patch, without the oacc test-case.

As for the oacc test-case, you could add the __int128 bit, perhaps along
the lines of how things are done in
libgomp/testsuite/libgomp.c++/target-8.C ?

[ FWIW, it would be nice if the choice between critical section and
atomic builtin was postponed till after the host/offload compiler split.
 Then the nvptx compiler would just fall back on the critical section,
and the test-case would have worked out of the box, and there would be
no need for the target to pretend to support unsupported atomic operators.
Conversely, we could upgrade the omp variables from char/short to int
somehow to make sure the assumptions for using non-native subword
__sync_val_compare_and_swap are met. ]

Thanks,
- Tom

> Kwok
> 
> 
> nvptx_subword.patch
> 
> commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3
> Author: Kwok Cheung Yeung 
> Date:   Mon Jun 15 12:34:55 2020 -0700
> 
> nvptx: Add support for subword compare-and-swap
> 
> 2020-06-15  Kwok Cheung Yeung  
> 
>   libgcc/
>   * config/nvptx/atomic.c: New.
>   * config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
> 
>   libgomp/
>   * testsuite/libgomp.c-c++-common/reduction-16.c: New.
> 
> diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
> new file mode 100644
> index 000..4becbd2
> --- /dev/null
> +++ b/libgcc/config/nvptx/atomic.c
> @@ -0,0 +1,59 @@
> +/* NVPTX atomic operations
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   Contributed by Mentor Graphics.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#include 
> +
> +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)   \
> +

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-01 Thread Jan Hubicka
> On 7/1/20 3:15 AM, Martin Liška wrote:
> > On 6/30/20 3:24 PM, Nathan Sidwell wrote:
> > > On 6/30/20 8:53 AM, Martin Liška wrote:
> 
> > Yes, if there's a real need for a solid compression, then I would use a
> > generic compressor of a final streamed file.
> 
> for avoidance of doubt, I'm good with your current solution

I think special casing zeros is good even in combination with a generic
compressor because it is very common special case to deal with (for
programs that exit frequently and have a lot of basic blocks like GCC
generic comporessor may easilly turn out to be a bottleneck if fed with
overhelming quantity of redundant data). I am also happy with current solution.

Thanks for working on it!
Honza
> 
> nathan
> 
> 
> -- 
> Nathan Sidwell


Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-01 Thread Nathan Sidwell

On 7/1/20 3:15 AM, Martin Liška wrote:

On 6/30/20 3:24 PM, Nathan Sidwell wrote:

On 6/30/20 8:53 AM, Martin Liška wrote:


Yes, if there's a real need for a solid compression, then I would use a 
generic compressor of a final streamed file.


for avoidance of doubt, I'm good with your current solution

nathan


--
Nathan Sidwell


[wwwdocs] Update C++ core issues table

2020-07-01 Thread Marek Polacek via Gcc-patches
Pushed.

commit 59ae91dc7fff5d0745ab3410ce9e71d127c4b148
Author: Marek Polacek 
Date:   Wed Jul 1 09:17:17 2020 -0400

C++ DRs: Add new CWGs, update existing ones.

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index 85924fd1..8a107dfd 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -4092,7 +4092,7 @@
 
 
   https://wg21.link/cwg581;>581
-  DR
+  DRWP
   Can a templated constructor be explicitly instantiated or 
specialized?
   ?
   
@@ -4799,7 +4799,7 @@
 
 
   https://wg21.link/cwg682;>682
-  tentatively ready
+  DRWP
   Missing description of lookup of template aliases
   ?
   
@@ -11375,11 +11375,11 @@
   -
   
 
-
+
   https://wg21.link/cwg1621;>1621
-  drafting
+  DRWP
   Member initializers in anonymous unions
-  -
+  ?
   
 
 
@@ -13589,14 +13589,14 @@
 
 
   https://wg21.link/cwg1937;>1937
-  DR
+  DRWP
   Incomplete specification of function pointer from lambda
   ?
   
 
 
   https://wg21.link/cwg1938;>1938
-  DR
+  DRWP
   Should hosted/freestanding be implementation-defined?
   ?
   
@@ -14170,7 +14170,7 @@
 
 
   https://wg21.link/cwg2020;>2020
-  DR
+  DRWP
   Inadequate description of odr-use of implicitly-invoked 
functions
   ?
   
@@ -14387,7 +14387,7 @@
 
 
   https://wg21.link/cwg2051;>2051
-  DR
+  DRWP
   Simplifying alias rules
   ?
   
@@ -14399,11 +14399,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2053;>2053
-  drafting
+  DR
   auto in non-generic lambdas
-  -
+  ?
   
 
 
@@ -14611,7 +14611,7 @@
 
 
   https://wg21.link/cwg2083;>2083
-  DR
+  DRWP
   Incorrect cases of odr-use
   ?
   
@@ -14751,7 +14751,7 @@
 
 
   https://wg21.link/cwg2103;>2103
-  DR
+  DRWP
   Lvalue-to-rvalue conversion is irrelevant in odr-use of a 
reference
   ?
   dup of 2083
@@ -14910,11 +14910,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2126;>2126
-  drafting
+  DRWP
   Lifetime-extended temporaries in constant expressions
-  -
+  ?
   
 
 
@@ -15220,7 +15220,7 @@
 
 
   https://wg21.link/cwg2170;>2170
-  DR
+  DRWP
   Unclear definition of odr-use for arrays
   ?
   dup of 2083
@@ -15479,7 +15479,7 @@
 
 
   https://wg21.link/cwg2207;>2207
-  tentatively ready
+  DRWP
   Alignment of allocation function return value
   ?
   
@@ -15822,14 +15822,14 @@
 
 
   https://wg21.link/cwg2256;>2256
-  DR
+  DRWP
   Lifetime of trivially-destructible objects
   ?
   
 
 
   https://wg21.link/cwg2257;>2257
-  DR
+  DRWP
   Lifetime extension of references vs exceptions
   ?
   
@@ -15892,14 +15892,14 @@
 
 
   https://wg21.link/cwg2266;>2266
-  DR
+  DRWP
   Has dependent type vs is type-dependent
   Yes
   
 
 
   https://wg21.link/cwg2267;>2267
-  DR
+  DRWP
   Copy-initialization of temporary in reference 
direct-initialization
   Yes
   
@@ -15976,7 +15976,7 @@
 
 
   https://wg21.link/cwg2278;>2278
-  DR
+  DRWP
   Copy elision in constant expressions reconsidered
   ?
   
@@ -15990,7 +15990,7 @@
 
 
   https://wg21.link/cwg2280;>2280
-  review
+  DRWP
   Matching a usual deallocation function with placement new
   ?
   
@@ -16002,11 +16002,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2282;>2282
-  drafting
+  DRWP
   Consistency with mismatched aligned/non-over-aligned 
allocation/deallocation functions
-  -
+  ?
   
 
 
@@ -16053,7 +16053,7 @@
 
 
   https://wg21.link/cwg2289;>2289
-  DR
+  DRWP
   Uniqueness of decomposition declaration names
   11
   https://gcc.gnu.org/PR94553;>PR94553
@@ -16130,7 +16130,7 @@
 
 
   https://wg21.link/cwg2300;>2300
-  tentatively ready
+  DRWP
   Lambdas in multiple definitions
   ?
   
@@ -16151,7 +16151,7 @@
 
 
   https://wg21.link/cwg2303;>2303
-  DR
+  DRWP
   Partial ordering and recursive variadic inheritance
   ?
   
@@ -16193,14 +16193,14 @@
 
 
   https://wg21.link/cwg2309;>2309
-  DR
+  DRWP
   Restrictions on nested statements within constexpr 
functions
   ?
   
 
 
   https://wg21.link/cwg2310;>2310
-  DR
+  DRWP
   Type completeness and derived-to-base pointer conversions
   No
   https://gcc.gnu.org/PR94302;>PR94302
@@ -16249,14 +16249,14 @@
 
 
   

[PATCH 1/7 v8] ifn/optabs: Support vector load/store with length

2020-07-01 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2020/6/30 下午11:32, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> on 2020/6/29 下午6:07, Richard Sandiford wrote:
>>> Thanks for the update.  I agree with the summary of the IRC discussion
>>> except for…
>>>
>>> "Kewen.Lin"  writes:
 Hi Richard S./Richi/Jim/Segher,

 Thanks a lot for your comments to make this patch more solid.

 Based on our discussion, for the vector load/store with length
 optab, the length unit would be measured in lanes by default.
 For the targets which support length measured in bytes like Power,
 they should only define VnQI modes to wrap the other same size
 vector modes.  If the length is larger than total lane/byte count
 of the given mode, it's taken to load all lanes/bytes implicitly.
>>>
>>> …this last bit.  IMO the behaviour of the optab should be undefined
>>> when the supplied length is greater than the number of lanes.
>>>
>>> I think that also makes things better for the lxvl implementation,
>>> which ignores the upper 56 bits of the length.  It sounds like the
>>> above semantics would instead require Power to saturate the value
>>> at 255 before shifting it.
>>>
>>
>> Good catch, I just realized that this part is inconsistent to what I
>> implemented in patch 5/7, where the function vect_gen_len still does
>> the min operation between the given length and length_limit.
>>
>> This patch is updated accordingly to state the behavior to be undefined.
>> The others aren't required to change.
>>
>> Could you have a further look? Thanks in advance!
>>
>> v6/v7: Updated optab descriptions.
>>
>> v5:
>>   - Updated lenload/lenstore optab to len_load/len_store and the docs.
>>   - Rename expand_mask_{load,store}_optab_fn to 
>> expand_partial_{load,store}_optab_fn
>>   - Added/updated macros for expand_mask_{load,store}_optab_fn
>> and expand_len_{load,store}_optab_fn
>>
>> v4: Update len_load_direct/len_store_direct to align with direct optab.
>>
>> v3: Get rid of length mode hook.
> 
> Thanks, mostly looks good, just some comments about the documentation…
> 

Thanks here again!!!

V8 attached with updates according to your comments!  

Could you have a check again?  Thanks!

-

v6/v7/v8: Updated optab descriptions.

v5:
  - Updated lenload/lenstore optab to len_load/len_store and the docs.
  - Rename expand_mask_{load,store}_optab_fn to 
expand_partial_{load,store}_optab_fn
  - Added/updated macros for expand_mask_{load,store}_optab_fn
and expand_len_{load,store}_optab_fn

v4: Update len_load_direct/len_store_direct to align with direct optab.

v3: Get rid of length mode hook.

BR,
Kewen
-
gcc/ChangeLog:

2020-MM-DD  Kewen Lin  

* doc/md.texi (len_load_@var{m}): Document.
(len_store_@var{m}): Likewise.
* internal-fn.c (len_load_direct): New macro.
(len_store_direct): Likewise.
(expand_len_load_optab_fn): Likewise.
(expand_len_store_optab_fn): Likewise.
(direct_len_load_optab_supported_p): Likewise.
(direct_len_store_optab_supported_p): Likewise.
(expand_mask_load_optab_fn): New macro.  Original renamed to ...
(expand_partial_load_optab_fn): ... here.  Add handlings for
len_load_optab.
(expand_mask_store_optab_fn): New macro.  Original renamed to ...
(expand_partial_store_optab_fn): ... here. Add handlings for
len_store_optab.
(internal_load_fn_p): Handle IFN_LEN_LOAD.
(internal_store_fn_p): Handle IFN_LEN_STORE.
(internal_fn_stored_value_index): Handle IFN_LEN_STORE.
* internal-fn.def (LEN_LOAD): New internal function.
(LEN_STORE): Likewise.
* optabs.def (len_load_optab, len_store_optab): New optab.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2c67c818da5..2b462869437 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5167,6 +5167,32 @@ mode @var{n}.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_load_@var{m}} instruction pattern
+@item @samp{len_load_@var{m}}
+Load the number of vector elements specified by operand 2 from memory
+operand 1 into vector register operand 0, setting the other elements of
+operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
+which must be a vector mode.  Operand 2 has whichever integer mode the
+target prefers.  If operand 2 exceeds the number of elements in mode
+@var{m}, the behavior is undefined.  If the target prefers the length
+to be measured in bytes rather than elements, it should only implement
+this pattern for vectors of @code{QI} elements.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_store_@var{m}} instruction pattern
+@item @samp{len_store_@var{m}}
+Store the number of vector elements specified by operand 2 from vector
+register operand 1 into memory operand 0, leaving the other elements of
+operand 0 unchanged.  Operands 0 and 1 have mode @var{m}, which must be
+a vector mode.  

Re: [PATCH 5/7 v6] vect: Support vector load/store with length in vectorizer

2020-07-01 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Many thanks for your great review comments!

on 2020/7/1 上午3:53, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 06a04e3d7dd..284c15705ea 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -13389,6 +13389,13 @@ by the copy loop headers pass.
>>  @item vect-epilogues-nomask
>>  Enable loop epilogue vectorization using smaller vector size.
>>  
>> +@item vect-with-length-scope
> 
> In principle there's nothing length-specific about this option.
> We could do the same for masks or for any future loop control
> mechanism.  So how about vect-partial-vector-usage instead?
> 

Sounds good, will update it. 

[snip] 

I will also update as the comments in snipped parts (if they have)
snip some of them to have good readablity.

>> +  machine_mode vmode;
>> +  /* Check whether the related VnQI vector mode exists, as well as
>> + optab supported.  */
>> +  if (related_vector_mode (mode, emode, nunits).exists ()
>> +  && direct_optab_handler (op, vmode))
>> +{
>> +  unsigned int mul;
>> +  scalar_mode orig_emode = GET_MODE_INNER (mode);
>> +  poly_uint64 orig_esize = GET_MODE_SIZE (orig_emode);
>> +
>> +  if (constant_multiple_p (orig_esize, esize, ))
>> +*factor = mul;
>> +  else
>> +gcc_unreachable ();
> 
> This is just:
> 
> *factor = GET_MODE_UNIT_SIZE (mode);
> 
> However, I think it would be better to return the vector mode that the
> load or store should use, instead of this factor.  That way we can reuse
> it when generating the load and store statements.
> 
> So maybe call the function get_len_load_store_mode and return an
> opt_machine_mode.
> 

Will improve it.

>> diff --git a/gcc/params.opt b/gcc/params.opt
>> index 9b564bb046c..daa6e8a2beb 100644
>> --- a/gcc/params.opt
>> +++ b/gcc/params.opt
>> @@ -968,4 +968,8 @@ Bound on number of runtime checks inserted by the 
>> vectorizer's loop versioning f
>>  Common Joined UInteger Var(param_vect_max_version_for_alignment_checks) 
>> Init(6) Param Optimization
>>  Bound on number of runtime checks inserted by the vectorizer's loop 
>> versioning for alignment check.
>>  
>> +-param=vect-with-length-scope=
>> +Common Joined UInteger Var(param_vect_with_length_scope) Init(0) 
>> IntegerRange(0, 2) Param Optimization
>> +Control the vector with length exploitation scope.
> 
> Think this should be a bit more descriptive, at least saying what the
> three values are (but in a more abbreviated form than the .texi above).
> 
> I think the default should be 2, with targets actively turning it down
> where necessary.  That way, the decision to turn it down is more likely
> to have a comment explaining why.
> 

Will update both.

>> +
>>tree ctrl_type = rgc->type;
>> -  unsigned int nscalars_per_iter = rgc->max_nscalars_per_iter;
>> +  /* Scale up nscalars per iteration with factor.  */
>> +  unsigned int nscalars_per_iter_ft = rgc->max_nscalars_per_iter * 
>> rgc->factor;
> 
> Maybe “scaled_nscalars_per_iter”?  Not sure the comment really adds
> anything here.
> 
> Or maybe “nitems_per_iter”, to keep the names shorter?
> 

Will use the short one.

>>poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +  tree length_limit = NULL_TREE;
>> +  /* For length, we need length_limit to check length in range.  */
>> +  if (!vect_for_masking)
>> +{
>> +  poly_uint64 len_limit = nscalars_per_ctrl * rgc->factor;
>> +  length_limit = build_int_cst (compare_type, len_limit);
>> +}
>>  
>>/* Calculate the maximum number of scalar values that the rgroup
>>   handles in total, the number that it handles for each iteration
>> @@ -434,12 +445,12 @@ vect_set_loop_controls_directly (class loop *loop, 
>> loop_vec_info loop_vinfo,
>>tree nscalars_total = niters;
>>tree nscalars_step = build_int_cst (iv_type, vf);
>>tree nscalars_skip = niters_skip;
>> -  if (nscalars_per_iter != 1)
>> +  if (nscalars_per_iter_ft != 1)
>>  {
>>/* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that
>>   these multiplications don't overflow.  */
>> -  tree compare_factor = build_int_cst (compare_type, nscalars_per_iter);
>> -  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
>> +  tree compare_factor = build_int_cst (compare_type, 
>> nscalars_per_iter_ft);
>> +  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter_ft);
>>nscalars_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
>>   nscalars_total, compare_factor);
>>nscalars_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
>> @@ -509,7 +520,7 @@ vect_set_loop_controls_directly (class loop *loop, 
>> loop_vec_info loop_vinfo,
>>   NSCALARS_SKIP to that cannot overflow.  */
>>tree const_limit = build_int_cst (compare_type,
>>  LOOP_VINFO_VECT_FACTOR (loop_vinfo)

[PATCH] nvptx: : Add support for popcount and widening multiply instructions

2020-07-01 Thread Roger Sayle

The following patch adds support for the popc and mul.wide instructions to the 
nvptx backend.
I've a follow-up patch for supporting mul.hi instructions, but those changes 
require some minor
tweaks to GCC's middle-end, so I'll submit those pieces separately.

Tested by "make" and "make -k check" on --build=nvptx-none hosted on 
x86_64-pc-linux-gnu with
no new regressions.

2020-07-01  Roger Sayle  

gcc/ChangeLog:
* config/nvptx/nvptx.md (popcount2): New instructions.
(mulhishi3, mulsidi3, umulhisi3, umulsidi3): New instructions.

gcc/testsuite/ChangeLog:
* gcc.target/nvptx/popc-1.c: New test.
* gcc.target/nvptx/popc-2.c: New test.
* gcc.target/nvptx/popc-3.c: New test.
* gcc.target/nvptx/mul-wide.c: New test.
* gcc.target/nvptx/umul-wide.c: New test.


Ok for mainline?

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 089cdf0..5ceeac7 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -493,6 +493,50 @@
   DONE;
 })
 
+(define_insn "popcount2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (popcount:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tpopc.b%T1\\t%0, %1;")
+
+;; Multiplication variants
+
+(define_insn "mulhisi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (mult:SI (sign_extend:SI
+ (match_operand:HI 1 "nvptx_register_operand" "R"))
+(sign_extend:SI
+ (match_operand:HI 2 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tmul.wide.s16\\t%0, %1, %2;")
+
+(define_insn "mulsidi3"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+   (mult:DI (sign_extend:DI
+ (match_operand:SI 1 "nvptx_register_operand" "R"))
+(sign_extend:DI
+ (match_operand:SI 2 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tmul.wide.s32\\t%0, %1, %2;")
+
+(define_insn "umulhisi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (mult:SI (zero_extend:SI
+ (match_operand:HI 1 "nvptx_register_operand" "R"))
+(zero_extend:SI
+ (match_operand:HI 2 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tmul.wide.u16\\t%0, %1, %2;")
+
+(define_insn "umulsidi3"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+   (mult:DI (zero_extend:DI
+ (match_operand:SI 1 "nvptx_register_operand" "R"))
+(zero_extend:DI
+ (match_operand:SI 2 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tmul.wide.u32\\t%0, %1, %2;")
+
 ;; Shifts
 
 (define_insn "ashl3"
/* { dg-do compile } */
/* { dg-options "-O2" } */

unsigned int foo(unsigned int x)
{
  return __builtin_popcount(x);
}

/* { dg-final { scan-assembler-times "popc.b32" 1 } } */
/* { dg-do compile } */
/* { dg-options "-O2" } */

unsigned long foo(unsigned long x)
{
  return __builtin_popcountl(x);
}

/* { dg-final { scan-assembler-times "popc.b64" 1 } } */
/* { dg-final { scan-assembler-times "cvt.s64.s32" 1 } } */

/* { dg-do compile } */
/* { dg-options "-O2" } */

unsigned int foo(unsigned long x)
{
  return __builtin_popcountl(x);
}

/* { dg-final { scan-assembler-times "popc.b64" 1 } } */
/* { dg-final { scan-assembler-times "cvt.s64.s32" 0 } } */

/* { dg-do compile } */
/* { dg-options "-O2" } */

int mulhisi3(short x, short y)
{
  return (int)x * (int)y;
}

long mulsidi3(int x, int y)
{
  return (long)x * (long)y;
}

/* { dg-final { scan-assembler-times "mul.wide.s16" 1 } } */
/* { dg-final { scan-assembler-times "mul.wide.s32" 1 } } */

/* { dg-do compile } */
/* { dg-options "-O2" } */

unsigned int umulhisi3(unsigned short x, unsigned short y)
{
  return (unsigned int)x * (unsigned int)y;
}

unsigned long umulsidi3(unsigned int x, unsigned int y)
{
  return (unsigned long)x * (unsigned long)y;
}

/* { dg-final { scan-assembler-times "mul.wide.u16" 1 } } */
/* { dg-final { scan-assembler-times "mul.wide.u32" 1 } } */



Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-01 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
>> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
>>  wrote:
>> >Richard Biener  writes:
>> >> So it seems odd to somehow put in the number of vectors...  so to me
>> >> it would have made sense if it did
>> >>
>> >>   possible_npeel_number = lower_bound (nscalars);
>> >>
>> >> or whateveris necessary to make the polys happy.  Thus simply elide
>> >> the vect_get_num_vectors call?  But it's been very long since I've
>> >> dived into the alignment peeling code...
>> >
>> >Ah, I see what you mean.  So rather than:
>> >
>> >  /* Save info about DR in the hash table.  Also include peeling
>> > amounts according to the explanation above.  */
>> >  for (j = 0; j < possible_npeel_number; j++)
>> >{
>> >  vect_peeling_hash_insert (_htab, loop_vinfo,
>> >dr_info, npeel_tmp);
>> >  npeel_tmp += target_align / dr_size;
>> >}
>> >
>> >just have something like:
>> >
>> >  while (known_le (npeel_tmp, nscalars))
>> >{
>> >  …
>> >}
>> >
>> >?
>> 
>> Yeah.
>
> Not sure if I understand correctly.  I am supposing the following check in 
> the original code is not necessary if we go like that.
>
> 1822   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>
> Is that correct?

I think we still need it.  I guess there are two choices:

- make nscalars default to npeel_tmp before the “if” above.
- put the loop inside the “if” and add a single call to
  vect_peeling_hash_insert as a new “else”.

Thanks,
Richard


Re: [PATCH] arm: Warn if IRQ handler is not compiled with -mgeneral-regs-only [PR target/94743]

2020-07-01 Thread Christophe Lyon via Gcc-patches
On Tue, 30 Jun 2020 at 15:34, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Christophe Lyon 
> > Sent: 30 June 2020 14:32
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] arm: Warn if IRQ handler is not compiled with -
> > mgeneral-regs-only [PR target/94743]
> >
> > On Tue, 30 Jun 2020 at 15:16, Kyrylo Tkachov 
> > wrote:
> > >
> > > Hi Christophe,
> > >
> > > Sorry for the delay.
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  On Behalf Of
> > > > Christophe Lyon via Gcc-patches
> > > > Sent: 29 April 2020 16:19
> > > > To: gcc-patches@gcc.gnu.org
> > > > Subject: [PATCH] arm: Warn if IRQ handler is not compiled with -
> > mgeneral-
> > > > regs-only [PR target/94743]
> > > >
> > > > The interrupt attribute does not guarantee that the FP registers are
> > > > saved, which can result in problems difficult to debug.
> > > >
> > > > Saving the FP registers and status registers can be a large penalty,
> > > > so it's probably not desirable to do that all the time.
> > > >
> > > > If the handler calls other functions, we'd likely need to save all of
> > > > them, for lack of knowledge of which registers they actually use.
> > > >
> > > > This is even more obscure for the end-user when the compiler inserts
> > > > calls to helper functions such as memcpy (some multilibs do use FP
> > > > registers to speed it up).
> > > >
> > > > In the PR, we discussed adding routines in libgcc to save the FP
> > > > context and saving only locally-clobbered FP registers, but I think
> > > > this is too intrusive for stage 4.
> > > >
> > > > In the mean time, emit a warning to suggest re-compiling with
> > > > -mgeneral-regs-only. Note that this can lead to errors if the code
> > > > uses floating-point and -mfloat-abi=hard, eg:
> > > > argument of type 'double' not permitted with -mgeneral-regs-only
> > > >
> > > > This can be troublesome for the user, but at least this would make
> > > > them aware of the latent issue.
> > > >
> > > > The patch adds two testcases:
> > > > - pr94734-1.c checks that a warning is emitted. One function can make
> > > >   implicit calls to runtime floating-point routines, the other one
> > > >   doesn't. We can improve the diagnostic later not to warn in the
> > > >   second case.
> > > >
> > > > - pr94734-2.c checks that no warning is emitted when using
> > > >   -mgeneral-regs-only.
> > > >
> > > > 2020-04-29  Christophe Lyon  
> > > >
> > > >   PR target/94743
> > > >   gcc/
> > > >   * config/arm/arm.c (arm_handle_isr_attribute): Warn if
> > > >   -mgeneral-regs-only is not used.
> > > >
> > > >   gcc/testsuite/
> > > >   * gcc.target/arm/pr94743-1.c: New test.
> > > >   * gcc.target/arm/pr94743-2.c: New test.
> > > > ---
> > > >  gcc/config/arm/arm.c |  5 +
> > > >  gcc/testsuite/gcc.target/arm/pr94743-1.c | 20 
> > > >  gcc/testsuite/gcc.target/arm/pr94743-2.c | 17 +
> > > >  3 files changed, 42 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-2.c
> > > >
> > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > > index 6a6e804..34aad1d 100644
> > > > --- a/gcc/config/arm/arm.c
> > > > +++ b/gcc/config/arm/arm.c
> > > > @@ -7176,6 +7176,11 @@ arm_handle_isr_attribute (tree *node, tree
> > name,
> > > > tree args, int flags,
> > > >  name);
> > > > *no_add_attrs = true;
> > > >   }
> > > > +  else if (TARGET_VFP_BASE)
> > > > + {
> > > > +   warning (OPT_Wattributes, "FP registers might be clobbered
> > > > despite %qE attribute: compile with -mgeneral-regs-only",
> > > > +name);
> > > > + }
> > >
> > > Let's use %< and %> to quote -mgeneral-regs-only properly.
> > > Ok with that change.
> >
> > Hi Kyrill,
> >
> > Thanks for the review, however I have posted v2 here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545747.html
> > (you approved v1)
> >
> > It includes a few more testcases and updates to existing ones.
> >
> > Is v2 OK with the quotation marks fixed?
> >
>
> Oops, sorry. Yes that looks ok too (with the quotation fixed).
> Kyrill
>

Thanks, pushed as r11-1732.
There were two follow-ups: r11-1752 because I forgot to update the
expected warning message in the testcases after I changed the
quotation)
and r11-1759 because I missed that gcc.target/arm/handler-align.c has
to be compiled with -mgeneral-regs-only like other testcases.

Sorry for the noise,

Christophe

> > Thanks
> >
> > Christophe
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >/* FIXME: the argument if any is checked for type attributes;
> > > >should it be checked for decl ones?  */
> > > >  }
> > > > diff --git a/gcc/testsuite/gcc.target/arm/pr94743-1.c
> > > > b/gcc/testsuite/gcc.target/arm/pr94743-1.c
> > > > new file mode 100644
> > > > index 

RE: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-01 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Tuesday, June 30, 2020 10:50 PM
> To: Richard Sandiford 
> Cc: Richard Biener ; Yangfei (Felix)
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182
> 
> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
>  wrote:
> >Richard Biener  writes:
> >> On Tue, 30 Jun 2020, Richard Sandiford wrote:
> >>
> >>> Richard Biener  writes:
> >>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
> >>> >  wrote:
> >>> >>
> >>> >> "Yangfei (Felix)"  writes:
> >>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > index e050db1a2e4..ea39fcac0e0 100644
> >>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > @@ -1,6 +1,7 @@
> >>> >> >  /* { dg-do compile } */
> >>> >> >  /* { dg-additional-options "-O3" } */
> >>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-*
> >x86_64-*-* } } } */
> >>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve
> >-fno-vect-cost-model" { target aarch64*-*-* } } */
> >>> >> >
> >>> >> >  typedef struct {
> >>> >> >  unsigned short mprr_2[5][16][16];
> >>> >>
> >>> >> This test is useful for Advanced SIMD too, so I think we should
> >continue
> >>> >> to test it with whatever options the person running the testsuite
> >chose.
> >>> >> Instead we could duplicate the test in gcc.target/aarch64/sve
> >with
> >>> >> appropriate options.
> >>> >>
> >>> >> > diff --git a/gcc/tree-vect-data-refs.c
> >b/gcc/tree-vect-data-refs.c
> >>> >> > index eb8288e7a85..b30a7d8a3bb 100644
> >>> >> > --- a/gcc/tree-vect-data-refs.c
> >>> >> > +++ b/gcc/tree-vect-data-refs.c
> >>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment
> >(loop_vec_info loop_vinfo)
> >>> >> >   {
> >>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE
> >(stmt_info)
> >>> >> > ? vf * DR_GROUP_SIZE
> >(stmt_info) : vf);
> >>> >> > -   possible_npeel_number
> >>> >> > - = vect_get_num_vectors (nscalars, vectype);
> >>> >> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS
> >(vectype)))
> >>> >> > + possible_npeel_number = 0;
> >>> >> > +   else
> >>> >> > + possible_npeel_number
> >>> >> > +   = vect_get_num_vectors (nscalars, vectype);
> >>> >> >
> >>> >> > /* NPEEL_TMP is 0 when there is no
> >misalignment, but also
> >>> >> >allow peeling NELEMENTS.  */
> >>> >>
> >>> >> OK, so this is coming from:
> >>> >>
> >>> >>   int s[16][2];
> >>> >>   …
> >>> >>   … =s[j][1];
> >>> >>
> >>> >> and an SLP node containing 16 instances of “s[j][1]”.  The
> >DR_GROUP_SIZE
> >>> >> is 2 because that's the inner dimension of “s”.
> >>> >>
> >>> >> I don't think maybe_lt is right here though.  The same problem
> >could in
> >>> >> principle happen for cases in which NSCALARS >
> >TYPE_VECTOR_SUBPARTS,
> >>> >> e.g. for different inner dimensions of “s”.
> >>> >>
> >>> >> I think the testcase shows that using DR_GROUP_SIZE in this
> >calculation
> >>> >> is flawed.  I'm not sure whether we can really do better given
> >the current
> >>> >> representation though.  This is one case where having a separate
> >dr_vec_info
> >>> >> per SLP node would help.
> >>> >
> >>> > I guess what the code likes to know is what we now have in
> >SLP_TREE_LANES
> >>> > (or formerly group_size) but that's not necessarily connected to
> >DR_GROUP_SIZE.
> >>> > Given we only see a stmt_info here and there's no 1:1 mapping of
> >SLP node
> >>> > to stmt_info (and the reverse mapping doesn't even exist) I do not
> >have
> >>> > any good idea either.
> >>> >
> >>> > Honestly I don't really see what this code tries to do ... doesn't
> >it
> >>> > simply want to set possible_npeel_number to
> TYPE_VECTOR_SUBPARTS
> >(vectype)?!
> >>>
> >>> I think it's trying to set possible_npeel_number to the number of
> >>> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
> >>>
> >>>   /* For multiple types, it is possible that the bigger
> >type access
> >>>  will have more than one peeling option.  E.g., a
> >loop with two
> >>>  types: one of size (vector size / 4), and the other
> >one of
> >>>  size (vector size / 8).  Vectorization factor will
> >8.  If both
> >>>  accesses are misaligned by 3, the first one needs
> >one scalar
> >>>  iteration to be aligned, and the second one needs
> >5.  But the
> >>>  first one will be aligned also by peeling 5 scalar
> >>>  iterations, and in that case both accesses will be
> >aligned.
> >>>  Hence, except for the immediate peeling amount, we
> >also want
> >>>  to try to add full 

[PATCH v4 2/2] S/390: Define CODE_LABEL_BOUNDARY

2020-07-01 Thread Ilya Leoshkevich via Gcc-patches
Currently s390 emits the following sequence to store a frame_pc:

a:
.LASANPC0:

lg  %r1,.L5-.L4(%r13)
la  %r1,0(%r1,%r12)
stg %r1,176(%r11)

.L5:
.quad   .LASANPC0@GOTOFF

The reason GOT indirection is used instead of larl is that gcc does not
know that .LASANPC0, being a code label, is aligned on a 2-byte
boundary, and larl can load only even addresses.

Define CODE_LABEL_BOUNDARY in order to get rid of GOT indirection:

larl%r1,.LASANPC0
stg %r1,176(%r11)

gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
requires code labels to be aligned on a 2-byte boundary.

gcc/testsuite/ChangeLog:

2019-06-30  Ilya Leoshkevich  

* gcc.target/s390/asan-no-gotoff.c: New test.
---
 gcc/config/s390/s390.h |  3 +++
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index e4ef63e4080..08ce500ab00 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -343,6 +343,9 @@ extern const char *s390_host_detect_local_cpu (int argc, 
const char **argv);
 /* Allocation boundary (in *bits*) for the code of a function.  */
 #define FUNCTION_BOUNDARY 64
 
+/* Alignment required for a code label, in bits.  */
+#define CODE_LABEL_BOUNDARY 16
+
 /* There is no point aligning anything to a rounder boundary than this.  */
 #define BIGGEST_ALIGNMENT 64
 
diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c 
b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
new file mode 100644
index 000..f555e4e96f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
@@ -0,0 +1,15 @@
+/* Test that ASAN labels are referenced without unnecessary indirections.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fPIE -O2 -fsanitize=kernel-address --param asan-stack=1" } */
+
+extern void c (int *);
+
+void a ()
+{
+  int b;
+  c ();
+}
+
+/* { dg-final { scan-assembler {\tlarl\t%r\d+,\.LASANPC\d+} } } */
+/* { dg-final { scan-assembler-not {\.LASANPC\d+@GOTOFF} } } */
-- 
2.25.4



[PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-07-01 Thread Ilya Leoshkevich via Gcc-patches
gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* asan.c (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
* defaults.h (CODE_LABEL_BOUNDARY): New macro.
* doc/tm.texi: Document CODE_LABEL_BOUNDARY.
* doc/tm.texi.in: Likewise.
---
 gcc/asan.c | 1 +
 gcc/defaults.h | 5 +
 gcc/doc/tm.texi| 4 
 gcc/doc/tm.texi.in | 4 
 4 files changed, 14 insertions(+)

diff --git a/gcc/asan.c b/gcc/asan.c
index 9c9aa4cae35..cc06afb9ddc 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1524,6 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/defaults.h b/gcc/defaults.h
index f1a38626624..e5a9139bbbe 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1473,4 +1473,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 typedef TARGET_UNIT target_unit;
 #endif
 
+/* Alignment required for a code label, in bits.  */
+#ifndef CODE_LABEL_BOUNDARY
+#define CODE_LABEL_BOUNDARY BITS_PER_UNIT
+#endif
+
 #endif  /* ! GCC_DEFAULTS_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 6e7d9dc54a9..16e48ce59d8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1011,6 +1011,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3be984bbd5c..12f8c05f5dd 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -965,6 +965,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
-- 
2.25.4



[PATCH v4 0/2] S/390: Improve storing asan frame_pc

2020-07-01 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html

Andreas suggested that I should split the patch in two, so that common
code and S/390-specific parts can be reviewed independently.

Patch 1: Add the ability to specify CODE_LABEL alignment.
Patch 2: Specify CODE_LABEL alignment for S/390.

Ilya Leoshkevich (2):
  asan: specify alignment for LASANPC labels
  S/390: Define CODE_LABEL_BOUNDARY

 gcc/asan.c |  1 +
 gcc/config/s390/s390.h |  3 +++
 gcc/defaults.h |  5 +
 gcc/doc/tm.texi|  4 
 gcc/doc/tm.texi.in |  4 
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 6 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

-- 
2.25.4



Re: [patch] Extend store merging to STRING_CST

2020-07-01 Thread Richard Biener via Gcc-patches
On Wed, Jul 1, 2020 at 12:32 PM Eric Botcazou  wrote:
>
> > + && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
> > + && tree_int_cst_equal
> > +(TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
> > +   {
> >
> > I guess we miss a BITS_PER_UNIT == 8 check here?
>
> OK, added.
>
> > + gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> > + gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> > + if (gimple_vdef (new_stmt)
> > + && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> > +   SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> >
> > you can use gimple_move_vops (new_stmt, stmt) for this.
>
> Indeed, changed.
>
> > I wonder if there are objects beside STRING_CSTs that could have their
> > sizes fixed via inlining, thus, whether you can omit the == STRING_CST
> > check?  That is, I see this change independent of the store-merging
> > optimization.
>
> Will that not interfere with the subsequent cases though?  For STRING_CSTs,
> this is not the case since they are not var_decl_component_p.  I can replace
> STRING_CST with !var_decl_component_p but I'm not sure what we will gain.

Hmm, that's a good question - so would your patch be replaceable by
simply amending var_decl_component_p by

  (var_decl_component_p (TREE_OPERAND (src, 0))
   || TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST)

?

> > Otherwise the memcpy folding part looks OK to me, I skimmed the
> > store-merging change and didn't find anything suspicious but I wonder
> > whether not special-casing STRING_CSTs would actually simplify
> > the code?
>
> This would mean implementing the 1) though and, in particular, rejecting or
> splitting long strings, which is precisely what we do not want to do in Ada.

Oh, indeed, yeah - so the separate handling looks better.  I thought
of cases like merging "x\0" with a subsequent short integer but those may
be already handled since they are power-of-two sized.

Richard.

> --
> Eric Botcazou


Re: [PATCH] gcov: shorted one option help message

2020-07-01 Thread Martin Liška

On 7/1/20 1:37 PM, Andreas Schwab wrote:

On Jul 01 2020, Martin Liška wrote:


+  fnotice (file, "  -j, --json-format   Output JSON intermediate 
format \n\


Please avoid the trailing space.


Thanks for heads up, fixed.

Martin



Andreas.





Re: [PATCH] gcov: shorted one option help message

2020-07-01 Thread Andreas Schwab
On Jul 01 2020, Martin Liška wrote:

> +  fnotice (file, "  -j, --json-format   Output JSON intermediate 
> format \n\

Please avoid the trailing space.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] tree-optimization/95839 - teach SLP vectorization about vector inputs

2020-07-01 Thread Richard Biener
On Fri, 26 Jun 2020, Richard Biener wrote:

> (sorry for the duplicate, forgot to copy the list)
> 
> This teaches SLP analysis about vector typed externals that are
> fed into the SLP operations via lane extracting BIT_FIELD_REFs.
> It shows that there's currently no good representation for
> vector code on the SLP side so I went a half way and represent
> such vector externals uses always using a SLP permutation node
> with a single external SLP child which has a non-standard
> representation of no scalar defs but only a vector def.  That
> works best for shielding the rest of the vectorizer from it.
> 
> I'm not sure it's actually worth the trouble and what real-world
> cases benefit from this.  In theory vectorized unrolled code  
>   
> interfacing with scalar code might be one case but there
> we necessarily go through memory and there's no intermediate  
>   
> pass transforming that to registers [to make BB vectorization
> cheaper]. 
>   
> 
> It's also not even close to ready for re-vectorizing vectorized   
>   
> code with a larger VF.
>   
>   
> Any opinions?   

I have now installed this.

Richard.

> Bootstrapped / tested on x86_64-unknown-linux-gnu.
>   
>   
> Thanks, 
> Richard. 
> 
> 2020-06-26  Richard Biener  
> 
>   PR tree-optimization/95839
>   * tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
>   vectors are not uniform.
>   (vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
>   vector registers.
>   (vect_build_slp_tree_2): For groups of lane extracts
>   from a vector register generate a permute node
>   with a special child representing the pre-existing vector.
>   (vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
>   (vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
>   (vectorizable_slp_permutation): Do not generate or cost identity
>   permutes.
>   (vect_schedule_slp_instance): Handle pre-existing vector
>   that are function arguments.
> 
>   * gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c |  20 
>  gcc/tree-vect-slp.c  | 119 ---
>  2 files changed, 124 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> new file mode 100644
> index 000..49e75d8c95c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-additional-options "-w -Wno-psabi" } */
> +
> +typedef double __attribute__((vector_size(16))) v2df;
> +
> +v2df f(v2df a, v2df b)
> +{
> +  return (v2df){a[0] + b[0], a[1] + b[1]};
> +}
> +
> +v2df g(v2df a, v2df b)
> +{
> +  return (v2df){a[0] + b[1], a[1] + b[0]};
> +}
> +
> +/* Verify we manage to vectorize this with using the original vectors
> +   and do not end up with any vector CTORs.  */
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-not "vect_cst" "slp2" } } */
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b223956e3af..83ec382ee0d 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -247,6 +247,10 @@ vect_slp_tree_uniform_p (slp_tree node)
>gcc_assert (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> || SLP_TREE_DEF_TYPE (node) == vect_external_def);
>  
> +  /* Pre-exsting vectors.  */
> +  if (SLP_TREE_SCALAR_OPS (node).is_empty ())
> +return false;
> +
>unsigned i;
>tree op, first = NULL_TREE;
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> @@ -838,7 +842,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>else
>   {
> rhs_code = gimple_assign_rhs_code (stmt);
> -   load_p = TREE_CODE_CLASS (rhs_code) == tcc_reference;
> +   load_p = gimple_vuse (stmt);
>   }
>  
>/* Check the operation.  */
> @@ -899,6 +903,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>need_same_oprnds = true;
>first_op1 = gimple_assign_rhs2 (stmt);
>  }
> +   else if (!load_p
> +&& rhs_code == BIT_FIELD_REF)
> + {
> +   tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> +   if (TREE_CODE (vec) != SSA_NAME
> +   || !types_compatible_p (vectype, TREE_TYPE (vec)))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc 

[PATCH] move ILS include to system.h

2020-07-01 Thread Richard Biener


This is something I had lying around since late stage4, re-checked
and pushed now.

Richard.


This moves ISL system header includes to system.h.

* system.h (INCLUDE_ISL): New guarded include.
* graphite-dependences.c: Use it.
* graphite-isl-ast-to-gimple.c: Likewise.
* graphite-optimize-isl.c: Likewise.
* graphite-poly.c: Likewise.
* graphite-scop-detection.c: Likewise.
* graphite-sese-to-poly.c: Likewise.
* graphite.c: Likewise.
* graphite.h: Drop the includes here.
---
 gcc/graphite-dependences.c   |  2 +-
 gcc/graphite-isl-ast-to-gimple.c |  2 +-
 gcc/graphite-optimize-isl.c  |  2 +-
 gcc/graphite-poly.c  |  2 +-
 gcc/graphite-scop-detection.c|  2 +-
 gcc/graphite-sese-to-poly.c  | 11 +--
 gcc/graphite.c   |  2 +-
 gcc/graphite.h   | 16 
 gcc/system.h | 23 +--
 9 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 7078c949800..b8480217e6d 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index ef93fda2233..81f7b48887c 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 0ca3c4fab2c..f9043f0aeed 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 42acffc0c5f..de32ab648ef 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 75f81227f8a..31f837c11b1 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c42415e0554..afce6f0eb70 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
@@ -46,15 +46,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "domwalk.h"
 #include "tree-ssa-propagate.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
 /* Return an isl identifier for the polyhedral basic block PBB.  */
diff --git a/gcc/graphite.c b/gcc/graphite.c
index 27f1e486e1f..1c702d09500 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
The wiki page http://gcc.gnu.org/wiki/Graphite contains pointers to
the related work.  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 #include "system.h"
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 3fe1345cf96..8c06b81e32d 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -23,22 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_GRAPHITE_POLY_H
 
 #include "sese.h"
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 
 typedef struct poly_dr *poly_dr_p;
 
diff --git a/gcc/system.h b/gcc/system.h
index 544f7ba427f..5f740e3b82b 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -736,6 +736,27 @@ extern int vsnprintf (char *, size_t, const char *, 
va_list);
 #endif
 #endif
 
+#ifdef INCLUDE_ISL
+#ifdef HAVE_isl
+#include 
+#include 
+#include 
+#include 

[PATCH] gcov: shorted one option help message

2020-07-01 Thread Martin Liška

Installed to master.

gcc/ChangeLog:

* gcov.c (print_usage): Shorted option description for -j
option.
---
 gcc/gcov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index c60f5112d2c..99c52f6a318 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -898,7 +898,8 @@ print_usage (int error_p)
   fnotice (file, "  -d, --display-progress  Display progress 
information\n");
   fnotice (file, "  -f, --function-summariesOutput summaries for each 
function\n");
   fnotice (file, "  -h, --help  Print this help, then 
exit\n");
-  fnotice (file, "  -j, --json-format   Output JSON intermediate format 
into .gcov.json.gz file\n");
+  fnotice (file, "  -j, --json-format   Output JSON intermediate 
format \n\
+into .gcov.json.gz file\n");
   fnotice (file, "  -H, --human-readableOutput human readable 
numbers\n");
   fnotice (file, "  -k, --use-colorsEmit colored output\n");
   fnotice (file, "  -l, --long-file-names   Use long output file names 
for included\n\
--
2.27.0



Re: [PATCH] gcov: rename 2 options.

2020-07-01 Thread Martin Liška

On 7/1/20 9:49 AM, Richard Biener wrote:

Please keep the old options as aliases, documented as obsoleted in --help.


Good idea! I've done that and made an alias for -i -> -j option.
I'm going to push it to master.

Martin
>From f0564852b5acbc2fcde604308e61109f22815ac1 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 30 Jun 2020 15:48:03 +0200
Subject: [PATCH] gcov: rename 2 options.

gcc/ChangeLog:

	* doc/gcov.texi: Rename 2 options.
	* gcov.c (print_usage): Rename -i,--json-format to
	-j,--json-format and -j,--human-readable to -H,--human-readable.
	(process_args): Fix up parsing.  Document obsolete options and
	how are they changed.

gcc/testsuite/ChangeLog:

	* g++.dg/gcov/loop.C: Use -H option instead of -j option.
---
 gcc/doc/gcov.texi|  8 
 gcc/gcov.c   | 16 ++--
 gcc/testsuite/g++.dg/gcov/loop.C |  2 +-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 22e42da2ea6..00f0cdc45f9 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -124,8 +124,8 @@ gcov [@option{-v}|@option{--version}] [@option{-h}|@option{--help}]
  [@option{-c}|@option{--branch-counts}]
  [@option{-d}|@option{--display-progress}]
  [@option{-f}|@option{--function-summaries}]
- [@option{-i}|@option{--json-format}]
- [@option{-j}|@option{--human-readable}]
+ [@option{-j}|@option{--json-format}]
+ [@option{-H}|@option{--human-readable}]
  [@option{-k}|@option{--use-colors}]
  [@option{-l}|@option{--long-file-names}]
  [@option{-m}|@option{--demangled-names}]
@@ -180,7 +180,7 @@ Output summaries for each function in addition to the file level summary.
 Display help about using @command{gcov} (on the standard output), and
 exit without doing any further processing.
 
-@item -i
+@item -j
 @itemx --json-format
 Output gcov file in an easy-to-parse JSON intermediate format
 which does not require source code for generation.  The JSON
@@ -339,7 +339,7 @@ Fields of the @var{branch} element have following semantics:
 @var{throw}: true when the branch is an exceptional branch
 @end itemize
 
-@item -j
+@item -H
 @itemx --human-readable
 Write counts in human readable format (like 24.6k).
 
diff --git a/gcc/gcov.c b/gcc/gcov.c
index ef93758b26f..c60f5112d2c 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -898,8 +898,8 @@ print_usage (int error_p)
   fnotice (file, "  -d, --display-progress  Display progress information\n");
   fnotice (file, "  -f, --function-summariesOutput summaries for each function\n");
   fnotice (file, "  -h, --help  Print this help, then exit\n");
-  fnotice (file, "  -i, --json-format   Output JSON intermediate format into .gcov.json.gz file\n");
-  fnotice (file, "  -j, --human-readableOutput human readable numbers\n");
+  fnotice (file, "  -j, --json-format   Output JSON intermediate format into .gcov.json.gz file\n");
+  fnotice (file, "  -H, --human-readableOutput human readable numbers\n");
   fnotice (file, "  -k, --use-colorsEmit colored output\n");
   fnotice (file, "  -l, --long-file-names   Use long output file names for included\n\
 source files\n");
@@ -915,6 +915,9 @@ print_usage (int error_p)
   fnotice (file, "  -v, --version   Print version number, then exit\n");
   fnotice (file, "  -w, --verbose   Print verbose informations\n");
   fnotice (file, "  -x, --hash-filenamesHash long pathnames\n");
+  fnotice (file, "\nObsolete options:\n");
+  fnotice (file, "  -i, --json-format   Replaced with -j, --json-format\n");
+  fnotice (file, "  -j, --human-readableReplaced with -H, --human-readable\n");
   fnotice (file, "\nFor bug reporting instructions, please see:\n%s.\n",
 	   bug_report_url);
   exit (status);
@@ -942,8 +945,8 @@ static const struct option options[] =
   { "all-blocks",   no_argument,   NULL, 'a' },
   { "branch-probabilities", no_argument,   NULL, 'b' },
   { "branch-counts",no_argument,   NULL, 'c' },
-  { "json-format",	no_argument,   NULL, 'i' },
-  { "human-readable",	no_argument,   NULL, 'j' },
+  { "json-format",	no_argument,   NULL, 'j' },
+  { "human-readable",	no_argument,   NULL, 'H' },
   { "no-output",no_argument,   NULL, 'n' },
   { "long-file-names",  no_argument,   NULL, 'l' },
   { "function-summaries",   no_argument,   NULL, 'f' },
@@ -969,7 +972,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdfhijklmno:pqrs:tuvwx";
+  const char *opts = "abcdfhHijklmno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
@@ -992,7 +995,7 @@ process_args (int argc, char **argv)
 	case 'l':
 	  flag_long_names = 1;
 	  break;
-	case 'j':

Re: [patch] Extend store merging to STRING_CST

2020-07-01 Thread Eric Botcazou
> + && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
> + && tree_int_cst_equal
> +(TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
> +   {
> 
> I guess we miss a BITS_PER_UNIT == 8 check here?

OK, added.

> + gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> + gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> + if (gimple_vdef (new_stmt)
> + && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> +   SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> 
> you can use gimple_move_vops (new_stmt, stmt) for this.

Indeed, changed.

> I wonder if there are objects beside STRING_CSTs that could have their
> sizes fixed via inlining, thus, whether you can omit the == STRING_CST
> check?  That is, I see this change independent of the store-merging
> optimization.

Will that not interfere with the subsequent cases though?  For STRING_CSTs, 
this is not the case since they are not var_decl_component_p.  I can replace 
STRING_CST with !var_decl_component_p but I'm not sure what we will gain.

> Otherwise the memcpy folding part looks OK to me, I skimmed the
> store-merging change and didn't find anything suspicious but I wonder
> whether not special-casing STRING_CSTs would actually simplify
> the code?

This would mean implementing the 1) though and, in particular, rejecting or 
splitting long strings, which is precisely what we do not want to do in Ada.

-- 
Eric Botcazou


RE: [PATCH] Fix unnecessary register spill that depends on function ordering

2020-07-01 Thread Omar Tahir
Hi Richard,

From: Richard Sandiford 
> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) 
> > { #ifdef INSN_SCHEDULING
> > +  first_moveable_pseudo = last_moveable_pseudo;
> >if (flag_selective_scheduling
> >&& ! maybe_skip_selective_scheduling ())
> >  run_selective_scheduling ();
> 
> I think instead we should zero out both variables at the end of IRA.
> There are other places besides the scheduler that call into the IRA code, so 
> tackling the problem there seems more general.

If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then
they'll be zero for the second scheduler pass, which uses them.

In fact, I've just realised that the GCSE and move_loop_invariants passes
also use them (they both call ira_set_pseudo_classes which calls
find_costs_and_classes which uses these variables).

They need to be zeroed or set equal to each other before the first pass that
uses them (move_loop_invariants), but kept alive until after the last pass
that uses them (GCSE). So if there's a function that sets things up right
before the RTL passes start then I think that's a good location candidate.

> > +/* We have plenty of spare registers, so check nothing has been 
> > +spilled. */
> > +/* { dg-final { scan-assembler-not "str" } } */
> 
> The testcase looks good, but it's probably better to make that “\tstr\t”.
> The problem with plain “str” is that assembly output can often include 
> pathnames and version strings, and it's possible that one of those could 
> contain “str”.

Good catch, I'll keep that tip in mind for future!

Thanks,
Omar


Re: [PATCH] Split load permutation

2020-07-01 Thread Richard Biener
On Tue, 30 Jun 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> >> Another thing we'd like for SVE in general is to allow vectors *with*
> >> gaps throughout the SLP graph, since with predication it's simple to
> >> ignore any inactive element where necessary.  This is often much cheaper
> >> than packing the active elements together to remove gaps.  For example:
> >> 
> >>   a[0] += 1;
> >>   a[2] += 1;
> >>   a[3] += 1;
> >> 
> >> should just be a predicated load, add and store, with no shuffling.
> >> 
> >> Even on targets without predication, it might be better to leave
> >> the gaps in for part of the graph, e.g. if two loads feed a single
> >> permute.  So it would be good if the node representation allowed
> >> arbitrary permutations, including with “dead” elements at any
> >> position.  (I realise that isn't news and that keeping the gap
> >> removal with the load was just “for now” :-))
> >
> > Hmm, OK.  So I'm still experimenting with how to incrementally push
> > these kind of changes to master.  Unfortunately it resists any
> > "serious" change because we've painted us into a corner with respect
> > to load and store handling ;)  Well, it just means the change will
> > need to be bigger than anticipated.
> >
> > As for inactive lanes, for SVE this means a mask register for each
> > operation, correct?
> 
> Certainly for potentially trapping ops and reductions, yeah.
> For others it really doesn't matter (and those wouldn't require
> a target that supports predication).
> 
> So I don't think having gaps necessarily means we have a mask.
> Being able to attach masks to nodes (perhaps independently of gaps)
> would be useful though.
> 
> > At the moment the SLP graph is a representation
> > of the scalar GIMPLE IL and we cannot really change that until we
> > commit to a vectorization factor and more.  So an inactive lane
> > is simply not there and a "load" is simply another way of building
> > up a vector from scalar stmt results much like those "built from scalars"
> > SLP nodes but we of course make them special because we have those
> > DR groups that are used.
> >
> > So with the patch as posted the "gaps" are represented in the
> > load permutation of the "loads" which is where you could create
> > mask registers from and simply push them to SLP parents (up to
> > the point you get to a parent whose childs have differing masks...).
> 
> OK.  But I'd argue that's true of permutations too.  At the point
> that the SLP graph just represents scalar gimple IL, we can simply
> permute SLP_TREE_SCALAR_STMTS and not have any explicit permute
> operations in the graph.

That's true when the only permutes are in leafs.  The 
SLP_TREE_SCALAR_STMTS impose an order of lanes so once
different parents need a different order we need explicit permute
nodes swapping SLP_TREE_SCALAR_STMTS.  Of course that's only
because we have combined multiple scalar stmts into a single
SLP node which really is ...

> Permutations and gaps only come into play once we add more
> vector-related information or restrictions.

... what those "vector-related" restrictions are right now.

> > I think we're going to have a set of post-processing steps on the
> > initial SLP graph for both "optimizing" where (and if) to do permutations
> > and whether to elide gap removal in favor of masks (and we'd then
> > annotate the SLP nodes with the active mask).
> 
> So we'd start out without any permutations and gaps, and then add
> them as post-processing step based on what the target can handle?
> If so, sounds good to me FWIW.

If you start with the notion on loads and stores being
compositions/extracts then yes - based on how we want to
vectorize those operations we'd expose any required permutation
explicitely so we can then combine & optimize those.

For the existing patches I tried to relate loads involving the
same data-ref group by introducing a shared SLP node accessing the
whole group, so that's not starting with no permute and it comes
with some issues.  If we don't relate loads from the same dataref
group early (during SLP discovery, that is), then we have to
try (again based on cost) doing that during that post-processing
which then becomes more complicated.

As said - this is work in progress and I'm experimenting in
multiple directions ...

> > All of this requires pushing back some of the early decisions
> > (I've mostly identified vectorization factor determining) but also
> > do load/store classification early.  For example if we end up
> > with strided loads or stores such operations can fuse with any
> > permutation at no cost.
> >
> > At the moment I'm continuing to poke the code for its least
> > resistance for introducing at least parts of the machinery.
> > I'm targeting post-processing for merging of identical
> > permutes across operations like it appears for
> >
> >   a[0] = b[1] + c[1];
> >   a[1] = b[0] + c[0];
> >
> >> I guess this to some extent feeds into a long-standing TODO to allow
> >> “don't 

Re: [PATCH v2] aarch64: Add 64 bits fpcr fpsr getter/setter builtins

2020-07-01 Thread Richard Sandiford
Andrea Corallo  writes:
> +/* Initialize all builtins in the AARCH64_BUILTIN_GENERAL group.  */
> +
> +void
> +aarch64_general_init_builtins (void)
> +{
> +

Excess blank line here.

> +  aarch64_init_fpsr_fpcr_builtins ();
> +
>aarch64_init_fp16_types ();
>  
>aarch64_init_bf16_types ();
> @@ -1876,6 +1913,14 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, 
> rtx target)
>return target;
>  }
>  
> +static void
> +aarch64_expand_fpsr_fpcr_setter (int unspec, machine_mode mode, tree exp)

Should have a function comment here, even though the function is tiny.

> +{
> +  tree arg = CALL_EXPR_ARG (exp, 0);
> +  rtx op = force_reg (mode, expand_normal (arg));
> +  emit_insn (gen_aarch64_set (unspec, mode, op));
> +}
> +
>  /* Expand an expression EXP that calls built-in function FCODE,
> with result going to TARGET if that's convenient.  IGNORE is true
> if the result of the builtin is ignored.  */
> […]
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index deca0004fedc..75f9e9e97e8b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7230,37 +7230,21 @@
>[(set_attr "length" "12")
> (set_attr "type" "multiple")])
>  
> -;; Write Floating-point Control Register.
> -(define_insn "set_fpcr"
> -  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
> UNSPECV_SET_FPCR)]
> +;; Write into Floating-point Status Register.

“Status or Control”.  IMO reads better with “the” added.

> +(define_insn "@aarch64_set_"
> +  [(unspec_volatile [(match_operand:GPI 0 "register_operand" "r")] 
> SET_FPSCR)]
>""
> -  "msr\\tfpcr, %0"
> +  "msr\\t, %0"
>[(set_attr "type" "mrs")])
>  
>  ;; Read Floating-point Control Register.

Same here.

> -(define_insn "get_fpcr"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -(unspec_volatile:SI [(const_int 0)] UNSPECV_GET_FPCR))]
> -  ""
> -  "mrs\\t%0, fpcr"
> -  [(set_attr "type" "mrs")])
> -
> -;; Write Floating-point Status Register.
> -(define_insn "set_fpsr"
> -  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
> UNSPECV_SET_FPSR)]
> -  ""
> -  "msr\\tfpsr, %0"
> -  [(set_attr "type" "mrs")])
> -
> -;; Read Floating-point Status Register.
> -(define_insn "get_fpsr"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -(unspec_volatile:SI [(const_int 0)] UNSPECV_GET_FPSR))]
> +(define_insn "@aarch64_get_"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +(unspec_volatile:GPI [(const_int 0)] GET_FPSCR))]
>""
> -  "mrs\\t%0, fpsr"
> +  "mrs\\t%0, "
>[(set_attr "type" "mrs")])
>  
> -
>  ;; Define the subtract-one-and-jump insns so loop.c
>  ;; knows what to generate.
>  (define_expand "doloop_end"
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index a568cf21b99d..9a5191689634 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -3453,3 +3453,17 @@
>  
>  (define_int_attr unspec [(UNSPEC_WHILERW "UNSPEC_WHILERW")
>(UNSPEC_WHILEWR "UNSPEC_WHILEWR")])
> +
> +;; Iterators and attributes for fpcr fpsr getter setters
> +
> +(define_int_iterator GET_FPSCR
> +  [UNSPECV_GET_FPSR UNSPECV_GET_FPCR])
> +
> +(define_int_iterator SET_FPSCR
> +  [UNSPECV_SET_FPSR UNSPECV_SET_FPCR])
> +
> +(define_int_attr fpscr_name
> +  [(UNSPECV_GET_FPSR "fpsr")
> +   (UNSPECV_SET_FPSR "fpsr")
> +   (UNSPECV_GET_FPCR "fpcr")
> +   (UNSPECV_SET_FPCR "fpcr")])
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 10dc32e6d2d4..29a6635ad134 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -13880,6 +13880,12 @@ unsigned int __builtin_aarch64_get_fpcr ()
>  void __builtin_aarch64_set_fpcr (unsigned int)
>  unsigned int __builtin_aarch64_get_fpsr ()
>  void __builtin_aarch64_set_fpsr (unsigned int)
> +
> +unsigned long long __builtin_aarch64_get_fpcr64 ()
> +void __builtin_aarch64_set_fpcr64 (unsigned long long)
> +unsigned long long __builtin_aarch64_get_fpsr64 ()
> +void __builtin_aarch64_set_fpsr64 (unsigned long long)
> +

Excess blank line.

>  @end smallexample
>  
>  @node Alpha Built-in Functions
> diff --git a/gcc/testsuite/gcc.target/aarch64/get_fpcr64.c 
> b/gcc/testsuite/gcc.target/aarch64/get_fpcr64.c

Very minor, but it's probably more future-proof to add _1 to the filenames
of the tests.

> new file mode 100644
> index ..9fed91fe2053
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/get_fpcr64.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +long long unsigned
> +get_fpcr64 ()
> +{
> +  return __builtin_aarch64_get_fpcr64 ();
> +}
> +
> +/* { dg-final { scan-assembler-times "mrs.*fpcr" 1 } } */

Probably safer as: {\tmrs\t[^\n]*fpcr}

(“.” matches newlines in this context.)

Although since the test is running at -O2, we should be able to rely on
x0 being chosen, so I think we should just use: {\tmrs\tx0, fpcr\n}

Same idea for the other 

[PATCH v2] aarch64: Add 64 bits fpcr fpsr getter/setter builtins

2020-07-01 Thread Andrea Corallo
Hi all,

Second version of the patch addressing comments.

This introduces the following 64bit builtins variants as FPCR and FPSR
registers getter/setter:

unsigned long long __builtin_aarch64_get_fpcr64 ()
void __builtin_aarch64_set_fpcr64 (unsigned long long)
unsigned long long __builtin_aarch64_get_fpsr64 ()
void __builtin_aarch64_set_fpsr64 (unsigned long long)

aarch64-unknown-linux-gnu bootstrapped and regresssioned.

Regards
  Andrea

gcc/ChangeLog

2020-??-??  Andrea Corallo  

* config/aarch64/aarch64-builtins.c (aarch64_builtins): Add enums
for 64bits fpsr/fpcr getter setters builtin variants.
(aarch64_init_fpsr_fpcr_builtins): New function.
(aarch64_general_init_builtins): Modify to make use of the later.
(aarch64_expand_fpsr_fpcr_setter): New function.
(aarch64_general_expand_builtin): Modify to make use of the later.
* config/aarch64/aarch64.md (@aarch64_set_)
(@aarch64_get_): New patterns replacing and
generalizing 'get_fpcr', 'set_fpsr'.
* config/aarch64/iterators.md (GET_FPSCR, SET_FPSCR): New int
iterators.
(fpscr_name): New int attribute.
* doc/extend.texi (__builtin_aarch64_get_fpcr64)
(__builtin_aarch64_set_fpcr64, __builtin_aarch64_get_fpsr64)
(__builtin_aarch64_set_fpsr64): Add into AArch64 Built-in
Functions.

gcc/testsuite/ChangeLog

2020-??-??  Andrea Corallo  

* gcc.target/aarch64/get_fpcr64.c: New test.
* gcc.target/aarch64/set_fpcr64.c: New test.
* gcc.target/aarch64/get_fpsr64.c: New test.
* gcc.target/aarch64/set_fpsr64.c: New test.

>From 96a285d0a80cbf3e1e06d7c5a6d4d2e37dd28c65 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 28 May 2020 08:49:42 +0100
Subject: [PATCH] aarch64: Add 64 bit setter getter fpsr fpcr

gcc/ChangeLog

	* config/aarch64/aarch64-builtins.c (aarch64_builtins): Add enums
	for 64bits fpsr/fpcr getter setters builtin variants.
	(aarch64_init_fpsr_fpcr_builtins): New function.
	(aarch64_general_init_builtins): Modify to make use of the later.
	(aarch64_expand_fpsr_fpcr_setter): New function.
	(aarch64_general_expand_builtin): Modify to make use of the later.
	* config/aarch64/aarch64.md (@aarch64_set_)
	(@aarch64_get_): New patterns replacing and
	generalizing 'get_fpcr', 'set_fpsr'.
	* config/aarch64/iterators.md (GET_FPSCR, SET_FPSCR): New int
	iterators.
	(fpscr_name): New int attribute.
	* doc/extend.texi (__builtin_aarch64_get_fpcr64)
	(__builtin_aarch64_set_fpcr64, __builtin_aarch64_get_fpsr64)
	(__builtin_aarch64_set_fpsr64): Add into AArch64 Built-in
	Functions.

gcc/testsuite/ChangeLog

2020-05-28  Andrea Corallo  

	* gcc.target/aarch64/get_fpcr64.c: New test.
	* gcc.target/aarch64/set_fpcr64.c: New test.
	* gcc.target/aarch64/get_fpsr64.c: New test.
	* gcc.target/aarch64/set_fpsr64.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.c | 103 +-
 gcc/config/aarch64/aarch64.md |  32 ++
 gcc/config/aarch64/iterators.md   |  14 +++
 gcc/doc/extend.texi   |   6 +
 gcc/testsuite/gcc.target/aarch64/get_fpcr64.c |  10 ++
 gcc/testsuite/gcc.target/aarch64/get_fpsr64.c |  10 ++
 gcc/testsuite/gcc.target/aarch64/set_fpcr64.c |  10 ++
 gcc/testsuite/gcc.target/aarch64/set_fpsr64.c |  10 ++
 8 files changed, 142 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/get_fpcr64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/get_fpsr64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/set_fpcr64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/set_fpsr64.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 95213cd70c84..2990026cfb73 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -448,6 +448,11 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_GET_FPCR64,
+  AARCH64_BUILTIN_SET_FPCR64,
+  AARCH64_BUILTIN_GET_FPSR64,
+  AARCH64_BUILTIN_SET_FPSR64,
+
   AARCH64_BUILTIN_RSQRT_DF,
   AARCH64_BUILTIN_RSQRT_SF,
   AARCH64_BUILTIN_RSQRT_V2DF,
@@ -1245,33 +1250,65 @@ aarch64_init_memtag_builtins (void)
 #undef AARCH64_INIT_MEMTAG_BUILTINS_DECL
 }
 
-/* Initialize all builtins in the AARCH64_BUILTIN_GENERAL group.  */
+/* Initialize fpsr fpcr getters and setters.  */
 
-void
-aarch64_general_init_builtins (void)
+static void
+aarch64_init_fpsr_fpcr_builtins (void)
 {
-  tree ftype_set_fpr
+  tree ftype_set
 = build_function_type_list (void_type_node, unsigned_type_node, NULL);
-  tree ftype_get_fpr
+  tree ftype_get
 = build_function_type_list (unsigned_type_node, NULL);
 
   aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR]
 = aarch64_general_add_builtin ("__builtin_aarch64_get_fpcr",
-   ftype_get_fpr,
+   ftype_get,
    AARCH64_BUILTIN_GET_FPCR);
   aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR]
 

Re: [PATCH] gcov: rename 2 options.

2020-07-01 Thread Richard Biener via Gcc-patches
On Wed, Jul 1, 2020 at 9:12 AM Martin Liška  wrote:
>
> Hey.
>
> Even thought an option renaming is a problematic change, I still believe
> the option names were selected poorly and this patch is attempt to improve it.
>
> Thoughts?

Please keep the old options as aliases, documented as obsoleted in --help.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * doc/gcov.texi: Rename 2 options.
> * gcov.c (print_usage): Rename -i,--json-format to
> -j,--json-format and -j,--human-readable to -H,--human-readable.
> (process_args): Fix up parsing.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/gcov/loop.C: Use -H option instead of -j option.
> ---
>   gcc/doc/gcov.texi|  8 
>   gcc/gcov.c   | 14 +++---
>   gcc/testsuite/g++.dg/gcov/loop.C |  2 +-
>   3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
> index 22e42da2ea6..00f0cdc45f9 100644
> --- a/gcc/doc/gcov.texi
> +++ b/gcc/doc/gcov.texi
> @@ -124,8 +124,8 @@ gcov [@option{-v}|@option{--version}] 
> [@option{-h}|@option{--help}]
>[@option{-c}|@option{--branch-counts}]
>[@option{-d}|@option{--display-progress}]
>[@option{-f}|@option{--function-summaries}]
> - [@option{-i}|@option{--json-format}]
> - [@option{-j}|@option{--human-readable}]
> + [@option{-j}|@option{--json-format}]
> + [@option{-H}|@option{--human-readable}]
>[@option{-k}|@option{--use-colors}]
>[@option{-l}|@option{--long-file-names}]
>[@option{-m}|@option{--demangled-names}]
> @@ -180,7 +180,7 @@ Output summaries for each function in addition to the 
> file level summary.
>   Display help about using @command{gcov} (on the standard output), and
>   exit without doing any further processing.
>
> -@item -i
> +@item -j
>   @itemx --json-format
>   Output gcov file in an easy-to-parse JSON intermediate format
>   which does not require source code for generation.  The JSON
> @@ -339,7 +339,7 @@ Fields of the @var{branch} element have following 
> semantics:
>   @var{throw}: true when the branch is an exceptional branch
>   @end itemize
>
> -@item -j
> +@item -H
>   @itemx --human-readable
>   Write counts in human readable format (like 24.6k).
>
> diff --git a/gcc/gcov.c b/gcc/gcov.c
> index ef93758b26f..21c42c72be1 100644
> --- a/gcc/gcov.c
> +++ b/gcc/gcov.c
> @@ -898,8 +898,8 @@ print_usage (int error_p)
> fnotice (file, "  -d, --display-progress  Display progress 
> information\n");
> fnotice (file, "  -f, --function-summariesOutput summaries for 
> each function\n");
> fnotice (file, "  -h, --help  Print this help, then 
> exit\n");
> -  fnotice (file, "  -i, --json-format   Output JSON intermediate 
> format into .gcov.json.gz file\n");
> -  fnotice (file, "  -j, --human-readableOutput human readable 
> numbers\n");
> +  fnotice (file, "  -j, --json-format   Output JSON intermediate 
> format into .gcov.json.gz file\n");
> +  fnotice (file, "  -H, --human-readableOutput human readable 
> numbers\n");
> fnotice (file, "  -k, --use-colorsEmit colored output\n");
> fnotice (file, "  -l, --long-file-names   Use long output file 
> names for included\n\
>   source files\n");
> @@ -942,8 +942,8 @@ static const struct option options[] =
> { "all-blocks",   no_argument,   NULL, 'a' },
> { "branch-probabilities", no_argument,   NULL, 'b' },
> { "branch-counts",no_argument,   NULL, 'c' },
> -  { "json-format", no_argument,   NULL, 'i' },
> -  { "human-readable",  no_argument,   NULL, 'j' },
> +  { "json-format", no_argument,   NULL, 'j' },
> +  { "human-readable",  no_argument,   NULL, 'H' },
> { "no-output",no_argument,   NULL, 'n' },
> { "long-file-names",  no_argument,   NULL, 'l' },
> { "function-summaries",   no_argument,   NULL, 'f' },
> @@ -969,7 +969,7 @@ process_args (int argc, char **argv)
>   {
> int opt;
>
> -  const char *opts = "abcdfhijklmno:pqrs:tuvwx";
> +  const char *opts = "abcdfhHjklmno:pqrs:tuvwx";
> while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
>   {
> switch (opt)
> @@ -992,7 +992,7 @@ process_args (int argc, char **argv)
> case 'l':
>   flag_long_names = 1;
>   break;
> -   case 'j':
> +   case 'H':
>   flag_human_readable_numbers = 1;
>   break;
> case 'k':
> @@ -1023,7 +1023,7 @@ process_args (int argc, char **argv)
> case 'u':
>   flag_unconditional = 1;
>   break;
> -   case 'i':
> +   case 'j':
>   flag_json_format = 1;
>   flag_gcov_file = 1;
>   break;
> diff --git a/gcc/testsuite/g++.dg/gcov/loop.C 
> 

Re: [patch] Extend store merging to STRING_CST

2020-07-01 Thread Richard Biener via Gcc-patches
On Wed, Jul 1, 2020 at 8:50 AM Eric Botcazou  wrote:
>
> Hi,
>
> the GIMPLE store merging pass doesn't merge STRING_CSTs in the general case,
> although they are accepted by native_encode_expr; the reason is that the pass
> only works with integral modes, i.e. with chunks whose size is a power of two.
>
> There are two possible ways of extending it to handle STRING_CSTs: 1) lift the
> condition of integral modes and treat STRING_CSTs as other _CST nodes but with
> arbitrary size; 2) implement a specific and separate handling for STRING_CSTs.
>
> The attached patch implements 2) for the following reasons: on the one hand,
> even in Ada where character strings are first-class citizens, cases where
> merging STRING_CSTs with other *_CST nodes would be possible are quite rare in
> practice; on the other hand, string concatenations happen more naturally and
> frequently thanks to the "&" operator, giving rise to merging opportunities.
>
> These opportunites generally occur after inlining, when a strng argument
> passed in a subprogram call is a STRING_CST and is concatenated with other
> STRING_CSTs in the called subprogram.  In this case, the concatenation is
> originally expanded by means of a VLA and calls to BUILT_IN_MEMCPY on the
> array, and becomes of fixed length after inlining; in order to avoid having to
> deal with the calls to BUILT_IN_MEMCPY in the GIMPLE store merging pass, the
> patch also adds an optimization to gimple_fold_builtin_memory_op that turns
> calls to BUILT_IN_MEMCPY of STRING_CSTs into direct assignment of STRING_CSTs.
> Unfortunately, doing this in the general case discombobulates the C-oriented
> string-related GIMPLE passes of the optimizer, so it is done only for the
> calls to BUILT_IN_MEMCPY that have been generated during gimplification for
> variable-sized assignments, which are thus marked with a new flag.
>
> Tested on x86-64/Linux, OK for the mainline?

+ && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+ && tree_int_cst_equal
+(TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
+   {

I guess we miss a BITS_PER_UNIT == 8 check here?

+ gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+ gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+ if (gimple_vdef (new_stmt)
+ && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
+   SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;

you can use gimple_move_vops (new_stmt, stmt) for this.

I wonder if there are objects beside STRING_CSTs that could have their
sizes fixed via inlining, thus, whether you can omit the == STRING_CST
check?  That is, I see this change independent of the store-merging
optimization.

Otherwise the memcpy folding part looks OK to me, I skimmed the
store-merging change and didn't find anything suspicious but I wonder
whether not special-casing STRING_CSTs would actually simplify
the code?

Thanks,
Richard.


>
> 2020-07-01  Eric Botcazou  
>
> * gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
> initially created for the assignment of a variable-sized object.
> * gimple-ssa-store-merging.c (struct merged_store_group): Document
> STRING_CST for rhs_code field.
> Add string_concatenation boolean field.
> (merged_store_group::merged_store_group): Initialize it as well as
> bit_insertion here.
> (merged_store_group::do_merge): Set it upon seeing a STRING_CST.  Also
> set bit_insertion here upon seeing a BIT_INSERT_EXPR.
> (merged_store_group::apply_stores): Clear it for small regions.  Do 
> not
> create a power-of-2-sized buffer if it is still true.  And do not set
> bit_insertion here again.
> (encode_tree_to_bitpos): Deal with BLKmode for the expression.
> (merged_store_group::can_be_merged_into): Deal with STRING_CST.
> (imm_store_chain_info::coalesce_immediate_stores): Set bit_insertion
> to true after changing MEM_REF stores into BIT_INSERT_EXPR stores.
> (count_multiple_uses): Return 0 for STRING_CST.
> (split_group): Do not split the group for a string concatenation.
> (imm_store_chain_info::output_merged_store): Constify and rename some
> local variables.  Build an array type as destination type for a string
> concatenation, as well as a zero mask, and call build_string to build
> the source.
> (lhs_valid_for_store_merging_p): Return true for VIEW_CONVERT_EXPR.
> (pass_store_merging::process_store): Accept STRING_CST on the RHS.
> * gimple.h (gimple_call_alloca_for_var_p): New accessor function.
> * gimplify.c (gimplify_modify_expr_to_memcpy): Set alloca_for_var.
> * tree.h (CALL_ALLOCA_FOR_VAR_P): Document it for BUILT_IN_MEMCPY.
>
>
> 2020-07-01  Eric Botcazou  
>
> * gnat.dg/opt87.adb: New test.
> * gnat.dg/opt87_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou


Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-01 Thread Martin Liška

On 6/30/20 3:24 PM, Nathan Sidwell wrote:

On 6/30/20 8:53 AM, Martin Liška wrote:

Hey.

The bug reported confirmed that using the direct merging provides a solid
speed and so I don't insist on this patch. Using a generic compression algorithm
would be a better solution anyway.


Thanks for the reminder.  I won't insist on the generic compression scheme, 
because it's more work.  And you've got this one now.


Yes, I've got a working patch that brings a solid compression. However, it 
complicates streaming (in and out) and makes
work with the format more complicated. That's main reason why I don't like the 
patch much.


While we do (or I tried at least) insisting the gcov format was an internal 
implementation detail, that does tend to get ignored.  It seems possible we 
could add zlib (or other) compression later in a compatible manner.


Yes, if there's a real need for a solid compression, then I would use a generic 
compressor of a final streamed file.

Martin



nathan





[PATCH v2] RISC-V: Handle multi-letter extension for multilib-generator

2020-07-01 Thread Kito Cheng
 - The order of multi-lib config could be wrong if multi-ltter are
   used, e.g. `./multilib-generator rv32izfh-ilp32--c`, would expect
   rv32ic_zfh/ilp32 reuse rv32i_zfh/ilp32, however the multi-ltter is not
   handled correctly, it will generate reuse rule for rv32izfhc/ilp32
   which is invalid arch configuration.

 - Remove re-use rule gen for g/imafd, because we canonicalize the -march at
   gcc driver too, so we don't need handle 'g' for multilib now.

gcc/ChangeLog:

* config/riscv/multilib-generator (arch_canonicalize): Handle
multi-letter extension.
Using underline as separator between different extensions.
---
 gcc/config/riscv/multilib-generator | 30 +
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index b9194e6d3cc1..8f4df183db21 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -39,12 +39,12 @@ reuse = []
 canonical_order = "mafdgqlcbjtpvn"
 
 def arch_canonicalize(arch):
-  # TODO: Support Z, S, H, or X extensions.
   # TODO: Support implied extensions, e.g. D implied F in latest spec.
   # TODO: Support extension version.
   new_arch = ""
   if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
-new_arch = arch[:5]
+# TODO: We should expand g to imadzifencei once we support newer spec.
+new_arch = arch[:5].replace("g", "imafd")
   else:
 raise Exception("Unexpected arch: `%s`" % arch[:5])
 
@@ -56,30 +56,44 @@ def arch_canonicalize(arch):
   long_ext_prefixes_idx = list(filter(lambda x: x != -1, 
long_ext_prefixes_idx))
   if long_ext_prefixes_idx:
 first_long_ext_idx = min(long_ext_prefixes_idx)
-long_exts = arch[first_long_ext_idx:]
+long_exts = arch[first_long_ext_idx:].split("_")
 std_exts = arch[5:first_long_ext_idx]
   else:
-long_exts = ""
+long_exts = []
 std_exts = arch[5:]
 
+  # Single letter extension might appear in the long_exts list,
+  # becasue we just append extensions list to the arch string.
+  std_exts += "".join(filter(lambda x:len(x) == 1, long_exts))
+
+  # Multi-letter extension must be in lexicographic order.
+  long_exts = sorted(filter(lambda x:len(x) != 1, long_exts))
+
   # Put extensions in canonical order.
   for ext in canonical_order:
 if ext in std_exts:
   new_arch += ext
 
+  # Check every extension is processed.
+  for ext in std_exts:
+if ext == '_':
+  continue
+if ext not in canonical_order:
+  raise Exception("Unsupported extension `%s`" % ext)
+
   # Concat rest of the multi-char extensions.
-  new_arch += long_exts
+  if long_exts:
+new_arch += "_" + "_".join(long_exts)
   return new_arch
 
 for cfg in sys.argv[1:]:
   (arch, abi, extra, ext) = cfg.split('-')
+  arch = arch_canonicalize (arch)
   arches[arch] = 1
   abis[abi] = 1
   extra = list(filter(None, extra.split(',')))
   ext = list(filter(None, ext.split(',')))
-  alts = sum([[x] + [x + y for y in ext] for x in [arch] + extra], [])
-  # TODO: We should expand g to imadzifencei once we support newer spec.
-  alts = alts + [x.replace('imafd', 'g') for x in alts if 'imafd' in x]
+  alts = sum([[x] + [x + "_" + y for y in ext] for x in [arch] + extra], [])
   alts = list(map(arch_canonicalize, alts))
   for alt in alts[1:]:
 arches[alt] = 1
-- 
2.27.0



Re: [PATCH v3] Add -fuse-ld= to specify an arbitrary executable as the linker

2020-07-01 Thread Martin Liška

On 6/30/20 5:32 PM, Fāng-ruì Sòng wrote:

There is some concern about clang's -fuse-ld=path
http://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html and use
of COMPILER_PATH vs PATH.
Shall we introduce another option like -fld-path=path (PATH is used,
COMPILER_PATH is not used)?


I would recommend first landing a patch to LLVM and then we can do
a corresponding change to GCC.

Martin



On Tue, Jun 30, 2020 at 6:04 AM Martin Liška  wrote:


PING^1

On 5/29/20 3:10 AM, Fangrui Song wrote:

On 2020-05-25, Martin Liška wrote:

On 5/22/20 6:42 AM, Fangrui Song wrote:

but I can't fix this one because joining two lines will break the 80-column 
rule.


What about this:

diff --git a/gcc/collect2.c b/gcc/collect2.c
index cc57a20e08b..e5b54b080f7 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -1138,8 +1138,8 @@ main (int argc, char **argv)
   /* Search the ordinary system bin directories
  for `ld' (if native linking) or `TARGET-ld' (if cross).  */
   if (ld_file_name == 0)
-ld_file_name =
-  find_a_file (, full_ld_suffixes[selected_linker], X_OK);
+ld_file_name
+  = find_a_file (, full_ld_suffixes[selected_linker], X_OK);
 }
#ifdef REAL_NM_FILE_NAME

Apart from that, the patch is fine.

Martin


Adding people who may be able to approve and commit on my behalf.

This formatting issue seems small enough. Hopefully a maintainer can do
it for me.









[PATCH] gcov: rename 2 options.

2020-07-01 Thread Martin Liška

Hey.

Even thought an option renaming is a problematic change, I still believe
the option names were selected poorly and this patch is attempt to improve it.

Thoughts?
Thanks,
Martin

gcc/ChangeLog:

* doc/gcov.texi: Rename 2 options.
* gcov.c (print_usage): Rename -i,--json-format to
-j,--json-format and -j,--human-readable to -H,--human-readable.
(process_args): Fix up parsing.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/loop.C: Use -H option instead of -j option.
---
 gcc/doc/gcov.texi|  8 
 gcc/gcov.c   | 14 +++---
 gcc/testsuite/g++.dg/gcov/loop.C |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 22e42da2ea6..00f0cdc45f9 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -124,8 +124,8 @@ gcov [@option{-v}|@option{--version}] 
[@option{-h}|@option{--help}]
  [@option{-c}|@option{--branch-counts}]
  [@option{-d}|@option{--display-progress}]
  [@option{-f}|@option{--function-summaries}]
- [@option{-i}|@option{--json-format}]
- [@option{-j}|@option{--human-readable}]
+ [@option{-j}|@option{--json-format}]
+ [@option{-H}|@option{--human-readable}]
  [@option{-k}|@option{--use-colors}]
  [@option{-l}|@option{--long-file-names}]
  [@option{-m}|@option{--demangled-names}]
@@ -180,7 +180,7 @@ Output summaries for each function in addition to the file 
level summary.
 Display help about using @command{gcov} (on the standard output), and
 exit without doing any further processing.
 
-@item -i

+@item -j
 @itemx --json-format
 Output gcov file in an easy-to-parse JSON intermediate format
 which does not require source code for generation.  The JSON
@@ -339,7 +339,7 @@ Fields of the @var{branch} element have following semantics:
 @var{throw}: true when the branch is an exceptional branch
 @end itemize
 
-@item -j

+@item -H
 @itemx --human-readable
 Write counts in human readable format (like 24.6k).
 
diff --git a/gcc/gcov.c b/gcc/gcov.c

index ef93758b26f..21c42c72be1 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -898,8 +898,8 @@ print_usage (int error_p)
   fnotice (file, "  -d, --display-progress  Display progress 
information\n");
   fnotice (file, "  -f, --function-summariesOutput summaries for each 
function\n");
   fnotice (file, "  -h, --help  Print this help, then 
exit\n");
-  fnotice (file, "  -i, --json-format   Output JSON intermediate format 
into .gcov.json.gz file\n");
-  fnotice (file, "  -j, --human-readableOutput human readable 
numbers\n");
+  fnotice (file, "  -j, --json-format   Output JSON intermediate format 
into .gcov.json.gz file\n");
+  fnotice (file, "  -H, --human-readableOutput human readable 
numbers\n");
   fnotice (file, "  -k, --use-colorsEmit colored output\n");
   fnotice (file, "  -l, --long-file-names   Use long output file names 
for included\n\
 source files\n");
@@ -942,8 +942,8 @@ static const struct option options[] =
   { "all-blocks",   no_argument,   NULL, 'a' },
   { "branch-probabilities", no_argument,   NULL, 'b' },
   { "branch-counts",no_argument,   NULL, 'c' },
-  { "json-format",   no_argument,   NULL, 'i' },
-  { "human-readable",no_argument,   NULL, 'j' },
+  { "json-format",   no_argument,   NULL, 'j' },
+  { "human-readable",no_argument,   NULL, 'H' },
   { "no-output",no_argument,   NULL, 'n' },
   { "long-file-names",  no_argument,   NULL, 'l' },
   { "function-summaries",   no_argument,   NULL, 'f' },
@@ -969,7 +969,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdfhijklmno:pqrs:tuvwx";

+  const char *opts = "abcdfhHjklmno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
@@ -992,7 +992,7 @@ process_args (int argc, char **argv)
case 'l':
  flag_long_names = 1;
  break;
-   case 'j':
+   case 'H':
  flag_human_readable_numbers = 1;
  break;
case 'k':
@@ -1023,7 +1023,7 @@ process_args (int argc, char **argv)
case 'u':
  flag_unconditional = 1;
  break;
-   case 'i':
+   case 'j':
  flag_json_format = 1;
  flag_gcov_file = 1;
  break;
diff --git a/gcc/testsuite/g++.dg/gcov/loop.C b/gcc/testsuite/g++.dg/gcov/loop.C
index 24f580634d9..e63cb92e6e6 100644
--- a/gcc/testsuite/g++.dg/gcov/loop.C
+++ b/gcc/testsuite/g++.dg/gcov/loop.C
@@ -24,4 +24,4 @@ int main(int argc, char **argv)
   return 0;  /* count(1) */
 }
 
-/* { dg-final { run-gcov branches { -abj loop.C } } } */

+/* { dg-final { run-gcov branches { -abH loop.C } } } */
--
2.27.0



[PATCH] Fortran : ICE in gfc_find_array_ref(): No ref found PR95981

2020-07-01 Thread Mark Eggleston

Please find attached a patch to fix PR95981.  Original patch by Steve Kargl.

OK to commit and backport?

Commit message:

Fortran  :  ICE in gfc_find_array_ref(): No ref found PR95981

When looking for an array reference allow NULL references.  If
no array reference is found dim_rank_check should return false.

2020-07-01  Steven G. Kargl  

gcc/fortran/

    PR fortran/95981
    * check.c (dim_rank_check): Allow NULL references in call to
    gfc_find_array_ref and return false if no reference is found.

2020-07-01  Mark Eggleston 

gcc/testsuite/

    PR fortran/95981
    * gfortran.dg/pr95981.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From 137ae7db6e494eca7ddf4e8a526b0343683b461b Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Tue, 30 Jun 2020 10:15:05 +0100
Subject: [PATCH] Fortran  :  ICE in gfc_find_array_ref(): No ref found PR95981

When looking for an array reference allow NULL references.  If
no array reference is found dim_rank_check should return false.

2020-07-01  Steven G. Kargl  

gcc/fortran/

	PR fortran/95981
	* check.c (dim_rank_check): Allow NULL references in call to
	gfc_find_array_ref and return false if no reference is found.

2020-07-01  Mark Eggleston  

gcc/testsuite/

	PR fortran/95981
	* gfortran.dg/pr95981.f90: New test.
---
 gcc/fortran/check.c   | 4 +++-
 gcc/testsuite/gfortran.dg/pr95981.f90 | 9 +
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr95981.f90

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index de9a45fe4f9..9e792a847bc 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1142,7 +1142,9 @@ dim_rank_check (gfc_expr *dim, gfc_expr *array, int allow_assumed)
 
   if (array->expr_type == EXPR_VARIABLE)
 {
-  ar = gfc_find_array_ref (array);
+  ar = gfc_find_array_ref (array, true);
+  if (!ar)
+	return false;
   if (ar->as->type == AS_ASSUMED_SIZE
 	  && !allow_assumed
 	  && ar->type != AR_ELEMENT
diff --git a/gcc/testsuite/gfortran.dg/pr95981.f90 b/gcc/testsuite/gfortran.dg/pr95981.f90
new file mode 100644
index 000..7da6e9bd3dd
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95981.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+
+program p
+   type t
+   end type
+   class(t) :: x(:)! { dg-error "must be dummy, allocatable or pointer" }
+   type(t) :: y(size(x,1)) ! { dg-error "must be constant of INTEGER type" }
+end
+
-- 
2.11.0



[patch] Extend store merging to STRING_CST

2020-07-01 Thread Eric Botcazou
Hi,

the GIMPLE store merging pass doesn't merge STRING_CSTs in the general case, 
although they are accepted by native_encode_expr; the reason is that the pass 
only works with integral modes, i.e. with chunks whose size is a power of two.

There are two possible ways of extending it to handle STRING_CSTs: 1) lift the 
condition of integral modes and treat STRING_CSTs as other _CST nodes but with 
arbitrary size; 2) implement a specific and separate handling for STRING_CSTs.

The attached patch implements 2) for the following reasons: on the one hand, 
even in Ada where character strings are first-class citizens, cases where 
merging STRING_CSTs with other *_CST nodes would be possible are quite rare in 
practice; on the other hand, string concatenations happen more naturally and 
frequently thanks to the "&" operator, giving rise to merging opportunities.

These opportunites generally occur after inlining, when a strng argument 
passed in a subprogram call is a STRING_CST and is concatenated with other 
STRING_CSTs in the called subprogram.  In this case, the concatenation is 
originally expanded by means of a VLA and calls to BUILT_IN_MEMCPY on the 
array, and becomes of fixed length after inlining; in order to avoid having to 
deal with the calls to BUILT_IN_MEMCPY in the GIMPLE store merging pass, the 
patch also adds an optimization to gimple_fold_builtin_memory_op that turns 
calls to BUILT_IN_MEMCPY of STRING_CSTs into direct assignment of STRING_CSTs.
Unfortunately, doing this in the general case discombobulates the C-oriented 
string-related GIMPLE passes of the optimizer, so it is done only for the 
calls to BUILT_IN_MEMCPY that have been generated during gimplification for 
variable-sized assignments, which are thus marked with a new flag.

Tested on x86-64/Linux, OK for the mainline?


2020-07-01  Eric Botcazou  

* gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
initially created for the assignment of a variable-sized object.
* gimple-ssa-store-merging.c (struct merged_store_group): Document
STRING_CST for rhs_code field.
Add string_concatenation boolean field.
(merged_store_group::merged_store_group): Initialize it as well as
bit_insertion here.
(merged_store_group::do_merge): Set it upon seeing a STRING_CST.  Also
set bit_insertion here upon seeing a BIT_INSERT_EXPR.
(merged_store_group::apply_stores): Clear it for small regions.  Do not
create a power-of-2-sized buffer if it is still true.  And do not set
bit_insertion here again.
(encode_tree_to_bitpos): Deal with BLKmode for the expression.
(merged_store_group::can_be_merged_into): Deal with STRING_CST.
(imm_store_chain_info::coalesce_immediate_stores): Set bit_insertion
to true after changing MEM_REF stores into BIT_INSERT_EXPR stores.
(count_multiple_uses): Return 0 for STRING_CST.
(split_group): Do not split the group for a string concatenation.
(imm_store_chain_info::output_merged_store): Constify and rename some
local variables.  Build an array type as destination type for a string
concatenation, as well as a zero mask, and call build_string to build
the source.
(lhs_valid_for_store_merging_p): Return true for VIEW_CONVERT_EXPR.
(pass_store_merging::process_store): Accept STRING_CST on the RHS.
* gimple.h (gimple_call_alloca_for_var_p): New accessor function.
* gimplify.c (gimplify_modify_expr_to_memcpy): Set alloca_for_var.
* tree.h (CALL_ALLOCA_FOR_VAR_P): Document it for BUILT_IN_MEMCPY.


2020-07-01  Eric Botcazou  

* gnat.dg/opt87.adb: New test.
* gnat.dg/opt87_pkg.ad[sb]: New helper.

-- 
Eric Botcazoudiff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..9a467e80aae 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -840,6 +840,35 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	}
 	}
 
+  /* If this was a variable-sized copy operation that has been turned
+	 into a fixed-size string store, typically after inlining, then do
+	 the fixed-size store explicitly.  We might be able to concatenate
+	 several of them later into a single string store.  */
+  if (tree_fits_uhwi_p (len)
+	  && gimple_call_alloca_for_var_p (stmt)
+	  && TREE_CODE (src) == ADDR_EXPR
+	  && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+	  && tree_int_cst_equal
+	 (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
+	{
+	  tree new_src = TREE_OPERAND (src, 0);
+	  tree new_dest
+	= fold_build2 (MEM_REF, TREE_TYPE (new_src), dest, off0);
+	  gimple *new_stmt = gimple_build_assign (new_dest, new_src);
+	  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+	  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+	  if (gimple_vdef (new_stmt)
+	  && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
+	SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = 

[PATCH] Fortran : ICE in gfc_check_pointer_assign PR95612

2020-07-01 Thread Mark Eggleston
Please find attached a patch which is a fix for PR95612.  The original 
patch is by Steve Kargl.


OK to commit and backport?

Commit message:

Fortran  : ICE in gfc_check_pointer_assign PR95612

Output an error if the right hand value is a zero sized array or
does not have a symbol tree otherwise continue checking.

2020-07-01  Steven G. Kargl  

gcc/fortran/

    PR fortran/95612
    * expr.c (gfc_check_pointer_assigb): Output an error if
    rvalue is a zero sized array or output an error if rvalue
    doesn't have a symbol tree.

2020-07-01  Mark Eggleston 

gcc/testsuite/

    PR fortran/95612
    * gfortran.dg/pr95612.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From a9b7de8b2002605e1bfc9c11a7fb3708bc1ef371 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Thu, 11 Jun 2020 11:05:40 +0100
Subject: [PATCH] Fortran  : ICE in gfc_check_pointer_assign PR95612

Output an error if the right hand value is a zero sized array or
does not have a symbol tree otherwise continue checking.

2020-07-01  Steven G. Kargl  

gcc/fortran/

	PR fortran/95612
	* expr.c (gfc_check_pointer_assigb): Output an error if
	rvalue is a zero sized array or output an error if rvalue
	doesn't have a symbol tree.

2020-07-01  Mark Eggleston  

gcc/testsuite/

	PR fortran/95612
	* gfortran.dg/pr95612.f90: New test.
---
 gcc/fortran/expr.c| 15 ++-
 gcc/testsuite/gfortran.dg/pr95612.f90 |  7 +++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr95612.f90

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 8daa7bb8d06..569f4d9bf06 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4271,7 +4271,20 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
   gfc_symbol *sym;
   bool target;
 
-  gcc_assert (rvalue->symtree);
+  if (gfc_is_size_zero_array (rvalue))
+	{
+	  gfc_error ("Zero-sized array detected at %L where an entity with "
+		 "the TARGET attribute is expected", >where);
+	  return false;
+	}
+  else if (!rvalue->symtree)
+	{
+	  gfc_error ("Pointer assignment target in initialization expression "
+		 "does not have the TARGET attribute at %L",
+		 >where);
+	  return false;
+	}
+
   sym = rvalue->symtree->n.sym;
 
   if (sym->ts.type == BT_CLASS && sym->attr.class_ok)
diff --git a/gcc/testsuite/gfortran.dg/pr95612.f90 b/gcc/testsuite/gfortran.dg/pr95612.f90
new file mode 100644
index 000..b3cac8c1d81
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95612.f90
@@ -0,0 +1,7 @@
+! { dg-do compile }
+
+program p
+   integer, pointer :: y(:) => shape(1)   ! { dg-error "Zero-sized array detected at .1. where an entity with the TARGET attribute is expected" }
+   integer, pointer :: z(:) => shape([1]) ! { dg-error "Pointer assignment target in initialization expression does not have the TARGET attribute at .1." }
+end
+
-- 
2.11.0