Re: [PATCH] cache compute_objsize results in strlen/sprintf (PR 97373)

2020-11-04 Thread Richard Biener via Gcc-patches
On Thu, Nov 5, 2020 at 1:59 AM Martin Sebor via Gcc-patches
 wrote:
>
> To determine the target of a pointer expression and the offset into
> it, the increasingly widely used compute_objsize function traverses
> the IL following the DEF statements of pointer variables, aggregating
> offsets from POINTER_PLUS assignments along the way.  It does that
> for many statements that involve pointers, including calls to
> built-in functions and (so far only) accesses to char arrays.  When
> a function has many such statements with pointers to the same objects
> but with different offsets, the traversal ends up visiting the same
> pointer assignments repeatedly and unnecessarily.
>
> To avoid this repeated traversal, the attached patch adds the ability
> to cache results obtained in prior calls to the function.  The cache
> is optional and only used when enabled.
>
> To exercise the cache I have enabled it for the strlen pass (which
> is probably the heaviest compute_objsize user).  That happens to
> resolve PR 97373 which tracks the pass' failure to detect sprintf
> overflowing allocated buffers at a non-constant offset.  I thought
> about making this a separate patch but the sprintf/strlen changes
> are completely mechanical so it didn't seem worth the effort.
>
> In the benchmarking I've done the cache isn't a huge win there but
> it does have a measurable difference in the project I'm wrapping up
> where most pointer assignments need to be examined.  The space used
> for the cache is negligible on average: fewer than 20 entries per
> Glibc function and about 6 for GCC.  The worst case in Glibc is
> 6 thousand entries and 10k in GCC.  Since the entries are sizable
> (216 bytes each) the worst case memory consumption can be reduced
> by adding a level of indirection.  A further savings can be obtained
> by replacing some of the offset_int members of the entries with
> HOST_WIDE_INT.
>
> The efficiency benefits of the cache should increase further as more
> of the access checking code is integrated into the same pass.  This
> should eventually include the checking currently done in the built-in
> expanders.
>
> Tested on x86_64-linux, along with Glibc and Binutils/GDB.

I'm quite sure the objsz pass already has a cache, why not
re-use it instead of piggy-backing another one onto its machinery?

Richard.

> Martin
>
> PS The patch add the new pointer_query class (loosely modeled on
> range_query) to builtins.{h,c}.  This should be only temporary,
> until the access checking code is moved into a file (and ultimately
> a pass) of its own.


Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers

2020-11-04 Thread Andreas Krebbel via Gcc-patches
On 04.11.20 23:19, Ilya Leoshkevich wrote:
> On Wed, 2020-11-04 at 18:28 +0100, Andreas Krebbel wrote:
>> These tests all use the -mzvector option but do not appear to make
>> use of the z vector languages
>> extensions. I think that option could be removed. Then these tests
>> should be moved to the vector subdir.
> 
> Will change, thanks!
> 
>> You could do the asm scanning also in dg-do run tests.
> 
> This doesn't seem to work.  For example, if I add 
> 
> /* { dg-final { scan-assembler-times {aaa} 999 } } */
> 
> to long-double-from-double-run.c, it won't fail.

You will have to add --save-temps to dg-options to make it work. Otherwise the 
scan test will stay
unresolved.

Andreas

> 
>>
>> Andreas
>>
>>
>> On 03.11.20 22:46, Ilya Leoshkevich wrote:
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2020-11-03  Ilya Leoshkevich  
>>>
>>> * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-caller-abi-run.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-copysign-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-copysign-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-fprx2-constant.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-double-run.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-double-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-float-run.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-float-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-i16-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-i16-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-i32-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-i32-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-i64-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-i64-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-i8-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-u16-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-u16-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-u32-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-u32-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-u64-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-u64-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-from-u8-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-double-run.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-to-double-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-to-float-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-float-scan.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-to-i16-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i32-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i64-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i8-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u16-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u32-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u64-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u8-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-vec-duplicate.c: New
>>> test.
>>> * gcc.target/s390/zvector/long-double-wf.h: New test.
>>> * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfaxb.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test.
>>> * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test.
>>> * 

Re: [PATCH] middle-end: Store and use the SLP instance kind when aborting load/store lanes

2020-11-04 Thread Richard Biener
On Thu, 5 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch stores the SLP instance kind in the SLP instance so that we can use
> it later when detecting load/store lanes support.
> 
> This also changes the load/store lane support check to only check if the SLP
> kind is a store.  This means that in order for the load/lanes to work all
> instances must be of kind store.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.c (vect_analyze_loop_2): Check kind.
>   * tree-vect-slp.c (vect_build_slp_instance): New.
>   (enum slp_instance_kind): Move to...
>   * tree-vectorizer.h (enum slp_instance_kind): .. Here
>   (SLP_INSTANCE_KIND): New.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

2020-11-04 Thread Andreas Krebbel via Gcc-patches
On 04.11.20 23:12, Ilya Leoshkevich wrote:
> On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
>> On 03.11.20 22:45, Ilya Leoshkevich wrote:
>>> On z14+, there are instructions for working with 128-bit floats
>>> (long
>>> doubles) in vector registers.  It's beneficial to use them instead
>>> of
>>> instructions that operate on floating point register pairs, because
>>> it
>>> allows to store 4 times more data in registers at a time,
>>> relieveing
>>> register pressure.  The performance of new instructions is almost
>>> the
>>> same.
>>>
>>> Implement by storing TFmode values in vector registers on
>>> z14+.  Since
>>> not all operations are available with the new instructions, keep
>>> the old
>>> ones using the new FPRX2 mode, and convert between it and TFmode
>>> when
>>> necessary (this is called "forwarder" expanders below).  Change the
>>> existing TFmode expanders to call either new- or old-style ones
>>> depending on whether we are on z14+ or older machines ("dispatcher"
>>> expanders).
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-11-03  Ilya Leoshkevich  
>>>
>>> * config/s390/s390-modes.def (FPRX2): New mode.
>>> * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
>>> * config/s390/s390.c (s390_fma_allowed_p): Likewise.
>>> (s390_build_signbit_mask): Support 128-bit masks.
>>> (print_operand): Support printing the second word of a TFmode
>>> operand as vector register.
>>> (constant_modes): Add FPRX2mode.
>>> (s390_class_max_nregs): Return 1 for TFmode on z14+.
>>> (s390_is_fpr128): New function.
>>> (s390_is_vr128): Likewise.
>>> (s390_can_change_mode_class): Use s390_is_fpr128 and
>>> s390_is_vr128 in order to determine whether mode refers to a
>>> FPR
>>> pair or to a VR.
>>> * config/s390/s390.h (EXPAND_MOVTF): New macro.
>>> (EXPAND_TF): Likewise.
>>> * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
>>> alias.
>>> (ALL): Add FPRX2.
>>> (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
>>> (FP): Likewise.
>>> (FP_ANYTF): New mode iterator.
>>> (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
>>> (TD_TF): Likewise.
>>> (xde): Add FPRX2.
>>> (nBFP): Likewise.
>>> (nDFP): Likewise.
>>> (DSF): Likewise.
>>> (DFDI): Likewise.
>>> (SFSI): Likewise.
>>> (DF): Likewise.
>>> (SF): Likewise.
>>> (fT0): Likewise.
>>> (bt): Likewise.
>>> (_d): Likewise.
>>> (HALF_TMODE): Likewise.
>>> (tf_fpr): New mode_attr.
>>> (type): New mode_attr.
>>> (*cmp_ccz_0): Use type instead of mode with fsimp.
>>> (*cmp_ccs_0_fastmath): Likewise.
>>> (*cmptf_ccs): New pattern for wfcxb.
>>> (*cmptf_ccsfps): New pattern for wfkxb.
>>> (mov): Rename to mov.
>>> (signbit2): Rename to signbit2.
>>> (isinf2): Renamed to isinf2.
>>> (*TDC_insn_): Use type instead of mode with fsimp.
>>> (fixuns_trunc2): Rename to
>>> fixuns_trunc2.
>>> (fix_trunctf2): Rename to fix_trunctf2_fpr.
>>> (floatdi2): Rename to floatdi2, use type
>>> instead of mode with itof.
>>> (floatsi2): Rename to floatsi2, use type
>>> instead of mode with itof.
>>> (*floatuns2): Use type instead of mode for
>>> itof.
>>> (floatuns2): Rename to
>>> floatuns2.
>>> (trunctf2): Rename to trunctf2_fpr, use type
>>> instead
>>> of mode with fsimp.
>>> (extend2): Rename to
>>> extend2.
>>> (2): Rename to
>>> 2, use type instead of
>>> mode with fsimp.
>>> (rint2): Rename to rint2, use
>>> type instead of mode with fsimp.
>>> (2): Use type instead of mode for
>>> fsimp.
>>> (rint2): Likewise.
>>> (trunc2): Rename to
>>> trunc2.
>>> (trunc2): Rename to
>>> trunc2.
>>> (extend2): Rename to
>>> extend2.
>>> (extend2): Rename to
>>> extend2.
>>> (add3): Rename to add3, use type instead of
>>> mode with fsimp.
>>> (*add3_cc): Use type instead of mode with fsimp.
>>> (*add3_cconly): Likewise.
>>> (sub3): Rename to sub3, use type instead of
>>> mode with fsimp.
>>> (*sub3_cc): Use type instead of mode with fsimp.
>>> (*sub3_cconly): Likewise.
>>> (mul3): Rename to mul3, use type instead of
>>> mode with fsimp.
>>> (fma4): Restrict using s390_fma_allowed_p.
>>> (fms4): Restrict using s390_fma_allowed_p.
>>> (div3): Rename to div3, use type instead of
>>> mode with fdiv.
>>> (neg2): Rename to neg2.
>>> (*neg2_cc): Use type instead of mode with fsimp.
>>> (*neg2_cconly): Likewise.
>>> (*neg2_nocc): Likewise.
>>> (*neg2): Likeiwse.
>>> (abs2): Rename to abs2, use type instead of
>>> mode with fdiv.
>>> (*abs2_cc): Use type instead of mode with fsimp.
>>> (*abs2_cconly): Likewise.
>>> (*abs2_nocc): Likewise.
>>> (*abs2): Likewise.
>>> (*negabs2_cc): Likewise.
>>> (*negabs2_cconly): Likewise.
>>> (*negabs2_nocc): Likewise.
>>> (*negabs2): 

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Boris Kolpackov
Nathan Sidwell  writes:

> Here is the implementation of C++20 modules that I have been developing 
> on the devel/c++-modules branch over the last few years.

Congrats on reaching this point.


> It is not a complete implementation.  The major missing pieces are:
> 
> [...]

Building C++20 modules requires non-trivial integration between the
compiler and the build system. This patch set introduces a module
mapper, a novel mechanism for such integration. Has it been tried
by any non-toy build system and on any real project?

If the answer is "no", then by shipping modules in GCC 11 are we
making any likely changes in this area impossible or unnecessarily
difficult?

To give an example of such a likely change, currently the mapper
has a notion of the central module repository directory that is
used to resolve all the relative CMI (compiled module interface[1])
paths (even paths like ./foo.gcm). However, this model will not
apply to all build systems. For example, in build2 (the build
system I am involved with), there can be no such central place
since a project can pull dependencies that are built in other
places. Currently, the only way to disable this repository
semantics is to use absolute CMI paths throughout.

Also, FWIW, I've attempted such a build system integration with
build2 back in 2019. While the overall idea of the module mapper
worked well, I had to make substantial extensions in my own
branch[2] of Nathan's c++-modules (also described in this[3]
WG21 paper). AFAIK, these extensions haven't yet been considered
for merging into c++-modules.

[1] BTW, SG15 seems to have settled on the BMI (built module
interface) term instead of CMI:


https://github.com/cplusplus/modules-ecosystem-tr/blob/master/definitions.tex

[2] https://github.com/boris-kolpackov/gcc-cxx-modules-ex

The branch used to live on gcc.gnu.org/git but was dropped as
part of the svn-to-git migration.

[3] https://wg21.link/P1842


Re: [PATCH v3] pass: Run cleanup passes before SLP [PR96789]

2020-11-04 Thread Kewen.Lin via Gcc-patches
Hi Lyon,

Thanks for reporting and sorry for the failure.

>> The patch was updated as your comments above, re-tested on Power8
>> and committed in r11-4637.
>>
> 
> The new test gcc.dg/tree-ssa/pr96789.c fails on arm:
> FAIL: gcc.dg/tree-ssa/pr96789.c scan-tree-dump dse3 "Deleted dead store:.*tmp"
> 
> Can you check?
> 

Could you help to provide the configuration command?

I tried the one from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96376#c3,
but I can't reproduce it with options -O2 -funroll-loops -ftree-vectorize
-fdump-tree-dse-details.  I guess I must miss something.

BR,
Kewen


[PATCH] Replace dep_list_size with dep_list_costs for better scheduling

2020-11-04 Thread Jojo R
gcc/
* haifa-sched.c (dep_list_costs): New.
(rank_for_schedule): Use dep_list_costs.

---
 gcc/haifa-sched.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 350178c82b8..62d1816a55d 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -1584,6 +1584,28 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list)
   return nodbgcount;
 }
 
+/* Compute the costs of nondebug deps in list LIST for INSN.  */
+
+static int
+dep_list_costs (rtx_insn *insn, sd_list_types_def list)
+{
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int costs = 0;
+
+  FOR_EACH_DEP (insn, list, sd_it, dep)
+{
+  if (!DEBUG_INSN_P (DEP_CON (dep))
+ && !DEBUG_INSN_P (DEP_PRO (dep)))
+   {
+ if (DEP_COST (dep) != 0)
+   costs++;
+   }
+}
+
+  return costs;
+}
+
 bool sched_fusion;
 
 /* Compute the priority number for INSN.  */
@@ -2795,8 +2817,8 @@ rank_for_schedule (const void *x, const void *y)
  This gives the scheduler more freedom when scheduling later
  instructions at the expense of added register pressure.  */
 
-  val = (dep_list_size (tmp2, SD_LIST_FORW)
-- dep_list_size (tmp, SD_LIST_FORW));
+  val = (dep_list_costs (tmp2, SD_LIST_FORW)
+- dep_list_costs (tmp, SD_LIST_FORW));
 
   if (flag_sched_dep_count_heuristic && val != 0)
 return rfs_result (RFS_DEP_COUNT, val, tmp, tmp2);
-- 
2.24.3 (Apple Git-128)



RE: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64

2020-11-04 Thread yangyang (ET)
> > Thanks for installing the patch. As you mentioned in the PR, stage1 of
> > GCC 11 is going to close in a few weeks, and GCC Development Plan
> > describes the stage3 as " During this two-month period, the only
> (non-documentation) changes that may be made are changes that fix bugs or
> new ports which do not require changes to other parts of the compiler.
> > New functionality may not be introduced during this period. ". So does
> > it mean that the rest four patches of this feature need to wait for the GCC 
> > 12
> stage1 to get installed?
> 
> Any taret-independent patches would need to be posted by the end of next
> week to get into GCC 11.  There's a bit more leeway for SVE-specific pieces in
> config/aarch64, since those have a lower impact.
> 
> Thanks,
> Richard

Thanks, all the rest patches seem to contain changes to the shared code, and I 
haven't finished them yet. So I might wait for GCC 12 stage1 to get them in.

Thanks,
Yang Yang


Re: PowerPC: Map IEEE 128-bit long double built-in functions

2020-11-04 Thread Michael Meissner via Gcc-patches
On Wed, Nov 04, 2020 at 06:13:57PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Oct 22, 2020 at 06:03:46PM -0400, Michael Meissner wrote:
> > To map the scanf functions,  is mapped to __isoc99_ieee128.
> 
> Is that correct?  What if you are compiling for c90?

That is the name in GLIBC.

> > * config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
> > support for mapping built-in function names for long double
> > built-in functions if long double is IEEE 128-bit.
> 
> "Map the built-in function names" etc.
> 
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -26893,56 +26893,127 @@ rs6000_globalize_decl_name (FILE * stream, tree 
> > decl)
> > library before you can switch the real*16 type at compile time.
> >  
> > We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  
> > We
> > -   only do this if the default is that long double is IBM extended double, 
> > and
> > -   the user asked for IEEE 128-bit.  */
> > +   only do this transformation if the __float128 type is enabled.  This
> > +   prevents us from doing the transformation on older 32-bit ports that 
> > might
> > +   have enabled using IEEE 128-bit floating point as the default long 
> > double
> > +   type.  */
> 
> I don't see why that is the right thing to do?  You'll have exactly
> these same problems on 32-bit!

The comment is refering to the remote possiblity that 32-bit VxWorks was
configured to use IEEE long double as the only long double type.  However, you
had to use several non-default options to do this.  All other historic 32-bit
implementations did not allow IEEE long double.

> 
> Hrm, we talked about that before I guess?  Do you just need to change
> this comment now?
> 
> > +   default:
> > + break;
> > +   }
> 
> That is useless, just leave it out?  The end of a switch will always
> fall through, and it is normal idiom to use that.

And it will get a warning that some enumeration elements did not have case
elements.

> > +  /* Update the __builtin_*printf && __builtin_*scanf functions.  */
> 
> "and" :-)
> 
> > + else if (name[len - 1] == 'l')
> > +   {
> > + bool uses_ieee128_p = false;
> > + tree type = TREE_TYPE (decl);
> > + machine_mode ret_mode = TYPE_MODE (type);
> > +
> > + /* See if the function returns a IEEE 128-bit floating point type 
> > or
> > +complex type.  */
> > + if (ret_mode == TFmode || ret_mode == TCmode)
> > +   uses_ieee128_p = true;
> > + else
> > {
> > - machine_mode arg_mode = TYPE_MODE (arg);
> > - if (arg_mode == TFmode || arg_mode == TCmode)
> > + function_args_iterator args_iter;
> > + tree arg;
> 
> (declare that right before the FOREACH)
> 
> > +
> > + /* See if the function passes a IEEE 128-bit floating point 
> > type
> > +or complex type.  */
> > + FOREACH_FUNCTION_ARGS (type, arg, args_iter)
> > {
> > - uses_ieee128_p = true;
> > - break;
> > + machine_mode arg_mode = TYPE_MODE (arg);
> > + if (arg_mode == TFmode || arg_mode == TCmode)
> > +   {
> > + uses_ieee128_p = true;
> > + break;
> > +   }
> > }
> > }
> 
> There is no point in doing all these early-outs in an initialisation
> function, making it much harder to read :-(
> 
> > + /* If we passed or returned an IEEE 128-bit floating point type,
> > +change the name.  Use __ieee128, instead of l.  */
> > + if (uses_ieee128_p)
> > +   newname = xasprintf ("__%.*sieee128", (int)(len - 1), name);
> 
> (int) (len - 1)
> 
> Please comment what the - 1 does, and/or what this is for at all.  (In
> the code / in a comment, not to me, I figured it out after a while.)

The comment just above the line says what it does.  I.e. it does not copy the
'l' in the name (i.e. sinl).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET

2020-11-04 Thread Xionghu Luo via Gcc-patches

Ping.

On 2020/10/10 16:08, Xionghu Luo wrote:

Originated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
with patch split and some refinement per review comments.

Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
this patch set enables expanding IFN VEC_SET for Power9 and Power8
with specfic instruction sequences.

Xionghu Luo (4):
   rs6000: Change rs6000_expand_vector_set param
   rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
   rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
   rs6000: Update testcases' instruction count

  gcc/config/rs6000/rs6000-c.c  |  44 +++--
  gcc/config/rs6000/rs6000-call.c   |   2 +-
  gcc/config/rs6000/rs6000-protos.h |   3 +-
  gcc/config/rs6000/rs6000.c| 181 +-
  gcc/config/rs6000/vector.md   |   4 +-
  .../powerpc/fold-vec-insert-char-p8.c |   8 +-
  .../powerpc/fold-vec-insert-char-p9.c |  12 +-
  .../powerpc/fold-vec-insert-double.c  |  11 +-
  .../powerpc/fold-vec-insert-float-p8.c|   6 +-
  .../powerpc/fold-vec-insert-float-p9.c|  10 +-
  .../powerpc/fold-vec-insert-int-p8.c  |   6 +-
  .../powerpc/fold-vec-insert-int-p9.c  |  11 +-
  .../powerpc/fold-vec-insert-longlong.c|  10 +-
  .../powerpc/fold-vec-insert-short-p8.c|   6 +-
  .../powerpc/fold-vec-insert-short-p9.c|   8 +-
  .../gcc.target/powerpc/pr79251-run.c  |  28 +++
  gcc/testsuite/gcc.target/powerpc/pr79251.h|  19 ++
  gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 ++
  gcc/testsuite/gcc.target/powerpc/pr79251.p9.c |  18 ++
  .../gcc.target/powerpc/vsx-builtin-7.c|   4 +-
  20 files changed, 337 insertions(+), 71 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c



--
Thanks,
Xionghu


[PATCH] cache compute_objsize results in strlen/sprintf (PR 97373)

2020-11-04 Thread Martin Sebor via Gcc-patches

To determine the target of a pointer expression and the offset into
it, the increasingly widely used compute_objsize function traverses
the IL following the DEF statements of pointer variables, aggregating
offsets from POINTER_PLUS assignments along the way.  It does that
for many statements that involve pointers, including calls to
built-in functions and (so far only) accesses to char arrays.  When
a function has many such statements with pointers to the same objects
but with different offsets, the traversal ends up visiting the same
pointer assignments repeatedly and unnecessarily.

To avoid this repeated traversal, the attached patch adds the ability
to cache results obtained in prior calls to the function.  The cache
is optional and only used when enabled.

To exercise the cache I have enabled it for the strlen pass (which
is probably the heaviest compute_objsize user).  That happens to
resolve PR 97373 which tracks the pass' failure to detect sprintf
overflowing allocated buffers at a non-constant offset.  I thought
about making this a separate patch but the sprintf/strlen changes
are completely mechanical so it didn't seem worth the effort.

In the benchmarking I've done the cache isn't a huge win there but
it does have a measurable difference in the project I'm wrapping up
where most pointer assignments need to be examined.  The space used
for the cache is negligible on average: fewer than 20 entries per
Glibc function and about 6 for GCC.  The worst case in Glibc is
6 thousand entries and 10k in GCC.  Since the entries are sizable
(216 bytes each) the worst case memory consumption can be reduced
by adding a level of indirection.  A further savings can be obtained
by replacing some of the offset_int members of the entries with
HOST_WIDE_INT.

The efficiency benefits of the cache should increase further as more
of the access checking code is integrated into the same pass.  This
should eventually include the checking currently done in the built-in
expanders.

Tested on x86_64-linux, along with Glibc and Binutils/GDB.

Martin

PS The patch add the new pointer_query class (loosely modeled on
range_query) to builtins.{h,c}.  This should be only temporary,
until the access checking code is moved into a file (and ultimately
a pass) of its own.
PR middle-end/97373 - missing warning on sprintf into allocated destination

gcc/ChangeLog:

	PR middle-end/97373
	* builtins.c (compute_objsize): Rename...
	(compute_objsize_r): to this.  Change order and types of arguments.
	Use new argument.  Adjust calls to self.
	(access_ref::get_ref): New member function.
	(pointer_query::pointer_query): New member function.
	(pointer_query::get_ref): Same.
	(pointer_query::put_ref): Same.
	(handle_min_max_size): Change order and types of arguments.
	(maybe_emit_free_warning): Add a test.
	* builtins.h (class pointer_query): New class.
	(compute_objsize): Declare an overload.
	* gimple-ssa-sprintf.c (get_destination_size):
	(handle_printf_call):
	* tree-ssa-strlen.c (adjust_last_stmt): Add an argument and use it.
	(maybe_warn_overflow): Same.
	(handle_builtin_strcpy): Same.
	(maybe_diag_stxncpy_trunc): Same.
	(handle_builtin_memcpy): Change argument type.  Adjust calls.
	(handle_builtin_strcat): Same.
	(handle_builtin_memset): Same.
	(handle_store): Same.
	(strlen_check_and_optimize_call): Same.
	(check_and_optimize_stmt): Same.
	(strlen_dom_walker): Add new data members.
	(strlen_dom_walker::before_dom_children): Use new member.
	(printf_strlen_execute): Dump cache performance counters.
	* tree-ssa-strlen.h (maybe_diag_stxncpy_trunc): Add argument.
	(handle_printf_call): Change argument type.

gcc/testsuite/ChangeLog:

	PR middle-end/97373
	* gcc.dg/warn-strnlen-no-nul.c:
	* g++.dg/warn/Wmismatched-new-delete-2.C: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-25.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1b8a5b82dac..d79dab05671 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -185,7 +185,7 @@ static void maybe_emit_chk_warning (tree, enum built_in_function);
 static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function);
 static tree fold_builtin_object_size (tree, tree);
 static bool check_read_access (tree, tree, tree = NULL_TREE, int = 1);
-static bool compute_objsize (tree, int, access_ref *, bitmap *, range_query *);
+static bool compute_objsize_r (tree, access_ref *, pointer_query *, bitmap *);
 
 unsigned HOST_WIDE_INT target_newline;
 unsigned HOST_WIDE_INT target_percent;
@@ -241,15 +241,14 @@ access_ref::phi () const
 
 /* Determine and return the largest object to which REF refers.  If REF
refers to a PHI and PREF is nonnull, fill *PREF with the details of
-   the object determined by compute_objsize(ARG, OSTYPE) for each PHI
-   argument ARG.  */
+   the object determined by compute_objsize(ARG, ...) for each PHI argument
+   ARG.  */
 
 tree
 access_ref::get_ref (vec *all_refs,
 		 access_ref *pref /* = NULL */,
-		 int ostype /* = 1 */,
-		 bitmap 

builtins: Add DFP signaling NaN built-in functions

2020-11-04 Thread Joseph Myers
Add built-in functions __builtin_nansd32, __builtin_nansd64 and
__builtin_nansd128 to return signaling NaNs of decimal floating-point
types, analogous to the functions already present for binary
floating-point types.

This patch, independent of

(pending review), is in preparation for adding the  macros
for such signaling NaNs that are in C2x, analogous to the macros for
other types that are in that patch.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Also ran
the new tests for powerpc64le-linux-gnu to confirm they do work in the
case (hardware DFP) where floating-point exceptions are supported for
DFP.  OK to commit?

gcc/
2020-11-05  Joseph Myers  

* builtins.def (BUILT_IN_NANSD32, BUILT_IN_NANSD64)
(BUILT_IN_NANSD128): New built-in functions.
* fold-const-call.c (fold_const_call): Handle the new built-in
functions.
* doc/extend.texi (__builtin_nansd32, __builtin_nansd64)
(__builtin_nansd128): Document.
* doc/sourcebuild.texi (Effective-Target Keywords): Document
fenv_exceptions_dfp.

gcc/testsuite/
2020-11-05  Joseph Myers  

* lib/target-supports.exp
(check_effective_target_fenv_exceptions_dfp): New.
* gcc.dg/dfp/builtin-snan-1.c, gcc.dg/dfp/builtin-snan-2.c: New
tests.

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 68f2da6cda4..b4494c712a1 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -518,6 +518,9 @@ DEF_GCC_BUILTIN(BUILT_IN_NANSF, "nansf", 
BT_FN_FLOAT_CONST_STRING, ATTR_
 DEF_GCC_BUILTIN(BUILT_IN_NANSL, "nansl", 
BT_FN_LONGDOUBLE_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, 
ATTR_CONST_NOTHROW_NONNULL)
 #undef NAN_TYPE
+DEF_GCC_BUILTIN(BUILT_IN_NANSD32, "nansd32", 
BT_FN_DFLOAT32_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
+DEF_GCC_BUILTIN(BUILT_IN_NANSD64, "nansd64", 
BT_FN_DFLOAT64_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
+DEF_GCC_BUILTIN(BUILT_IN_NANSD128, "nansd128", 
BT_FN_DFLOAT128_CONST_STRING, ATTR_CONST_NOTHROW_NONNULL)
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINT, "nearbyint", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINTF, "nearbyintf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINTL, "nearbyintl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7a6ecce6a84..e6a9bdf1099 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13865,6 +13865,18 @@ to be a signaling NaN@.  The @code{nans} function is 
proposed by
 @uref{http://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm,,WG14 N965}.
 @end deftypefn
 
+@deftypefn {Built-in Function} _Decimal32 __builtin_nansd32 (const char *str)
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal32}.
+@end deftypefn
+
+@deftypefn {Built-in Function} _Decimal64 __builtin_nansd64 (const char *str)
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal64}.
+@end deftypefn
+
+@deftypefn {Built-in Function} _Decimal128 __builtin_nansd128 (const char *str)
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal128}.
+@end deftypefn
+
 @deftypefn {Built-in Function} float __builtin_nansf (const char *str)
 Similar to @code{__builtin_nans}, except the return type is @code{float}.
 @end deftypefn
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 49316a5d0ff..b3c5e530423 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2356,6 +2356,11 @@ Target provides @file{fenv.h} include file.
 Target supports @file{fenv.h} with all the standard IEEE exceptions
 and floating-point exceptions are raised by arithmetic operations.
 
+@item fenv_exceptions_dfp
+Target supports @file{fenv.h} with all the standard IEEE exceptions
+and floating-point exceptions are raised by arithmetic operations for
+decimal floating point.
+
 @item fileio
 Target offers such file I/O library functions as @code{fopen},
 @code{fclose}, @code{tmpnam}, and @code{remove}.  This is a link-time
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 11ed47db3d9..3548fab78cd 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1300,6 +1300,9 @@ fold_const_call (combined_fn fn, tree type, tree arg)
 
 CASE_CFN_NANS:
 CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
+case CFN_BUILT_IN_NANSD32:
+case CFN_BUILT_IN_NANSD64:
+case CFN_BUILT_IN_NANSD128:
   return fold_const_builtin_nan (type, arg, false);
 
 case CFN_REDUC_PLUS:
diff --git a/gcc/testsuite/gcc.dg/dfp/builtin-snan-1.c 
b/gcc/testsuite/gcc.dg/dfp/builtin-snan-1.c
new file mode 100644
index 000..49a32c87546
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/dfp/builtin-snan-1.c
@@ -0,0 +1,23 @@
+/* Test __builtin_nansd* functions.  

Re: PowerPC: Map IEEE 128-bit long double built-in functions

2020-11-04 Thread Segher Boessenkool
Hi!

On Thu, Oct 22, 2020 at 06:03:46PM -0400, Michael Meissner wrote:
> To map the scanf functions,  is mapped to __isoc99_ieee128.

Is that correct?  What if you are compiling for c90?

>   * config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
>   support for mapping built-in function names for long double
>   built-in functions if long double is IEEE 128-bit.

"Map the built-in function names" etc.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -26893,56 +26893,127 @@ rs6000_globalize_decl_name (FILE * stream, tree 
> decl)
> library before you can switch the real*16 type at compile time.
>  
> We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  We
> -   only do this if the default is that long double is IBM extended double, 
> and
> -   the user asked for IEEE 128-bit.  */
> +   only do this transformation if the __float128 type is enabled.  This
> +   prevents us from doing the transformation on older 32-bit ports that might
> +   have enabled using IEEE 128-bit floating point as the default long double
> +   type.  */

I don't see why that is the right thing to do?  You'll have exactly
these same problems on 32-bit!

Hrm, we talked about that before I guess?  Do you just need to change
this comment now?

> + default:
> +   break;
> + }

That is useless, just leave it out?  The end of a switch will always
fall through, and it is normal idiom to use that.

> +  /* Update the __builtin_*printf && __builtin_*scanf functions.  */

"and" :-)

> +   else if (name[len - 1] == 'l')
> + {
> +   bool uses_ieee128_p = false;
> +   tree type = TREE_TYPE (decl);
> +   machine_mode ret_mode = TYPE_MODE (type);
> +
> +   /* See if the function returns a IEEE 128-bit floating point type 
> or
> +  complex type.  */
> +   if (ret_mode == TFmode || ret_mode == TCmode)
> + uses_ieee128_p = true;
> +   else
>   {
> -   machine_mode arg_mode = TYPE_MODE (arg);
> -   if (arg_mode == TFmode || arg_mode == TCmode)
> +   function_args_iterator args_iter;
> +   tree arg;

(declare that right before the FOREACH)

> +
> +   /* See if the function passes a IEEE 128-bit floating point 
> type
> +  or complex type.  */
> +   FOREACH_FUNCTION_ARGS (type, arg, args_iter)
>   {
> -   uses_ieee128_p = true;
> -   break;
> +   machine_mode arg_mode = TYPE_MODE (arg);
> +   if (arg_mode == TFmode || arg_mode == TCmode)
> + {
> +   uses_ieee128_p = true;
> +   break;
> + }
>   }
>   }

There is no point in doing all these early-outs in an initialisation
function, making it much harder to read :-(

> +   /* If we passed or returned an IEEE 128-bit floating point type,
> +  change the name.  Use __ieee128, instead of l.  */
> +   if (uses_ieee128_p)
> + newname = xasprintf ("__%.*sieee128", (int)(len - 1), name);

(int) (len - 1)

Please comment what the - 1 does, and/or what this is for at all.  (In
the code / in a comment, not to me, I figured it out after a while.)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
> @@ -0,0 +1,567 @@
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power9 -mno-pcrel -O2 -Wno-psabi 
> -mabi=ieeelongdouble" } */

So why do you need power10_ok?  You need power9_ok of course, but why
power10?  You can specify -mno-pcrel always.

> +  /* { dg-final { scan-assembler {\mxsrqpi +[0-9]+,[0-9]+,[0-9]+,2\M} } }  */

The " +" should be just a " ".
Not every assembler variant uses plain numbers for registers.  That is
not currently an issue of course, but why not just do it right in the
first place :-)

You can do
  {\mxsrqpi r?[0-9]+,r?[0-9]+,r?[0-9]+,2\M}
or simpler,
  {\mxsrqpi r?\d+,r?\d+,r?\d+,2\M}
or even simpler
  {(?n)\mxsrqpi .*,2\M}
(if you don't mind the number of operands in that last case).

> +  /* { dg-final { scan-assembler {\mxsrqpi +[0-9]+,[0-9]+,[0-9]+,3\M} } }  */

(same as above)

> +  /* lgammaf128 mentioned previously.  */
> +  *p++ = BUILTIN1 (lgammal, *q++);

What does that comment mean?  Should you have used scan-assembler-times
perhaps?

> +  /* { dg-final { scan-assembler {\mxsrqpi +[0-9]+,[0-9]+,[0-9]+,1\M} } }  */

(one more)

> +  /* remainderf128 mentioned previously.  */
> +  *p++ = BUILTIN2 (remainderl, *q++, *r++);

(similar to the lgammal case)

> +  /* { dg-final { scan-assembler {\m__scalbnieee128\M} } }  */
> +  *p   = BUILTIN2 (scalbl, *q, *r);  

(trailing spaces)

> +  /* { dg-final { scan-assembler {\m__cabsieee128\M} } }  */
> +  *p++ = BUILTIN1 

[PATCH] middle-end: Store and use the SLP instance kind when aborting load/store lanes

2020-11-04 Thread Tamar Christina via Gcc-patches
Hi All,

This patch stores the SLP instance kind in the SLP instance so that we can use
it later when detecting load/store lanes support.

This also changes the load/store lane support check to only check if the SLP
kind is a store.  This means that in order for the load/lanes to work all
instances must be of kind store.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.c (vect_analyze_loop_2): Check kind.
* tree-vect-slp.c (vect_build_slp_instance): New.
(enum slp_instance_kind): Move to...
* tree-vectorizer.h (enum slp_instance_kind): .. Here
(SLP_INSTANCE_KIND): New.

-- 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c9de82b35bee123a93847dc1e79280ebc3d81893..1a6f52e6a261ebbfae6c2d7760232e341389ab4d 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2394,6 +2394,7 @@ start_over:
 	  /* If the loads and stores can be handled with load/store-lane
 	 instructions record it and move on to the next instance.  */
 	  if (loads_permuted
+	  && SLP_INSTANCE_KIND (instance) == slp_inst_kind_store
 	  && vect_store_lanes_supported (vectype, group_size, false))
 	{
 	  FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (instance), i, load_node)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d498cd466eb7e8d98230f6b662437cf4fa312f27..420c3c93374b788d96779bf0b730d1bc47a98f58 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2171,13 +2171,6 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
   return exact_div (common_multiple (nunits, group_size), group_size);
 }
 
-enum slp_instance_kind {
-slp_inst_kind_store,
-slp_inst_kind_reduc_group,
-slp_inst_kind_reduc_chain,
-slp_inst_kind_ctor
-};
-
 static bool
 vect_analyze_slp_instance (vec_info *vinfo,
 			   scalar_stmts_to_slp_tree_map_t *bst_map,
@@ -2253,6 +2246,7 @@ vect_build_slp_instance (vec_info *vinfo,
 	  SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
 	  SLP_INSTANCE_LOADS (new_instance) = vNULL;
 	  SLP_INSTANCE_ROOT_STMT (new_instance) = root_stmt_info;
+	  SLP_INSTANCE_KIND (new_instance) = kind;
 	  new_instance->reduc_phis = NULL;
 	  new_instance->cost_vec = vNULL;
 	  new_instance->subgraph_entries = vNULL;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index fbf5291cf065f3944040937db92d3997acd45f23..26988f78143621dc9f21698b105127c88f9b0212 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -174,6 +174,15 @@ struct _slp_tree {
   static void operator delete (void *, size_t);
 };
 
+/* The enum describes the type of operations that an SLP instance
+   can perform. */
+
+enum slp_instance_kind {
+slp_inst_kind_store,
+slp_inst_kind_reduc_group,
+slp_inst_kind_reduc_chain,
+slp_inst_kind_ctor
+};
 
 /* SLP instance is a sequence of stmts in a loop that can be packed into
SIMD stmts.  */
@@ -202,6 +211,9 @@ public:
  entries into the same subgraph, including itself.  */
   vec<_slp_instance *> subgraph_entries;
 
+  /* The type of operation the SLP instance is performing.  */
+  slp_instance_kind kind;
+
   dump_user_location_t location () const;
 } *slp_instance;
 
@@ -211,6 +223,7 @@ public:
 #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor
 #define SLP_INSTANCE_LOADS(S)(S)->loads
 #define SLP_INSTANCE_ROOT_STMT(S)(S)->root_stmt
+#define SLP_INSTANCE_KIND(S) (S)->kind
 
 #define SLP_TREE_CHILDREN(S) (S)->children
 #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts



Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Wed, Nov 4, 2020 at 3:00 PM Hans-Peter Nilsson  wrote:
>
> On Wed, 4 Nov 2020, H.J. Lu wrote:
> > On Wed, Nov 4, 2020 at 1:56 PM Hans-Peter Nilsson  wrote:
> > > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > >
> > > > On Wed, Nov 4, 2020 at 1:03 PM Hans-Peter Nilsson  
> > > > wrote:
> > > > >
> > > > > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > > > > > On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson 
> > > > > >  wrote:
> > > > > > > I'm not much more than a random voice, but an assembly directive
> > > > > > > that specifies the symbol (IIUC your .retain directive) to
> > > > > >
> > > > > > But .retain directive DOES NOT adjust symbol attribute.
> > >
> > > I see I missed to point out that I was speaking about the *gcc
> > > symbol* attribute "used".
> >
> > There is no such corresponding symbol attribute in ELF.
>
> I have not missed that, nor that SHF_GNU_RETAIN is so new that
> it's not in binutils master.  I have also not missed that gcc
> caters to other object formats too.  A common symbol-specific
> directive such as .retain, would be better than messing with
> section attributes, for gcc.

This is totally irrelevant to SHF_GNU_RETAIN.

> > > It's cleaner to the compiler if it can pass on to the assembler
> > > the specific symbol that needs to be kept.
> > >
> >
> > SHF_GNU_RETAIN is for section and GCC should place the symbol,
> > which should be kept, in the SHF_GNU_RETAIN section directly, not
> > through .retain directive.
>
> This is where opinions differ.  Anyway, this is now repetition;
> I'm done.

.retain is ill-defined.   For example,

[hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
static int xyzzy __attribute__((__used__));
[hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S /tmp/x.c -fcommon
[hjl@gnu-cfl-2 gcc]$ cat x.s
.file "x.c"
.text
.retain xyzzy  <<<<<<<<< What does it do?
.local xyzzy
.comm xyzzy,4,4
.ident "GCC: (GNU) 11.0.0 20201103 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 gcc]$

A symbol directive should operate on the symbol table.
With 'R' flag, we got

.file "x.c"
.text
.section .bss.xyzzy,"awR",@nobits
.align 4
.type xyzzy, @object
.size xyzzy, 4
xyzzy:
.zero 4
.ident "GCC: (GNU) 11.0.0 20201104 (experimental)"
.section .note.GNU-stack,"",@progbits

-- 
H.J.


Re: [PATCH] libstdc++: Implement C++20 features for

2020-11-04 Thread Jonathan Wakely via Gcc-patches

On 04/11/20 21:45 +, Jonathan Wakely wrote:

On 04/11/20 12:43 -0800, Thomas Rodgers wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97719


On Nov 4, 2020, at 11:54 AM, Stephan Bergmann  wrote:

On 07/10/2020 18:55, Thomas Rodgers wrote:

From: Thomas Rodgers 
New ctors and ::view() accessor for -
 * basic_stingbuf
 * basic_istringstream
 * basic_ostringstream
 * basic_stringstreamm
New ::get_allocator() accessor for basic_stringbuf.

I found that this 
 
"libstdc++: Implement C++20 features for " changed the behavior of


$ cat test.cc
#include 
#include 
#include 
int main() {
std::stringstream s("a");
std::istreambuf_iterator i(s);
if (i != std::istreambuf_iterator()) std::cout << *i << '\n';
}
$ g++ -std=c++20 test.cc
$ ./a.out


from printing "a" to printing nothing.  (The `i != ...` comparison appears to change i 
from pointing at "a" to pointing to null, and returns false.)

I ran into this when building LibreOffice, and I hope test.cc is a faithfully 
minimized reproducer.  However, I know little about std::istreambuf_iterator, 
so it may well be that the code isn't even valid.



I'm testing this patch.


Tested powerpc64le-linux. Pushed now.



commit 1ca2fe0fc85403c6ea4e0775b5da051ff0eebc96
Author: Jonathan Wakely 
Date:   Wed Nov 4 21:44:05 2020

   libstdc++: Fix default mode of new basic_stringstream constructor [PR 97719]
   
   libstdc++-v3/ChangeLog:
   
   PR libstdc++/97719

   * include/std/sstream (basic_stringstream(string_type&&, openmode)):
   Fix default argument.
   * testsuite/27_io/basic_stringstream/cons/char/97719.cc: New test.

diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index 33a00486606c..8acf1eb259ab 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -976,7 +976,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11

  explicit
  basic_stringstream(__string_type&& __str,
-ios_base::openmode __mode = ios_base::out
+ios_base::openmode __mode = ios_base::in
 | ios_base::out)
  : __iostream_type(), _M_stringbuf(std::move(__str), __mode)
  { this->init(std::__addressof(_M_stringbuf)); }
diff --git a/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc 
b/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc
new file mode 100644
index ..fa523a803b6d
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc
@@ -0,0 +1,40 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/97719
+  std::string str = "a";
+  std::stringstream s(std::move(str));
+  std::istreambuf_iterator i(s);
+  VERIFY( i != std::istreambuf_iterator() );
+  VERIFY( *i == 'a' );
+}
+
+int
+main()
+{
+  test01();
+}




Re: [PATCH] i386: Cleanup i386/i386elf.h and align it's return convention with the SVR4 ABI

2020-11-04 Thread Pat Bernardi
> So, since the unpatched compiler crashes with an example that would
> make a difference, I think the patch is OK as it is.

Thanks for taking the time to look at that Uros, and apologies for not getting 
back to you sooner.

With regards to your other question:

> So, is it necessary to define DEFAULT_PCC_STRUCT_RETURN ?

It’s not necessary, but a number of other i386 targets like GNU and NetBSD have 
it explicitly defined. I wasn’t sure if it was for legacy reasons or to 
document clearly the choice. I can prepare a patch to remove 
DEFAULT_PCC_STRUCT_RETURN from i386elf.h if you think it would be clearer not 
to explicitly define it.

Thanks,

Pat Bernardi

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Hans-Peter Nilsson
On Wed, 4 Nov 2020, H.J. Lu wrote:
> On Wed, Nov 4, 2020 at 1:56 PM Hans-Peter Nilsson  wrote:
> > On Wed, 4 Nov 2020, H.J. Lu wrote:
> >
> > > On Wed, Nov 4, 2020 at 1:03 PM Hans-Peter Nilsson  
> > > wrote:
> > > >
> > > > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > > > > On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson 
> > > > >  wrote:
> > > > > > I'm not much more than a random voice, but an assembly directive
> > > > > > that specifies the symbol (IIUC your .retain directive) to
> > > > >
> > > > > But .retain directive DOES NOT adjust symbol attribute.
> >
> > I see I missed to point out that I was speaking about the *gcc
> > symbol* attribute "used".
>
> There is no such corresponding symbol attribute in ELF.

I have not missed that, nor that SHF_GNU_RETAIN is so new that
it's not in binutils master.  I have also not missed that gcc
caters to other object formats too.  A common symbol-specific
directive such as .retain, would be better than messing with
section attributes, for gcc.

> > It's cleaner to the compiler if it can pass on to the assembler
> > the specific symbol that needs to be kept.
> >
>
> SHF_GNU_RETAIN is for section and GCC should place the symbol,
> which should be kept, in the SHF_GNU_RETAIN section directly, not
> through .retain directive.

This is where opinions differ.  Anyway, this is now repetition;
I'm done.

brgds, H-P


Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Tobias Burnus

On 04.11.20 20:00, Segher Boessenkool wrote:

But why are tests in gcc.target/i386/ run for other targets at all?!


Those under gcc.target/i386/ contain assembler checks – and are only
for x86 and are (hence) also only suitable for x86.

But the failing ones for PowerPC (PR97680) and ARM (PR97699) are under
c-c++-common/zero-scratch-regs-*.c (only {9,10,11} fail here, I did
see a "8" fail at gcc-testresults). And those tests apply to all targets.

On 04.11.20 23:01, Segher Boessenkool wrote:

If the default implementation doesn’t work ... on PowerPC, ... the Maintainer 
... to decide
Whether to skip these testing case on this platform or add a PowerPC 
implementation.

Yeah, we will deal with it.  In stage 3:-)


Thanks!

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers

2020-11-04 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-11-04 at 18:28 +0100, Andreas Krebbel wrote:
> These tests all use the -mzvector option but do not appear to make
> use of the z vector languages
> extensions. I think that option could be removed. Then these tests
> should be moved to the vector subdir.

Will change, thanks!

> You could do the asm scanning also in dg-do run tests.

This doesn't seem to work.  For example, if I add 

/* { dg-final { scan-assembler-times {aaa} 999 } } */

to long-double-from-double-run.c, it won't fail.

> 
> Andreas
> 
> 
> On 03.11.20 22:46, Ilya Leoshkevich wrote:
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-caller-abi-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-copysign-run.c: New test.
> > * gcc.target/s390/zvector/long-double-copysign-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-fprx2-constant.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-double-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-double-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-float-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-float-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i16-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i32-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i64-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u16-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u32-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u64-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-double-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-double-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-float-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-float-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-i16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-vec-duplicate.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-wf.h: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfixb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfkxb-0111.c: New test.
> >   

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Wed, Nov 4, 2020 at 1:56 PM Hans-Peter Nilsson  wrote:
>
>
>
> On Wed, 4 Nov 2020, H.J. Lu wrote:
>
> > On Wed, Nov 4, 2020 at 1:03 PM Hans-Peter Nilsson  wrote:
> > >
> > > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > > > On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  
> > > > wrote:
> > > > >
> > > > > On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > > > > > I personally do not see the problem with the .retain attribute, 
> > > > > > however
> > > > > > if it is going to be a barrier to getting the functionality 
> > > > > > committed, I
> > > > > > am happy to change it, since I really just want the functionality in
> > > > > > upstream sources.
> > > > > >
> > > > > > If a global maintainer would comment on whether any of the proposed
> > > > > > approaches are acceptable, then I will try to block out time from 
> > > > > > other
> > > > > > deadlines so I can work on the fixups and submit a patch in time 
> > > > > > for the
> > > > > > GCC 11 freeze.
> > > > > >
> > > > > > Thanks,
> > > > > > Jozef
> > > > >
> > > > > I'm not much more than a random voice, but an assembly directive
> > > > > that specifies the symbol (IIUC your .retain directive) to
> > > >
> > > > But .retain directive DOES NOT adjust symbol attribute.
>
> I see I missed to point out that I was speaking about the *gcc
> symbol* attribute "used".

There is no such corresponding symbol attribute in ELF.

> > > > Instead, it sets
> > > > the SHF_GNU_RETAIN bit on the section which contains the symbol
> > > > definition.  The same section can have many unrelated symbols.
> > >
> > > That's an implementation detail *left to the assembler and
> > > linker*.  It's not something the compiler needs to know, and
> > > teoretically it could even change.
> > >
> >
> > The ELF extension is SHF_GNU_RETAIN.  .retain directive is a hack
> > which I strongly objected and showed that it wasn't needed to implement
> > SHF_GNU_RETAIN in binutils.
>
> It's still an implementation detail better kept in the
> assembler, that the mechanism used to retain a symbol for the
> compiler, happens to map to a section attribute.  Some may call
> *that* a hack.
>
> It's cleaner to the compiler if it can pass on to the assembler
> the specific symbol that needs to be kept.
>

SHF_GNU_RETAIN is for section and GCC should place the symbol,
which should be kept, in the SHF_GNU_RETAIN section directly, not
through .retain directive.

-- 
H.J.


Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

2020-11-04 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
> On 03.11.20 22:45, Ilya Leoshkevich wrote:
> > On z14+, there are instructions for working with 128-bit floats
> > (long
> > doubles) in vector registers.  It's beneficial to use them instead
> > of
> > instructions that operate on floating point register pairs, because
> > it
> > allows to store 4 times more data in registers at a time,
> > relieveing
> > register pressure.  The performance of new instructions is almost
> > the
> > same.
> > 
> > Implement by storing TFmode values in vector registers on
> > z14+.  Since
> > not all operations are available with the new instructions, keep
> > the old
> > ones using the new FPRX2 mode, and convert between it and TFmode
> > when
> > necessary (this is called "forwarder" expanders below).  Change the
> > existing TFmode expanders to call either new- or old-style ones
> > depending on whether we are on z14+ or older machines ("dispatcher"
> > expanders).
> > 
> > gcc/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  
> > 
> > * config/s390/s390-modes.def (FPRX2): New mode.
> > * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> > * config/s390/s390.c (s390_fma_allowed_p): Likewise.
> > (s390_build_signbit_mask): Support 128-bit masks.
> > (print_operand): Support printing the second word of a TFmode
> > operand as vector register.
> > (constant_modes): Add FPRX2mode.
> > (s390_class_max_nregs): Return 1 for TFmode on z14+.
> > (s390_is_fpr128): New function.
> > (s390_is_vr128): Likewise.
> > (s390_can_change_mode_class): Use s390_is_fpr128 and
> > s390_is_vr128 in order to determine whether mode refers to a
> > FPR
> > pair or to a VR.
> > * config/s390/s390.h (EXPAND_MOVTF): New macro.
> > (EXPAND_TF): Likewise.
> > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> > alias.
> > (ALL): Add FPRX2.
> > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (FP): Likewise.
> > (FP_ANYTF): New mode iterator.
> > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (TD_TF): Likewise.
> > (xde): Add FPRX2.
> > (nBFP): Likewise.
> > (nDFP): Likewise.
> > (DSF): Likewise.
> > (DFDI): Likewise.
> > (SFSI): Likewise.
> > (DF): Likewise.
> > (SF): Likewise.
> > (fT0): Likewise.
> > (bt): Likewise.
> > (_d): Likewise.
> > (HALF_TMODE): Likewise.
> > (tf_fpr): New mode_attr.
> > (type): New mode_attr.
> > (*cmp_ccz_0): Use type instead of mode with fsimp.
> > (*cmp_ccs_0_fastmath): Likewise.
> > (*cmptf_ccs): New pattern for wfcxb.
> > (*cmptf_ccsfps): New pattern for wfkxb.
> > (mov): Rename to mov.
> > (signbit2): Rename to signbit2.
> > (isinf2): Renamed to isinf2.
> > (*TDC_insn_): Use type instead of mode with fsimp.
> > (fixuns_trunc2): Rename to
> > fixuns_trunc2.
> > (fix_trunctf2): Rename to fix_trunctf2_fpr.
> > (floatdi2): Rename to floatdi2, use type
> > instead of mode with itof.
> > (floatsi2): Rename to floatsi2, use type
> > instead of mode with itof.
> > (*floatuns2): Use type instead of mode for
> > itof.
> > (floatuns2): Rename to
> > floatuns2.
> > (trunctf2): Rename to trunctf2_fpr, use type
> > instead
> > of mode with fsimp.
> > (extend2): Rename to
> > extend2.
> > (2): Rename to
> > 2, use type instead of
> > mode with fsimp.
> > (rint2): Rename to rint2, use
> > type instead of mode with fsimp.
> > (2): Use type instead of mode for
> > fsimp.
> > (rint2): Likewise.
> > (trunc2): Rename to
> > trunc2.
> > (trunc2): Rename to
> > trunc2.
> > (extend2): Rename to
> > extend2.
> > (extend2): Rename to
> > extend2.
> > (add3): Rename to add3, use type instead of
> > mode with fsimp.
> > (*add3_cc): Use type instead of mode with fsimp.
> > (*add3_cconly): Likewise.
> > (sub3): Rename to sub3, use type instead of
> > mode with fsimp.
> > (*sub3_cc): Use type instead of mode with fsimp.
> > (*sub3_cconly): Likewise.
> > (mul3): Rename to mul3, use type instead of
> > mode with fsimp.
> > (fma4): Restrict using s390_fma_allowed_p.
> > (fms4): Restrict using s390_fma_allowed_p.
> > (div3): Rename to div3, use type instead of
> > mode with fdiv.
> > (neg2): Rename to neg2.
> > (*neg2_cc): Use type instead of mode with fsimp.
> > (*neg2_cconly): Likewise.
> > (*neg2_nocc): Likewise.
> > (*neg2): Likeiwse.
> > (abs2): Rename to abs2, use type instead of
> > mode with fdiv.
> > (*abs2_cc): Use type instead of mode with fsimp.
> > (*abs2_cconly): Likewise.
> > (*abs2_nocc): Likewise.
> > (*abs2): Likewise.
> > (*negabs2_cc): Likewise.
> > (*negabs2_cconly): Likewise.
> > (*negabs2_nocc): Likewise.
> > (*negabs2): Likewise.
> > (sqrt2): Rename to sqrt2, 

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Segher Boessenkool
On Wed, Nov 04, 2020 at 01:58:26PM -0600, Qing Zhao wrote:
> > On Nov 4, 2020, at 1:00 PM, Segher Boessenkool  
> > wrote:
> > On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
> >> Tobias Burnus  writes:
> >>> Three of the testcases fail on PowerPC: 
> >>> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
> >>>   powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> >>> unimplemented: '-fzero-call-used_regs' not supported on this target
> >>> 
> >>> Did you miss some dg-require-effective-target ?
> >> 
> >> No, these are a signal to target maintainers that they need
> >> to decide whether to add support or accept the status quo
> >> (in which case a new effective-target will be needed).  See:
> >> https://urldefense.com/v3/__https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html__;!!GqivPVa7Brio!PD1t9rpXf7lNS8yVbiQckiR5w3bv1eqGZenzRGPMBTAlYpshdQ9qVR0JLhoeNFMg$
> >>  :
> >> 
> >>The new tests are likely to fail on some targets with the sorry()
> >>message, but I think target maintainers are best placed to decide
> >>whether (a) that's a fundamental restriction of the target and the
> >>tests should just be skipped or (b) the target needs to implement
> >>the new hook.
> > 
> > But why are tests in gcc.target/i386/ run for other targets at all?!
> 
> No,  tests in gcc.target/i386 should not run for PowerPC.
> 
> What Tobias Burnus mentioned are the following tests:
> 
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++2a (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++98 (test for excess errors)
> 
> 
> They are under c-c++-common, not gcc.target/i386. 

Ah, good.  But the mail said

> >>> Three of the testcases fail on PowerPC: 
> >>> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
> >>>   powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> >>> unimplemented: '-fzero-call-used_regs' not supported on this target

so :-)

> These testing cases are added intentionaly on all platforms in order to check 
> whether  the current middle-end default implementation for
> -fzero-call-used-regs works on the specific platform.
> 
> If the default implementation doesn’t work for the specific platform, for 
> example, on PowerPC, it’s better for the Maintainer of PowerPC to decide
> Whether to skip these testing case on this platform or add a PowerPC 
> implementation.

Yeah, we will deal with it.  In stage 3 :-)


Segher


Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Hans-Peter Nilsson



On Wed, 4 Nov 2020, H.J. Lu wrote:

> On Wed, Nov 4, 2020 at 1:03 PM Hans-Peter Nilsson  wrote:
> >
> > On Wed, 4 Nov 2020, H.J. Lu wrote:
> > > On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  
> > > wrote:
> > > >
> > > > On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > > > > I personally do not see the problem with the .retain attribute, 
> > > > > however
> > > > > if it is going to be a barrier to getting the functionality 
> > > > > committed, I
> > > > > am happy to change it, since I really just want the functionality in
> > > > > upstream sources.
> > > > >
> > > > > If a global maintainer would comment on whether any of the proposed
> > > > > approaches are acceptable, then I will try to block out time from 
> > > > > other
> > > > > deadlines so I can work on the fixups and submit a patch in time for 
> > > > > the
> > > > > GCC 11 freeze.
> > > > >
> > > > > Thanks,
> > > > > Jozef
> > > >
> > > > I'm not much more than a random voice, but an assembly directive
> > > > that specifies the symbol (IIUC your .retain directive) to
> > >
> > > But .retain directive DOES NOT adjust symbol attribute.

I see I missed to point out that I was speaking about the *gcc
symbol* attribute "used".

> > > Instead, it sets
> > > the SHF_GNU_RETAIN bit on the section which contains the symbol
> > > definition.  The same section can have many unrelated symbols.
> >
> > That's an implementation detail *left to the assembler and
> > linker*.  It's not something the compiler needs to know, and
> > teoretically it could even change.
> >
>
> The ELF extension is SHF_GNU_RETAIN.  .retain directive is a hack
> which I strongly objected and showed that it wasn't needed to implement
> SHF_GNU_RETAIN in binutils.

It's still an implementation detail better kept in the
assembler, that the mechanism used to retain a symbol for the
compiler, happens to map to a section attribute.  Some may call
*that* a hack.

It's cleaner to the compiler if it can pass on to the assembler
the specific symbol that needs to be kept.

brgds, H-P


Re: [PATCH] libstdc++: Implement C++20 features for

2020-11-04 Thread Jonathan Wakely via Gcc-patches

On 04/11/20 12:43 -0800, Thomas Rodgers wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97719


On Nov 4, 2020, at 11:54 AM, Stephan Bergmann  wrote:

On 07/10/2020 18:55, Thomas Rodgers wrote:

From: Thomas Rodgers 
New ctors and ::view() accessor for -
  * basic_stingbuf
  * basic_istringstream
  * basic_ostringstream
  * basic_stringstreamm
New ::get_allocator() accessor for basic_stringbuf.

I found that this 
 
"libstdc++: Implement C++20 features for " changed the behavior of


$ cat test.cc
#include 
#include 
#include 
int main() {
 std::stringstream s("a");
 std::istreambuf_iterator i(s);
 if (i != std::istreambuf_iterator()) std::cout << *i << '\n';
}
$ g++ -std=c++20 test.cc
$ ./a.out


from printing "a" to printing nothing.  (The `i != ...` comparison appears to change i 
from pointing at "a" to pointing to null, and returns false.)

I ran into this when building LibreOffice, and I hope test.cc is a faithfully 
minimized reproducer.  However, I know little about std::istreambuf_iterator, 
so it may well be that the code isn't even valid.



I'm testing this patch.


commit 1ca2fe0fc85403c6ea4e0775b5da051ff0eebc96
Author: Jonathan Wakely 
Date:   Wed Nov 4 21:44:05 2020

libstdc++: Fix default mode of new basic_stringstream constructor [PR 97719]

libstdc++-v3/ChangeLog:

PR libstdc++/97719
* include/std/sstream (basic_stringstream(string_type&&, openmode)):
Fix default argument.
* testsuite/27_io/basic_stringstream/cons/char/97719.cc: New test.

diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index 33a00486606c..8acf1eb259ab 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -976,7 +976,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   explicit
   basic_stringstream(__string_type&& __str,
-	 ios_base::openmode __mode = ios_base::out
+	 ios_base::openmode __mode = ios_base::in
 		 | ios_base::out)
   : __iostream_type(), _M_stringbuf(std::move(__str), __mode)
   { this->init(std::__addressof(_M_stringbuf)); }
diff --git a/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc b/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc
new file mode 100644
index ..fa523a803b6d
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/97719.cc
@@ -0,0 +1,40 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/97719
+  std::string str = "a";
+  std::stringstream s(std::move(str));
+  std::istreambuf_iterator i(s);
+  VERIFY( i != std::istreambuf_iterator() );
+  VERIFY( *i == 'a' );
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Wed, Nov 4, 2020 at 1:03 PM Hans-Peter Nilsson  wrote:
>
> On Wed, 4 Nov 2020, H.J. Lu wrote:
> > On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  
> > wrote:
> > >
> > > On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > > > I personally do not see the problem with the .retain attribute, however
> > > > if it is going to be a barrier to getting the functionality committed, I
> > > > am happy to change it, since I really just want the functionality in
> > > > upstream sources.
> > > >
> > > > If a global maintainer would comment on whether any of the proposed
> > > > approaches are acceptable, then I will try to block out time from other
> > > > deadlines so I can work on the fixups and submit a patch in time for the
> > > > GCC 11 freeze.
> > > >
> > > > Thanks,
> > > > Jozef
> > >
> > > I'm not much more than a random voice, but an assembly directive
> > > that specifies the symbol (IIUC your .retain directive) to
> >
> > But .retain directive DOES NOT adjust symbol attribute.  Instead, it sets
> > the SHF_GNU_RETAIN bit on the section which contains the symbol
> > definition.  The same section can have many unrelated symbols.
>
> That's an implementation detail *left to the assembler and
> linker*.  It's not something the compiler needs to know, and
> teoretically it could even change.
>

The ELF extension is SHF_GNU_RETAIN.  .retain directive is a hack
which I strongly objected and showed that it wasn't needed to implement
SHF_GNU_RETAIN in binutils.


-- 
H.J.


Re: [PATCH v5] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-11-04 Thread Joseph Myers
On Wed, 4 Nov 2020, Richard Biener wrote:

> AFAICS you do nothing to marshall with the actually used libc
> implementation which AFAIU can choose arbitrary values for
> the FE_* macros.  I'm not sure we require the compiler to be
> configured for one specific C library and for example require
> matching FE_* macro definitions for all uses of the built
> compiler.

The compiler is definitely expected to match a given C library.  This 
applies for  and other typedefs, for example (various of which 
are used for printf format checking).  It also applies to FE_* in some 
cases where relevant for __atomic_feraiseexcept for floating-point atomic 
compound assignment.

> Now, I wonder whether _GCC_ should provide the FE_* macros, thus
> move (parts of) fenv.h to GCC like we do for stdint.h?

I think that would be a bad idea.  fenv.h involves library functionality 
that can sometimes need to do things beyond simply modifying hardware 
registers.  Consider e.g. the TLS exception and rounding mode state for 
soft-float powerpc-linux-gnu.  Or the TLS decimal rounding mode in libdfp.  
Or how exception enabling can involve a prctl call on powerpc.  Getting 
libgcc involved in storing such TLS state seems problematic.  And whether 
an FE_* macro should be defined may depend on whether the library supports 
the underlying functionality (consider the case of FE_TONEARESTFROMZERO 
for RISC-V, where defining it should mean library code actually works in 
that rounding mode, not just that hardware supports it).

The natural way to handle the rule in C2x that "The strictly conforming 
programs that shall be accepted by a conforming freestanding 
implementation that defines __STDC_IEC_60559_BFP__ or 
__STDC_IEC_60559_DFP__ may also use features in the contents of the 
standard headers  and  and the numeric conversion 
functions (7.22.1) of the standard header ." would be to say 
that GCC provides the compiler pieces of a freestanding implementation, 
not necessarily the whole freestanding implementation.  (Those macros 
would only be defined via implicit preinclusion of stdc-predef.h anyway.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v5] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-11-04 Thread Joseph Myers
On Wed, 4 Nov 2020, Raoni Fassina Firmino via Gcc-patches wrote:

> IMHO, It seems like it is not necessary if there not a libc that have
> different values for the FE_* macros. I didn't check other archs, but if
> is the case for some other arch I think it could be changed if and when
> some other arch implements expands for these builtins.

SPARC is the case I know of where the FE_* values vary depending on target 
libc (see the SPARC_LOW_FE_EXCEPT_VALUES target macro).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Hans-Peter Nilsson
On Wed, 4 Nov 2020, H.J. Lu wrote:
> On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  wrote:
> >
> > On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > > I personally do not see the problem with the .retain attribute, however
> > > if it is going to be a barrier to getting the functionality committed, I
> > > am happy to change it, since I really just want the functionality in
> > > upstream sources.
> > >
> > > If a global maintainer would comment on whether any of the proposed
> > > approaches are acceptable, then I will try to block out time from other
> > > deadlines so I can work on the fixups and submit a patch in time for the
> > > GCC 11 freeze.
> > >
> > > Thanks,
> > > Jozef
> >
> > I'm not much more than a random voice, but an assembly directive
> > that specifies the symbol (IIUC your .retain directive) to
>
> But .retain directive DOES NOT adjust symbol attribute.  Instead, it sets
> the SHF_GNU_RETAIN bit on the section which contains the symbol
> definition.  The same section can have many unrelated symbols.

That's an implementation detail *left to the assembler and
linker*.  It's not something the compiler needs to know, and
teoretically it could even change.

> > adjust a symbol attribute sounds cleaner to me, than requiring
> > gcc to know that this requires it to adjust what it knows about
> > section flags (again, IIUC).
> >
> > brgds, H-P
>
>
>
> --
> H.J.
>


Re: [PATCH] PowerPC: PR libgcc/97543, build libgcc with -mno-gnu-attribute

2020-11-04 Thread Segher Boessenkool
Hi!

On Tue, Nov 03, 2020 at 10:25:05PM -0500, Michael Meissner wrote:
> On Sat, Oct 31, 2020 at 11:39:23PM +1030, Alan Modra wrote:
> > Why is this is wrong?  If you are configuring using
> > --without-long-double-128 then that doesn't mean 128-bit long doubles
> > are unsupported, it just selects the default to be 64-bit long double.
> > A compiler built using --without-long-double-128 can generate code
> > for 128-bit long double by simply using -mlong-double-128.  In which
> > case you need the libgcc support for 128-bit long doubles.  Well, I
> > suppose you are passing -mlong-double-128 for those objects that need
> > it, but I can't see any harm in passing -mlong-double-128 everywhere
> > in libgcc.

Yeah.

> It was more just a case for explicitly adding -mabi=ibmlongdouble for the
> modules that need it, and letting the default option be used for the modules
> that don't use IBM long doubles.
> 
> Because we only have set of GNU attributes for an entire shared library, it is
> important that all 3 cases for long double not set the attributes.
> 
> I first took off the -mno-gnu-attributes options, and built libgcc with 3
> different compilers (long double = 64 bits, long double = IEEE 128-bits, and
> long double = IBM 128-bit).  The modules in the patch were the ones that set
> gnu attribute #4 to 5 (i.e. use IBM 128-bit floating point).
> 
> I then looked in the libgcc built with the configure option:
>   --without-long-double-128 
> 
> and a bunch of modules showed setting gnu attribute #4 to 9 (i.e. use of long
> double as double).
> 
> The reason is within GCC if you compile with -mlong-double-64 (or build with a
> compiler configured with --without-long-double-128), every time you use double
> or _Complex double, it will set gnu attribute #4 to 5.  It is due to this code
> in rs6000_emit_move in rs6000.c:
> 
> #ifdef HAVE_AS_GNU_ATTRIBUTE
>   /* If we use a long double type, set the flags in .gnu_attribute that say
>  what the long double type is.  This is to allow the linker's warning
>  message for the wrong long double to be useful, even if the function does
>  not do a call (for example, doing a 128-bit add on power9 if the long
>  double type is IEEE 128-bit.  Do not set this if __ibm128 or __floa128 
> are
>  used if they aren't the default long dobule type.  */
>   if (rs6000_gnu_attr && (HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT))
> {
>   if (TARGET_LONG_DOUBLE_128 && (mode == TFmode || mode == TCmode))
>   rs6000_passes_float = rs6000_passes_long_double = true;
> 
>   else if (!TARGET_LONG_DOUBLE_128 && (mode == DFmode || mode == DCmode))
>   rs6000_passes_float = rs6000_passes_long_double = true;
> }
> #endif

Yeah that is just bad.  Fix that?

Modes are not types.

> The code in rs6000-call.c (init_cumulative_args and
> rs6000_function_arg_advance_1) actually does look at the types and only sets
> rs6000_passes_long_double if the type is the long double type.
> 
> So I figured it was better just to build libgcc where we explicitly enable the
> IBM floating point and just turn off gnu attributes globally in the library.

It is much better to just fix the bug.  Please try that, it is a real
bug that needs real fixing no matter what, so might as well do it now
(in a separate patch) and not have to work around it for the one place
you know it to hurt now.

> The code in rs6000_emit_move is perhaps well intentioned, but by looking at
> just modes and not types, it can miss things.  However, that is a giant can of
> worms, and we can play whack-a-mole trying to get everything right.  Perhaps
> somebody (else) can tackle it.  But I would prefer to just fix the issue at
> hand, rather than possibly having an endless battle to get things right.

It does very much the wrong thing now.  It needs fixing.  You need it
fixed for this patch, to do this patch properly.  Please just fix it.

> As I recall, what the code in rs6000_emit_move is trying to do is to catch
> places where you use long double, but no call is made.  I.e.
> 
>   struct foo {
> /* ... */
> long double ld;
> /* ... */
>   };
> 
>   void bar (struct foo *p)
>   {
> /* ... */
> (p->ld)++;
> /* ... */
>   }
> 
> On a power9/10 system with IEEE long double or on a system with a 64-bit long
> double, that code would not generate a call, but it does depend on the long
> double format (and hence should set the gnu attribute #4).

Yes, but setting it for any DFmode (with 64-bit long double) is just
wrong.


Segher


Re: [PATCH] libstdc++: Implement C++20 features for

2020-11-04 Thread Thomas Rodgers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97719

> On Nov 4, 2020, at 11:54 AM, Stephan Bergmann  wrote:
> 
> On 07/10/2020 18:55, Thomas Rodgers wrote:
>> From: Thomas Rodgers 
>> New ctors and ::view() accessor for -
>>   * basic_stingbuf
>>   * basic_istringstream
>>   * basic_ostringstream
>>   * basic_stringstreamm
>> New ::get_allocator() accessor for basic_stringbuf.
> I found that this 
> 
>  "libstdc++: Implement C++20 features for " changed the behavior of
> 
>> $ cat test.cc
>> #include 
>> #include 
>> #include 
>> int main() {
>>  std::stringstream s("a");
>>  std::istreambuf_iterator i(s);
>>  if (i != std::istreambuf_iterator()) std::cout << *i << '\n';
>> }
>> $ g++ -std=c++20 test.cc
>> $ ./a.out
> 
> from printing "a" to printing nothing.  (The `i != ...` comparison appears to 
> change i from pointing at "a" to pointing to null, and returns false.)
> 
> I ran into this when building LibreOffice, and I hope test.cc is a faithfully 
> minimized reproducer.  However, I know little about std::istreambuf_iterator, 
> so it may well be that the code isn't even valid.
> 



[PATCH, rs6000] Update instruction attributes for Power10

2020-11-04 Thread Pat Haugen via Gcc-patches
Update instruction attributes for Power10.


This patch updates the type/prefixed/dot/size attributes for various new 
instructions (and a couple existing that were incorrect) in preparation for the 
Power10 scheduling patch that will be following.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-04  Pat Haugen  

gcc/
* config/rs6000/altivec.md (vsdb_, xxspltiw_v4si,
xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst,
xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst,
vstrir_code_, vstrir_p_code_, vstril_code_,
vstril_p_code_, altivec_lvsl_reg, altivec_lvsl_direct,
altivec_lvsr_reg, altivec_lvsr_direct, xxeval, vcfuged, vclzdm,
vctzdm, vpdepd, vpextd, vgnb, vclrlb, vclrrb): Update instruction
attributes for Power10.
* config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_,
*dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Likewise.
* config/rs6000/mma.md (*movpoi, mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for
PREFIXED_YES.
* config/rs6000/rs6000.md (define_attr "size"): Add 256.
(define_attr "prefixed"): Add 'always'.
(define_mode_attr bits): Add DD/TD modes.
(cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg,
bswapdi2_brd, setbc_signed_,
*setbcr_signed_, *setnbc_signed_,
*setnbcr_signed_): Update instruction attributes for
Power10.
* config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
store_conditionalpti): Update instruction attributes for Power10.
* config/rs6000/vsx.md (*xvtlsbb_internal, xxgenpcvm__internal,
vextractl_internal, vextractr_internal,
vinsertvl_internal_, vinsertvr_internal_,
vinsertgl_internal_, vinsertgr_internal_,
vreplace_elt__inst): Likewise.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..76191ba4107 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -819,7 +819,7 @@ (define_insn "vsdb_"
  VSHIFT_DBL_LR))]
   "TARGET_POWER10"
   "vsdbi %0,%1,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")])
 
 (define_insn "xxspltiw_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -827,7 +827,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -846,7 +847,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -865,7 +867,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -894,7 +897,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -922,7 +926,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -932,7 +937,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -976,7 +982,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -998,7 +1005,7 @@ (define_insn "vstrir_code_"
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   

Re: [PATCH] arm: Implement vceqq_p64, vceqz_p64 and vceqzq_p64 intrinsics

2020-11-04 Thread Christophe Lyon via Gcc-patches
ping?
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556299.html

On Fri, 23 Oct 2020 at 19:20, Christophe Lyon
 wrote:
>
> ping?
>
> On Fri, 16 Oct 2020 at 10:41, Christophe Lyon
>  wrote:
> >
> > On Thu, 15 Oct 2020 at 20:10, Andrea Corallo  wrote:
> > >
> > > Hi Christophe,
> > >
> > > I've spotted two very minors.
> > >
> > > Christophe Lyon via Gcc-patches  writes:
> > >
> > > [...]
> > >
> > > > +/* For vceqq_p64, we rely on vceq_p64 for each of the two elements.  */
> > > > +__extension__ extern __inline uint64x2_t
> > > > +__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > > > +vceqq_p64 (poly64x2_t __a, poly64x2_t __b)
> > > > +{
> > > > +  poly64_t __high_a = vget_high_p64 (__a);
> > > > +  poly64_t __high_b = vget_high_p64 (__b);
> > > > +  uint64x1_t __high = vceq_p64(__high_a, __high_b);
> > > ^^^
> > >space
> >
> > Thanks for catching this, I'll fix it before committing if the rest is 
> > approved.
> >
> > Christophe
> >
> > > > +
> > > > +  poly64_t __low_a = vget_low_p64 (__a);
> > > > +  poly64_t __low_b = vget_low_p64 (__b);
> > > > +  uint64x1_t __low = vceq_p64(__low_a, __low_b);
> > >
> > > Same
> > >
> > > > +  return vcombine_u64 (__low, __high);
> > > > +}
> > > > +
> > > > +__extension__ extern __inline uint64x2_t
> > > > +__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > > > +vceqzq_p64 (poly64x2_t __a)
> > > > +{
> > > > +  poly64x2_t __b = vreinterpretq_p64_u32 (vdupq_n_u32 (0));
> > > > +  return vceqq_p64 (__a, __b);
> > > > +}
> > >
> > > Thanks
> > >
> > >   Andrea


Re: [Patch][i386][PR97715]: Fix a bug when adding -fzero-call-used-regs=all with -mno-80387

2020-11-04 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 4, 2020 at 9:16 PM Qing Zhao  wrote:
>
> As we discussed in the bug report, we should not zero stack registers when 
> there is no x87 registers available.
>
> The following is the fix per Jakub’s suggestion.
>
> And I have tested it on X86.
>
> Okay for commit?
>
> thanks.
>
> Qing
>
> From 0080f104df2dc752969a1949981ba343f276e802 Mon Sep 17 00:00:00 2001
> From: qing zhao 
> Date: Wed, 4 Nov 2020 20:46:15 +0100
> Subject: [PATCH] i386: Fix PR97715
>
> This change fixes a bug in the i386 backend when adding
> -fzero-call-used-regs=all on a target that has no x87
> registers.
>
> When there is no x87 registers available, we should not
> zero stack registers.
>
> gcc/Changelog:
>
> PR target/97715
> * config/i386/i386.c (zero_all_st_registers): Return
> earlier when the FPU is disabled.
>
> gcc/testsuite/ChnageLog:
>
> PR target/97715
> * gcc.target/i386/zero-scratch-regs-32.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c   |  5 +
>  gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c | 11 +++
>  2 files changed, 16 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 6fc6228a26e..789ef727cf8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3640,6 +3640,11 @@ zero_all_vector_registers (HARD_REG_SET 
> need_zeroed_hardregs)
>  static bool
>  zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>  {
> +
> +  /* If the FPU is disabled, no need to zero all st registers.  */
> +  if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
> +return false;
> +
>unsigned int num_of_st = 0;
>for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>  if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c 
> b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
> new file mode 100644
> index 000..ca3261fe5ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -mno-80387" } */
> +
> +int
> +foo (int x)
> +{
> +  return (x + 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "fldz" } } */
> +
> --
> 2.11.0
>


[Patch][i386][PR97715]: Fix a bug when adding -fzero-call-used-regs=all with -mno-80387

2020-11-04 Thread Qing Zhao via Gcc-patches
As we discussed in the bug report, we should not zero stack registers when 
there is no x87 registers available. 

The following is the fix per Jakub’s suggestion. 

And I have tested it on X86.

Okay for commit?

thanks.

Qing

From 0080f104df2dc752969a1949981ba343f276e802 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Wed, 4 Nov 2020 20:46:15 +0100
Subject: [PATCH] i386: Fix PR97715

This change fixes a bug in the i386 backend when adding
-fzero-call-used-regs=all on a target that has no x87
registers.

When there is no x87 registers available, we should not
zero stack registers.

gcc/Changelog:

PR target/97715
* config/i386/i386.c (zero_all_st_registers): Return
earlier when the FPU is disabled.

gcc/testsuite/ChnageLog:

PR target/97715
* gcc.target/i386/zero-scratch-regs-32.c: New test.
---
 gcc/config/i386/i386.c   |  5 +
 gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c | 11 +++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6fc6228a26e..789ef727cf8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3640,6 +3640,11 @@ zero_all_vector_registers (HARD_REG_SET 
need_zeroed_hardregs)
 static bool
 zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
 {
+
+  /* If the FPU is disabled, no need to zero all st registers.  */
+  if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
+return false;
+
   unsigned int num_of_st = 0;
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c 
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
new file mode 100644
index 000..ca3261fe5ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -mno-80387" } */
+
+int
+foo (int x)
+{
+  return (x + 1);
+}
+
+/* { dg-final { scan-assembler-not "fldz" } } */
+
-- 
2.11.0



Re: [PATCH] [PING] Asan changes for RISC-V.

2020-11-04 Thread Jim Wilson
On Wed, Oct 28, 2020 at 4:59 PM Jim Wilson  wrote:

> We have only riscv64 asan support, there is no riscv32 support as yet.  So
> I
> need to be able to conditionally enable asan support for the riscv
> target.  I
> implemented this by returning zero from the asan_shadow_offset function.
> This
> requires a change to toplev.c and docs in target.def.
>
> The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
> The problem is that the asan high memory region is a small wedge below
> 0x40.  The new kernel puts shared libraries at 0x3f and
> going
> down which works.  But the old kernel puts shared libraries at 0x20
> and going up which does not work, as it isn't in any recognized memory
> region.  This might be fixable with more asan work, but we don't really
> need
> support for old kernel versions.
>
> The asan port is curious in that it uses 1<<29 for the shadow offset, but
> all
> other 64-bit targets use a number larger than 1<<32.  But what we have is
> working OK for now.
>
> I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running
> on
> qemu and the results look reasonable.
>
> === gcc Summary ===
>
> # of expected passes1905
> # of unexpected failures11
> # of unsupported tests  224
>
> === g++ Summary ===
>
> # of expected passes2002
> # of unexpected failures6
> # of unresolved testcases   1
> # of unsupported tests  175
>
> OK?
>
> Jim
>
> 2020-10-28  Jim Wilson  
>
> gcc/
> * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
> (TARGET_ASAN_SHADOW_OFFSET): New.
> * doc/tm.texi: Regenerated.
> * target.def (asan_shadow_offset); Mention that it can return zero.
> * toplev.c (process_options): Check for and handle zero return from
> targetm.asan_shadow_offset call.
>
> Co-Authored-By: cooper.joshua 
> ---
>  gcc/config/riscv/riscv.c | 16 
>  gcc/doc/tm.texi  |  3 ++-
>  gcc/target.def   |  3 ++-
>  gcc/toplev.c |  3 ++-
>  4 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 989a9f15250..6909e200de1 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
>return true;
>  }
>
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  /* We only have libsanitizer support for RV64 at present.
> +
> + This number must match kRiscv*_ShadowOffset* in the file
> + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
> + even though 1<<36 makes more sense.  */
> +  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 24c37f655c8..39c596b647a 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12078,7 +12078,8 @@ is zero, which disables this optimization.
>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT}
> TARGET_ASAN_SHADOW_OFFSET (void)
>  Return the offset bitwise ored into shifted address to get corresponding
>  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
> -supported by the target.
> +supported by the target.  May return 0 if Address Sanitizer is not
> supported
> +by a subtarget.
>  @end deftypefn
>
>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK
> (unsigned HOST_WIDE_INT @var{val})
> diff --git a/gcc/target.def b/gcc/target.def
> index ed2da154e30..268b56b6ebd 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4452,7 +4452,8 @@ DEFHOOK
>  (asan_shadow_offset,
>   "Return the offset bitwise ored into shifted address to get
> corresponding\n\
>  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is
> not\n\
> -supported by the target.",
> +supported by the target.  May return 0 if Address Sanitizer is not
> supported\n\
> +by a subtarget.",
>   unsigned HOST_WIDE_INT, (void),
>   NULL)
>
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 20e231f4d2a..cf89598252c 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1834,7 +1834,8 @@ process_options (void)
>  }
>
>if ((flag_sanitize & SANITIZE_USER_ADDRESS)
> -  && targetm.asan_shadow_offset == NULL)
> +  && ((targetm.asan_shadow_offset == NULL)
> + || (targetm.asan_shadow_offset () == 0)))

Go patch committed: Turn off -fipa-icf-functions

2020-11-04 Thread Ian Lance Taylor via Gcc-patches
Go code expects to be able to do a reliable backtrace and get correct
file/line information of callers.  This is broken by
-fipa-icf-functions, so this Go frontend patch disables that option by
default.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

* go-lang.c (go_langhook_post_options): Disable
-fipa-icf-functions if it was not explicitly enabled.
diff --git a/gcc/go/go-lang.c b/gcc/go/go-lang.c
index 2cfb41042bd..08c1f38a2c1 100644
--- a/gcc/go/go-lang.c
+++ b/gcc/go/go-lang.c
@@ -306,6 +306,12 @@ go_langhook_post_options (const char **pfilename 
ATTRIBUTE_UNUSED)
   SET_OPTION_IF_UNSET (_options, _options_set,
   flag_partial_inlining, 0);
 
+  /* Go programs expect runtime.Callers to give the right answers,
+ which means that we can't combine functions even if they look the
+ same.  */
+  SET_OPTION_IF_UNSET (_options, _options_set,
+  flag_ipa_icf_functions, 0);
+
   /* If the debug info level is still 1, as set in init_options, make
  sure that some debugging type is selected.  */
   if (global_options.x_debug_info_level == DINFO_LEVEL_TERSE


Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Qing Zhao via Gcc-patches



> On Nov 4, 2020, at 1:00 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
>> Tobias Burnus  writes:
>>> Three of the testcases fail on PowerPC: 
>>> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>>>   powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
>>> unimplemented: '-fzero-call-used_regs' not supported on this target
>>> 
>>> Did you miss some dg-require-effective-target ?
>> 
>> No, these are a signal to target maintainers that they need
>> to decide whether to add support or accept the status quo
>> (in which case a new effective-target will be needed).  See:
>> https://urldefense.com/v3/__https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html__;!!GqivPVa7Brio!PD1t9rpXf7lNS8yVbiQckiR5w3bv1eqGZenzRGPMBTAlYpshdQ9qVR0JLhoeNFMg$
>>  :
>> 
>>The new tests are likely to fail on some targets with the sorry()
>>message, but I think target maintainers are best placed to decide
>>whether (a) that's a fundamental restriction of the target and the
>>tests should just be skipped or (b) the target needs to implement
>>the new hook.
> 
> But why are tests in gcc.target/i386/ run for other targets at all?!

No,  tests in gcc.target/i386 should not run for PowerPC.

What Tobias Burnus mentioned are the following tests:

powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++2a (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++98 (test for excess errors)


They are under c-c++-common, not gcc.target/i386. 

These testing cases are added intentionaly on all platforms in order to check 
whether  the current middle-end default implementation for
-fzero-call-used-regs works on the specific platform.

If the default implementation doesn’t work for the specific platform, for 
example, on PowerPC, it’s better for the Maintainer of PowerPC to decide
Whether to skip these testing case on this platform or add a PowerPC 
implementation.

Qing
> 
> 
> Segher



Re: [PATCH] libstdc++: Implement C++20 features for

2020-11-04 Thread Stephan Bergmann via Gcc-patches

On 07/10/2020 18:55, Thomas Rodgers wrote:

From: Thomas Rodgers 

New ctors and ::view() accessor for -
   * basic_stingbuf
   * basic_istringstream
   * basic_ostringstream
   * basic_stringstreamm

New ::get_allocator() accessor for basic_stringbuf.
I found that this 
 
"libstdc++: Implement C++20 features for " changed the behavior of



$ cat test.cc
#include 
#include 
#include 
int main() {
  std::stringstream s("a");
  std::istreambuf_iterator i(s);
  if (i != std::istreambuf_iterator()) std::cout << *i << '\n';
}

$ g++ -std=c++20 test.cc
$ ./a.out


from printing "a" to printing nothing.  (The `i != ...` comparison 
appears to change i from pointing at "a" to pointing to null, and 
returns false.)


I ran into this when building LibreOffice, and I hope test.cc is a 
faithfully minimized reproducer.  However, I know little about 
std::istreambuf_iterator, so it may well be that the code isn't even valid.




Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers



> On Nov 4, 2020, at 10:50 AM, Jonathan Wakely  wrote:
> 
> On 04/11/20 09:29 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers 
>> 
>> Adds 
>> 
>> libstdc++/ChangeLog:
>> 
>>  * include/Makefile.am (std_headers): Add new header.
>>  * include/Makefile.in: Regenerate.
>>  * include/std/barrier: New file.
>>  * testsuite/30_thread/barrier/1.cc: New test.
>>  * testsuite/30_thread/barrier/2.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive.cc: Likewise.
>>  * testsuite/30_thread/barrier/completion.cc: Likewise.
>>  * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/std/barrier  | 248 ++
>> .../testsuite/30_threads/barrier/1.cc |  27 ++
>> .../testsuite/30_threads/barrier/2.cc |  27 ++
>> .../testsuite/30_threads/barrier/arrive.cc|  51 
>> .../30_threads/barrier/arrive_and_drop.cc |  49 
>> .../30_threads/barrier/arrive_and_wait.cc |  51 
>> .../30_threads/barrier/completion.cc  |  54 
>> .../testsuite/30_threads/barrier/max.cc   |  44 
>> 8 files changed, 551 insertions(+)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>> 
>> diff --git a/libstdc++-v3/include/std/barrier 
>> b/libstdc++-v3/include/std/barrier
>> new file mode 100644
>> index 000..80e6d668cf5
>> --- /dev/null
>> +++ b/libstdc++-v3/include/std/barrier
>> @@ -0,0 +1,248 @@
>> +//  -*- C++ -*-
>> +
>> +// Copyright (C) 2020 Free Software Foundation, Inc.
>> +//
>> +// This file is part of the GNU ISO C++ Library.  This library is free
>> +// software; you can redistribute it and/or modify it under the
>> +// terms of the GNU General Public License as published by the
>> +// Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +
>> +// This library is distributed in the hope that it will be useful,
>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +// GNU General Public License for more details.
>> +
>> +// You should have received a copy of the GNU General Public License along
>> +// with this library; see the file COPYING3.  If not see
>> +// .
>> +
>> +// This implementation is based on libcxx/include/barrier
>> +//===-- barrier.h --===//
>> +//
>> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
>> Exceptions.
>> +// See https://llvm.org/LICENSE.txt for license information.
>> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
>> +//
>> +//===---===//
>> +
>> +#ifndef _GLIBCXX_BARRIER
>> +#define _GLIBCXX_BARRIER 1
>> +
>> +#pragma GCC system_header
>> +
>> +#if __cplusplus > 201703L
>> +#define __cpp_lib_barrier 201907L
> 
> This feature test macro will be defined unconditionally, even if
> _GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
> for gthreads.
> 
> You're also missing an edit to  (which should depend on the
> same conditions).
> 
> 
>> +#include 
>> +
>> +#if defined(_GLIBCXX_HAS_GTHREADS)
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +namespace std _GLIBCXX_VISIBILITY(default)
>> +{
>> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +
>> +  struct __empty_completion
>> +  {
>> +_GLIBCXX_ALWAYS_INLINE void
>> +operator()() noexcept
>> +{ }
>> +  };
>> +
>> +/*
>> +
>> +The default implementation of __barrier_base is a classic tree barrier.
>> +
>> +It looks different from literature pseudocode for two main reasons:
>> + 1. Threads that call into std::barrier functions do not provide indices,
>> +so a numbering step is added before the actual barrier algorithm,
>> +appearing as an N+1 round to the N rounds of the tree barrier.
>> + 2. A great deal of attention has been paid to avoid cache line thrashing
>> +by flattening the tree structure into cache-line sized arrays, that
>> +are indexed in an efficient way.
>> +
>> +*/
>> +
>> +  using __barrier_phase_t = uint8_t;
> 
> Please add  or  since you're using uint8_t
> (it's currently included by  but that could
> change).
> 
> Would it work to use a scoped enumeration type here 

[10/32] config

2020-11-04 Thread Nathan Sidwell

I managed to flub sending this yesterday.

This is the gcc/configure.ac changes (rebuild configure and config.h.in 
after applying).  Generally just checking for network-related 
functionality.  If it's not available, those features of the module 
mapper will be unavailable.


nathan
--
Nathan Sidwell
diff --git c/gcc/configure.ac w/gcc/configure.ac
index 73034bb902b..168a3bc3625 100644
--- c/gcc/configure.ac
+++ w/gcc/configure.ac
@@ -1417,8 +1419,8 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked feof_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
-	gettimeofday mbstowcs wcswidth mmap setlocale \
-	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2)
+	gettimeofday mbstowcs wcswidth mmap memrchr posix_fallocate setlocale \
+	gcc_UNLOCKED_FUNCS madvise mallinfo execv mallinfo2 fstatat)
 
 if test x$ac_cv_func_mbstowcs = xyes; then
   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
@@ -1440,6 +1442,10 @@ fi
 
 AC_CHECK_TYPE(ssize_t, int)
 AC_CHECK_TYPE(caddr_t, char *)
+AC_CHECK_TYPE(sighander_t,
+  AC_DEFINE(HAVE_SIGHANDLER_T, 1,
+[Define if  defines sighandler_t]),
+,signal.h)
 
 GCC_AC_FUNC_MMAP_BLACKLIST
 
@@ -1585,6 +1591,146 @@ if test $ac_cv_f_setlkw = yes; then
   [Define if F_SETLKW supported by fcntl.])
 fi
 
+# Check if O_CLOEXEC is defined by fcntl
+AC_CACHE_CHECK(for O_CLOEXEC, ac_cv_o_cloexec, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]], [[
+return open ("/dev/null", O_RDONLY | O_CLOEXEC);]])],
+[ac_cv_o_cloexec=yes],[ac_cv_o_cloexec=no])])
+if test $ac_cv_o_cloexec = yes; then
+  AC_DEFINE(HOST_HAS_O_CLOEXEC, 1,
+  [Define if O_CLOEXEC supported by fcntl.])
+fi
+
+# C++ Modules would like some networking features to provide the mapping
+# server.  You can still use modules without them though.
+# The following network-related checks could probably do with some
+# Windows and other non-linux defenses and checking.
+
+# Local socket connectivity wants AF_UNIX networking
+# Check for AF_UNIX networking
+AC_CACHE_CHECK(for AF_UNIX, ac_cv_af_unix, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_un un;
+un.sun_family = AF_UNSPEC;
+int fd = socket (AF_UNIX, SOCK_STREAM, 0);
+connect (fd, (sockaddr *), sizeof (un));]])],
+[ac_cv_af_unix=yes],
+[ac_cv_af_unix=no])])
+if test $ac_cv_af_unix = yes; then
+  AC_DEFINE(HAVE_AF_UNIX, 1,
+  [Define if AF_UNIX supported.])
+fi
+
+# Remote socket connectivity wants AF_INET6 networking
+# Check for AF_INET6 networking
+AC_CACHE_CHECK(for AF_INET6, ac_cv_af_inet6, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_in6 in6;
+in6.sin6_family = AF_UNSPEC;
+struct addrinfo *addrs = 0;
+struct addrinfo hints;
+hints.ai_flags = 0;
+hints.ai_family = AF_INET6;
+hints.ai_socktype = SOCK_STREAM;
+hints.ai_protocol = 0;
+hints.ai_canonname = 0;
+hints.ai_addr = 0;
+hints.ai_next = 0;
+int e = getaddrinfo ("localhost", 0, , );
+const char *str = gai_strerror (e);
+freeaddrinfo (addrs);
+int fd = socket (AF_INET6, SOCK_STREAM, 0);
+connect (fd, (sockaddr *), sizeof (in6));]])],
+[ac_cv_af_inet6=yes],
+[ac_cv_af_inet6=no])])
+if test $ac_cv_af_inet6 = yes; then
+  AC_DEFINE(HAVE_AF_INET6, 1,
+  [Define if AF_INET6 supported.])
+fi
+
+# Efficient server response wants epoll
+# Check for epoll_create, epoll_ctl, epoll_pwait
+AC_CACHE_CHECK(for epoll, ac_cv_epoll, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+int fd = epoll_create (1);
+epoll_event ev;
+ev.events = EPOLLIN;
+ev.data.fd = 0;
+epoll_ctl (fd, EPOLL_CTL_ADD, 0, );
+epoll_pwait (fd, 0, 0, -1, 0);]])],
+[ac_cv_epoll=yes],
+[ac_cv_epoll=no])])
+if test $ac_cv_epoll = yes; then
+  AC_DEFINE(HAVE_EPOLL, 1,
+  [Define if epoll_create, epoll_ctl, epoll_pwait provided.])
+fi
+
+# If we can't use epoll, try pselect.
+# Check for pselect
+AC_CACHE_CHECK(for pselect, ac_cv_pselect, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+pselect (0, 0, 0, 0, 0, 0);]])],
+[ac_cv_pselect=yes],
+[ac_cv_pselect=no])])
+if test $ac_cv_pselect = yes; then
+  AC_DEFINE(HAVE_PSELECT, 1,
+  [Define if pselect provided.])
+fi
+
+# And failing that, use good old select.
+# If we can't even use this, the server is serialized.
+# Check for select
+AC_CACHE_CHECK(for select, ac_cv_select, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+select (0, 0, 0, 0, 0);]])],
+[ac_cv_select=yes],
+[ac_cv_select=no])])
+if test $ac_cv_select = yes; then
+  AC_DEFINE(HAVE_SELECT, 1,
+  [Define if select provided.])
+fi
+
+# Avoid some fnctl calls by using accept4, when available.
+# Check for accept4
+AC_CACHE_CHECK(for accept4, ac_cv_accept4, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]],[[
+int err = accept4 (1, 0, 0, SOCK_NONBLOCK);]])],
+[ac_cv_accept4=yes],
+[ac_cv_accept4=no])])
+if test $ac_cv_accept4 = yes; then
+  AC_DEFINE(HAVE_ACCEPT4, 1,
+  [Define if accept4 provided.])

[PATCH] c++: Use two levels of caching in satisfy_atom

2020-11-04 Thread Patrick Palka via Gcc-patches
[ This patch depends on

  c++: Reuse identical ATOMIC_CONSTRs during normalization

  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557929.html  ]

This improves the effectiveness of caching in satisfy_atom by querying
the cache again after we've instantiated the atom's parameter mapping.

Before instantiating its mapping, the identity of an (atom,args) pair
within the satisfaction cache is determined by idiosyncratic things such
as the level and index of each template parameter used in targets of the
parameter mapping.  For example, the associated constraints of foo in

  template  concept range = range_v;
  template  void foo () requires range && range;

are range_v (with mapping T -> U) /\ range_v (with mapping T -> V).
If during satisfaction the template arguments supplied for U and V are
the same, then the satisfaction value of these two atoms will be the
same (despite their uninstantiated parameter mappings being different).

But sat_cache doesn't see this because it compares the uninstantiated
parameter mapping and the supplied template arguments of sat_entry's
independently.  So satisy_atom currently will end up fully evaluating
the latter atom instead of reusing the satisfaction value of the former.

But there is a point when the two atoms do look the same to sat_cache,
and that's after instantiating their parameter mappings.  By querying
the cache again at this point, we're at least able to avoid substituting
the instantiated mapping into the second atom's expression.

With this patch, compile time and memory usage for the cmcstl2 test
test/algorithm/set_symmetric_diference4.cpp drops from 11s/1.4GB to
8.5s/1.2GB with an --enable-checking=release compiler.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* cp-tree.h (ATOMIC_CONSTR_MAP_INSTANTIATED_P): Define this flag
for ATOMIC_CONSTRs.
* constraint.cc (sat_hasher::hash): Use hash_atomic_constraint
if the flag is set, otherwise keep using a pointer hash.
(sat_hasher::equal): Return false if the flag's setting differs
on two atoms.  Call atomic_constraints_identical_p if the flag
is set, otherwise keep using a pointer equality test.
(satisfy_atom): After instantiating the parameter mapping, form
another ATOMIC_CONSTR using the instantiated mapping and query
the cache again.  Cache the satisfaction value of both atoms.
(diagnose_atomic_constraint): Simplify now that the supplied
atom has an instantiated mapping.
---
 gcc/cp/constraint.cc | 47 +++-
 gcc/cp/cp-tree.h |  6 ++
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 55dba362ca5..c612bfba13b 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2315,12 +2315,32 @@ struct sat_hasher : ggc_ptr_hash
 {
   static hashval_t hash (sat_entry *e)
   {
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e->constr))
+  {
+   gcc_assert (!e->args);
+   return hash_atomic_constraint (e->constr);
+  }
+
 hashval_t value = htab_hash_pointer (e->constr);
 return iterative_hash_template_arg (e->args, value);
   }
 
   static bool equal (sat_entry *e1, sat_entry *e2)
   {
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e1->constr)
+   != ATOMIC_CONSTR_MAP_INSTANTIATED_P (e2->constr))
+  return false;
+
+if (ATOMIC_CONSTR_MAP_INSTANTIATED_P (e1->constr))
+  {
+   /* Atoms with instantiated mappings are built in satisfy_atom.  */
+   gcc_assert (!e1->args && !e2->args);
+   return atomic_constraints_identical_p (e1->constr, e2->constr);
+  }
+
+/* Atoms with uninstantiated mappings are built in normalize_atom.
+   Their identity is determined by their pointer value due to
+   the caching of ATOMIC_CONSTRs performed therein.  */
 if (e1->constr != e2->constr)
   return false;
 return template_args_equal (e1->args, e2->args);
@@ -2614,6 +2634,18 @@ satisfy_atom (tree t, tree args, subst_info info)
   return cache.save (boolean_false_node);
 }
 
+  /* Now build a new atom using the instantiated mapping.  We use
+ this atom as a second key to the satisfaction cache, and we
+ also pass it to diagnose_atomic_constraint so that diagnostics
+ which refer to the atom display the instantiated mapping.  */
+  t = copy_node (t);
+  ATOMIC_CONSTR_MAP (t) = map;
+  gcc_assert (!ATOMIC_CONSTR_MAP_INSTANTIATED_P (t));
+  ATOMIC_CONSTR_MAP_INSTANTIATED_P (t) = true;
+  satisfaction_cache inst_cache (t, /*args=*/NULL_TREE, info.complain);
+  if (tree r = inst_cache.get ())
+return cache.save (r);
+
   /* Rebuild the argument vector from the parameter mapping.  */
   args = get_mapped_args (map);
 
@@ -2626,19 +2658,19 @@ satisfy_atom (tree t, tree args, subst_info info)
 is not satisfied. Replay the substitution.  */
   if (info.noisy ())

Re: [patch] Add dg-require-effective-target fpic to an aarch64 specific test in gcc.dg

2020-11-04 Thread Richard Sandiford via Gcc-patches
Olivier Hainque  writes:
> Hello,
>
> This patch adds dg-require-effective-target fpic
> to an aarch64 specific gcc.dg test using -fPIC,
> which helps circumvent a failure we observed while
> testing the aarch64 port for VxWorks.
>
> ok to commit ?

OK, thanks.  Also OK for any other current or future aarch64 test that
has -fpic or -fPIC in the options and forgets to do this.

Richard


Re: [ping] aarch64: move and adjust PROBE_STACK_*_REG

2020-11-04 Thread Richard Sandiford via Gcc-patches
Olivier Hainque  writes:
> Ping, please ?
>
> Patch re-attached for convenience.

Looks OK to me, and I assume Richard would have spoken up by now if
he didn't think the patch did what he wanted.

> +;; The pair of scratch registers used for stack probing with 
> -fstack-check.
> +;; Leave R9 alone as a possible choice for the static chain.
> +(PROBE_STACK_FIRST_REGNUM  10)
> +(PROBE_STACK_SECOND_REGNUM 11)
>  ;; Scratch register used by stack clash protection to calculate
>  ;; SVE CFA offsets during probing.
>  (STACK_CLASH_SVE_CFA_REGNUM 11)

It's a bit concerning that the second register now overlaps
STACK_CLASH_SVE_CFA_REGNUM, but I agree that isn't a problem
in practice, since the two uses are currently mutually-exclusive.
I think it might be worth having a comment about that,  So maybe add:

;; Note that the use of these registers is mutually exclusive with the use
;; of STACK_CLASH_SVE_CFA_REGNUM, which is for -fstack-clash-protection
;; rather than -fstack-check.

to the new comment above.

OK with that change, thanks.  Sorry for the long delay in the review.

Richard


Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Segher Boessenkool
On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
> Tobias Burnus  writes:
> > Three of the testcases fail on PowerPC: 
> > gcc.target/i386/zero-scratch-regs-{9,10,11}.c
> >powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> > unimplemented: '-fzero-call-used_regs' not supported on this target
> >
> > Did you miss some dg-require-effective-target ?
> 
> No, these are a signal to target maintainers that they need
> to decide whether to add support or accept the status quo
> (in which case a new effective-target will be needed).  See:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html:
> 
> The new tests are likely to fail on some targets with the sorry()
> message, but I think target maintainers are best placed to decide
> whether (a) that's a fundamental restriction of the target and the
> tests should just be skipped or (b) the target needs to implement
> the new hook.

But why are tests in gcc.target/i386/ run for other targets at all?!


Segher


Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers



> On Nov 4, 2020, at 10:52 AM, Jonathan Wakely  wrote:
> 
> On 04/11/20 10:41 -0800, Thomas Rodgers wrote:
>> From: Thomas Rodgers 
>> 
>> IGNORE the previous version of this patch please.
> 
> OK, but all my comments seem to apply to this one too.
> 

Sure :)

>> Adds 
>> 
>> libstdc++/ChangeLog:
>> 
>>  * include/Makefile.am (std_headers): Add new header.
>>  * include/Makefile.in: Regenerate.
>>  * include/std/barrier: New file.
>>  * testsuite/30_thread/barrier/1.cc: New test.
>>  * testsuite/30_thread/barrier/2.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
>>  * testsuite/30_thread/barrier/arrive.cc: Likewise.
>>  * testsuite/30_thread/barrier/completion.cc: Likewise.
>>  * testsuite/30_thread/barrier/max.cc: Likewise.
>> ---
>> libstdc++-v3/include/Makefile.am  |   1 +
>> libstdc++-v3/include/Makefile.in  |   1 +
>> libstdc++-v3/include/bits/atomic_base.h   |  11 +-
>> libstdc++-v3/include/std/barrier  | 248 ++
>> libstdc++-v3/include/std/version  |   1 +
>> .../testsuite/30_threads/barrier/1.cc |  27 ++
>> .../testsuite/30_threads/barrier/2.cc |  27 ++
>> .../testsuite/30_threads/barrier/arrive.cc|  51 
>> .../30_threads/barrier/arrive_and_drop.cc |  49 
>> .../30_threads/barrier/arrive_and_wait.cc |  51 
>> .../30_threads/barrier/completion.cc  |  54 
>> .../testsuite/30_threads/barrier/max.cc   |  44 
>> 12 files changed, 562 insertions(+), 3 deletions(-)
>> create mode 100644 libstdc++-v3/include/std/barrier
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
>> create mode 100644 
>> libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
>> create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc
>> 
>> diff --git a/libstdc++-v3/include/Makefile.am 
>> b/libstdc++-v3/include/Makefile.am
>> index 382e94322c1..9e497835ee0 100644
>> --- a/libstdc++-v3/include/Makefile.am
>> +++ b/libstdc++-v3/include/Makefile.am
>> @@ -30,6 +30,7 @@ std_headers = \
>>  ${std_srcdir}/any \
>>  ${std_srcdir}/array \
>>  ${std_srcdir}/atomic \
>> +${std_srcdir}/barrier \
>>  ${std_srcdir}/bit \
>>  ${std_srcdir}/bitset \
>>  ${std_srcdir}/charconv \
>> diff --git a/libstdc++-v3/include/bits/atomic_base.h 
>> b/libstdc++-v3/include/bits/atomic_base.h
>> index dd4db926592..1ad34719d3e 100644
>> --- a/libstdc++-v3/include/bits/atomic_base.h
>> +++ b/libstdc++-v3/include/bits/atomic_base.h
>> @@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>  }
>> 
>> #if __cplusplus > 201703L
>> +  template
>> +_GLIBCXX_ALWAYS_INLINE void
>> +_M_wait(__int_type __old, const _Func& __fn) const noexcept
>> +{ std::__atomic_wait(&_M_i, __old, __fn); }
>> +
>>  _GLIBCXX_ALWAYS_INLINE void
>>  wait(__int_type __old,
>>memory_order __m = memory_order_seq_cst) const noexcept
>>  {
>> -std::__atomic_wait(&_M_i, __old,
>> -   [__m, this, __old]
>> -   { return this->load(__m) != __old; });
>> +_M_wait(__old,
>> +[__m, this, __old]
>> +{ return this->load(__m) != __old; });
>>  }
> 
> This looks like it's not meant to be part of this patch.
> 
> It also looks wrong for any patch, because it adds _M_wait as a public
> member.
> 
> Not sure what this piece is for :-)
> 

It is used at include/std/barrier:197 to keep the implementation as close as 
possible to the libc++ version upon which it is based.


> 



Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Jonathan Wakely via Gcc-patches

On 04/11/20 10:41 -0800, Thomas Rodgers wrote:

From: Thomas Rodgers 

IGNORE the previous version of this patch please.


OK, but all my comments seem to apply to this one too.


Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
libstdc++-v3/include/Makefile.am  |   1 +
libstdc++-v3/include/Makefile.in  |   1 +
libstdc++-v3/include/bits/atomic_base.h   |  11 +-
libstdc++-v3/include/std/barrier  | 248 ++
libstdc++-v3/include/std/version  |   1 +
.../testsuite/30_threads/barrier/1.cc |  27 ++
.../testsuite/30_threads/barrier/2.cc |  27 ++
.../testsuite/30_threads/barrier/arrive.cc|  51 
.../30_threads/barrier/arrive_and_drop.cc |  49 
.../30_threads/barrier/arrive_and_wait.cc |  51 
.../30_threads/barrier/completion.cc  |  54 
.../testsuite/30_threads/barrier/max.cc   |  44 
12 files changed, 562 insertions(+), 3 deletions(-)
create mode 100644 libstdc++-v3/include/std/barrier
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 382e94322c1..9e497835ee0 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -30,6 +30,7 @@ std_headers = \
${std_srcdir}/any \
${std_srcdir}/array \
${std_srcdir}/atomic \
+   ${std_srcdir}/barrier \
${std_srcdir}/bit \
${std_srcdir}/bitset \
${std_srcdir}/charconv \
diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index dd4db926592..1ad34719d3e 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }

#if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, const _Func& __fn) const noexcept
+   { std::__atomic_wait(&_M_i, __old, __fn); }
+
  _GLIBCXX_ALWAYS_INLINE void
  wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
  {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old,
+   [__m, this, __old]
+   { return this->load(__m) != __old; });
  }


This looks like it's not meant to be part of this patch.

It also looks wrong for any patch, because it adds _M_wait as a public
member.

Not sure what this piece is for :-)




Re: [PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Jonathan Wakely via Gcc-patches

On 04/11/20 09:29 -0800, Thomas Rodgers wrote:

From: Thomas Rodgers 

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
libstdc++-v3/include/std/barrier  | 248 ++
.../testsuite/30_threads/barrier/1.cc |  27 ++
.../testsuite/30_threads/barrier/2.cc |  27 ++
.../testsuite/30_threads/barrier/arrive.cc|  51 
.../30_threads/barrier/arrive_and_drop.cc |  49 
.../30_threads/barrier/arrive_and_wait.cc |  51 
.../30_threads/barrier/completion.cc  |  54 
.../testsuite/30_threads/barrier/max.cc   |  44 
8 files changed, 551 insertions(+)
create mode 100644 libstdc++-v3/include/std/barrier
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..80e6d668cf5
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L


This feature test macro will be defined unconditionally, even if
_GLIBCXX_HAS_GTHREADS is not defined. It should be inside the check
for gthreads.

You're also missing an edit to  (which should depend on the
same conditions).



+#include 
+
+#if defined(_GLIBCXX_HAS_GTHREADS)
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __empty_completion
+  {
+_GLIBCXX_ALWAYS_INLINE void
+operator()() noexcept
+{ }
+  };
+
+/*
+
+The default implementation of __barrier_base is a classic tree barrier.
+
+It looks different from literature pseudocode for two main reasons:
+ 1. Threads that call into std::barrier functions do not provide indices,
+so a numbering step is added before the actual barrier algorithm,
+appearing as an N+1 round to the N rounds of the tree barrier.
+ 2. A great deal of attention has been paid to avoid cache line thrashing
+by flattening the tree structure into cache-line sized arrays, that
+are indexed in an efficient way.
+
+*/
+
+  using __barrier_phase_t = uint8_t;


Please add  or  since you're using uint8_t
(it's currently included by  but that could
change).

Would it work to use a scoped enumeration type here instead? That
would prevent people accidentally doing arithmetic on it, or passing
it to functions taking an integer (and prevent it promoting to int in
arithmetic).

e.g. define it similar to std::byte:

enum class __barrier_phase_t : unsigned char { };

and then cast to an integer on the way in and the way out, so that the
implementation works with its numeric value, but users have a
non-arithmetic type that 

[ping*n] aarch64: move and adjust PROBE_STACK_*_REG

2020-11-04 Thread Olivier Hainque
Hello,

Another ping for this as a new end of stage1 approaches,
please ?

While this may ring the bell of a more involved issue
with ABIs and the use of R18, this particular change doesn't
have that kind of implication.

Thanks a lot in advance!

With Kind Regards,

Olivier

> On 26 Oct 2020, at 12:08, Olivier Hainque  wrote:
> 
>> On 15 Oct 2020, at 08:38, Olivier Hainque  wrote:
>> 
>>> On 24 Sep 2020, at 11:46, Olivier Hainque  wrote:
>>> 
>>> Re-proposing this patch after re-testing with a recent
>>> mainline on on aarch64-linux (bootstrap and regression test
>>> with --enable-languages=all), and more than a year of in-house
>>> use in production for a few aarch64 ports on a gcc-9 base.
>>> 
>>> The change moves the definitions of PROBE_STACK_FIRST_REG
>>> and PROBE_STACK_SECOND_REG to a more appropriate place for such
>>> items (here, in aarch64.md as suggested by Richard), and adjusts
>>> their value from r9/r10 to r10/r11 to free r9 for a possibly
>>> more general purpose (e.g. as a static chain at least on targets
>>> which have a private use of r18, such as Windows or Vxworks).
>>> 
>>> OK to commit?
>>> 
>>> Thanks in advance,
>>> 
>>> With Kind Regards,
>>> 
>>> Olivier
>>> 
>>> 2020-11-07  Olivier Hainque  
>>> 
>>> * config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
>>> and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
>>> Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
>>> aarch64.c.
>>> * config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
>>> (PROBE_STACK_SECOND_REG): Remove.
>>> (aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
>>> suffix update for PROBE_STACK register numbers.
>> 
>> 
> 



Re: [PATCH][AArch64] Use intrinsics for upper saturating shift right

2020-11-04 Thread Richard Sandiford via Gcc-patches
Thanks for the patch, looks good.

David Candler  writes:
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 4f33dd936c7..f93f4e29c89 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -254,6 +254,10 @@ aarch64_types_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define TYPES_GETREG (aarch64_types_binop_imm_qualifiers)
>  #define TYPES_SHIFTIMM (aarch64_types_binop_imm_qualifiers)
>  static enum aarch64_type_qualifiers
> +aarch64_types_ternop_s_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate};
> +#define TYPES_SHIFT2IMM (aarch64_types_ternop_s_imm_qualifiers)
> +static enum aarch64_type_qualifiers
>  aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_none, qualifier_immediate };
>  #define TYPES_SHIFTIMM_USS (aarch64_types_shift_to_unsigned_qualifiers)
> @@ -265,14 +269,16 @@ static enum aarch64_type_qualifiers
>  aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
>  #define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers)
> +#define TYPES_USHIFT2IMM (aarch64_types_ternopu_imm_qualifiers)
> +static enum aarch64_type_qualifiers
> +aarch64_types_shift2_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_none, 
> qualifier_immediate };
> +#define TYPES_SHIFT2IMM_UUSS (aarch64_types_shift2_to_unsigned_qualifiers)
>  
>  static enum aarch64_type_qualifiers
>  aarch64_types_ternop_s_imm_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_none, qualifier_poly, qualifier_immediate};
>  #define TYPES_SETREGP (aarch64_types_ternop_s_imm_p_qualifiers)
> -static enum aarch64_type_qualifiers
> -aarch64_types_ternop_s_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> -  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate};
>  #define TYPES_SETREG (aarch64_types_ternop_s_imm_qualifiers)
>  #define TYPES_SHIFTINSERT (aarch64_types_ternop_s_imm_qualifiers)
>  #define TYPES_SHIFTACC (aarch64_types_ternop_s_imm_qualifiers)

Very minor, but I think it would be better to keep
aarch64_types_ternop_s_imm_qualifiers where it is and define
TYPES_SHIFT2IMM here rather than above.  For better or worse,
the current style seems to be to keep the defines next to the
associated arrays, rather than group them based on the TYPES_* name.

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index d1b21102b2f..0b82b9c072b 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -285,6 +285,13 @@
>BUILTIN_VSQN_HSDI (USHIFTIMM, uqshrn_n, 0, ALL)
>BUILTIN_VSQN_HSDI (SHIFTIMM, sqrshrn_n, 0, ALL)
>BUILTIN_VSQN_HSDI (USHIFTIMM, uqrshrn_n, 0, ALL)
> +  /* Implemented by aarch64_qshrn2_n.  */
> +  BUILTIN_VQN (SHIFT2IMM_UUSS, sqshrun2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM_UUSS, sqrshrun2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM, sqshrn2_n, 0, ALL)
> +  BUILTIN_VQN (USHIFT2IMM, uqshrn2_n, 0, ALL)
> +  BUILTIN_VQN (SHIFT2IMM, sqrshrn2_n, 0, ALL)
> +  BUILTIN_VQN (USHIFT2IMM, uqrshrn2_n, 0, ALL)

Using ALL is a holdover from the time (until a few weeks ago) when we
didn't record function attributes.  New intrinsics should therefore
have something more specific than ALL.

We discussed offline whether the Q flag side effect of the intrinsics
should be observable or not, and the conclusion was that it shouldn't.
I think we can therefore treat these functions as pure functions,
meaning that they should have flags NONE rather than ALL.

For that reason, I think we should also remove the Set_Neon_Cumulative_Sat
and CHECK_CUMULATIVE_SAT parts of the test (sorry).

Other than that, the patch looks good to go.

Thanks,
Richard


[PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers
From: Thomas Rodgers 

IGNORE the previous version of this patch please.

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/atomic_base.h   |  11 +-
 libstdc++-v3/include/std/barrier  | 248 ++
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/30_threads/barrier/1.cc |  27 ++
 .../testsuite/30_threads/barrier/2.cc |  27 ++
 .../testsuite/30_threads/barrier/arrive.cc|  51 
 .../30_threads/barrier/arrive_and_drop.cc |  49 
 .../30_threads/barrier/arrive_and_wait.cc |  51 
 .../30_threads/barrier/completion.cc  |  54 
 .../testsuite/30_threads/barrier/max.cc   |  44 
 12 files changed, 562 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/include/std/barrier
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 382e94322c1..9e497835ee0 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -30,6 +30,7 @@ std_headers = \
${std_srcdir}/any \
${std_srcdir}/array \
${std_srcdir}/atomic \
+   ${std_srcdir}/barrier \
${std_srcdir}/bit \
${std_srcdir}/bitset \
${std_srcdir}/charconv \
diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index dd4db926592..1ad34719d3e 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -603,13 +603,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 #if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, const _Func& __fn) const noexcept
+   { std::__atomic_wait(&_M_i, __old, __fn); }
+
   _GLIBCXX_ALWAYS_INLINE void
   wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
   {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old,
+   [__m, this, __old]
+   { return this->load(__m) != __old; });
   }
 
   // TODO add const volatile overload
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..50654b00a0c
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L
+
+#include 
+
+#if 

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Wed, Nov 4, 2020 at 10:09 AM Hans-Peter Nilsson  wrote:
>
> On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> > I personally do not see the problem with the .retain attribute, however
> > if it is going to be a barrier to getting the functionality committed, I
> > am happy to change it, since I really just want the functionality in
> > upstream sources.
> >
> > If a global maintainer would comment on whether any of the proposed
> > approaches are acceptable, then I will try to block out time from other
> > deadlines so I can work on the fixups and submit a patch in time for the
> > GCC 11 freeze.
> >
> > Thanks,
> > Jozef
>
> I'm not much more than a random voice, but an assembly directive
> that specifies the symbol (IIUC your .retain directive) to

But .retain directive DOES NOT adjust symbol attribute.  Instead, it sets
the SHF_GNU_RETAIN bit on the section which contains the symbol
definition.  The same section can have many unrelated symbols.

> adjust a symbol attribute sounds cleaner to me, than requiring
> gcc to know that this requires it to adjust what it knows about
> section flags (again, IIUC).
>
> brgds, H-P



-- 
H.J.


[PATCH] Add Ranger temporal cache

2020-11-04 Thread Andrew MacLeod via Gcc-patches
PR 97515 highlighted a bit of silliness that results when we calculate a 
bunch of ranges by traversing a back edge, and set some values.  Then we 
eventually visit that block during the DOM walk, and discover the value 
can be improved, sometimes dramatically.  It is already cached, so 
unfortunately we don't visit it again...


The situation is described in comment 4 : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97515#c4


I have created a temporal cache for the ranger that basically adds a 
timestamp to the global cache.


The timestamp maintains the time a global range was calculated 
(monotonically increasing based on "set") and  a list of up to 2 
directly dependent ssa-names that whenever we access the global value, 
their timestamp is checked to see if they are newer.  Any time the 
global value for a dependent is newer, then the current global value is 
considered stale and the ranger will recalculate using the newer values 
of the dependent.


whats that mean?

Using the PR testcase,  a back edge calculation for a PHI requires a 
range for ui_8:

  ui_8 = ~xe_3;
and at this time, we only know that xe_3 is <=0 based on the branch 
feeding the statement..  This ui_8 is calculated as [-1, +INF] globally 
and stored.

As the caluclting comtniues, we actually discover that xe_3 has to be -1.

When the EVRp dom walk eventually gets to this statement, we know xe_3 
evaluates to [-1, -1] and we fodl this statement to

  ui_8 = -1
Unfortunately, the global cache still have [-1, +INF] for ui_8 and has 
no way to know to reevalaute.


With the temporal cache operating, when we figure out that xe_3 
evaluates to [-1,-1], xe_3 gets a timestamp that is newer than that of ui_8.
when range_of_stmt is called on ui_8 now, it fails the "current" check, 
and the ranger proceeds to recalculate ui_8 using the new value of x_3, 
and we get the proper result of [-1, -1] store for the global value.

With this patch, this testcase now comes out of EVRP  looking like:

e7 (int gg)
{
  int ui;
  int xe;
  _Bool _1;
  int _2;

   :

   :
  _1 = 0;
  _2 = (int) _1;
  goto ; [INV]

}

Time wise its pretty good.  It basically consumes the time I saved with 
the previous cache tweaks, and the overall time now is almost exactly 
the same as it was before.    So we're still pretty zippy.


This  integrates with the previous cache changes so that when this 
global value for ui_8 is updated, any changes are automatically 
propagated into the global cache as well.


I have also updated the testcase to ensure that it now produces the 
above code with a single goto.



This bootstraps on x86_64-pc-linux-gnu, no regressions, and pushed.

Andrew

PS.  Before the next stage 1, I intend to use the preexisting dependency 
chains in the GORI engine instead of this one-or-two name timestamp entry.
Currentlt the  drawback is that only dependent values are checked, so 
intervening calculations will not trigger a recalculation.   If we use 
the GORI dependency chains, then everything in the dependency chain will 
be recognized as stale, and we'll get even more cases. Combined with 
improvements planned for how dependency chain ranges are calculated by 
GORI, we could get even more interesting results.
commit e86fd6a17cdb26710d1f13c9a47a3878c76028f9
Author: Andrew MacLeod 
Date:   Wed Nov 4 12:59:15 2020 -0500

Add Ranger temporal cache

Add a timestamp to supplement the global range cache to detect when a value
may become stale.

gcc/
PR tree-optimization/97515
* gimple-range-cache.h (class ranger_cache): New prototypes plus
temporal cache pointer.
* gimple-range-cache.cc (struct range_timestamp): New.
(class temporal_cache): New.
(temporal_cache::temporal_cache): New.
(temporal_cache::~temporal_cache): New.
(temporal_cache::get_timestamp): New.
(temporal_cache::set_dependency): New.
(temporal_cache::temporal_value): New.
(temporal_cache::current_p): New.
(temporal_cache::set_timestamp): New.
(temporal_cache::set_always_current): New.
(ranger_cache::ranger_cache): Allocate the temporal cache.
(ranger_cache::~ranger_cache): Free temporal cache.
(ranger_cache::get_non_stale_global_range): New.
(ranger_cache::set_global_range): Add a timestamp.
(ranger_cache::register_dependency): New.  Add timestamp dependency.
* gimple-range.cc (gimple_ranger::range_of_range_op): Add operand
dependencies.
(gimple_ranger::range_of_phi): Ditto.
(gimple_ranger::range_of_stmt): Check if global range is stale, and
recalculate if so.
gcc/testsuite/
* gcc.dg/pr97515.c: Check listing for folding of entire function.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index cca9025abba..b01563c83f9 100644
--- 

Re: [PATCH,rs6000] Add patterns for combine to support p10 fusion

2020-11-04 Thread Aaron Sawdey via Gcc-patches
Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> This patch adds the first couple patterns to support p10 fusion. These
> will allow combine to create a single insn for a pair of instructions
> that that power10 can fuse and execute. These particular ones have the
> requirement that only cr0 can be used when fusing a load with a compare
> immediate of -1/0/1, so we want combine to put that requirement in, and
> if it doesn't work out later the splitter can get used.
> 
> This also adds option -mpower10-fusion which defaults on for power10 and
> will gate all these fusion patterns. In addition I have added an
> undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
> that just controls the load+compare-immediate patterns. I have make
> these default on for power10 but they are not disallowed for earlier
> processors because it is still valid code. This allows us to test the
> correctness of fusion code generation by turning it on explicitly.
> 
> The intention is to work through more patterns of this style to support
> the rest of the power10 fusion pairs.
> 
> Bootstrap and regtest looks good on ppc64le power9 with these patterns
> enabled in stage2/stage3 and for regtest. Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/predicates.md: Add const_me_to_1_operand.
>   * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
>   OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER.
>   * config/rs6000/rs6000-protos.h (address_ok_for_form): Add
>   prototype.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
>   if target is power10.  (rs600_opt_masks): Allow -mpower10-fusion
>   in function attributes.  (address_ok_for_form): New function.
>   * config/rs6000/rs6000.h: Add MASK_P10_FUSION.
>   * config/rs6000/rs6000.md (*ld_cmpi_cr0): New
>   define_insn_and_split.
>   (*lwa_cmpdi_cr0): New define_insn_and_split.
>   (*lwa_cmpwi_cr0): New define_insn_and_split.
>   * config/rs6000/rs6000.opt: Add -mpower10-fusion
>   and -mpower10-fusion-ld-cmpi.
> ---
> gcc/config/rs6000/predicates.md   |  5 +++
> gcc/config/rs6000/rs6000-cpus.def |  6 ++-
> gcc/config/rs6000/rs6000-protos.h |  2 +
> gcc/config/rs6000/rs6000.c| 34 
> gcc/config/rs6000/rs6000.h|  1 +
> gcc/config/rs6000/rs6000.md   | 68 +++
> gcc/config/rs6000/rs6000.opt  |  8 
> 7 files changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 4c2fe7fa312..b75c1ddfb69 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand"
>   (and (match_code "const_int")
>(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
> 
> +;; Match op = -1, op = 0, or op = 1.
> +(define_predicate "const_m1_to_1_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
> +
> ;; Match op = 0..3.
> (define_predicate "const_0_to_3_operand"
>   (and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 8d2c1ffd6cf..3e65289d8df 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -82,7 +82,9 @@
> 
> #define ISA_3_1_MASKS_SERVER  (ISA_3_0_MASKS_SERVER   \
>| OPTION_MASK_POWER10  \
> -  | OTHER_POWER10_MASKS)
> +  | OTHER_POWER10_MASKS  \
> +  | OPTION_MASK_P10_FUSION   \
> +  | OPTION_MASK_P10_FUSION_LD_CMPI)
> 
> /* Flags that need to be turned off if -mno-power9-vector.  */
> #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\
> @@ -129,6 +131,8 @@
>| OPTION_MASK_FLOAT128_KEYWORD \
>| OPTION_MASK_FPRND\
>| OPTION_MASK_POWER10  \
> +  | OPTION_MASK_P10_FUSION   \
> +  | OPTION_MASK_P10_FUSION_LD_CMPI   \
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
>| OPTION_MASK_MFCRF\
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 25fa5dd57cd..d8a344245e6 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -190,6 +190,8 @@ enum 

Re: [PATCH, rs6000] Optimize pcrel access of globals (updated, ping)

2020-11-04 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

Ping, as it has been a while.
This also includes a slight fix to make sure that all references can get
optimized.

This patch implements a RTL pass that looks for pc-relative loads of the
address of an external variable using the PCREL_GOT relocation and a
single load or store that uses that external address.

Produced by a cast of thousands:
 * Michael Meissner
 * Peter Bergner
 * Bill Schmidt
 * Alan Modra
 * Segher Boessenkool
 * Aaron Sawdey

Passes bootstrap/regtest on ppc64le power10. OK for trunk?

gcc/ChangeLog:

* config.gcc: Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md: Add d_form_memory predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT.
* config/rs6000/rs6000-passes.def: Add pass_pcrel_opt.
* config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(),
offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(),
and make_pass_pcrel_opt().
* config/rs6000/rs6000.c (reg_to_non_prefixed): Make global.
(rs6000_option_override_internal): Add pcrel-opt.
(rs6000_delegitimize_address): Support pcrel-opt.
(rs6000_opt_masks): Add pcrel-opt.
(offsettable_non_prefixed_memory): New function.
(reg_to_non_prefixed): Make global.
(rs6000_asm_output_opcode): Reset next_insn_prefixed_p.
(output_pcrel_opt_reloc): New function.
* config/rs6000/rs6000.md (loads_extern_addr): New attr.
(pcrel_extern_addr): Set loads_extern_addr.
Add include for pcrel-opt.md.
* config/rs6000/rs6000.opt: Add -mpcrel-opt.
* config/rs6000/t-rs6000: Add rules for pcrel-opt.c and
pcrel-opt.md.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pcrel-opt-inc-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-df.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-si.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: New test.
* gcc.target/powerpc/pcrel-opt-st-df.c: New test.
* gcc.target/powerpc/pcrel-opt-st-di.c: New test.
* gcc.target/powerpc/pcrel-opt-st-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-st-si.c: New test.
* gcc.target/powerpc/pcrel-opt-st-vector.c: New test.
---
 gcc/config.gcc|   6 +-
 gcc/config/rs6000/pcrel-opt.c | 888 ++
 gcc/config/rs6000/pcrel-opt.md| 386 
 gcc/config/rs6000/predicates.md   |  23 +
 gcc/config/rs6000/rs6000-cpus.def |   2 +
 gcc/config/rs6000/rs6000-passes.def   |   8 +
 gcc/config/rs6000/rs6000-protos.h |   4 +
 gcc/config/rs6000/rs6000.c| 116 ++-
 gcc/config/rs6000/rs6000.md   |   8 +-
 gcc/config/rs6000/rs6000.opt  |   4 +
 gcc/config/rs6000/t-rs6000|   7 +-
 .../gcc.target/powerpc/pcrel-opt-inc-di.c |  18 +
 .../gcc.target/powerpc/pcrel-opt-ld-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-ld-di.c  |  43 +
 .../gcc.target/powerpc/pcrel-opt-ld-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-sf.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-di.c  |  37 +
 .../gcc.target/powerpc/pcrel-opt-st-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-sf.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-st-vector.c  |  36 +
 26 files changed, 2013 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
 create mode 100644 

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Hans-Peter Nilsson
On Wed, 4 Nov 2020, Jozef Lawrynowicz wrote:
> I personally do not see the problem with the .retain attribute, however
> if it is going to be a barrier to getting the functionality committed, I
> am happy to change it, since I really just want the functionality in
> upstream sources.
>
> If a global maintainer would comment on whether any of the proposed
> approaches are acceptable, then I will try to block out time from other
> deadlines so I can work on the fixups and submit a patch in time for the
> GCC 11 freeze.
>
> Thanks,
> Jozef

I'm not much more than a random voice, but an assembly directive
that specifies the symbol (IIUC your .retain directive) to
adjust a symbol attribute sounds cleaner to me, than requiring
gcc to know that this requires it to adjust what it knows about
section flags (again, IIUC).

brgds, H-P


[PATCH] libstdc++: Add support for C++20 barriers

2020-11-04 Thread Thomas Rodgers
From: Thomas Rodgers 

Adds 

libstdc++/ChangeLog:

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
 libstdc++-v3/include/std/barrier  | 248 ++
 .../testsuite/30_threads/barrier/1.cc |  27 ++
 .../testsuite/30_threads/barrier/2.cc |  27 ++
 .../testsuite/30_threads/barrier/arrive.cc|  51 
 .../30_threads/barrier/arrive_and_drop.cc |  49 
 .../30_threads/barrier/arrive_and_wait.cc |  51 
 .../30_threads/barrier/completion.cc  |  54 
 .../testsuite/30_threads/barrier/max.cc   |  44 
 8 files changed, 551 insertions(+)
 create mode 100644 libstdc++-v3/include/std/barrier
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..80e6d668cf5
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,248 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L
+
+#include 
+
+#if defined(_GLIBCXX_HAS_GTHREADS)
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __empty_completion
+  {
+_GLIBCXX_ALWAYS_INLINE void
+operator()() noexcept
+{ }
+  };
+
+/*
+
+The default implementation of __barrier_base is a classic tree barrier.
+
+It looks different from literature pseudocode for two main reasons:
+ 1. Threads that call into std::barrier functions do not provide indices,
+so a numbering step is added before the actual barrier algorithm,
+appearing as an N+1 round to the N rounds of the tree barrier.
+ 2. A great deal of attention has been paid to avoid cache line thrashing
+by flattening the tree structure into cache-line sized arrays, that
+are indexed in an efficient way.
+
+*/
+
+  using __barrier_phase_t = uint8_t;
+
+  template
+class __barrier_base
+{
+  struct alignas(64) /* naturally-align the heap state */ __state_t
+  {
+   struct
+   {
+ __atomic_base<__barrier_phase_t> __phase = ATOMIC_VAR_INIT(0);
+   } __tickets[64];
+  };
+
+  ptrdiff_t _M_expected;
+  unique_ptr _M_state_allocation;
+  __state_t*   _M_state;
+  __atomic_base _M_expected_adjustment;
+  _CompletionF _M_completion;
+  __atomic_base<__barrier_phase_t> _M_phase;
+
+  static __gthread_t
+  _S_get_tid() noexcept
+  {
+#ifdef __GLIBC__
+   // For the GNU C library pthread_self() is usable without linking to
+   // libpthread.so but returns 0, so we cannot use it in single-threaded
+   // programs, because this_thread::get_id() != thread::id{} must be true.
+  

Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers

2020-11-04 Thread Andreas Krebbel via Gcc-patches
These tests all use the -mzvector option but do not appear to make use of the z 
vector languages
extensions. I think that option could be removed. Then these tests should be 
moved to the vector subdir.

You could do the asm scanning also in dg-do run tests.

Andreas


On 03.11.20 22:46, Ilya Leoshkevich wrote:
> gcc/testsuite/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-caller-abi-run.c: New test.
>   * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-copysign-run.c: New test.
>   * gcc.target/s390/zvector/long-double-copysign-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-fprx2-constant.c: New test.
>   * gcc.target/s390/zvector/long-double-from-double-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-double-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-float-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-float-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-double-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-double-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-float-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-float-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u16-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u32-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u64-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u8-run.c: New test.
>   * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-vec-duplicate.c: New test.
>   * gcc.target/s390/zvector/long-double-wf.h: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-wfaxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test.
>   * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test.
>   * gcc.target/s390/zvector/long-double-wfdxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfixb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-0111.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-1011.c: New test.
>   * gcc.target/s390/zvector/long-double-wfkxb-1101.c: New test.
>   * gcc.target/s390/zvector/long-double-wflcxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wflpxb.c: New test.
>   * gcc.target/s390/zvector/long-double-wfmaxb-2.c: New test.
> 

Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

2020-11-04 Thread Andreas Krebbel via Gcc-patches
On 03.11.20 22:45, Ilya Leoshkevich wrote:
> On z14+, there are instructions for working with 128-bit floats (long
> doubles) in vector registers.  It's beneficial to use them instead of
> instructions that operate on floating point register pairs, because it
> allows to store 4 times more data in registers at a time, relieveing
> register pressure.  The performance of new instructions is almost the
> same.
> 
> Implement by storing TFmode values in vector registers on z14+.  Since
> not all operations are available with the new instructions, keep the old
> ones using the new FPRX2 mode, and convert between it and TFmode when
> necessary (this is called "forwarder" expanders below).  Change the
> existing TFmode expanders to call either new- or old-style ones
> depending on whether we are on z14+ or older machines ("dispatcher"
> expanders).
> 
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390-modes.def (FPRX2): New mode.
>   * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
>   * config/s390/s390.c (s390_fma_allowed_p): Likewise.
>   (s390_build_signbit_mask): Support 128-bit masks.
>   (print_operand): Support printing the second word of a TFmode
>   operand as vector register.
>   (constant_modes): Add FPRX2mode.
>   (s390_class_max_nregs): Return 1 for TFmode on z14+.
>   (s390_is_fpr128): New function.
>   (s390_is_vr128): Likewise.
>   (s390_can_change_mode_class): Use s390_is_fpr128 and
>   s390_is_vr128 in order to determine whether mode refers to a FPR
>   pair or to a VR.
>   * config/s390/s390.h (EXPAND_MOVTF): New macro.
>   (EXPAND_TF): Likewise.
>   * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
>   alias.
>   (ALL): Add FPRX2.
>   (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (FP): Likewise.
>   (FP_ANYTF): New mode iterator.
>   (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
>   (TD_TF): Likewise.
>   (xde): Add FPRX2.
>   (nBFP): Likewise.
>   (nDFP): Likewise.
>   (DSF): Likewise.
>   (DFDI): Likewise.
>   (SFSI): Likewise.
>   (DF): Likewise.
>   (SF): Likewise.
>   (fT0): Likewise.
>   (bt): Likewise.
>   (_d): Likewise.
>   (HALF_TMODE): Likewise.
>   (tf_fpr): New mode_attr.
>   (type): New mode_attr.
>   (*cmp_ccz_0): Use type instead of mode with fsimp.
>   (*cmp_ccs_0_fastmath): Likewise.
>   (*cmptf_ccs): New pattern for wfcxb.
>   (*cmptf_ccsfps): New pattern for wfkxb.
>   (mov): Rename to mov.
>   (signbit2): Rename to signbit2.
>   (isinf2): Renamed to isinf2.
>   (*TDC_insn_): Use type instead of mode with fsimp.
>   (fixuns_trunc2): Rename to
>   fixuns_trunc2.
>   (fix_trunctf2): Rename to fix_trunctf2_fpr.
>   (floatdi2): Rename to floatdi2, use type
>   instead of mode with itof.
>   (floatsi2): Rename to floatsi2, use type
>   instead of mode with itof.
>   (*floatuns2): Use type instead of mode for
>   itof.
>   (floatuns2): Rename to
>   floatuns2.
>   (trunctf2): Rename to trunctf2_fpr, use type instead
>   of mode with fsimp.
>   (extend2): Rename to
>   extend2.
>   (2): Rename to
>   2, use type instead of
>   mode with fsimp.
>   (rint2): Rename to rint2, use
>   type instead of mode with fsimp.
>   (2): Use type instead of mode for
>   fsimp.
>   (rint2): Likewise.
>   (trunc2): Rename to
>   trunc2.
>   (trunc2): Rename to
>   trunc2.
>   (extend2): Rename to
>   extend2.
>   (extend2): Rename to
>   extend2.
>   (add3): Rename to add3, use type instead of
>   mode with fsimp.
>   (*add3_cc): Use type instead of mode with fsimp.
>   (*add3_cconly): Likewise.
>   (sub3): Rename to sub3, use type instead of
>   mode with fsimp.
>   (*sub3_cc): Use type instead of mode with fsimp.
>   (*sub3_cconly): Likewise.
>   (mul3): Rename to mul3, use type instead of
>   mode with fsimp.
>   (fma4): Restrict using s390_fma_allowed_p.
>   (fms4): Restrict using s390_fma_allowed_p.
>   (div3): Rename to div3, use type instead of
>   mode with fdiv.
>   (neg2): Rename to neg2.
>   (*neg2_cc): Use type instead of mode with fsimp.
>   (*neg2_cconly): Likewise.
>   (*neg2_nocc): Likewise.
>   (*neg2): Likeiwse.
>   (abs2): Rename to abs2, use type instead of
>   mode with fdiv.
>   (*abs2_cc): Use type instead of mode with fsimp.
>   (*abs2_cconly): Likewise.
>   (*abs2_nocc): Likewise.
>   (*abs2): Likewise.
>   (*negabs2_cc): Likewise.
>   (*negabs2_cconly): Likewise.
>   (*negabs2_nocc): Likewise.
>   (*negabs2): Likewise.
>   (sqrt2): Rename to sqrt2, use type instead
>   of mode with fsqrt.
>   (cbranch4): Use FP_ANYTF instead of FP.
>   (copysign3): Rename to copysign3, use 

Re: [PATCH 2/4] IBM Z: Unhardcode NR_C_MODES

2020-11-04 Thread Andreas Krebbel via Gcc-patches
On 03.11.20 22:45, Ilya Leoshkevich wrote:
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390.c (NR_C_MODES): Unhardcode.
>   (s390_alloc_pool): Use size_t for iterating from 0 to
>   NR_C_MODES.
>   (s390_add_constant): Likewise.
>   (s390_find_constant): Likewise.
>   (s390_dump_pool): Likewise.
>   (s390_free_pool): Likewise.

Ok. Thanks!

Andreas



Re: [PATCH 1/4] IBM Z: Remove unused RRe and RXe mode_attrs

2020-11-04 Thread Andreas Krebbel via Gcc-patches
On 03.11.20 22:36, Ilya Leoshkevich wrote:
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  
> 
>   * config/s390/s390.md (RRe): Remove.
>   (RXe): Remove.

Ok. Thanks!

Andreas


RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-04 Thread Carl Love via Gcc-patches
David:

I have reworked the patch moving the new vector instruction patterns to
vsx.md.  Also, cleaned up the vector division instructions.  The
div3 pattern definitions are the only ones that should be
defined.  

I have retested the patch on:

   powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

--

2020-11-02  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI,
VMULHS_V2DI, VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI):
Add builtin define.
(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_VDIVEU_V4SI,
P10V_BUILTIN_VDIVEU_V2DI, P10V_BUILTIN_VDIVU_V4SI,
P10V_BUILTIN_VDIVU_V2DI, P10V_BUILTIN_VMODU_V2DI,
P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case
statement for builtins.
* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
(UNSPEC_VDIVES, UNSPEC_VDIVEU,
UNSPEC_VMULHS, UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(vdives_, vdiveu_, vdiv3, uuvdiv3,
vmods_, vmodu_, vmulhs_, vmulhu_, mulv2di3):
Add define_insn, mode is VIlong.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  23 ++
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/vsx.md  | 205 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 378 ++
 7 files changed, 730 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..d8f1d2cfc55 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index a58102c3785..7663465b755 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, 
vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
+BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
+BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
+BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)

Re: [PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-11-04 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> Hi Richard,
>
> I'm sending up the revised patch 5 (introducing stack variable handling)
> without the other changes to other patches.
>
> I figure there's been quite a lot of changes to this patch and I wanted
> to give you time to review them while I worked on finishing the less
> widespread changes in patch 6 and before I ran the more exhaustive (and
> time-consuming) tests in case you didn't like the changes and those
> exhaustive tests would just have to get repeated.

Thanks, the new approach looks good to me.  Most of the comments below
are just minor.

> […]
> @@ -75,6 +89,26 @@ extern hash_set  *asan_used_labels;
>  
>  #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE   "use after scope memory"
>  
> +/* NOTE: The values below and the hooks under targetm.memtag define an ABI 
> and
> +   are hard-coded to these values in libhwasan, hence they can't be changed
> +   independently here.  */
> +/* How many bits are used to store a tag in a pointer.
> +   The default version uses the entire top byte of a pointer (i.e. 8 bits).  
> */
> +#define HWASAN_TAG_SIZE targetm.memtag.tag_size ()
> +/* Tag Granule of HWASAN shadow stack.
> +   This is the size in real memory that each byte in the shadow memory refers
> +   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow

s/it's/its/

> +   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
> +   Most variables will need to be aligned to this amount since two variables
> +   that are neighbours in memory and share a tag granule would need to share

s/neighbours/neighbors/

> +   the same tag (the shared tag granule can only store one tag).  */
> +#define HWASAN_TAG_GRANULE_SIZE targetm.memtag.granule_size ()
> +/* Define the tag for the stack background.
> +   This defines what tag the stack pointer will be and hence what tag all
> +   variables that are not given special tags are (e.g. spilled registers,
> +   and parameters passed on the stack).  */
> +#define HWASAN_STACK_BACKGROUND gen_int_mode (0, QImode)
> +
>  /* Various flags for Asan builtins.  */
>  enum asan_check_flags
>  {
> […]
> @@ -1352,6 +1393,28 @@ asan_redzone_buffer::flush_if_full (void)
>  flush_redzone_payload ();
>  }
>  
> +/* Returns whether we are tagging pointers and checking those tags on memory
> +   access.  */
> +bool
> +hwasan_sanitize_p ()
> +{
> +  return sanitize_flags_p (SANITIZE_HWADDRESS);
> +}
> +
> +/* Are we tagging the stack?  */
> +bool
> +hwasan_sanitize_stack_p ()
> +{
> +  return (hwasan_sanitize_p () && param_hwasan_instrument_stack);
> +}
> +
> +/* Are we protecting alloca objects?  */

Same comment as before about avoiding the word “protect”, both in the
comment and the option name.  Maybe s/protect/sanitize/ or s/protect/tag/.

> +bool
> +hwasan_sanitize_allocas_p (void)
> +{
> +  return (hwasan_sanitize_stack_p () && param_hwasan_protect_allocas);
> +}
> +
>  /* Insert code to protect stack vars.  The prologue sequence should be 
> emitted
> directly, epilogue sequence returned.  BASE is the register holding the
> stack base, against which OFFSETS array offsets are relative to, OFFSETS
> […]
> @@ -3702,4 +3772,330 @@ make_pass_asan_O0 (gcc::context *ctxt)
>return new pass_asan_O0 (ctxt);
>  }
>  
> +/* For stack tagging:
> +
> +   Return the offset from the frame base tag that the "next" expanded object
> +   should have.  */
> +uint8_t
> +hwasan_current_frame_tag ()
> +{
> +  return hwasan_frame_tag_offset;
> +}
> +
> +/* For stack tagging:
> +
> +   Return the 'base pointer' for this function.  If that base pointer has not
> +   yet been created then we create a register to hold it and initialise that
> +   value with a possibly random tag and the value of the
> +   virtual_stack_vars_rtx.  */

As discussed offline, I think the old approach of generating the
initialisation in hwasan_emit_prologue was safer, although I agree
there doesn't seem to be a specific problem with doing things this way.

> +rtx
> +hwasan_frame_base ()
> +{
> +  if (! hwasan_frame_base_ptr)
> +{
> +  hwasan_frame_base_ptr
> + = targetm.memtag.insert_random_tag (virtual_stack_vars_rtx);
> +}

Nit: should be no braces around single statements, even if they span
multiple lines.

> +
> +  return hwasan_frame_base_ptr;
> +}
> +
> +/* Record a compile-time constant size stack variable that HWASAN will need 
> to
> +   tag.  This record of the range of a stack variable will be used by
> +   `hwasan_emit_prologue` to emit the RTL at the start of each frame which 
> will
> +   set tags in the shadow memory according to the assigned tag for each 
> object.
> +
> +   The range that the object spans in stack space should be described by the
> +   bounds `untagged_base + nearest` and `untagged_base + farthest`.
> +   `tagged_base` is the base address which contains the "base frame tag" for
> +   this frame, and from which the value to address this object with will be
> +   calculated.
> +
> +   We record the 

[committed] libstdc++: Fix test failure with --disable-linux-futex

2020-11-04 Thread Jonathan Wakely via Gcc-patches
As noted in PR 96817 this new test fails if the library is built without
futexes. That's expected of course, but we might as well fail more
obviously than a deadlock that eventually times out.

libstdc++-v3/ChangeLog:

* testsuite/18_support/96817.cc: Fail fail if the library is
configured to not use futexes.

Tested powerpc64le-linux. Committed to trunk.

I've just realised the changelog above should say "Fail fast", I'll
fix that in the ChangeLog tomorrow.


commit 9c1125c121423a9948fa39e71ef89ba4059a2fad
Author: Jonathan Wakely 
Date:   Wed Nov 4 15:24:47 2020

libstdc++: Fix test failure with --disable-linux-futex

As noted in PR 96817 this new test fails if the library is built without
futexes. That's expected of course, but we might as well fail more
obviously than a deadlock that eventually times out.

libstdc++-v3/ChangeLog:

* testsuite/18_support/96817.cc: Fail fail if the library is
configured to not use futexes.

diff --git a/libstdc++-v3/testsuite/18_support/96817.cc 
b/libstdc++-v3/testsuite/18_support/96817.cc
index f03329678313..4591a7288a57 100644
--- a/libstdc++-v3/testsuite/18_support/96817.cc
+++ b/libstdc++-v3/testsuite/18_support/96817.cc
@@ -24,6 +24,10 @@
 #include 
 #include 
 
+#ifndef _GLIBCXX_HAVE_LINUX_FUTEX
+# error "This test requries futex support in the library"
+#endif
+
 int init()
 {
 #if __has_include()


RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Tamar Christina via Gcc-patches



> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 3:12 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: RE: [PATCH v2 10/18]middle-end simplify lane permutes which
> selects from loads from the same DR.
> 
> On Wed, 4 Nov 2020, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -Original Message-
> > > From: rguent...@c653.arch.suse.de  On
> > > Behalf Of Richard Biener
> > > Sent: Wednesday, November 4, 2020 1:36 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > > Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> > > selects from loads from the same DR.
> > >
> > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This change allows one to simplify lane permutes that select from
> > > > multiple load leafs that load from the same DR group by promoting
> > > > the VEC_PERM node into a load itself and pushing the lane permute
> > > > into it as a
> > > load permute.
> > > >
> > > > This saves us from having to calculate where to materialize a new load
> node.
> > > > If the resulting loads are now unused they are freed and are
> > > > removed from the graph.
> > > >
> > > > This allows us to handle cases where we would have generated:
> > > >
> > > > moviv4.4s, 0
> > > > adrpx3, .LC0
> > > > ldr q5, [x3, #:lo12:.LC0]
> > > > mov x3, 0
> > > > .p2align 3,,7
> > > > .L2:
> > > > mov v0.16b, v4.16b
> > > > mov v3.16b, v4.16b
> > > > ldr q1, [x1, x3]
> > > > ldr q2, [x0, x3]
> > > > fcmla   v0.4s, v2.4s, v1.4s, #0
> > > > fcmla   v3.4s, v1.4s, v2.4s, #0
> > > > fcmla   v0.4s, v2.4s, v1.4s, #270
> > > > fcmla   v3.4s, v1.4s, v2.4s, #270
> > > > mov v1.16b, v3.16b
> > > > tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > > > str q0, [x2, x3]
> > > > add x3, x3, 16
> > > > cmp x3, 1600
> > > > bne .L2
> > > > ret
> > > >
> > > > and instead generate
> > > >
> > > > mov x3, 0
> > > > .p2align 3,,7
> > > > .L27:
> > > > ldr q0, [x2, x3]
> > > > ldr q1, [x0, x3]
> > > > ldr q2, [x1, x3]
> > > > fcmla   v0.2d, v1.2d, v2.2d, #0
> > > > fcmla   v0.2d, v1.2d, v2.2d, #270
> > > > str q0, [x2, x3]
> > > > add x3, x3, 16
> > > > cmp x3, 512
> > > > bne .L27
> > > > ret
> > > >
> > > > This runs as a pre step such that permute simplification can still
> > > > inspect this permute is needed
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > Tests are included as part of the final patch as they need the SLP
> > > > pattern matcher to insert permutes in between.
> > > >
> > > > Ok for master?
> > >
> > > So I think this is too specialized for the general issue that we're
> > > doing a bad job in CSEing the load part of different permutes of the
> > > same group.  I've played with fixing this half a year ago (again) in
> > > multiple general ways but they all caused some regressions.
> > >
> > > So you're now adding some heuristics as to when to anticipate "CSE"
> > > (or merging with followup permutes).
> > >
> > > To quickly recap what I did consider two loads (V2DF) one { a[0],
> > > a[1] } and the other { a[1], a[0] }.  They currently are two SLP
> > > nodes and one with a load_permutation.
> > > My original attempts focused on trying to get rid of
> > > load_permutation in favor of lane_permute nodes and thus during SLP
> > > discovery I turned the second into { a[0], a[1] } (magically unified
> > > with the other load) and a followup lane-permute node.
> > >
> > > So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] }
> > > which eventually will (due to patterns) be lane-permuted into {
> > > a[0], a[1] }, right?  So generalizing this as a single { a[0], a[1]
> > > } plus two lane-permute nodes  { 0, 0 } and { 1, 1 } early would solve the
> issue as well?
> >
> > Correct, I did wonder why it was generating two different nodes
> > instead of a lane permute but didn't pay much attention that it was just a
> short coming.
> >
> > > Now, in general it might be
> > > more profitable to generate the { a[0], a[0] } and { a[1], a[1] }
> > > via scalar-load- and-splat rather than vector load and permute so we
> > > have to be careful to not over-optimize here or be prepared to do the
> reverse transform.
> >
> > This in principle can be done in optimize_slp then right? Since it
> > would do a lot of the same work already and find the materialization points.
> >
> > >
> > > The patch itself is a bit ugly since it modifies the SLP graph when
> > > we already produced the graphds graph so I would do any of this
> > > before.  I did consider 

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Richard Biener
On Wed, 4 Nov 2020, Tamar Christina wrote:

> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 2:04 PM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; nd ;
> > gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Wed, 4 Nov 2020, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: rguent...@c653.arch.suse.de  On
> > > > Behalf Of Richard Biener
> > > > Sent: Wednesday, November 4, 2020 12:41 PM
> > > > To: Tamar Christina 
> > > > Cc: Richard Sandiford ; nd ;
> > > > gcc-patches@gcc.gnu.org
> > > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > > matching scaffolding.
> > > >
> > > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > > >
> > > > > Hi Richi,
> > > > >
> > > > > This is a respin which includes the changes you requested.
> > > >
> > > > Comments randomly ordered, I'm pasting in pieces of the patch -
> > > > sending it inline would help to get pieces properly quoted and in-order.
> > > >
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > > >
> > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f
> > > > 9
> > > > aa42ffe0ce0dd
> > > > 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > ...
> > > >
> > > > I miss comments in this file, see tree-vectorizer.h where we try to
> > > > document purpose of classes and fields.
> > > >
> > > > Things that sticks out to me:
> > > >
> > > > +uint8_t m_arity;
> > > > +uint8_t m_num_args;
> > > >
> > > > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > > > num_args should be here ;)
> > >
> > > I think I can remove arity, but num_args is how many operands the
> > > created internal function call should take.  Since we can't vectorize
> > > calls with more than
> > > 4 arguments at the moment it seemed like 255 would be a safe limit :).
> > >
> > > >
> > > > +vec_info *m_vinfo;
> > > > ...
> > > > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > > >
> > > > so this looks like something I freed stmt_vec_info of -
> > > > back-pointers in the "wrong" direction of the logical hierarchy.  I
> > > > suppose it's just to avoid passing down vinfo where we need it?
> > > > Please do that instead - pass down vinfo as everything else does.
> > > >
> > > > The class seems to expose both very high-level (build () it!) and
> > > > very low level details (get_ifn).  The high-level one suggests that
> > > > a pattern _not_ being represented by an ifn is possible but there's
> > > > too much implementation detail already in the vect_pattern class to
> > > > make that impossible.  I guess the IFN details could be pushed down
> > > > to the simple matching class (and that be called vect_ifn_pattern or 
> > > > so).
> > > >
> > > > +static bool
> > > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > > +  bool found_p = false;
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +{
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt
> > > > + match
> > > > --\n");
> > > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > > > +}
> > > >
> > > > we dumped all instances after their analysis.  Maybe just refer to
> > > > the instance with its address (dump_print %p) so lookup in the
> > > > (already large) dump file is easy.
> > > >
> > > > +  hash_set *visited = new hash_set ();  for
> > > > + (unsigned x = 0; x < num__slp_patterns; x++)
> > > > +{
> > > > +  visited->empty ();
> > > > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > > > slp_patterns[x],
> > > > +   visited);
> > > > +}
> > > > +
> > > > +  delete visited;
> > > >
> > > > no need to new / delete, just do
> > > >
> > > >   has_set visited;
> > > >
> > > > like everyone else.  Btw, do you really want to scan pieces of the
> > > > SLP graph (with instances being graph entries) multiple times?  If
> > > > not then you should move the visited set to the caller instead.
> > > >
> > > > +  /* TODO: Remove in final version, only here for generating debug
> > > > + dot
> > > > graphs
> > > > +  from SLP tree.  */
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +{
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > > > +}
> > > >
> > > > now, if there was some pattern matched it is probably useful to dump
> > > > the graph (entry) again.  But only conditional on that I think.  So
> > > > can you instead make the dump conditional on 

RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Richard Biener
On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 1:36 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> > selects from loads from the same DR.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This change allows one to simplify lane permutes that select from
> > > multiple load leafs that load from the same DR group by promoting the
> > > VEC_PERM node into a load itself and pushing the lane permute into it as a
> > load permute.
> > >
> > > This saves us from having to calculate where to materialize a new load 
> > > node.
> > > If the resulting loads are now unused they are freed and are removed
> > > from the graph.
> > >
> > > This allows us to handle cases where we would have generated:
> > >
> > >   moviv4.4s, 0
> > >   adrpx3, .LC0
> > >   ldr q5, [x3, #:lo12:.LC0]
> > >   mov x3, 0
> > >   .p2align 3,,7
> > > .L2:
> > >   mov v0.16b, v4.16b
> > >   mov v3.16b, v4.16b
> > >   ldr q1, [x1, x3]
> > >   ldr q2, [x0, x3]
> > >   fcmla   v0.4s, v2.4s, v1.4s, #0
> > >   fcmla   v3.4s, v1.4s, v2.4s, #0
> > >   fcmla   v0.4s, v2.4s, v1.4s, #270
> > >   fcmla   v3.4s, v1.4s, v2.4s, #270
> > >   mov v1.16b, v3.16b
> > >   tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > >   str q0, [x2, x3]
> > >   add x3, x3, 16
> > >   cmp x3, 1600
> > >   bne .L2
> > >   ret
> > >
> > > and instead generate
> > >
> > >   mov x3, 0
> > >   .p2align 3,,7
> > > .L27:
> > >   ldr q0, [x2, x3]
> > >   ldr q1, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   fcmla   v0.2d, v1.2d, v2.2d, #0
> > >   fcmla   v0.2d, v1.2d, v2.2d, #270
> > >   str q0, [x2, x3]
> > >   add x3, x3, 16
> > >   cmp x3, 512
> > >   bne .L27
> > >   ret
> > >
> > > This runs as a pre step such that permute simplification can still
> > > inspect this permute is needed
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Tests are included as part of the final patch as they need the SLP
> > > pattern matcher to insert permutes in between.
> > >
> > > Ok for master?
> > 
> > So I think this is too specialized for the general issue that we're doing a 
> > bad
> > job in CSEing the load part of different permutes of the same group.  I've
> > played with fixing this half a year ago (again) in multiple general ways but
> > they all caused some regressions.
> > 
> > So you're now adding some heuristics as to when to anticipate "CSE" (or
> > merging with followup permutes).
> > 
> > To quickly recap what I did consider two loads (V2DF) one { a[0], a[1] } and
> > the other { a[1], a[0] }.  They currently are two SLP nodes and one with a
> > load_permutation.
> > My original attempts focused on trying to get rid of load_permutation in
> > favor of lane_permute nodes and thus during SLP discovery I turned the
> > second into { a[0], a[1] } (magically unified with the other load) and a
> > followup lane-permute node.
> > 
> > So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] } which 
> > eventually
> > will (due to patterns) be lane-permuted into { a[0], a[1] }, right?  So
> > generalizing this as a single { a[0], a[1] } plus two lane-permute nodes  { 
> > 0, 0 }
> > and { 1, 1 } early would solve the issue as well?
> 
> Correct, I did wonder why it was generating two different nodes instead of a 
> lane
> permute but didn't pay much attention that it was just a short coming.
> 
> > Now, in general it might be
> > more profitable to generate the { a[0], a[0] } and { a[1], a[1] } via 
> > scalar-load-
> > and-splat rather than vector load and permute so we have to be careful to
> > not over-optimize here or be prepared to do the reverse transform.
> 
> This in principle can be done in optimize_slp then right? Since it would do
> a lot of the same work already and find the materialization points. 
> 
> > 
> > The patch itself is a bit ugly since it modifies the SLP graph when we 
> > already
> > produced the graphds graph so I would do any of this before.  I did consider
> > gathering all loads nodes loading from a group and then trying to apply some
> > heuristic to alter the SLP graph so it can be better optimized.  In fact 
> > when we
> > want to generate the same code as the non-SLP interleaving scheme does
> > we do have to look at those since we have to unify loads there.
> > 
> 
> Yes.. I will concede the patch isn't my finest work.. I also don't like the 
> fact that I
> had to keep leafs in tact less I break things later. But wanted feedback :) 
> 
> > I'd put this after vect_slp_build_vertices but before the new_graph call -
> > altering 'vertices' / 'leafs' should be more easily possible and the 
> > 'leafs' array
> > contains all loads already 

Re: [PATCH v5] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-11-04 Thread Raoni Fassina Firmino via Gcc-patches
On Wed, Nov 04, 2020 at 10:35:03AM +0100, Richard Biener wrote:
> > +/* Expand call EXP to the fegetround builtin (from C99 fenv.h), returning 
> > the
> > +   result and setting it in TARGET.  Otherwise return NULL_RTX on failure. 
> >  */
> > +static rtx
> > +expand_builtin_fegetround (tree exp, rtx target, machine_mode target_mode)
> > +{
> > +  if (!validate_arglist (exp, VOID_TYPE))
> > +return NULL_RTX;
> > +
> > +  insn_code icode = direct_optab_handler (fegetround_optab, SImode);
> > +  if (icode == CODE_FOR_nothing)
> > +return NULL_RTX;
> > +
> > +  if (target == 0
> > +  || GET_MODE (target) != target_mode
> > +  || !(*insn_data[icode].operand[0].predicate) (target, target_mode))
> > +target = gen_reg_rtx (target_mode);
> > +
> > +  rtx pat = GEN_FCN (icode) (target);
> > +  if (!pat)
> > +return NULL_RTX;
> > +  emit_insn (pat);
> 
> I think you need to verify whether the expansion ended up in 'target'
> and otherwise emit a move since usually 'target' is just a hint.

I thought the "if (target == 0 ..." took care of that. The expands do
emit a move, if that helps.

For feclearexcept and feraiseexcept I included tests to variable
'target', including none, but now I see that I did not do the same for
fegetround, I can add the same if it is necessary, but the test do check
if the return is correct, so I don't know.


> > +@cindex @code{fegetround@var{m}} instruction pattern
> > +@item @samp{fegetround@var{m}}
> > +Store the current machine floating-point rounding mode into operand 0.
> > +Operand 0 has mode @var{m}, which is scalar.  This pattern is used to
> > +implement the @code{fegetround} function from the ISO C99 standard.
> 
> I think this needs to elaborate on the format of the "rounding mode".
> 
> AFAICS you do nothing to marshall with the actually used libc
> implementation which AFAIU can choose arbitrary values for
> the FE_* macros.  I'm not sure we require the compiler to be
> configured for one specific C library and for example require
> matching FE_* macro definitions for all uses of the built
> compiler.
> 
> For the patch at hand you seem to assume the libc "format"
> matches the hardware one (which would of course be reasonable).
> 
> Does that actually hold up when looking at libcs other than 
> glibc supporting powerpc?

I checked in some other libc implementations that have POWER support and
all have the same value as glic for the four rounding modes and the five
exception flags from libc. The libcs implementations I checked are:

 - musl
 - uclibc & uclibc-ng
 - freebsd

Is There any other I am missing?


> If all of these are non-issues then the middle-end pices look OK.
> If we need any such "translation" layer then I guess we need
> to either have additional operands to the optabs specifying
> all of the FE_* values relevant for the respective call or
> provide a side-channel (target hook) to implement the
> translation on the expansion side.

IMHO, It seems like it is not necessary if there not a libc that have
different values for the FE_* macros. I didn't check other archs, but if
is the case for some other arch I think it could be changed if and when
some other arch implements expands for these builtins.


o/
Raoni


Re: [00/32] C++ 20 Modules

2020-11-04 Thread Nathan Sidwell

On 11/4/20 9:15 AM, Jason Merrill wrote:
On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell > wrote:



We can; apparently the necessary incantation is to

#define INCLUDE_ALGORITHM


thanks that's fixed the build problem.  And working around the i386 
error I get a working toolchain.  modules test all pass except a trivial 
one detecting va_list looks different.  I must have messed the target 
check there.


so i686-linux is now known good

nathan

--
Nathan Sidwell


Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Wed, Nov 4, 2020 at 6:41 AM Jozef Lawrynowicz
 wrote:
>
> On Wed, Nov 04, 2020 at 05:47:28AM -0800, H.J. Lu wrote:
> > On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
> > >
> > > On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
> > >  wrote:
> > > >
> > > > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > > > wrote:
> > > > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via 
> > > > > > > > > Gcc-patches wrote:
> > > > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > The attached patch implements 
> > > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED for ELF GNU
> > > > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > > > attribute
> > > > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > > > >
> > > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > > > ".retain"
> > > > > > > > > >
> > > > > > > > > > Can you use the "R" flag instead?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > For the benefit of this mailing list, I have copied my 
> > > > > > > > > response from the
> > > > > > > > > Binutils mailing list regarding this.
> > > > > > > > > The "comm_section" example I gave is actually innacurate, but 
> > > > > > > > > you can
> > > > > > > > > see the examples of the variety of sections that would need 
> > > > > > > > > to be
> > > > > > > > > handled by doing
> > > > > > > > >
> > > > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > > > >
> > > > > > > > > > ... snip ...
> > > > > > > > > > Secondly, for seamless integration with the "used" 
> > > > > > > > > > attribute, we must be
> > > > > > > > > > able to to mark the symbol with the used attribute applied 
> > > > > > > > > > as "retained"
> > > > > > > > > > without changing its section name. For GCC "named" 
> > > > > > > > > > sections, this is
> > > > > > > > > > straightforward, but for "unnamed" sections it is a giant 
> > > > > > > > > > mess.
> > > > > > > > > >
> > > > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > > > available,
> > > > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > > > switch to one
> > > > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded 
> > > > > > > > > > in the
> > > > > > > > > > structure.
> > > > > > > > > >
> > > > > > > > > > Backends define the assembly code to switch to these 
> > > > > > > > > > sections (some
> > > > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > > > unnamed section
> > > > > > > > > > "comm_section", might correspond to a .bss section, or emit 
> > > > > > > > > > a .comm
> > > > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > > > what the
> > > > > > > > > > name of a section will be, but it would be very messy and 
> > > > > > > > > > not robust.
> > > > > > > > > >
> > > > > > > > > > Meanwhile, having a .retain  directive is a 
> > > > > > > > > > very simmple
> > > > > > > > > > solution, and keeps the GCC implementation really concise 
> > > > > > > > > > (patch
> > > > > > > > > > attached). The assembler will know for sure what the 
> > > > > > > > > > section containing
> > > > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > > > directly.
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Please take a look at
> > > > > > > >
> > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > > > >
> > > > > > > > which is built in top of
> > > > > > > >
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > > > >
> > > > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > > > want, you extract my flags2 change and use it for 
> > > > > > > > SHF_GNU_RETAIN.
> > > > > > >
> > > > > > > In your patch you have to make the assumption that data_section, 
> > > > > > > always
> > > > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > > > (which
> > > > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > > > >
> > > > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > > > >
> > > > > > > data_section for c6x corresponds to .fardata, not .data. So the 
> > > > > > > use of
> > > > > > > "used" on a data declaration would place it in a 

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread Jozef Lawrynowicz
On Wed, Nov 04, 2020 at 05:47:28AM -0800, H.J. Lu wrote:
> On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
> >
> > On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
> >  wrote:
> > >
> > > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > > wrote:
> > > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via 
> > > > > > > > Gcc-patches wrote:
> > > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > The attached patch implements 
> > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED for ELF GNU
> > > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > > attribute
> > > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > > >
> > > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > > ".retain"
> > > > > > > > >
> > > > > > > > > Can you use the "R" flag instead?
> > > > > > > > >
> > > > > > > >
> > > > > > > > For the benefit of this mailing list, I have copied my response 
> > > > > > > > from the
> > > > > > > > Binutils mailing list regarding this.
> > > > > > > > The "comm_section" example I gave is actually innacurate, but 
> > > > > > > > you can
> > > > > > > > see the examples of the variety of sections that would need to 
> > > > > > > > be
> > > > > > > > handled by doing
> > > > > > > >
> > > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > > >
> > > > > > > > > ... snip ...
> > > > > > > > > Secondly, for seamless integration with the "used" attribute, 
> > > > > > > > > we must be
> > > > > > > > > able to to mark the symbol with the used attribute applied as 
> > > > > > > > > "retained"
> > > > > > > > > without changing its section name. For GCC "named" sections, 
> > > > > > > > > this is
> > > > > > > > > straightforward, but for "unnamed" sections it is a giant 
> > > > > > > > > mess.
> > > > > > > > >
> > > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > > available,
> > > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > > switch to one
> > > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded 
> > > > > > > > > in the
> > > > > > > > > structure.
> > > > > > > > >
> > > > > > > > > Backends define the assembly code to switch to these sections 
> > > > > > > > > (some
> > > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > > unnamed section
> > > > > > > > > "comm_section", might correspond to a .bss section, or emit a 
> > > > > > > > > .comm
> > > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > > what the
> > > > > > > > > name of a section will be, but it would be very messy and not 
> > > > > > > > > robust.
> > > > > > > > >
> > > > > > > > > Meanwhile, having a .retain  directive is a very 
> > > > > > > > > simmple
> > > > > > > > > solution, and keeps the GCC implementation really concise 
> > > > > > > > > (patch
> > > > > > > > > attached). The assembler will know for sure what the section 
> > > > > > > > > containing
> > > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > > directly.
> > > > > > > > >
> > > > > > >
> > > > > > > Please take a look at
> > > > > > >
> > > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > > >
> > > > > > > which is built in top of
> > > > > > >
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > > >
> > > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > > want, you extract my flags2 change and use it for SHF_GNU_RETAIN.
> > > > > >
> > > > > > In your patch you have to make the assumption that data_section, 
> > > > > > always
> > > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > > (which
> > > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > > >
> > > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > > >
> > > > > > data_section for c6x corresponds to .fardata, not .data. So the use 
> > > > > > of
> > > > > > "used" on a data declaration would place it in a different section, 
> > > > > > that
> > > > > > if the "used" attribute was not applied.
> > > > > >
> > > > > > For c6x and mips, readonly_data_section does not correspond to 
> > > > > > .rodata,
> > > > > > so that assumption cannot be made either:
> > > > > > > c6x/elf-common.h:#define 

RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 2:04 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; nd ;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
> 
> On Wed, 4 Nov 2020, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: rguent...@c653.arch.suse.de  On
> > > Behalf Of Richard Biener
> > > Sent: Wednesday, November 4, 2020 12:41 PM
> > > To: Tamar Christina 
> > > Cc: Richard Sandiford ; nd ;
> > > gcc-patches@gcc.gnu.org
> > > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern
> > > matching scaffolding.
> > >
> > > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > >
> > > > Hi Richi,
> > > >
> > > > This is a respin which includes the changes you requested.
> > >
> > > Comments randomly ordered, I'm pasting in pieces of the patch -
> > > sending it inline would help to get pieces properly quoted and in-order.
> > >
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f
> > > 9
> > > aa42ffe0ce0dd
> > > 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > ...
> > >
> > > I miss comments in this file, see tree-vectorizer.h where we try to
> > > document purpose of classes and fields.
> > >
> > > Things that sticks out to me:
> > >
> > > +uint8_t m_arity;
> > > +uint8_t m_num_args;
> > >
> > > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > > num_args should be here ;)
> >
> > I think I can remove arity, but num_args is how many operands the
> > created internal function call should take.  Since we can't vectorize
> > calls with more than
> > 4 arguments at the moment it seemed like 255 would be a safe limit :).
> >
> > >
> > > +vec_info *m_vinfo;
> > > ...
> > > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > >
> > > so this looks like something I freed stmt_vec_info of -
> > > back-pointers in the "wrong" direction of the logical hierarchy.  I
> > > suppose it's just to avoid passing down vinfo where we need it?
> > > Please do that instead - pass down vinfo as everything else does.
> > >
> > > The class seems to expose both very high-level (build () it!) and
> > > very low level details (get_ifn).  The high-level one suggests that
> > > a pattern _not_ being represented by an ifn is possible but there's
> > > too much implementation detail already in the vect_pattern class to
> > > make that impossible.  I guess the IFN details could be pushed down
> > > to the simple matching class (and that be called vect_ifn_pattern or so).
> > >
> > > +static bool
> > > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > +  bool found_p = false;
> > > +
> > > +  if (dump_enabled_p ())
> > > +{
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt
> > > + match
> > > --\n");
> > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > > +}
> > >
> > > we dumped all instances after their analysis.  Maybe just refer to
> > > the instance with its address (dump_print %p) so lookup in the
> > > (already large) dump file is easy.
> > >
> > > +  hash_set *visited = new hash_set ();  for
> > > + (unsigned x = 0; x < num__slp_patterns; x++)
> > > +{
> > > +  visited->empty ();
> > > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > > slp_patterns[x],
> > > +   visited);
> > > +}
> > > +
> > > +  delete visited;
> > >
> > > no need to new / delete, just do
> > >
> > >   has_set visited;
> > >
> > > like everyone else.  Btw, do you really want to scan pieces of the
> > > SLP graph (with instances being graph entries) multiple times?  If
> > > not then you should move the visited set to the caller instead.
> > >
> > > +  /* TODO: Remove in final version, only here for generating debug
> > > + dot
> > > graphs
> > > +  from SLP tree.  */
> > > +
> > > +  if (dump_enabled_p ())
> > > +{
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > > +}
> > >
> > > now, if there was some pattern matched it is probably useful to dump
> > > the graph (entry) again.  But only conditional on that I think.  So
> > > can you instead make the dump conditional on found_p and remove the
> > > start dot/end dot markers as said in the comment?
> > >
> > > + if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +"transformation for %s not valid due to
> > > + post
> > > "
> > > +

Re: deprecations in OpenMP 5.0

2020-11-04 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 04, 2020 at 02:23:17PM +, Kwok Cheung Yeung wrote:
> I have used Tobias' recently added patch for Fortran deprecation support to
> mark omp_get_nested and omp_set_nested as deprecated. If the omp_lock_hint_*
> integer parameters are marked though, then the deprecation warnings will
> fire the moment omp_lib is used from a Fortran program, even if they are not
> referenced in the progam itself - a bug perhaps?
> 
> I have added '-cpp' (for preprocessor support) and '-fopenmp' (for the
> _OPENMP define) to the Makefile when compiling the omp_lib.f90.
> 
> Would a warning message be acceptable if OMP_NESTED is used? Obviously this
> cannot be done at compile-time.

I'd strongly prefer no runtime warnings.

> 2020-11-04  Ulrich Drepper  
>   Kwok Cheung Yeung  
> 
>   libgomp/
>   * Makefile.am (%.mod): Add -cpp and -fopenmp to compile flags.
>   * Makefile.in: Regenerate.
>   * fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
>   pragmas to ignore -Wdeprecated-declarations warnings.
>   * icv.c: Likewise.
>   * omp.h.in (__GOMP_DEPRECATED_5_0): Define.
>   Mark omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
>   and omp_get_nested with __GOMP_DEPRECATED_5_0.
>   * omp_lib.f90.in: Mark omp_get_nested and omp_set_nested as
>   deprecated.

LGTM, except:

> +  omp_lock_hint_contended __GOMP_DEPRECATED_5_0 = omp_sync_hint_contended,
>omp_sync_hint_nonspeculative = 4,
> -  omp_lock_hint_nonspeculative = omp_sync_hint_nonspeculative,
> +  omp_lock_hint_nonspeculative __GOMP_DEPRECATED_5_0 = 
> omp_sync_hint_nonspeculative,

The above line is too long and needs wrapping.

But it would be nice to also add -Wno-deprecated to dg-additional-options of
tests that do use those.
Perhaps for testing replace the 201811 temporarily with 201511 and run make
check.

> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in
> @@ -644,4 +644,8 @@
>end function
>  end interface
>  
> +#if _OPENMP >= 201811
> +!GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
> +#endif
> +
>end module omp_lib

Also, what about omp_lib.h?  Do you plan to change it only when we switch
_OPENMP macro?  I mean, we can't rely on preprocessing in that case...

Jakub



Re: deprecations in OpenMP 5.0

2020-11-04 Thread Kwok Cheung Yeung

On 28/10/2020 4:06 pm, Jakub Jelinek wrote:

On Wed, Oct 28, 2020 at 03:41:25PM +, Kwok Cheung Yeung wrote:

What if we made the definition of __GOMP_DEPRECATED in the original patch
conditional on the current value of __OPENMP__? i.e. Something like:

+#if defined(__GNUC__) && __OPENMP__ >= 201811L
+# define __GOMP_DEPRECATED __attribute__((__deprecated__))
+#else
+# define __GOMP_DEPRECATED
+#endif

In that case, __GOMP_DEPRECATED will not do anything until __OPENMP__ is
updated to reflect OpenMP 5.0, but when it is, the functions will
immediately be marked deprecated without any further work.


That could work, but the macro name would need to incorporate the exact
OpenMP version.
Because some APIs can be deprecated in OpenMP 5.0, others in 5.1 or in 5.2
(all to be removed in 6.0), others in 6.0/6.1 etc. to be removed in 7.0 etc.


I've renamed __GOMP_DEPRECATED to __GOMP_DEPRECATED_5_0.



However, GFortran does not support the deprecated attribute, so how should
it behave? My first thought would be to print out a warning message at
runtime the first time a deprecated function is called (printing it out
every time would probably be too annoying), and maybe add an environment
variable that can be set to disable the warning. A similar runtime warning
could also be printed if the OMP_NESTED environment variable is set. Again,
printing these warnings could be surpressed until the value of __OPENMP__ is
bumped up.


I'm against such runtime diagnostics, that is perhaps good for some
sanitization, but not normal usage.  Perhaps better implement deprecated
attribute in gfortran?



I have used Tobias' recently added patch for Fortran deprecation support to mark 
omp_get_nested and omp_set_nested as deprecated. If the omp_lock_hint_* integer 
parameters are marked though, then the deprecation warnings will fire the moment 
omp_lib is used from a Fortran program, even if they are not referenced in the 
progam itself - a bug perhaps?


I have added '-cpp' (for preprocessor support) and '-fopenmp' (for the _OPENMP 
define) to the Makefile when compiling the omp_lib.f90.


Would a warning message be acceptable if OMP_NESTED is used? Obviously this 
cannot be done at compile-time.


Is this patch okay for trunk? We could add the deprecations for omp_lock_hint_* 
later when the deprecations for parameters are fixed. I have checked that it 
bootstraps on x86_64.


Kwok
From 6e8fc46bdcaf44da11d46968a488fdd990ae Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 4 Nov 2020 03:59:44 -0800
Subject: [PATCH] openmp: Mark deprecated symbols in OpenMP 5.0

2020-11-04  Ulrich Drepper  
Kwok Cheung Yeung  

libgomp/
* Makefile.am (%.mod): Add -cpp and -fopenmp to compile flags.
* Makefile.in: Regenerate.
* fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
pragmas to ignore -Wdeprecated-declarations warnings.
* icv.c: Likewise.
* omp.h.in (__GOMP_DEPRECATED_5_0): Define.
Mark omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
and omp_get_nested with __GOMP_DEPRECATED_5_0.
* omp_lib.f90.in: Mark omp_get_nested and omp_set_nested as
deprecated.
---
 libgomp/Makefile.am|  2 +-
 libgomp/Makefile.in|  2 +-
 libgomp/fortran.c  | 13 +++--
 libgomp/icv.c  | 10 --
 libgomp/omp.h.in   | 22 ++
 libgomp/omp_lib.f90.in |  4 
 6 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 586c930..4cf1f58 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -92,7 +92,7 @@ openacc_kinds.mod: openacc.mod
 openacc.mod: openacc.lo
:
 %.mod: %.f90
-   $(FC) $(FCFLAGS) -fsyntax-only $<
+   $(FC) $(FCFLAGS) -cpp -fopenmp -fsyntax-only $<
 fortran.lo: libgomp_f.h
 fortran.o: libgomp_f.h
 env.lo: libgomp_f.h
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 00d5e29..eb868b3 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -1382,7 +1382,7 @@ openacc_kinds.mod: openacc.mod
 openacc.mod: openacc.lo
:
 %.mod: %.f90
-   $(FC) $(FCFLAGS) -fsyntax-only $<
+   $(FC) $(FCFLAGS) -cpp -fopenmp -fsyntax-only $<
 fortran.lo: libgomp_f.h
 fortran.o: libgomp_f.h
 env.lo: libgomp_f.h
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 029dec1..cd719f9 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -47,10 +47,13 @@ ialias_redirect (omp_test_lock)
 ialias_redirect (omp_test_nest_lock)
 # endif
 ialias_redirect (omp_set_dynamic)
-ialias_redirect (omp_set_nested)
-ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_get_dynamic)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+ialias_redirect (omp_set_nested)
 ialias_redirect (omp_get_nested)
+#pragma GCC diagnostic pop
+ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_in_parallel)
 ialias_redirect (omp_get_max_threads)
 

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches
Christophe Lyon  writes:

> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>  wrote:
>>
>> Hi Andrea,
>>
>> > -Original Message-
>> > From: Andrea Corallo 
>> > Sent: 26 October 2020 15:59
>> > To: gcc-patches@gcc.gnu.org
>> > Cc: Kyrylo Tkachov ; Richard Earnshaw
>> > ; nd 
>> > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
>> >
>> > Hi all,
>> >
>> > I'd like to submit the following patch implementing the bfloat16_t
>> > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
>> >
>> > Please see refer to:
>> > ACLE 
>> > ISA  
>> >
>> > Regtested and bootstrapped.
>> >
>> > Okay for trunk?
>>
>
> I think you need to add -mfloat-abi=hard to the dg-additional-options
> otherwise vld1_lane_bf16_1.c
> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>
> See bf16_vldn_1.c.
>
> BTW, why did you use a different naming scheme for the tests?
> (bf16_vldn_1.c vs vld1_lane_bf16_1.c)

Nothing special, it made more sense to me to use directly the name of
the intrinsic as it include already the bf16 information.  I believe we
have both schemas in the aarch64 & arm backends.  I've no problem with
renaming the tests if we feel is important.

  Andrea


Re: [00/32] C++ 20 Modules

2020-11-04 Thread Jason Merrill via Gcc-patches
On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell  wrote:

> On 11/4/20 7:30 AM, Nathan Sidwell wrote:
>
> > rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try
> > that.
>
> yeah, that didn't work.  There's compilation errors in
> ../../../src/gcc/config/i386/x86-tune-costs.h about missing
> initializers.  and then ...
>
> In file included from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
>   from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
>   from
> /usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,
>   from /usr/include/c++/4.9/random:50,
>   from /usr/include/c++/4.9/bits/stl_algo.h:66,
>   from /usr/include/c++/4.9/algorithm:62,
>   from ../../../src/gcc/cp/mapper-resolver.cc:26:
> ./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
>   return malloc (__size);
>  ^
> Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed
>
> it's a little unfortunate we can't use the standard library :(  I'll see
> what I can do about avoiding algorithm.
>

We can; apparently the necessary incantation is to

#define INCLUDE_ALGORITHM

before

#include "system.h"

Jason


Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Andrea Corallo via Gcc-patches
Christophe Lyon  writes:

> On Wed, 4 Nov 2020 at 14:29, Christophe Lyon  
> wrote:
>>
>> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>>  wrote:
>> >
>> > Hi Andrea,
>> >
>> > > -Original Message-
>> > > From: Andrea Corallo 
>> > > Sent: 26 October 2020 15:59
>> > > To: gcc-patches@gcc.gnu.org
>> > > Cc: Kyrylo Tkachov ; Richard Earnshaw
>> > > ; nd 
>> > > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
>> > >
>> > > Hi all,
>> > >
>> > > I'd like to submit the following patch implementing the bfloat16_t
>> > > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
>> > >
>> > > Please see refer to:
>> > > ACLE 
>> > > ISA  
>> > >
>> > > Regtested and bootstrapped.
>> > >
>> > > Okay for trunk?
>> >
>>
>> I think you need to add -mfloat-abi=hard to the dg-additional-options
>> otherwise vld1_lane_bf16_1.c
>> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>>
>> See bf16_vldn_1.c.
>
> Actually that's not sufficient because in turn we get:
> /sysroot-arm-none-linux-gnueabi/usr/include/gnu/stubs.h:10:11: fatal
> error: gnu/stubs-hard.h: No such file or directory
>
> So you should check that -mfloat-abi=hard is supported.
>
> Ditto for the vst tests.
>

Hi Christophe,

thanks for catching this, I'll prepare a patch.

  Andrea


RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Richard Biener
On Wed, 4 Nov 2020, Tamar Christina wrote:

> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 12:41 PM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; nd ;
> > gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > scaffolding.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > This is a respin which includes the changes you requested.
> > 
> > Comments randomly ordered, I'm pasting in pieces of the patch - sending it
> > inline would help to get pieces properly quoted and in-order.
> > 
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9
> > aa42ffe0ce0dd
> > 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > ...
> > 
> > I miss comments in this file, see tree-vectorizer.h where we try to document
> > purpose of classes and fields.
> > 
> > Things that sticks out to me:
> > 
> > +uint8_t m_arity;
> > +uint8_t m_num_args;
> > 
> > why uint8_t and not simply unsigned int?  Not knowing what arity /
> > num_args should be here ;)
> 
> I think I can remove arity, but num_args is how many operands the created
> internal function call should take.  Since we can't vectorize calls with more 
> than
> 4 arguments at the moment it seemed like 255 would be a safe limit :).
> 
> > 
> > +vec_info *m_vinfo;
> > ...
> > +vect_pattern (slp_tree *node, vec_info *vinfo)
> > 
> > so this looks like something I freed stmt_vec_info of - back-pointers in the
> > "wrong" direction of the logical hierarchy.  I suppose it's just to avoid 
> > passing
> > down vinfo where we need it?  Please do that instead - pass down vinfo as
> > everything else does.
> > 
> > The class seems to expose both very high-level (build () it!) and very low
> > level details (get_ifn).  The high-level one suggests that a pattern _not_
> > being represented by an ifn is possible but there's too much implementation
> > detail already in the vect_pattern class to make that impossible.  I guess 
> > the
> > IFN details could be pushed down to the simple matching class (and that be
> > called vect_ifn_pattern or so).
> > 
> > +static bool
> > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > +  bool found_p = false;
> > +
> > +  if (dump_enabled_p ())
> > +{
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match
> > --\n");
> > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> > +}
> > 
> > we dumped all instances after their analysis.  Maybe just refer to the
> > instance with its address (dump_print %p) so lookup in the (already large)
> > dump file is easy.
> > 
> > +  hash_set *visited = new hash_set ();  for
> > + (unsigned x = 0; x < num__slp_patterns; x++)
> > +{
> > +  visited->empty ();
> > +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> > slp_patterns[x],
> > +   visited);
> > +}
> > +
> > +  delete visited;
> > 
> > no need to new / delete, just do
> > 
> >   has_set visited;
> > 
> > like everyone else.  Btw, do you really want to scan pieces of the SLP graph
> > (with instances being graph entries) multiple times?  If not then you should
> > move the visited set to the caller instead.
> > 
> > +  /* TODO: Remove in final version, only here for generating debug dot
> > graphs
> > +  from SLP tree.  */
> > +
> > +  if (dump_enabled_p ())
> > +{
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> > +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> > +}
> > 
> > now, if there was some pattern matched it is probably useful to dump the
> > graph (entry) again.  But only conditional on that I think.  So can you 
> > instead
> > make the dump conditional on found_p and remove the start dot/end dot
> > markers as said in the comment?
> > 
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"transformation for %s not valid due to
> > + post
> > "
> > +"condition\n",
> > 
> > not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> > MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> > vectorization fail.
> > 
> > +  /* Perform recursive matching, it's important to do this after
> > + matching
> > things
> > 
> > before matching things?
> > 
> > + in the current node as the matches here may re-order the nodes
> > + below
> > it.
> > + As such the pattern that needs to be subsequently match may change.
> > 
> > and this is no longer 

RE: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Tamar Christina via Gcc-patches
Hi Richi,

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 1:36 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: Re: [PATCH v2 10/18]middle-end simplify lane permutes which
> selects from loads from the same DR.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This change allows one to simplify lane permutes that select from
> > multiple load leafs that load from the same DR group by promoting the
> > VEC_PERM node into a load itself and pushing the lane permute into it as a
> load permute.
> >
> > This saves us from having to calculate where to materialize a new load node.
> > If the resulting loads are now unused they are freed and are removed
> > from the graph.
> >
> > This allows us to handle cases where we would have generated:
> >
> > moviv4.4s, 0
> > adrpx3, .LC0
> > ldr q5, [x3, #:lo12:.LC0]
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > mov v0.16b, v4.16b
> > mov v3.16b, v4.16b
> > ldr q1, [x1, x3]
> > ldr q2, [x0, x3]
> > fcmla   v0.4s, v2.4s, v1.4s, #0
> > fcmla   v3.4s, v1.4s, v2.4s, #0
> > fcmla   v0.4s, v2.4s, v1.4s, #270
> > fcmla   v3.4s, v1.4s, v2.4s, #270
> > mov v1.16b, v3.16b
> > tbl v0.16b, {v0.16b - v1.16b}, v5.16b
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 1600
> > bne .L2
> > ret
> >
> > and instead generate
> >
> > mov x3, 0
> > .p2align 3,,7
> > .L27:
> > ldr q0, [x2, x3]
> > ldr q1, [x0, x3]
> > ldr q2, [x1, x3]
> > fcmla   v0.2d, v1.2d, v2.2d, #0
> > fcmla   v0.2d, v1.2d, v2.2d, #270
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 512
> > bne .L27
> > ret
> >
> > This runs as a pre step such that permute simplification can still
> > inspect this permute is needed
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Tests are included as part of the final patch as they need the SLP
> > pattern matcher to insert permutes in between.
> >
> > Ok for master?
> 
> So I think this is too specialized for the general issue that we're doing a 
> bad
> job in CSEing the load part of different permutes of the same group.  I've
> played with fixing this half a year ago (again) in multiple general ways but
> they all caused some regressions.
> 
> So you're now adding some heuristics as to when to anticipate "CSE" (or
> merging with followup permutes).
> 
> To quickly recap what I did consider two loads (V2DF) one { a[0], a[1] } and
> the other { a[1], a[0] }.  They currently are two SLP nodes and one with a
> load_permutation.
> My original attempts focused on trying to get rid of load_permutation in
> favor of lane_permute nodes and thus during SLP discovery I turned the
> second into { a[0], a[1] } (magically unified with the other load) and a
> followup lane-permute node.
> 
> So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] } which 
> eventually
> will (due to patterns) be lane-permuted into { a[0], a[1] }, right?  So
> generalizing this as a single { a[0], a[1] } plus two lane-permute nodes  { 
> 0, 0 }
> and { 1, 1 } early would solve the issue as well?

Correct, I did wonder why it was generating two different nodes instead of a 
lane
permute but didn't pay much attention that it was just a short coming.

> Now, in general it might be
> more profitable to generate the { a[0], a[0] } and { a[1], a[1] } via 
> scalar-load-
> and-splat rather than vector load and permute so we have to be careful to
> not over-optimize here or be prepared to do the reverse transform.

This in principle can be done in optimize_slp then right? Since it would do
a lot of the same work already and find the materialization points. 

> 
> The patch itself is a bit ugly since it modifies the SLP graph when we already
> produced the graphds graph so I would do any of this before.  I did consider
> gathering all loads nodes loading from a group and then trying to apply some
> heuristic to alter the SLP graph so it can be better optimized.  In fact when 
> we
> want to generate the same code as the non-SLP interleaving scheme does
> we do have to look at those since we have to unify loads there.
> 

Yes.. I will concede the patch isn't my finest work.. I also don't like the 
fact that I
had to keep leafs in tact less I break things later. But wanted feedback :) 

> I'd put this after vect_slp_build_vertices but before the new_graph call -
> altering 'vertices' / 'leafs' should be more easily possible and the 'leafs' 
> array
> contains all loads already (vect_slp_build_vertices could be massaged to
> provide a map from DR_GROUP_FIRST_ELEMENT to slp_tree, giving us the
> meta we want).
> 
> That said, I'd like to see something more forward-looking rather than the ad-
> hoc special-casing of what you run into with the 

Re: [00/32] C++ 20 Modules

2020-11-04 Thread Nathan Sidwell

On 11/4/20 7:30 AM, Nathan Sidwell wrote:

rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try 
that.


yeah, that didn't work.  There's compilation errors in
../../../src/gcc/config/i386/x86-tune-costs.h about missing 
initializers.  and then ...


In file included from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
 from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
 from 
/usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,

 from /usr/include/c++/4.9/random:50,
 from /usr/include/c++/4.9/bits/stl_algo.h:66,
 from /usr/include/c++/4.9/algorithm:62,
 from ../../../src/gcc/cp/mapper-resolver.cc:26:
./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
 return malloc (__size);
^
Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed

it's a little unfortunate we can't use the standard library :(  I'll see 
what I can do about avoiding algorithm.


nathan

--
Nathan Sidwell
make[2]: Entering directory '/home/nathan/egcs/modules/obj/i686/gcc'
g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../../src/gcc -I../../../src/gcc/. -I../../../src/gcc/../include -I../../../src/gcc/../libcpp/include -I../../../src/gcc/../libcody -I/home/nathan/egcs/modules/obj/i686/./gmp -I/home/nathan/egcs/modules/src/gmp -I/home/nathan/egcs/modules/obj/i686/./mpfr/src -I/home/nathan/egcs/modules/src/mpfr/src -I/home/nathan/egcs/modules/src/mpc/src  -I../../../src/gcc/../libdecnumber -I../../../src/gcc/../libdecnumber/bid -I../libdecnumber -I../../../src/gcc/../libbacktrace -I/home/nathan/egcs/modules/obj/i686/./isl/include -I/home/nathan/egcs/modules/src/isl/include  -o i386-options.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo ../../../src/gcc/config/i386/i386-options.c
In file included from ../../../src/gcc/config/i386/i386-options.c:94:0:
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
   {rep_prefix_1_byte, {{-1, rep_prefix_1_byte, false;
^
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::alg' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::noalign' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::max'
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: warning: missing initializer for member 'stringop_algs::stringop_strategy::max' [-Wmissing-field-initializers]
../../../src/gcc/config/i386/x86-tune-costs.h:32:56: error: uninitialized const member 'stringop_algs::stringop_strategy::alg'

Re: [PATCH] "used" attribute saves decl from linker garbage collection

2020-11-04 Thread H.J. Lu via Gcc-patches
On Tue, Nov 3, 2020 at 2:11 PM H.J. Lu  wrote:
>
> On Tue, Nov 3, 2020 at 1:57 PM Jozef Lawrynowicz
>  wrote:
> >
> > On Tue, Nov 03, 2020 at 01:09:43PM -0800, H.J. Lu via Gcc-patches wrote:
> > > On Tue, Nov 3, 2020 at 1:00 PM H.J. Lu  wrote:
> > > >
> > > > On Tue, Nov 3, 2020 at 12:46 PM Jozef Lawrynowicz
> > > >  wrote:
> > > > >
> > > > > On Tue, Nov 03, 2020 at 11:58:04AM -0800, H.J. Lu via Gcc-patches 
> > > > > wrote:
> > > > > > On Tue, Nov 3, 2020 at 10:22 AM Jozef Lawrynowicz
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 03, 2020 at 09:57:58AM -0800, H.J. Lu via Gcc-patches 
> > > > > > > wrote:
> > > > > > > > On Tue, Nov 3, 2020 at 9:41 AM Jozef Lawrynowicz
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > The attached patch implements TARGET_ASM_MARK_DECL_PRESERVED 
> > > > > > > > > for ELF GNU
> > > > > > > > > OSABI targets, so that declarations that have the "used" 
> > > > > > > > > attribute
> > > > > > > > > applied will be saved from linker garbage collection.
> > > > > > > > >
> > > > > > > > > TARGET_ASM_MARK_DECL_PRESERVED will emit an assembler 
> > > > > > > > > ".retain"
> > > > > > > >
> > > > > > > > Can you use the "R" flag instead?
> > > > > > > >
> > > > > > >
> > > > > > > For the benefit of this mailing list, I have copied my response 
> > > > > > > from the
> > > > > > > Binutils mailing list regarding this.
> > > > > > > The "comm_section" example I gave is actually innacurate, but you 
> > > > > > > can
> > > > > > > see the examples of the variety of sections that would need to be
> > > > > > > handled by doing
> > > > > > >
> > > > > > > $ git grep -A2 "define.*SECTION_ASM_OP" gcc/ | grep "\".*\."
> > > > > > >
> > > > > > > > ... snip ...
> > > > > > > > Secondly, for seamless integration with the "used" attribute, 
> > > > > > > > we must be
> > > > > > > > able to to mark the symbol with the used attribute applied as 
> > > > > > > > "retained"
> > > > > > > > without changing its section name. For GCC "named" sections, 
> > > > > > > > this is
> > > > > > > > straightforward, but for "unnamed" sections it is a giant mess.
> > > > > > > >
> > > > > > > > The section name for a GCC "unnamed" section is not readily 
> > > > > > > > available,
> > > > > > > > instead a string which contains the full assembly code to 
> > > > > > > > switch to one
> > > > > > > > of these text/data/bss/rodata/comm etc. sections is encoded in 
> > > > > > > > the
> > > > > > > > structure.
> > > > > > > >
> > > > > > > > Backends define the assembly code to switch to these sections 
> > > > > > > > (some
> > > > > > > > "*ASM_OP*" macro) in a variety of ways. For example, the 
> > > > > > > > unnamed section
> > > > > > > > "comm_section", might correspond to a .bss section, or emit a 
> > > > > > > > .comm
> > > > > > > > directive. I even looked at trying to parse them to extract 
> > > > > > > > what the
> > > > > > > > name of a section will be, but it would be very messy and not 
> > > > > > > > robust.
> > > > > > > >
> > > > > > > > Meanwhile, having a .retain  directive is a very 
> > > > > > > > simmple
> > > > > > > > solution, and keeps the GCC implementation really concise (patch
> > > > > > > > attached). The assembler will know for sure what the section 
> > > > > > > > containing
> > > > > > > > the symbol will be, and can apply the SHF_GNU_RETAIN flag 
> > > > > > > > directly.
> > > > > > > >
> > > > > >
> > > > > > Please take a look at
> > > > > >
> > > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/elf/shf_retain
> > > > > >
> > > > > > which is built in top of
> > > > > >
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539963.html
> > > > > >
> > > > > > I think SECTION2_RETAIN matches SHF_GNU_RETAIN well.  If you
> > > > > > want, you extract my flags2 change and use it for SHF_GNU_RETAIN.
> > > > >
> > > > > In your patch you have to make the assumption that data_section, 
> > > > > always
> > > > > corresponds to a section named .data. For just this example, c6x 
> > > > > (which
> > > > > supports the GNU ELF OSABI) does not fit the rule:
> > > > >
> > > > > > c6x/elf-common.h:#define DATA_SECTION_ASM_OP 
> > > > > > "\t.section\t\".fardata\",\"aw\""
> > > > >
> > > > > data_section for c6x corresponds to .fardata, not .data. So the use of
> > > > > "used" on a data declaration would place it in a different section, 
> > > > > that
> > > > > if the "used" attribute was not applied.
> > > > >
> > > > > For c6x and mips, readonly_data_section does not correspond to 
> > > > > .rodata,
> > > > > so that assumption cannot be made either:
> > > > > > c6x/elf-common.h:#define READONLY_DATA_SECTION_ASM_OP 
> > > > > > "\t.section\t\".const\",\"a\",@progbits"
> > > > > > mips/mips.h:#define READONLY_DATA_SECTION_ASM_OP"\t.rdata"  
> > > > > > /* read-only data */
> > > > >
> > > > > The same can be said for bss_section for c6x as well.
> > > >
> > > > Just add and use named_xxx_section.
> > > >
> >
> > 

[PATCH] testsuite: Clean up lto and offload dump files

2020-11-04 Thread Frederik Harwath

Hi,

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects leftover
dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is used
for matching the "ltrans" files is also changed since the existing
pattern failed to remove some LTO ("ltrans0.ltrans.") dump files.


This patch gets rid of at least one unresolved libgomp test result that
would otherwise be introduced by the patch discussed in this thread:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557889.html


diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {

[...]

-   lappend tfiles "$stem.{$basename_ext,exe}"

I do not understand why "exe" should be included here. I have removed it
and I did not notice any files matching the resultig pattern being left
back by "make check-gcc".


Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9eb5da60e8822e1f6fa90b32bff6123ed62c146c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 4 Nov 2020 14:09:46 +0100
Subject: [PATCH] testsuite: Clean up lto and offload dump files

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects
leftover dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is
used for matching the "ltrans" files is also changed since the
existing pattern failed to match some dump files.

2020-11-04  Frederik Harwath  

gcc/testsuite/ChangeLog:

	* lib/gcc-dg.exp (proc schedule-cleanups): Adapt "-flto" handling,
	add "-foffload" handling.
---
 gcc/testsuite/lib/gcc-dg.exp | 50 
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {
 # stem.ext..
 # (tree)passes can have multiple instances, thus optional trailing *
 set ptn "\[0-9\]\[0-9\]\[0-9\]$ptn.*"
+set ltrans no
+set mkoffload no
+
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
 	verbose "Cleanup -flto seen" 4
-	set ltrans "{ltrans\[0-9\]*.,}"
-} else {
-	set ltrans ""
+	set ltrans yes
+}
+
+if [regexp -- {(^|\s+)-foffload=} $opts] {
+	verbose "Cleanup -foffload seen" 4
+	set mkoffload yes
 }
-set ptn "$ltrans$ptn"
+
 verbose "Cleanup final ptn: $ptn" 4
 set tfiles {}
 foreach src $testcases {
-	set basename [file tail $src]
-	if { $ltrans != "" } {
-	# ??? should we use upvar 1 output_file instead of this (dup ?)
-	set stem [file rootname $basename]
-	set basename_ext [file extension $basename]
-	if {$basename_ext != ""} {
-		regsub -- {^.*\.} $basename_ext {} basename_ext
-	}
-	lappend tfiles "$stem.{$basename_ext,exe}"
-	unset basename_ext
-	} else {
-	lappend tfiles $basename
-	}
+set basename [file tail $src]
+set stem [file rootname $basename]
+set basename_ext [file extension $basename]
+if {$basename_ext != ""} {
+regsub -- {^.*\.} $basename_ext {} basename_ext
+}
+set extensions [list $basename_ext]
+
+if { $ltrans == yes } {
+lappend extensions "ltrans\[0-9\]*.ltrans"
+}
+if { $mkoffload == yes} {
+# The * matches the offloading target's name, e.g. "xnvptx-none".
+lappend extensions "*.mkoffload"
+}
+
+set extensions_ptn [join $extensions ","]
+if { [llength $extensions] > 1 } {
+set extensions_ptn "{$extensions_ptn}"
+}
+
+  	lappend tfiles "$stem.$extensions_ptn"
 }
+
 if { [llength $tfiles] > 1 } {
 	set tfiles [join $tfiles ","]
 	set tfiles "{$tfiles}"
-- 
2.17.1



RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-11-04 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 12:41 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; nd ;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > This is a respin which includes the changes you requested.
> 
> Comments randomly ordered, I'm pasting in pieces of the patch - sending it
> inline would help to get pieces properly quoted and in-order.
> 
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9
> aa42ffe0ce0dd
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> ...
> 
> I miss comments in this file, see tree-vectorizer.h where we try to document
> purpose of classes and fields.
> 
> Things that sticks out to me:
> 
> +uint8_t m_arity;
> +uint8_t m_num_args;
> 
> why uint8_t and not simply unsigned int?  Not knowing what arity /
> num_args should be here ;)

I think I can remove arity, but num_args is how many operands the created
internal function call should take.  Since we can't vectorize calls with more 
than
4 arguments at the moment it seemed like 255 would be a safe limit :).

> 
> +vec_info *m_vinfo;
> ...
> +vect_pattern (slp_tree *node, vec_info *vinfo)
> 
> so this looks like something I freed stmt_vec_info of - back-pointers in the
> "wrong" direction of the logical hierarchy.  I suppose it's just to avoid 
> passing
> down vinfo where we need it?  Please do that instead - pass down vinfo as
> everything else does.
> 
> The class seems to expose both very high-level (build () it!) and very low
> level details (get_ifn).  The high-level one suggests that a pattern _not_
> being represented by an ifn is possible but there's too much implementation
> detail already in the vect_pattern class to make that impossible.  I guess the
> IFN details could be pushed down to the simple matching class (and that be
> called vect_ifn_pattern or so).
> 
> +static bool
> +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> +  bool found_p = false;
> +
> +  if (dump_enabled_p ())
> +{
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match
> --\n");
> +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> +}
> 
> we dumped all instances after their analysis.  Maybe just refer to the
> instance with its address (dump_print %p) so lookup in the (already large)
> dump file is easy.
> 
> +  hash_set *visited = new hash_set ();  for
> + (unsigned x = 0; x < num__slp_patterns; x++)
> +{
> +  visited->empty ();
> +  found_p |= vect_match_slp_patterns_2 (ref_node, vinfo,
> slp_patterns[x],
> +   visited);
> +}
> +
> +  delete visited;
> 
> no need to new / delete, just do
> 
>   has_set visited;
> 
> like everyone else.  Btw, do you really want to scan pieces of the SLP graph
> (with instances being graph entries) multiple times?  If not then you should
> move the visited set to the caller instead.
> 
> +  /* TODO: Remove in final version, only here for generating debug dot
> graphs
> +  from SLP tree.  */
> +
> +  if (dump_enabled_p ())
> +{
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> +  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +  dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> +}
> 
> now, if there was some pattern matched it is probably useful to dump the
> graph (entry) again.  But only conditional on that I think.  So can you 
> instead
> make the dump conditional on found_p and remove the start dot/end dot
> markers as said in the comment?
> 
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"transformation for %s not valid due to
> + post
> "
> +"condition\n",
> 
> not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> vectorization fail.
> 
> +  /* Perform recursive matching, it's important to do this after
> + matching
> things
> 
> before matching things?
> 
> + in the current node as the matches here may re-order the nodes
> + below
> it.
> + As such the pattern that needs to be subsequently match may change.
> 
> and this is no longer true?
> 
> */
> +
> +  if (SLP_TREE_CHILDREN (node).exists ()) {
> 
> elide this check, the loop will simply not run if empty
> 
> +slp_tree child;
> +FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> 
> I think you want to perform the recursion in the caller so you do it only once
> and not once for each 

Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Christophe Lyon via Gcc-patches
On Wed, 4 Nov 2020 at 14:29, Christophe Lyon  wrote:
>
> On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
>  wrote:
> >
> > Hi Andrea,
> >
> > > -Original Message-
> > > From: Andrea Corallo 
> > > Sent: 26 October 2020 15:59
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > > ; nd 
> > > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
> > >
> > > Hi all,
> > >
> > > I'd like to submit the following patch implementing the bfloat16_t
> > > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
> > >
> > > Please see refer to:
> > > ACLE 
> > > ISA  
> > >
> > > Regtested and bootstrapped.
> > >
> > > Okay for trunk?
> >
>
> I think you need to add -mfloat-abi=hard to the dg-additional-options
> otherwise vld1_lane_bf16_1.c
> fails on targets with a soft float-abi default (eg arm-linux-gnueabi).
>
> See bf16_vldn_1.c.

Actually that's not sufficient because in turn we get:
/sysroot-arm-none-linux-gnueabi/usr/include/gnu/stubs.h:10:11: fatal
error: gnu/stubs-hard.h: No such file or directory

So you should check that -mfloat-abi=hard is supported.

Ditto for the vst tests.

>
> BTW, why did you use a different naming scheme for the tests?
> (bf16_vldn_1.c vs vld1_lane_bf16_1.c)
>
> Christophe
>
> > Ok.
> > Thanks,
> > Kyrill
> >
> >
> > >
> > >   Andrea
> >


[committed] libstdc++: Define new C++17 std::search overload for Parallel Mode [PR 94971]

2020-11-04 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/94971
* include/bits/stl_algo.h (search(FIter, FIter, const Searcher):
Adjust #if condition.
* include/parallel/algo.h (search(FIter, FIter, const Searcher&):
Define new overload for C++17.

Tested powerpc64le-linux. Committed to trunk.

commit e0af865ab9d9d5b6b3ac7fdde26cf9bbf635b6b4
Author: Jonathan Wakely 
Date:   Wed Nov 4 13:36:32 2020

libstdc++: Define new C++17 std::search overload for Parallel Mode [PR 
94971]

libstdc++-v3/ChangeLog:

PR libstdc++/94971
* include/bits/stl_algo.h (search(FIter, FIter, const Searcher):
Adjust #if condition.
* include/parallel/algo.h (search(FIter, FIter, const Searcher&):
Define new overload for C++17.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 621c6331422e..6efc99035b7d 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -4243,7 +4243,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
__gnu_cxx::__ops::__iter_comp_val(__binary_pred, __val));
 }
 
-#if __cplusplus > 201402L
+#if __cplusplus >= 201703L
   /** @brief Search a sequence using a Searcher object.
*
*  @param  __firstA forward iterator.
diff --git a/libstdc++-v3/include/parallel/algo.h 
b/libstdc++-v3/include/parallel/algo.h
index cec6fd003c38..4b6dcc841191 100644
--- a/libstdc++-v3/include/parallel/algo.h
+++ b/libstdc++-v3/include/parallel/algo.h
@@ -1049,6 +1049,21 @@ namespace __parallel
 std::__iterator_category(__begin2));
 }
 
+#if __cplusplus >= 201703L
+  /** @brief Search a sequence using a Searcher object.
+   *
+   *  @param  __firstA forward iterator.
+   *  @param  __last A forward iterator.
+   *  @param  __searcher A callable object.
+   *  @return @p __searcher(__first,__last).first
+  */
+  template
+inline _ForwardIterator
+search(_ForwardIterator __first, _ForwardIterator __last,
+  const _Searcher& __searcher)
+{ return __searcher(__first, __last).first; }
+#endif
+
   // Sequential fallback
   template
 inline _FIterator


[patch, committed] targhooks.c: Fix -fzero-call-used-regs 'sorry' typo

2020-11-04 Thread Tobias Burnus

As also remarked in Christophe in PR97699.
Committed as obvious.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 243492e2c69741b91dbfe3bba9b772f65fc9354c
Author: Tobias Burnus 
Date:   Wed Nov 4 14:31:34 2020 +0100

targhooks.c: Fix -fzero-call-used-regs 'sorry' typo

gcc/ChangeLog:

* targhooks.c (default_zero_call_used_regs): Fix flag-name typo
in sorry.

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 4e4d100c547..5b68a2ad7d4 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1011,7 +1011,7 @@ default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	  {
 		issued_error = true;
 		sorry ("%qs not supported on this target",
-			"-fzero-call-used_regs");
+			"-fzero-call-used-regs");
 	  }
 	delete_insns_since (last_insn);
 	  }


Re: [PATCH v2 10/18]middle-end simplify lane permutes which selects from loads from the same DR.

2020-11-04 Thread Richard Biener
On Tue, 3 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This change allows one to simplify lane permutes that select from multiple 
> load
> leafs that load from the same DR group by promoting the VEC_PERM node into a
> load itself and pushing the lane permute into it as a load permute.
> 
> This saves us from having to calculate where to materialize a new load node.
> If the resulting loads are now unused they are freed and are removed from the
> graph.
> 
> This allows us to handle cases where we would have generated:
> 
>   moviv4.4s, 0
>   adrpx3, .LC0
>   ldr q5, [x3, #:lo12:.LC0]
>   mov x3, 0
>   .p2align 3,,7
> .L2:
>   mov v0.16b, v4.16b
>   mov v3.16b, v4.16b
>   ldr q1, [x1, x3]
>   ldr q2, [x0, x3]
>   fcmla   v0.4s, v2.4s, v1.4s, #0
>   fcmla   v3.4s, v1.4s, v2.4s, #0
>   fcmla   v0.4s, v2.4s, v1.4s, #270
>   fcmla   v3.4s, v1.4s, v2.4s, #270
>   mov v1.16b, v3.16b
>   tbl v0.16b, {v0.16b - v1.16b}, v5.16b
>   str q0, [x2, x3]
>   add x3, x3, 16
>   cmp x3, 1600
>   bne .L2
>   ret
> 
> and instead generate
> 
>   mov x3, 0
>   .p2align 3,,7
> .L27:
>   ldr q0, [x2, x3]
>   ldr q1, [x0, x3]
>   ldr q2, [x1, x3]
>   fcmla   v0.2d, v1.2d, v2.2d, #0
>   fcmla   v0.2d, v1.2d, v2.2d, #270
>   str q0, [x2, x3]
>   add x3, x3, 16
>   cmp x3, 512
>   bne .L27
>   ret
> 
> This runs as a pre step such that permute simplification can still inspect 
> this
> permute is needed
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Tests are included as part of the final patch as they need the SLP pattern
> matcher to insert permutes in between.
> 
> Ok for master?

So I think this is too specialized for the general issue that we're
doing a bad job in CSEing the load part of different permutes of
the same group.  I've played with fixing this half a year ago (again)
in multiple general ways but they all caused some regressions.

So you're now adding some heuristics as to when to anticipate
"CSE" (or merging with followup permutes).

To quickly recap what I did consider two loads (V2DF)
one { a[0], a[1] } and the other { a[1], a[0] }.  They
currently are two SLP nodes and one with a load_permutation.
My original attempts focused on trying to get rid of load_permutation
in favor of lane_permute nodes and thus during SLP discovery
I turned the second into { a[0], a[1] } (magically unified with
the other load) and a followup lane-permute node.

So for your case you have IIUC { a[0], a[0] } and { a[1], a[1] }
which eventually will (due to patterns) be lane-permuted
into { a[0], a[1] }, right?  So generalizing this as
a single { a[0], a[1] } plus two lane-permute nodes  { 0, 0 }
and { 1, 1 } early would solve the issue as well?  Now,
in general it might be more profitable to generate the
{ a[0], a[0] } and { a[1], a[1] } via scalar-load-and-splat
rather than vector load and permute so we have to be careful
to not over-optimize here or be prepared to do the reverse
transform.

The patch itself is a bit ugly since it modifies the SLP
graph when we already produced the graphds graph so I
would do any of this before.  I did consider gathering
all loads nodes loading from a group and then trying to
apply some heuristic to alter the SLP graph so it can
be better optimized.  In fact when we want to generate
the same code as the non-SLP interleaving scheme does
we do have to look at those since we have to unify
loads there.

I'd put this after vect_slp_build_vertices but before
the new_graph call - altering 'vertices' / 'leafs' should
be more easily possible and the 'leafs' array contains
all loads already (vect_slp_build_vertices could be massaged
to provide a map from DR_GROUP_FIRST_ELEMENT to slp_tree,
giving us the meta we want).

That said, I'd like to see something more forward-looking
rather than the ad-hoc special-casing of what you run into
with the pattern matching.

In case we want to still go with the special-casing it
should IMHO be done in a pre-order walk simply
looking for lane permute nodes with children that all
load from the same group performing what you do before
any of the vertices/graph stuff is built.  That's
probably easiest at this point and it can be done
when then bst_map is still around so you can properly
CSE the new load you build.

Thanks,
Richard.



> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.c (vect_optimize_slp): Promote permutes.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics

2020-11-04 Thread Christophe Lyon via Gcc-patches
On Tue, 3 Nov 2020 at 11:27, Kyrylo Tkachov via Gcc-patches
 wrote:
>
> Hi Andrea,
>
> > -Original Message-
> > From: Andrea Corallo 
> > Sent: 26 October 2020 15:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > ; nd 
> > Subject: [PATCH 1/x] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
> >
> > Hi all,
> >
> > I'd like to submit the following patch implementing the bfloat16_t
> > neon related load intrinsics: vld1_lane_bf16, vld1q_lane_bf16.
> >
> > Please see refer to:
> > ACLE 
> > ISA  
> >
> > Regtested and bootstrapped.
> >
> > Okay for trunk?
>

I think you need to add -mfloat-abi=hard to the dg-additional-options
otherwise vld1_lane_bf16_1.c
fails on targets with a soft float-abi default (eg arm-linux-gnueabi).

See bf16_vldn_1.c.

BTW, why did you use a different naming scheme for the tests?
(bf16_vldn_1.c vs vld1_lane_bf16_1.c)

Christophe

> Ok.
> Thanks,
> Kyrill
>
>
> >
> >   Andrea
>


[5/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From bad08833616e9dd7a212e55b93503200393da942 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Sun, 30 Aug 2020 10:21:35 +0200
Subject: [PATCH 5/7] Abort if Gimple produced from C++ or Fortran sources is
 found.

2020-11-04  Erick Ochoa  

* gcc/ipa-field-reorder: Add flag to exit transformation
* gcc/ipa-type-escape-analysis: Same

---
 gcc/ipa-field-reorder.c|  3 +-
 gcc/ipa-type-escape-analysis.c | 53 --
 gcc/ipa-type-escape-analysis.h |  2 ++
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index 611089ecf24..c23e6a3f818 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -590,6 +590,7 @@ lto_fr_execute ()
 {
   log ("here in field reordering \n");
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
@@ -597,7 +598,7 @@ lto_fr_execute ()
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, 0);

-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return 0;

   // Prepare for transformation.
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index 9944580da6c..b06f33e24fb 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -170,6 +170,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"

+#define ABORT_IF_NOT_C true
+
+bool detected_incompatible_syntax = false;
+
 // Main function that drives dfe.
 static unsigned int
 lto_dfe_execute ();
@@ -256,13 +260,14 @@ static void
 lto_dead_field_elimination ()
 {
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, OPT_Wdfa);
-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return;

 // Prepare for transformation.
@@ -589,6 +594,7 @@ TypeWalker::_walk (const_tree type)
   // Improve, verify that having a type is an invariant.
   // I think there was a specific example which didn't
   // allow for it
+  if (detected_incompatible_syntax) return;
   if (!type)
 return;

@@ -642,9 +648,9 @@ TypeWalker::_walk (const_tree type)
 case POINTER_TYPE:
   this->walk_POINTER_TYPE (type);
   break;
-case REFERENCE_TYPE:
-  this->walk_REFERENCE_TYPE (type);
-  break;
+//case REFERENCE_TYPE:
+//  this->walk_REFERENCE_TYPE (type);
+//  break;
 case ARRAY_TYPE:
   this->walk_ARRAY_TYPE (type);
   break;
@@ -654,18 +660,24 @@ TypeWalker::_walk (const_tree type)
 case FUNCTION_TYPE:
   this->walk_FUNCTION_TYPE (type);
   break;
-case METHOD_TYPE:
-  this->walk_METHOD_TYPE (type);
-  break;
+//case METHOD_TYPE:
+  //this->walk_METHOD_TYPE (type);
+  //break;
 // Since we are dealing only with C at the moment,
 // we don't care about QUAL_UNION_TYPE nor LANG_TYPEs
 // So fail early.
+case REFERENCE_TYPE:
+case METHOD_TYPE:
 case QUAL_UNION_TYPE:
 case LANG_TYPE:
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -848,6 +860,7 @@ TypeWalker::_walk_arg (const_tree t)
 void
 ExprWalker::walk (const_tree e)
 {
+  if (detected_incompatible_syntax) return;
   _walk_pre (e);
   _walk (e);
   _walk_post (e);
@@ -932,7 +945,11 @@ ExprWalker::_walk (const_tree e)
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -1165,6 +1182,7 @@ GimpleWalker::walk ()
   cgraph_node *node = NULL;
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
+  if (detected_incompatible_syntax) return;
   node->get_untransformed_body ();
   tree decl = node->decl;
   gcc_assert (decl);
@@ -1411,7 +1429,11 @@ GimpleWalker::_walk_gimple (gimple *stmt)
   // Break if something is unexpected.
   const char *name = gimple_code_name[code];
   log ("gimple code name %s\n", name);
+#ifdef ABORT_IF_NOT_C
+  detected_incompatible_syntax = true;
+#else
   gcc_unreachable ();
+#endif
 }

 void
@@ -2947,6 +2969,8 @@ TypeStringifier::stringify (const_tree t)
 return 

[2/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From 09feb1cc82a5d9851a6b524e37c32554b923b1c4 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Thu, 6 Aug 2020 14:07:20 +0200
Subject: [PATCH 2/7] Add Dead Field Elimination

Using the Dead Field Analysis, Dead Field Elimination
automatically transforms gimple to eliminate fields that
are never read.

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add file to list of sources
* gcc/ipa-dfe.c: New
* gcc/ipa-dfe.h: Same
* gcc/ipa-type-escape-analysis.h: Export code used in dfe.
* gcc/ipa-type-escape-analysis.c: Call transformation

---
 gcc/Makefile.in|1 +
 gcc/ipa-dfe.c  | 1280 
 gcc/ipa-dfe.h  |  250 +++
 gcc/ipa-type-escape-analysis.c |   21 +-
 gcc/ipa-type-escape-analysis.h |   10 +
 5 files changed, 1553 insertions(+), 9 deletions(-)
 create mode 100644 gcc/ipa-dfe.c
 create mode 100644 gcc/ipa-dfe.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8b18c9217a2..8ef6047870b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-type-escape-analysis.o \
+   ipa-dfe.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
new file mode 100644
index 000..c048fac8621
--- /dev/null
+++ b/gcc/ipa-dfe.c
@@ -0,0 +1,1280 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field elimination (IPA-DFE)
+
+   The goal of this transformation is to
+
+   1) Create new types to replace RECORD_TYPEs which hold dead fields.
+   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
+   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
+   4) Fix some instances of pointer arithmetic.
+   5) Relayout where needed.
+
+   First stage - DFA
+   =
+
+   Use DFA to compute the set of FIELD_DECLs which can be deleted.
+
+   Second stage - Reconstruct Types
+   
+
+   This stage is done by two family of classes, the SpecificTypeCollector
+   and the TypeReconstructor.
+
+   The SpecificTypeCollector collects all TYPE_P trees which point to
+   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
+   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
+   trees with the new RECORD_TYPE trees.
+
+   Third stage - Substitute Types and Relayout
+   ===
+
+   This stage is handled by ExprRewriter and GimpleRewriter.
+   Some pointer arithmetic is fixed here to take into account those 
eliminated

+   FIELD_DECLS.
+ */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "basic-block.h" //needed for gimple.h
+#include "function.h"//needed for gimple.h
+#include "gimple.h"
+#include "stor-layout.h"
+#include "cfg.h" // needed for gimple-iterator.h
+#include "gimple-iterator.h"
+#include "gimplify.h"  //unshare_expr
+#include "value-range.h"   // make_ssa_name dependency
+#include "tree-ssanames.h" // make_ssa_name
+#include "ssa.h"
+#include "tree-into-ssa.h"
+#include "gimple-ssa.h" // update_stmt
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-ssa-alias.h"
+#include "tree-ssanames.h"
+#include "gimple.h"

[6/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From 1609f4713b6d0ab2e84e52b4fbd6f645f10a95e7 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Fri, 16 Oct 2020 08:49:08 +0200
Subject: [PATCH 6/7] Add heuristic to take into account void* pattern.

We add a heuristic in order to be able to transform functions which
receive void* arguments as a way to generalize over arguments. An
example of this is qsort. The heuristic works by first inspecting
leaves in the call graph. If the leaves only contain a reference
to a single RECORD_TYPE then we color the nodes in the call graph
as "casts are safe in this function and does not call external
visible functions". We propagate this property up the callgraph
until a fixed point is reached. This will later be changed to
use ipa-modref.

2020-11-04  Erick Ochoa  

* ipa-type-escape-analysis.c : Add new heuristic
* ipa-field-reorder.c : Use heuristic
* ipa-type-escape-analysis.h : Change signatures
---
 gcc/ipa-field-reorder.c|   3 +-
 gcc/ipa-type-escape-analysis.c | 186 +++--
 gcc/ipa-type-escape-analysis.h |  72 -
 3 files changed, 246 insertions(+), 15 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index c23e6a3f818..5dcc5a38958 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -591,8 +591,9 @@ lto_fr_execute ()
   log ("here in field reordering \n");
   // Analysis.
   detected_incompatible_syntax = false;
+  std::map whitelisted = get_whitelisted_nodes();
   tpartitions_t escaping_nonescaping_sets
-= partition_types_into_escaping_nonescaping ();
+= partition_types_into_escaping_nonescaping (whitelisted);
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index b06f33e24fb..fe68eaf70c7 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -166,6 +166,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 

 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"
@@ -249,6 +250,99 @@ lto_dfe_execute ()
   return 0;
 }

+/* Heuristic to determine if casting is allowed in a function.
+ * This heuristic attempts to allow casting in functions which follow the
+ * pattern where a struct pointer or array pointer is casted to void* or
+ * char*.  The heuristic works as follows:
+ *
+ * There is a simple per-function analysis that determines whether there
+ * is more than 1 type of struct referenced in the body of the method.
+ * If there is more than 1 type of struct referenced in the body,
+ * then the layout of the structures referenced within the body
+ * cannot be casted.  However, if there's only one type of struct 
referenced

+ * in the body of the function, casting is allowed in the function itself.
+ * The logic behind this is that the if the code follows good programming
+ * practices, the only way the memory should be accessed is via a singular
+ * type. There is also another requisite to this per-function analysis, and
+ * that is that the function can only call colored functions or functions
+ * which are available in the linking unit.
+ *
+ * Using this per-function analysis, we then start coloring leaf nodes 
in the

+ * call graph as ``safe'' or ``unsafe''.  The color is propagated to the
+ * callers of the functions until a fixed point is reached.
+ */
+std::map
+get_whitelisted_nodes ()
+{
+  cgraph_node *node = NULL;
+  std::set nodes;
+  std::set leaf_nodes;
+  std::set leaf_nodes_decl;
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  {
+node->get_untransformed_body ();
+nodes.insert(node);
+if (node->callees) continue;
+
+leaf_nodes.insert (node);
+leaf_nodes_decl.insert (node->decl);
+  }
+
+  std::queue worklist;
+  for (std::set::iterator i = leaf_nodes.begin (),
+e = leaf_nodes.end (); i != e; ++i)
+  {
+if (dump_file) fprintf (dump_file, "is a leaf node %s\n", 
(*i)->name ());

+worklist.push (*i);
+  }
+
+  for (std::set::iterator i = nodes.begin (),
+e = nodes.end (); i != e; ++i)
+  {
+worklist.push (*i);
+  }
+
+  std::map map;
+  while (!worklist.empty ())
+  {
+
+if (detected_incompatible_syntax) return map;
+cgraph_node *i = worklist.front ();
+worklist.pop ();
+if (dump_file) fprintf (dump_file, "analyzing %s %p\n", i->name (), i);
+GimpleWhiteLister whitelister;
+whitelister._walk_cnode (i);
+bool no_external = whitelister.does_not_call_external_functions (i, 
map);

+bool before_in_map = map.find (i->decl) != map.end ();
+bool place_callers_in_worklist = !before_in_map;
+if (!before_in_map)
+{
+  map.insert(std::pair(i->decl, no_external));
+} else
+{
+  map[i->decl] = no_external;
+}
+bool previous_value = map[i->decl];
+place_callers_in_worklist |= 

[7/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From 747b13bf2c6f5b17bc46316998f01483f8039548 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Wed, 4 Nov 2020 13:42:35 +0100
Subject: [PATCH 7/7] Getting rid of warnings


2020-11-04  Erick Ochoa  

* gcc/ipa-dfe.c : Change const_tree to tree
* gcc/ipa-dfe.h : same
* gcc/ipa-field-reorder.h : same
* gcc/ipa-type-escape-analysis.c : same, add unused attribute
* gcc/ipa-type-escape-analysis.h : same, add unused attribute

---
 gcc/ipa-dfe.c  | 164 -
 gcc/ipa-dfe.h  |  80 ++---
 gcc/ipa-field-reorder.c|  72 ++--
 gcc/ipa-type-escape-analysis.c | 612 -
 gcc/ipa-type-escape-analysis.h | 312 -
 5 files changed, 621 insertions(+), 619 deletions(-)

diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index 16f594a36b9..e163a32617c 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -126,22 +126,22 @@ along with GCC; see the file COPYING3.  If not see
  * Find all non_escaping types which point to RECORD_TYPEs in
  * record_field_offset_map.
  */
-std::set
+std::set
 get_all_types_pointing_to (record_field_offset_map_t 
record_field_offset_map,

   tpartitions_t casting)
 {
   const tset_t _escaping = casting.non_escaping;

-  std::set specific_types;
+  std::set specific_types;
   TypeStringifier stringifier;

   // Here we are just placing the types of interest in a set.
-  for (std::map::const_iterator i
+  for (std::map::const_iterator i
= record_field_offset_map.begin (),
e = record_field_offset_map.end ();
i != e; ++i)
 {
-  const_tree record = i->first;
+  tree record = i->first;
   std::string name = stringifier.stringify (record);
   specific_types.insert (record);
 }
@@ -150,16 +150,16 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,


   // SpecificTypeCollector will collect all types which point to the 
types in

   // the set.
-  for (std::set::const_iterator i = non_escaping.begin (),
+  for (std::set::const_iterator i = non_escaping.begin (),
e = non_escaping.end ();
i != e; ++i)
 {
-  const_tree type = *i;
+  tree type = *i;
   specifier.walk (type);
 }

   // These are all the types which need modifications.
-  std::set to_modify = specifier.get_set ();
+  std::set to_modify = specifier.get_set ();
   return to_modify;
 }

@@ -178,24 +178,24 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,

  */
 reorg_maps_t
 get_types_replacement (record_field_offset_map_t record_field_offset_map,
-  std::set to_modify)
+  std::set to_modify)
 {
   TypeStringifier stringifier;

   TypeReconstructor reconstructor (record_field_offset_map, "reorg");
-  for (std::set::const_iterator i = to_modify.begin (),
+  for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
-  const_tree record = *i;
+  tree record = *i;
   reconstructor.walk (TYPE_MAIN_VARIANT (record));
 }

-  for (std::set::const_iterator i = to_modify.begin (),
+  for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
-  const_tree record = *i;
+  tree record = *i;
   reconstructor.walk (record);
 }

@@ -205,11 +205,11 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,
   // Here, we are just making sure that we are not doing anything too 
crazy.

   // Also, we found some types for which TYPE_CACHED_VALUES_P is not being
   // rewritten.  This is probably indicative of a bug in 
TypeReconstructor.

-  for (std::map::const_iterator i = map.begin (),
+  for (std::map::const_iterator i = map.begin (),
  e = map.end ();
i != e; ++i)
 {
-  const_tree o_record = i->first;
+  tree o_record = i->first;
   std::string o_name = stringifier.stringify (o_record);
   log ("original: %s\n", o_name.c_str ());
   tree r_record = i->second;
@@ -220,7 +220,7 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

continue;
   tree m_record = TYPE_MAIN_VARIANT (r_record);
   // Info: We had a bug where some TYPED_CACHED_VALUES were preserved?
-  tree _o_record = const_tree_to_tree (o_record);
+  tree _o_record = tree_to_tree (o_record);
   TYPE_CACHED_VALUES_P (_o_record) = false;
   TYPE_CACHED_VALUES_P (m_record) = false;

@@ -252,44 +252,44 @@ substitute_types_in_program (reorg_record_map_t map,
 /* Return a set of trees which point to the set of trees
  * that can be modified.
  */
-std::set
+std::set
 SpecificTypeCollector::get_set ()
 {
   return to_return;
 }

 void
-SpecificTypeCollector::_walk_POINTER_TYPE_pre 

[3/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From 91947eea01a41bd7b17e501ad7d53dfb6499eefc Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Sun, 9 Aug 2020 10:22:49 +0200
Subject: [PATCH 3/7] Add Field Reordering

Field reordering of structs at link-time

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add new file to list of sources
* gcc/common.opt: add new flag for field reordering
* gcc/passes.def: add new pass
* gcc/tree-pass.h: same
* gcc/ipa-field-reorder.c: New file
* gcc/ipa-type-escape-analysis.c: Export common functions
* gcc/ipa-type-escape-analysis.h: Same

---
 gcc/Makefile.in|   1 +
 gcc/common.opt |   4 +
 gcc/ipa-dfe.c  |  84 -
 gcc/ipa-dfe.h  |  26 +-
 gcc/ipa-field-reorder.c| 625 +
 gcc/ipa-type-escape-analysis.c |  44 ++-
 gcc/ipa-type-escape-analysis.h |  12 +-
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   2 +
 9 files changed, 751 insertions(+), 48 deletions(-)
 create mode 100644 gcc/ipa-field-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ef6047870b..2184bd0fc3d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1417,6 +1417,7 @@ OBJS = \
internal-fn.o \
ipa-type-escape-analysis.o \
ipa-dfe.o \
+   ipa-field-reorder.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 39bb6e100c3..035c1e8850f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3484,4 +3484,8 @@ fprint-access-analysis
 Common Report Var(flag_print_access_analysis) Optimization
 This flag is used to print the access analysis (if field is read or 
written to).


+fipa-field-reorder
+Common Report Var(flag_ipa_field_reorder) Optimization
+Reorder fields.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index c048fac8621..16f594a36b9 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -242,9 +242,9 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

  */
 void
 substitute_types_in_program (reorg_record_map_t map,
-reorg_field_map_t field_map)
+reorg_field_map_t field_map, bool _delete)
 {
-  GimpleTypeRewriter rewriter (map, field_map);
+  GimpleTypeRewriter rewriter (map, field_map, _delete);
   rewriter.walk ();
   rewriter._rewrite_function_decl ();
 }
@@ -358,8 +358,11 @@ TypeReconstructor::set_is_not_modified_yet 
(const_tree t)

 return;

   tree type = _reorg_map[tt];
-  const bool is_modified
+  bool is_modified
 = strstr (TypeStringifier::get_type_identifier (type).c_str (), 
".reorg");

+  is_modified
+|= (bool) strstr (TypeStringifier::get_type_identifier (type).c_str (),
+ ".reorder");
   if (!is_modified)
 return;

@@ -405,14 +408,20 @@ TypeReconstructor::is_memoized (const_tree t)
   return already_changed;
 }

-static tree
-get_new_identifier (const_tree type)
+const char *
+TypeReconstructor::get_new_suffix ()
+{
+  return _suffix;
+}
+
+tree
+get_new_identifier (const_tree type, const char *suffix)
 {
   const char *identifier = TypeStringifier::get_type_identifier 
(type).c_str ();

-  const bool is_new_type = strstr (identifier, "reorg");
+  const bool is_new_type = strstr (identifier, suffix);
   gcc_assert (!is_new_type);
   char *new_name;
-  asprintf (_name, "%s.reorg", identifier);
+  asprintf (_name, "%s.%s", identifier, suffix);
   return get_identifier (new_name);
 }

@@ -468,7 +477,9 @@ TypeReconstructor::_walk_ARRAY_TYPE_post (const_tree t)
   TREE_TYPE (copy) = build_variant_type_copy (TREE_TYPE (copy));
   copy = is_modified ? build_distinct_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   // This is useful so that we go again through type layout
   TYPE_SIZE (copy) = is_modified ? NULL : TYPE_SIZE (copy);
   tree domain = TYPE_DOMAIN (t);
@@ -521,7 +532,9 @@ TypeReconstructor::_walk_POINTER_TYPE_post 
(const_tree t)


   copy = is_modified ? build_variant_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   TYPE_CACHED_VALUES_P (copy) = false;

   tree _t = const_tree_to_tree (t);
@@ -616,7 +629,8 @@ TypeReconstructor::_walk_RECORD_TYPE_post (const_tree t)
   tree main = TYPE_MAIN_VARIANT (t);
   tree main_reorg = _reorg_map[main];
   tree copy_variant = 

[4/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

From a8c4d5b99d5c4168ede79054396cba514fdf23b5 Mon Sep 17 00:00:00 2001
From: Erick Ochoa 
Date: Mon, 10 Aug 2020 09:10:37 +0200
Subject: [PATCH 4/7] Add documentation for dead field elimination

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: Add file to documentation sources
* gcc/doc/dfe.texi: New section
* gcc/doc/gccint.texi: Include new section

---
 gcc/Makefile.in |   3 +-
 gcc/doc/dfe.texi| 187 
 gcc/doc/gccint.texi |   2 +
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 gcc/doc/dfe.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2184bd0fc3d..7e4c442416d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3275,7 +3275,8 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi		\

 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi   \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
-match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
+match-and-simplify.texi analyzer.texi ux.texi poly-int.texi\
+dfe.texi

 TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi \
 gcc-common.texi gcc-vers.texi
diff --git a/gcc/doc/dfe.texi b/gcc/doc/dfe.texi
new file mode 100644
index 000..e8d01d817d3
--- /dev/null
+++ b/gcc/doc/dfe.texi
@@ -0,0 +1,187 @@
+@c Copyright (C) 2001 Free Software Foundation, Inc.
+@c This is part of the GCC manual.
+@c For copying conditions, see the file gcc.texi.
+
+@node Dead Field Elimination
+@chapter Dead Field Elimination
+
+@node Dead Field Elimination Internals
+@section Dead Field Elimination Internals
+
+@subsection Introduction
+
+Dead field elimination is a compiler transformation that removes fields 
from structs. There are several challenges to removing fields from 
structs at link time but, depending on the workload of the compiled 
program and the architecture where the program runs, dead field 
elimination might be a worthwhile transformation to apply. Generally 
speaking, when the bottle-neck of an application is given by the memory 
bandwidth of the host system and the memory requested is of a struct 
which can be reduced in size, then that combination of workload, program 
and architecture can benefit from applying dead field elimination. The 
benefits come from removing unnecessary fields from structures and thus 
reducing the memory/cache requirements to represent a structure.

+
+
+
+While challenges exist to fully automate a dead field elimination 
transformation, similar and more powerful optimizations have been 
implemented in the past. Chakrabarti et al [0] implement struct peeling, 
splitting into hot and cold parts of a structure, and field reordering. 
Golovanevsky et al [1] also shows efforts to implement data layout 
optimizations at link time. Unlike the work of Chakrabarti and 
Golovanesky, this text only talks about dead field elimination. This 
doesn't mean that the implementation can't be expanded to perform other 
link-time layout optimizations, it just means that dead field 
elimination is the only transformation that is implemented at the time 
of this writing.

+
+[0] Chakrabarti, Gautam, Fred Chow, and L. PathScale. "Structure layout 
optimizations in the open64 compiler: Design, implementation and 
measurements." Open64 Workshop at the International Symposium on Code 
Generation and Optimization. 2008.

+
+[1] Golovanevsky, Olga, and Ayal Zaks. "Struct-reorg: current status 
and future perspectives." Proceedings of the GCC Developers’ Summit. 2007.

+
+@subsection Overview
+
+The dead field implementation is structured in the following way:
+
+
+@itemize @bullet
+@item
+Collect all types which can refer to a @code{RECORD_TYPE}. This means 
that if we have a pointer to a record, we also collect this pointer. Or 
an array, or a union.

+@item
+Mark types as escaping. More of this in the following section.
+@item
+Find fields which can be deleted. (Iterate over all gimple code and 
find which fields are read.)

+@item
+Create new types with removed fields (and reference these types in 
pointers, arrays, etc.)

+@item
+Modify gimple to include these types.
+@end itemize
+
+
+Most of this code relies on the visitor pattern. Types, Expr, and 
Gimple statements are visited using this pattern. You can find the base 
classes in @file{type-walker.c} @file{expr-walker.c} and 
@file{gimple-walker.c}. There are assertions in place where a type, 
expr, or gimple code is encountered which has not been encountered 
before during the testing of this transformation. This facilitates 
fuzzying of the transformation.

+
+@subsubsection Implementation Details: Is a global variable escaping?
+
+How does the analysis determine whether a global variable is visible to 
code outside the current linking unit? In the file 
@file{gimple-escaper.c} we have a simple 

[0/7] LTO Dead field elimination and field reordering

2020-11-04 Thread Erick Ochoa

Hi,

I've been working on several implementations of data layout 
optimizations for GCC, and I am again kindly requesting for a review of 
the type escape based dead field elimination and field reorg.


This patchset is organized in the following way:

* Adds a link-time warning if dead fields are detected
* Allows for the dead-field elimination transformation to be applied
* Reorganizes fields in structures.
* Adds some documentation
* Gracefully does not apply transformation if unknown syntax is detected.
* Adds a heuristic to handle void* casts

I have tested this transformations as extensively as I can. The way to 
trigger these transformations are:


-fipa-field-reorder and -fipa-type-escape-analysis

Having said that, I welcome all criticisms and will try to address those 
criticisms which I can. Please let me know if you have any questions or 
comments, I will try to answer in a timely manner.


There has been some initial discussion on the GCC mailing list but I'm 
submitting the patches to the patches mailing list now. Some of the 
initial criticisms mentioned on the GCC mailing list previously will be 
addressed in the following days, and I believe there is definitely 
enough time to address them all during Stage 1.


I had to add one last commit to account to some differences in the build 
script on master. I will be working today to squash it, but I still 
wanted to submit these patches in order to start the review process.


I have bootstrapped on aarch64-linux


Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Richard Sandiford via Gcc-patches
Tobias Burnus  writes:
> Three of the testcases fail on PowerPC: 
> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> unimplemented: '-fzero-call-used_regs' not supported on this target
>
> Did you miss some dg-require-effective-target ?

No, these are a signal to target maintainers that they need
to decide whether to add support or accept the status quo
(in which case a new effective-target will be needed).  See:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html:

The new tests are likely to fail on some targets with the sorry()
message, but I think target maintainers are best placed to decide
whether (a) that's a fundamental restriction of the target and the
tests should just be skipped or (b) the target needs to implement
the new hook.

Thanks,
Richard


Re: [PATCH v3] pass: Run cleanup passes before SLP [PR96789]

2020-11-04 Thread Christophe Lyon via Gcc-patches
On Tue, 3 Nov 2020 at 07:39, Kewen.Lin via Gcc-patches
 wrote:
>
> Hi Richard,
>
> Thanks again for your review!
>
> on 2020/11/2 下午6:23, Richard Sandiford wrote:
> > "Kewen.Lin"  writes:
> >> diff --git a/gcc/function.c b/gcc/function.c
> >> index 2c8fa217f1f..3e92ee9c665 100644
> >> --- a/gcc/function.c
> >> +++ b/gcc/function.c
> >> @@ -4841,6 +4841,8 @@ allocate_struct_function (tree fndecl, bool 
> >> abstract_p)
> >>   binding annotations among them.  */
> >>cfun->debug_nonbind_markers = lang_hooks.emits_begin_stmt
> >>  && MAY_HAVE_DEBUG_MARKER_STMTS;
> >> +
> >> +  cfun->pending_TODOs = 0;
> >
> > The field is cleared on allocation.  I think it would be better
> > to drop this, to avoid questions about why other fields aren't
> > similarly zero-initialised.
> >
> >>  }
> >>
> >>  /* This is like allocate_struct_function, but pushes a new cfun for FNDECL
> >> diff --git a/gcc/function.h b/gcc/function.h
> >> index d55cbddd0b5..ffed6520bf9 100644
> >> --- a/gcc/function.h
> >> +++ b/gcc/function.h
> >> @@ -269,6 +269,13 @@ struct GTY(()) function {
> >>/* Value histograms attached to particular statements.  */
> >>htab_t GTY((skip)) value_histograms;
> >>
> >> +  /* Different from normal TODO_flags which are handled right at the
> >> + begin or the end of one pass execution, the pending_TODOs are
> >
> > beginning
> >
> >> + passed down in the pipeline until one of its consumers can
> >> + perform the requested action.  Consumers should then clear the
> >> + flags for the actions that they have taken.  */
> >> +  unsigned int pending_TODOs;
> >> +
> >>/* For function.c.  */
> >>
> >>/* Points to the FUNCTION_DECL of this function.  */
> >> […]
> >> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> >> index 298ab215530..9a9076cee67 100644
> >> --- a/gcc/tree-ssa-loop-ivcanon.c
> >> +++ b/gcc/tree-ssa-loop-ivcanon.c
> >> @@ -1411,6 +1411,13 @@ tree_unroll_loops_completely_1 (bool 
> >> may_increase_size, bool unroll_outer,
> >>bitmap_clear (father_bbs);
> >>bitmap_set_bit (father_bbs, loop_father->header->index);
> >>  }
> >> +  else if (unroll_outer
> >> +   && !(cfun->pending_TODOs
> >> +& PENDING_TODO_force_next_scalar_cleanup))
> >> +{
> >> +  /* Trigger scalar cleanup once any outermost loop gets unrolled.  */
> >> +  cfun->pending_TODOs |= PENDING_TODO_force_next_scalar_cleanup;
> >> +}
> >
> > I can see it would make sense to test whether the flag is already set
> > if we were worried about polluting the cache line.  But this test and
> > set isn't performance-sensitive, so I think it would be clearer to
> > remove the “&& …” part of the condition.
> >
> > Nit: there should be no braces around the block, since it's a single
> > statement.
> >
> > OK with those changes, thanks.
>
> The patch was updated as your comments above, re-tested on Power8
> and committed in r11-4637.
>

The new test gcc.dg/tree-ssa/pr96789.c fails on arm:
FAIL: gcc.dg/tree-ssa/pr96789.c scan-tree-dump dse3 "Deleted dead store:.*tmp"

Can you check?


> BR,
> Kewen


RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

2020-11-04 Thread Richard Biener
On Wed, 4 Nov 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -Original Message-
> > From: rguent...@c653.arch.suse.de  On
> > Behalf Of Richard Biener
> > Sent: Wednesday, November 4, 2020 1:00 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> > Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> > permutes.
> > 
> > On Tue, 3 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This optimizes sequential permutes. i.e. if there are two permutes
> > > back to back this function applies the permute of the parent to the
> > > child and removed the parent.
> > >
> > > If the resulting permute in the child is now a no-op.  Then the child
> > > is also dropped from the graph and the parent's parent attached to the
> > child's child.
> > >
> > > This relies on the materialization point calculation in optimize SLP.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Tests are included as part of the final patch as they need the SLP
> > > pattern matcher to insert permutes in between.
> > >
> > > This allows us to remove useless permutes such as
> > >
> > >   ldr q0, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   trn1v1.4s, v0.4s, v0.4s
> > >   trn2v0.4s, v0.4s, v0.4s
> > >   trn1v0.4s, v1.4s, v0.4s
> > >   mov v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str q1, [x2, x3]
> > >
> > > from the sequence the vectorizer puts out and give
> > >
> > >   ldr q0, [x0, x3]
> > >   ldr q2, [x1, x3]
> > >   mov v1.16b, v3.16b
> > >   fcmla   v1.4s, v0.4s, v2.4s, #0
> > >   fcmla   v1.4s, v0.4s, v2.4s, #90
> > >   str q1, [x2, x3]
> > >
> > > instead
> > >
> > > Ok for master?
> > 
> > + /* If the remaining permute is a no-op then we can just
> > + drop
> > the
> > +node instead of materializing it.  */
> > + if (vect_slp_tree_permute_noop_p (node))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"removing unneeded permute node
> > %p\n",
> > +node);
> > +
> > +  unsigned idx = SLP_TREE_LANE_PERMUTATION
> > (node)[0].first;
> > +  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> > +  unsigned src = slpg->vertices[node->vertex].pred->src;
> > +  slp_tree prev = vertices[src];
> > +  unsigned dest;
> > +  slp_tree tmp;
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> > +if (tmp == node)
> > +  {
> > + SLP_TREE_CHILDREN (prev)[dest] = value;
> > + break;
> > +   }
> > 
> > so I don't think this will work reliably since we do not update the graph 
> > when
> > inserting permute nodes and thus the "parent"
> > can refer to a permute rather than the original node now (we're just walking
> > over all vertices in no specific order during materialization - guess using 
> > IPO
> > might fix this apart from in cycles).  You would also need to iterate over 
> > preds
> > here (pred_next).
> > I guess removing no-op permutes is only important for costing?
> > They should not cause any actual code generation?
> 
> Yeah, it's just for costing, the simplification of the permute part is the 
> one fixing
> the codegen. I could just remove the lane permute (as in, clear it) and 
> change the
> costing function to not cost VEC_PERMS with no lane permutes (if it doesn't 
> already do that).

I think clearing the lane permute is even not necessary.  The vec
perm code generation should already not cost anything here
since it is also able to elide costs when the permute aligns
naturally with vector boundaries as in { [0, 2], [0, 3], [0, 0], [0, 1] }
for two-element vectors. 

> > 
> > You also need to adjust reference counts when you change
> > SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> > slp_tree_free node itself (which might be tricky at this point).
> > 
> > +static bool
> > +vect_slp_tree_permute_noop_p (slp_tree node) {
> > +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> > +
> > +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> > +return true;
> > +
> > +  unsigned x, seed;
> > +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> > seed =
> > + perms[0].second;  for (x = 1; x < perms.length (); x++)
> > +if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> > +  return false;
> > 
> > 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> > (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> > SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> > of a vector no-op.
> > 
> > Maybe 

RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

2020-11-04 Thread Tamar Christina via Gcc-patches
Hi Richi,

> -Original Message-
> From: rguent...@c653.arch.suse.de  On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 1:00 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; o...@ucw.cz
> Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> permutes.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This optimizes sequential permutes. i.e. if there are two permutes
> > back to back this function applies the permute of the parent to the
> > child and removed the parent.
> >
> > If the resulting permute in the child is now a no-op.  Then the child
> > is also dropped from the graph and the parent's parent attached to the
> child's child.
> >
> > This relies on the materialization point calculation in optimize SLP.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Tests are included as part of the final patch as they need the SLP
> > pattern matcher to insert permutes in between.
> >
> > This allows us to remove useless permutes such as
> >
> > ldr q0, [x0, x3]
> > ldr q2, [x1, x3]
> > trn1v1.4s, v0.4s, v0.4s
> > trn2v0.4s, v0.4s, v0.4s
> > trn1v0.4s, v1.4s, v0.4s
> > mov v1.16b, v3.16b
> > fcmla   v1.4s, v0.4s, v2.4s, #0
> > fcmla   v1.4s, v0.4s, v2.4s, #90
> > str q1, [x2, x3]
> >
> > from the sequence the vectorizer puts out and give
> >
> > ldr q0, [x0, x3]
> > ldr q2, [x1, x3]
> > mov v1.16b, v3.16b
> > fcmla   v1.4s, v0.4s, v2.4s, #0
> > fcmla   v1.4s, v0.4s, v2.4s, #90
> > str q1, [x2, x3]
> >
> > instead
> >
> > Ok for master?
> 
> + /* If the remaining permute is a no-op then we can just
> + drop
> the
> +node instead of materializing it.  */
> + if (vect_slp_tree_permute_noop_p (node))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"removing unneeded permute node
> %p\n",
> +node);
> +
> +  unsigned idx = SLP_TREE_LANE_PERMUTATION
> (node)[0].first;
> +  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> +  unsigned src = slpg->vertices[node->vertex].pred->src;
> +  slp_tree prev = vertices[src];
> +  unsigned dest;
> +  slp_tree tmp;
> +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> +if (tmp == node)
> +  {
> + SLP_TREE_CHILDREN (prev)[dest] = value;
> + break;
> +   }
> 
> so I don't think this will work reliably since we do not update the graph when
> inserting permute nodes and thus the "parent"
> can refer to a permute rather than the original node now (we're just walking
> over all vertices in no specific order during materialization - guess using 
> IPO
> might fix this apart from in cycles).  You would also need to iterate over 
> preds
> here (pred_next).
> I guess removing no-op permutes is only important for costing?
> They should not cause any actual code generation?

Yeah, it's just for costing, the simplification of the permute part is the one 
fixing
the codegen. I could just remove the lane permute (as in, clear it) and change 
the
costing function to not cost VEC_PERMS with no lane permutes (if it doesn't 
already do that).

> 
> You also need to adjust reference counts when you change
> SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> slp_tree_free node itself (which might be tricky at this point).
> 
> +static bool
> +vect_slp_tree_permute_noop_p (slp_tree node) {
> +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> +
> +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> +return true;
> +
> +  unsigned x, seed;
> +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> seed =
> + perms[0].second;  for (x = 1; x < perms.length (); x++)
> +if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> +  return false;
> 
> 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> of a vector no-op.
> 
> Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check
> in vectorizable_slp_permutation instead?
> 
> The permute node adjustment part is OK, thus up to
> 
> + else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
> +   {
> + /* If the node if already a permute node we just need to
> apply
> +the permutation to the permute node itself.  */
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"simplifying permute node 

Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Christophe Lyon via Gcc-patches
On Wed, 4 Nov 2020 at 11:54, Tobias Burnus  wrote:
>
> Three of the testcases fail on PowerPC: 
> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
> unimplemented: '-fzero-call-used_regs' not supported on this target
>
> Did you miss some dg-require-effective-target ?
>
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -Wc++-compat  (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
> -std=gnu++2a (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
> -std=gnu++2a (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++98 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++14 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++17 (test for excess errors)
> powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
> -std=gnu++2a (test for excess errors)
>

This was reported as PR97680, see also PR97699 ofr arm.

> Tobias
>
> On 30.10.20 20:50, Qing Zhao via Gcc-patches wrote:
>
> > FYI.
> >
> > I just committed the patch to gcc11 as:
> >
> > https://gcc.gnu.org/pipermail/gcc-cvs/2020-October/336263.html 
> > 
> >
> > Qing
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter


  1   2   >