Re: [PATCH v4 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-21 Thread Nathaniel Shead via Gcc-patches
On Thu, Jul 20, 2023 at 10:42:29AM -0400, Jason Merrill wrote:
> On 7/20/23 05:35, Nathaniel Shead wrote:
> > This adds rudimentary lifetime tracking in C++ constexpr contexts,
> > allowing the compiler to report errors with using values after their
> > backing has gone out of scope. We don't yet handle other ways of
> > accessing values outside their lifetime (e.g. following explicit
> > destructor calls).
> 
> Incidentally, much of that should be straightforward to handle by no longer
> ignoring clobbers here:
> 
> > case MODIFY_EXPR:
> >   if (cxx_dialect < cxx14)
> > goto fail;
> >   if (!RECUR (TREE_OPERAND (t, 0), any))
> > return false;
> >   /* Just ignore clobbers.  */
> >   if (TREE_CLOBBER_P (TREE_OPERAND (t, 1)))
> > return true;
> 
> Assignment from a clobber represents end of lifetime to the middle-end. This
> can be a follow-up patch.

Thanks, this is very helpful to know. I'll keep this in mind.

> > @@ -7051,10 +7065,17 @@ cxx_eval_constant_expression (const constexpr_ctx 
> > *ctx, tree t,
> > return ctx->ctor;
> > if (VAR_P (t))
> > if (tree v = ctx->global->get_value (t))
> > -   {
> > - r = v;
> > - break;
> > -   }
> > + {
> > +   r = v;
> > +   break;
> > + }
> > +  if (ctx->global->is_outside_lifetime (t))
> > +   {
> > + if (!ctx->quiet)
> > +   outside_lifetime_error (loc, t);
> > + *non_constant_p = true;
> > + break;
> > +   }
> 
> Shouldn't this new check also be under the if (VAR_P (t))?  A CONST_DECL
> can't go out of scope.
> 
> Jason
> 

Yup you're right; I didn't properly read the documentation on what a
CONST_DECL was and misunderstood. I'll fix this up for the next version.


Re: [PATCH v4 3/3] c++: Improve location information in constant evaluation

2023-07-21 Thread Nathaniel Shead via Gcc-patches
On Fri, Jul 21, 2023 at 3:00 AM Jason Merrill  wrote:
>
> On 7/20/23 05:37, Nathaniel Shead wrote:
> > This patch updates 'input_location' during constant evaluation to ensure
> > that errors in subexpressions that lack location information still
> > provide accurate diagnostics.
> >
> > By itself this change causes some small regressions in diagnostic
> > quality for circumstances where errors used 'input_location' but the
> > location of the parent subexpression doesn't make sense, so this patch
> > also includes a couple of other small diagnostic improvements to improve
> > the most egregious cases.
> >
> > gcc/cp/ChangeLog:
> >
> >   * constexpr.cc (modifying_const_object_error): Find the source
> >   location of the const object's declaration.
> >   (cxx_eval_store_expression): Fall back to the location of the
> >   target object when evaluating initialiser.
>
> I'm skeptical about this workaround being an improvement in general.
> Reverting it, there only seems to be a difference for constexpr-89285.C,
> which seems fine; we see the location as the first line of the class, as
> usual for implicitly declared constructors.
>
> Showing the DMI location might be an improvement, but I think it would
> be better to make that change in perform_member_init so it applies to
> runtime as well.

Makes sense, I'll get rid of it.

> >   (cxx_eval_constant_expression): Update input_location to the location
> >   of the currently evaluated expression, if possible.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * testsuite/25_algorithms/equal/constexpr_neg.cc: Update diagnostic
> >   locations.
> >   * testsuite/26_numerics/gcd/105844.cc: Likewise.
> >   * testsuite/26_numerics/lcm/105844.cc: Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/cpp0x/constexpr-48089.C: Update diagnostic locations.
> >   * g++.dg/cpp0x/constexpr-70323.C: Likewise.
> >   * g++.dg/cpp0x/constexpr-70323a.C: Likewise.
> >   * g++.dg/cpp0x/constexpr-delete2.C: Likewise.
> >   * g++.dg/cpp0x/constexpr-diag3.C: Likewise.
> >   * g++.dg/cpp0x/constexpr-ice20.C: Likewise.
> >   * g++.dg/cpp0x/constexpr-recursion.C: Likewise.
> >   * g++.dg/cpp0x/overflow1.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-89285.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-89481.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const14.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const16.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const18.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const19.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const21.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const22.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const3.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const4.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-tracking-const7.C: Likewise.
> >   * g++.dg/cpp1y/constexpr-union5.C: Likewise.
> >   * g++.dg/cpp1y/pr68180.C: Likewise.
> >   * g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
> >   * g++.dg/cpp1z/constexpr-lambda8.C: Likewise.
> >   * g++.dg/cpp2a/bit-cast11.C: Likewise.
> >   * g++.dg/cpp2a/bit-cast12.C: Likewise.
> >   * g++.dg/cpp2a/bit-cast14.C: Likewise.
> >   * g++.dg/cpp2a/constexpr-98122.C: Likewise.
> >   * g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
> >   * g++.dg/cpp2a/constexpr-init1.C: Likewise.
> >   * g++.dg/cpp2a/constexpr-new12.C: Likewise.
> >   * g++.dg/cpp2a/constexpr-new3.C: Likewise.
> >   * g++.dg/cpp2a/constinit10.C: Likewise.
> >   * g++.dg/cpp2a/is-corresponding-member4.C: Likewise.
> >   * g++.dg/ext/constexpr-vla2.C: Likewise.
> >   * g++.dg/ext/constexpr-vla3.C: Likewise.
> >   * g++.dg/ubsan/pr63956.C: Likewise.
> >
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/constexpr.cc   | 46 ++-
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 ++--
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |  8 ++--
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |  8 ++--
> >   .../g++.dg/cpp0x/constexpr-delete2.C  |  5 +-
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
> >   gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  1 +
> >   .../g++.dg/cpp0x/constexpr-recursion.C|  6 +--
> >   gcc/testsuite/g++.dg/cpp0x/overflow1.C|  2 +-
> >   gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |  5 +-
> >   gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
> >   .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
> >   .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
> >   

Re: [PATCH v4 2/3] c++: Improve constexpr error for dangling local variables [PR110619]

2023-07-21 Thread Nathaniel Shead via Gcc-patches
On Fri, Jul 21, 2023 at 05:44:51PM -0400, Jason Merrill wrote:
> On 7/21/23 01:39, Nathaniel Shead wrote:
> > On Thu, Jul 20, 2023 at 11:46:47AM -0400, Jason Merrill wrote:
> > > On 7/20/23 05:36, Nathaniel Shead wrote:
> > > > Currently, when typeck discovers that a return statement will refer to a
> > > > local variable it rewrites to return a null pointer. This causes the
> > > > error messages for using the return value in a constant expression to be
> > > > unhelpful, especially for reference return values.
> > > > 
> > > > This patch removes this "optimisation".
> > > 
> > > This isn't an optimization, it's for safety, removing a way for an 
> > > attacker
> > > to get a handle on other data on the stack (CWE-562).
> > > 
> > > But I agree that we need to preserve some element of UB for constexpr
> > > evaluation to see.
> > > 
> > > Perhaps we want to move this transformation to cp_maybe_instrument_return,
> > > so it happens after maybe_save_constexpr_fundef?
> > 
> > Hm, OK. I can try giving this a go. I guess I should move the entire
> > maybe_warn_about_returning_address_of_local function to cp-gimplify.cc
> > to be able to detect this? Or is there a better way of marking that a
> > return expression will return a reference to a local for this
> > transformation? (I guess I can't use whether the warning has been
> > surpressed or not because the warning might not be enabled at all.)
> 
> You could use a TREE_LANG_FLAG, looks like none of them are used on
> RETURN_EXPR.
> 
> > It looks like this warning is raised also by diag_return_locals in
> > gimple-ssa-isolate-paths, should the transformation also be made here?
> 
> Looks like it already is, in warn_return_addr_local:
> 
> >   tree zero = build_zero_cst (TREE_TYPE (val));
> >   gimple_return_set_retval (return_stmt, zero);
> >   update_stmt (return_stmt);
> 
> ...but, weirdly, only with -fisolate-erroneous-paths-*, even though it isn't
> isolating anything.  Perhaps there should be another flag for this.
> 

I see, thanks. From this I've found that my above patch isn't sufficient
anyway, as compiling with -O2 causes the warning to appear twice as the
suppression I did wasn't sufficient. As such I'll exclude this patch
from the next revision since it's not actually necessary for the problem
I was trying to solve, and I'll work on trying to solve this properly
a bit later.

> > I note that the otherwise very similar -Wdangling-pointer warning
> > doesn't do this transformation either, should that also be something I
> > look into fixing here?
> 
> With that same flag, perhaps.  I wonder if it would make sense to remove the
> isolate-paths handling of locals in favor of the dangling-pointer handling?
> I don't know either file much at all.
> 
> Jason
> 


Re: [PATCH] testsuite/110763: Ensure zero return from test

2023-07-21 Thread Jeff Law via Gcc-patches




On 7/21/23 09:16, Siddhesh Poyarekar wrote:

The test deliberately reads beyond bounds to exersize ubsan and the
return value may be anything, based on previous allocations.  The OFF
test caters for it by ANDing the return with 0, do the same for the DYN
test.

gcc/testsuite/ChangeLog:

PR testsuite/110763
* gcc.dg/ubsan/object-size-dyn.c (dyn): New parameter RET.
(main): Use it.

OK
jeff


Re: [PATCH] c++: fix ICE with constexpr ARRAY_REF [PR110382]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/21/23 18:38, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --

This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

   gcc_assert (!ctx->object || !DECL_P (ctx->object)
   || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

   new_ctx.ctor = build_constructor (elem_type, NULL); // #1


...and this code doesn't also clear(/set) new_ctx.object like everywhere 
else in constexpr.cc that sets new_ctx.ctor.  Fixing that should make 
the testcase work.



which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

   new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.


Indeed, and using it rather than building a new one seems like a valid 
optimization for trunk.


I also notice that the DECL_EXPR code calls unshare_constructor, which 
should be unnecessary if init == ctx->ctor?



We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
  gcc/cp/constexpr.cc   |  5 -
  gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb94f3cefcb..518b7c7a2d5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
else
  val = build_value_init (elem_type, tf_warning_or_error);
  
-  if (!SCALAR_TYPE_P (elem_type))

+  if (!SCALAR_TYPE_P (elem_type)
+  /* Create a new constructor only if we don't already have one that
+is suitable.  */
+  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE (ctx->ctor


We generally use same_type_ignoring_top_level_qualifiers_p in the 
constexpr code.


Jason



Re: [PATCH v4] Introduce attribute sym

2023-07-21 Thread Fangrui Song via Gcc-patches
On Wed, Jul 19, 2023 at 4:12 PM Alexandre Oliva via Gcc-patches
 wrote:
>
> On Jul 18, 2023, Richard Biener  wrote:
>
> > I think the __symver__ attribute does something similar already so
> > maybe use __attribute__((__sym__("foo")))?
>
> Cool, thanks, that will do.  Regstrapped on x86_64-linux-gnu.  Ok to
> install?
>
>
> This patch introduces an attribute to add extra asm names (aliases)
> for a decl when its definition is output.  The main goal is to ease
> interfacing C++ with Ada, as C++ mangled names have to be named, and
> in some cases (e.g. when using stdint.h typedefs in function
> arguments) the symbol names may vary across platforms.
>
> The attribute is usable in C and C++, presumably in all C-family
> languages.  It can be attached to global variables and functions.  In
> C++, it can also be attached to class types, namespace-scoped
> variables and functions, static data members, member functions,
> explicit instantiations and specializations of template functions,
> members and classes.
>
> When applied to constructors or destructor, additional sym aliases
> with _Base and _Del suffixes are defined for variants other than
> complete-object ones.  This changes the assumption that clones always
> carry the same attributes as their abstract declarations, so there is
> now a function to adjust them.

I wonder whether this attribute can be named "alias" without arguments.
alias ("target") is an existing attribute that applies to a
declaration. The new "alias" without arguments can apply to
definitions.

I am just thinking that the semantics of "sym" may confuse users who
are familiar with "alias" :)


Re: [PATCH] Use __builtin_trap() for abort() if inhibit_libc

2023-07-21 Thread Andrew Pinski via Gcc-patches
On Tue, Aug 17, 2021 at 1:43 AM Sebastian Huber
 wrote:
>
> abort() is used in gcc_assert() and gcc_unreachable() which is used by target
> libraries such as libgcov.a.  This patch changes the abort() definition under
> certain conditions.  If inhibit_libc is defined and abort is not already
> defined, then abort() is defined to __builtin_trap().
>
> The inhibit_libc define is usually defined if GCC is built for targets running
> in embedded systems which may optionally use a C standard library.  If
> inhibit_libc is defined, then there may be still a full featured abort()
> available.  abort() is a heavy weight function which depends on signals and
> file streams.  For statically linked applications, this means that a 
> dependency
> on gcc_assert() pulls in the support for signals and file streams.  This could
> prevent using gcov to test low end targets for example.  Using 
> __builtin_trap()
> avoids these dependencies if the target implements a "trap" instruction.  The
> application or operating system could use a trap handler to react to failed 
> GCC
> runtime checks which caused a trap.

This also breaks sometimes compiling of emutls.c on some targets like uclibc ...
See https://gcc.gnu.org/PR110775 for some more details there.

Thanks,
Andrew

>
> gcc/
>
> * tsystem.h (abort): Define abort() if inhibit_libc is defined and it
> is not already defined.
> ---
>  gcc/tsystem.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tsystem.h b/gcc/tsystem.h
> index e1e6a96a4f48..5c72c69ff3ed 100644
> --- a/gcc/tsystem.h
> +++ b/gcc/tsystem.h
> @@ -59,7 +59,7 @@ extern int atexit (void (*)(void));
>  #endif
>
>  #ifndef abort
> -extern void abort (void) __attribute__ ((__noreturn__));
> +#define abort() __builtin_trap ()
>  #endif
>
>  #ifndef strlen
> --
> 2.26.2
>


[pushed][LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr LRA port

2023-07-21 Thread Vladimir Makarov via Gcc-patches
The following patch fixes sparc solaris bootstrap.  The explanation of 
the patch is in the commit message.


The patch was successfully bootstrap on x86-64, aarch64, and sparc64 
solaris.


commit d17be8f7f36abe257a7d026dad61e5f8d14bdafc
Author: Vladimir N. Makarov 
Date:   Fri Jul 21 20:28:50 2023 -0400

[LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr 
LRA port

The recent patch for fp elimination for avr LRA port modified an assert
which can be wrong for targets using hard frame pointer different from
frame pointer.  Also for such ports spilling pseudos assigned to fp
was wrong too in the new code.  Although this code is not used for any 
target
currently using LRA except for avr.  Given patch fixes the issues.

gcc/ChangeLog:

* lra-eliminations.cc (update_reg_eliminate): Fix the assert.
(lra_update_fp2sp_elimination): Use HARD_FRAME_POINTER_REGNUM
instead of FRAME_POINTER_REGNUM to spill pseudos.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index cf0aa94b69a..1f4e3fec9e0 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1179,8 +1179,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
  gcc_assert (ep->to_rtx != stack_pointer_rtx
  || (ep->from == FRAME_POINTER_REGNUM
  && !elimination_fp2sp_occured_p)
- || (ep->from != FRAME_POINTER_REGNUM
- && ep->from < FIRST_PSEUDO_REGISTER
+ || (ep->from < FIRST_PSEUDO_REGISTER
  && fixed_regs [ep->from]));
 
  /* Mark that is not eliminable anymore.  */
@@ -1398,7 +1397,7 @@ lra_update_fp2sp_elimination (void)
 " Frame pointer can not be eliminated anymore\n");
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
-  add_to_hard_reg_set (, Pmode, FRAME_POINTER_REGNUM);
+  add_to_hard_reg_set (, Pmode, HARD_FRAME_POINTER_REGNUM);
   spill_pseudos (set);
   for (ep = reg_eliminate; ep < _eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)


Re: [PATCH 2/2 ver 5] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 5, Fixed patch description, the first argument should be of
type vector.  Fixed comment in vsx.md to say "Vector and scalar
extract_elt iterator/attr ".  Removed a few of the changes in
version 4.  Specifically, reverted the names of REPLACE_ELT_V_sh back
to REPLACE_ELT_sh and REPLACE_ELT_V_max back to REPLACE_ELT_V_max. 
Combined the REPLACE_ELT_char and REPLACE_ELT_V_char mode attributes
into REPLACE_ELT_char.  Put the "dg-do link" directive back into the
vec-replace-word-runnable_1.c test file.  The patch was tested with the
updated patch 1 in the series on Power 8 LE/BE, Power 9 LE/BE and Power
10 with no regressions.

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-
do 
link" from both tests.  Put in an explicit cast in test vec-replace-
word-runnable_2.c to eliminate the need for the -flax-vector-
conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned
was
   implemented with the same function prototypes as
vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 






rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type vector unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI 

[PATCH 1/2 ver 2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 2:  Updated a number of formatting and spacing issues.   Added
the NARGS description to the header comment for function find_instance.
This patch was tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 




-
rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in with
only two arguments.  This patch extends the function by adding a parameter
specifying the number of built-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in arguments to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 40 +++
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..de35490de42 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1668,18 +1668,20 @@ resolve_vec_step (resolution *res, vec 
*arglist, unsigned nargs)
 /* Look for a matching instance in a chain of instances.  INSTANCE points to
the chain of instances; INSTANCE_CODE is the code identifying the specific
built-in being searched for; FCODE is the overloaded function code; TYPES
-   contains an array of two types that must match the types of the instance's
-   parameters; and ARGS contains an array of two arguments to be passed to
-   the instance.  If found, resolve the built-in and return it, unless the
-   built-in is not supported in context.  In that case, set
-   UNSUPPORTED_BUILTIN to true.  If we don't match, return error_mark_node
-   and leave UNSUPPORTED_BUILTIN alone.  */
+   contains an array of NARGS types that must match the types of the
+   instance's parameters; ARGS contains an array of NARGS arguments to be
+   passed to the instance; and NARGS is the number of built-in arguments to
+   check.  If found, resolve the built-in and return it, unless the built-in
+   is not supported in context.  In that case, set UNSUPPORTED_BUILTIN to
+   true.  If we don't match, return error_mark_node and leave
+   UNSUPPORTED_BUILTIN alone.
+*/
 
 tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1693,27 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  bool args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i < nargs; i++)
+{
+  tree parmtype = TREE_VALUE (argtype);
+  if (!rs6000_builtin_type_compatible (types[i], parmtype))
+   {
+ args_compatible = false;
+ break;
+   }
+  argtype = TREE_CHAIN (argtype);
+}
+
+  if (args_compatible)
 {
   if (rs6000_builtin_decl (inst->bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1933,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1970,7 @@ 

[PATCH 0/2 ver 2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintianers:

Version 2.  Both patches have been updated the first patch was approved
with minor issues to be fixed.  I will post the updated version as
version 2 for completeness of the series.  There were a few changes
with the second patch as well.  The second patch has not been approved
yet.  The updated version of the second patch is version 5 with the
requested changes made.  The two patches were tested together on Power
8 LE/BE, Power 9 LE/BE and Power 10 LE with no regressions.

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl



Re: [WIP RFC] Add support for keyword-based attributes

2023-07-21 Thread Joseph Myers
On Mon, 17 Jul 2023, Michael Matz via Gcc-patches wrote:

> So, essentially you want unignorable attributes, right?  Then implement 
> exactly that: add one new keyword "__known_attribute__" (invent a better 
> name, maybe :) ), semantics exactly as with __attribute__ (including using 
> the same underlying lists in our data structures), with only one single 
> deviation: instead of the warning you give an error for unhandled 
> attributes.  Done.

Assuming you also want the better-defined standard rules about how [[]] 
attributes appertain to particular entities, rather than the different 
__attribute__ rules, that would suggest something like [[!some::attr]] for 
the case of attributes that can't be ignored but otherwise are handled 
like standard [[]] attributes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
On Fri, 2023-07-21 at 13:04 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:20, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case
> > statement
> > rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was
> > changed
> > to REPLACE_ELT_V along with the associated
> > define_mode_attr.  Renamed
> > VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
> > REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
> > runnable_1.c to be consistent with the other tests.  Removed the
> > "dg-do 
> > link" from both tests.  Put in an explicit cast in test vec-
> > replace-word-runnable_2.c to eliminate the need for the -flax-
> > vector-conversions dg-option.
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  
> >It was intended that vec_replace_unaligned always specify
> > output
> >vectors as having type vector unsigned char, to emphasize
> > that 
> >elements are potentially misaligned by this built-in
> > function.  
> >This patch corrects the misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > 
> > 
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > of type unsigned char, as specified in gcc/doc/extend.texi.
> 
> Shouldn't be "vector unsigned char" instead of "unsigned char"?
> 
> Or do I miss something?

Nope, I missed saying "vector".  Fixed.

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
> > __builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > __builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > __builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > __builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > __builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > __builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> > VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> >  

Re: [PATCH 1/2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches
On Fri, 2023-07-21 at 10:19 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > The rs6000 function find_instance assumes that it is called for
> > built-
> > ins with only two arguments.  There is no checking for the actual
> > number of aruguments used in the built-in.  This patch adds an
> > additional parameter to the function call containing the number of
> > aruguments in the built-in.  The function will now do the needed
> > checks
> > for all of the arguments.
> > 
> > This fix is needed for the next patch in the series that fixes the
> > vec_replace_unaligned built-in.c test.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl 
> > 
> > 
> > 
> > rs6000, add argument to function find_instance
> > 
> > The function find_instance assumes it is called to check a built-
> > in  with   

Fixed
> >   ~~ two spaces.
> > only two arguments.  Ths patch extends the function by adding a
> > parameter
>s/Ths/This/
> > specifying the number of buit-in arguments to check.
>   s/bult-in/built-in/
> 
Fixed both typos.

> > gcc/ChangeLog:
> > * config/rs6000/rs6000-c.cc (find_instance): Add new parameter
> > that
> > specifies the number of built-in arguments to check.
> > (altivec_resolve_overloaded_builtin): Update calls to
> > find_instance
> > to pass the number of built-in argument to be checked.
> 
> s/argument/arguments/
fixed
> 
> > ---
> >  gcc/config/rs6000/rs6000-c.cc | 27 +++
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index a353bca19ef..350987b851b 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1679,7 +1679,7 @@ tree
> 
> There is one function comment here describing the meaning of each
> parameter,
> I think we should add a corresponding for NARGS, may be something
> like:
> 
> "; and NARGS specifies the number of built-in arguments."
> 
Added NARGS description.

> Also we need to update the below "two"s with "NARGS".
> 
> "TYPES contains an array of two types..." and "ARGS contains an array
> of two arguments..."
> 

Replaced multiple "two" occurrences with NARGS.

> since we already extend this to handle NARGS instead of two.
> 
> >  find_instance (bool *unsupported_builtin, ovlddata **instance,
> >rs6000_gen_builtins instance_code,
> >rs6000_gen_builtins fcode,
> > -  tree *types, tree *args)
> > +  tree *types, tree *args, int nargs)
> >  {
> >while (*instance && (*instance)->bifid != instance_code)
> >  *instance = (*instance)->next;
> > @@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin,
> > ovlddata **instance,
> >if (!inst->fntype)
> >  return error_mark_node;
> >tree fntype = rs6000_builtin_info[inst->bifid].fntype;
> > -  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> > -  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES
> > (fntype)));
> > +  tree argtype = TYPE_ARG_TYPES (fntype);
> > +  tree parmtype;
> 
> Nit: We can move "tree parmtype" into the loop (close to its only
> use).

Moved and combined declaration with assignment as you noted below.

> 
> > +  int args_compatible = true;
> 
> s/int/bool/
Changed.

> 
> >  
> > -  if (rs6000_builtin_type_compatible (types[0], parmtype0)
> > -  && rs6000_builtin_type_compatible (types[1], parmtype1))
> > +  for (int i = 0; i  
> Nit: formatting issue, space before nargs.
> 
> >  {
> > +  parmtype = TREE_VALUE (argtype);
> 
>  tree parmtype = TREE_VALUE (argtype);

Changed

> 
> > +  if (! rs6000_builtin_type_compatible (types[i], parmtype))
> 
> Nit: One unexpected(?) space after "!".

Removed extra space after "!".
> 
> > +   {
> > + args_compatible = false;
> > + break;
> > +   }
> > +  argtype = TREE_CHAIN (argtype);
> > +}
> > +
> > +  if (args_compatible)
> > +  {
> 
> Nit: indent issue for "{".
Fixed indent.

> 
> Ok for trunk with these nits fixed.  Btw, the description doesn't say
> how this was tested, I'm not sure if it's only tested together with
> "patch 2/2", but please ensure it's bootstrapped and regress-tested
> on BE and LE when committing.  Thanks!
> 

Yes, it was tested with patch 2/2 on Power 10 LE.  I did do a test on
Power 9 as well but don't recall if I tested for both BE and LE.  Will
retest on Power 8 LE/BE, Power 9 LE/BE and Power 10.

 Carl



[PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The actual change needed to the line-maps API in libcpp is not too large and
requires no space overhead in the line map data structures (on 64-bit systems
that is; one newly added data member to class line_map_ordinary sits inside
former padding bytes.) An LC_GEN map is just an ordinary map like any other,
but the TO_FILE member that normally points to the file name points instead to
the actual data.  This works automatically with PCH as well, for the same
reason that the file name makes its way into a PCH.  In order to avoid
confusion, the member has been renamed from TO_FILE to DATA, and associated
accessors adjusted.

Outside libcpp, there are many small changes but most of them are to
selftests, which are necessarily more sensitive to implementation
details. From the perspective of the user (the "user", here, being a frontend
using line maps or else the diagnostics infrastructure), the chief visible
change is that the function location_get_source_line() should be passed an
expanded_location object instead of a separate filename and line number.  This
is not a big change because in most cases, this information came anyway from a
call to expand_location and the needed expanded_location object is readily
available. The new overload of location_get_source_line() uses the extra
information in the expanded_location object to obtain the data from the
in-memory buffer when it originated from an LC_GEN map.

Until the subsequent patch that starts using LC_GEN maps, none are yet
generated within GCC, hence nothing is added to the testsuite here; but all
relevant selftests have been extended to cover generated data maps in addition
to normal files.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_ordinary): Add new members to support LC_GEN concept.
(ORDINARY_MAP_FILE_NAME): Assert that map really does encode a file
and not generated data.
(ORDINARY_MAP_GENERATED_DATA_P): New function.
(ORDINARY_MAP_GENERATED_DATA): New function.
(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
(ORDINARY_MAP_FILE_NAME_OR_DATA): New function.
(ORDINARY_MAPS_SAME_FILE_P): Declare new function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare new function.
(LINEMAP_FILE): This was always a synonym for ORDINARY_MAP_FILE_NAME;
make this explicit.
(linemap_get_file_highest_location): Adjust prototype.
(linemap_add): Adjust prototype.
(class expanded_location): Add new members to store generated content.
* line-map.cc (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
(ORDINARY_MAPS_SAME_FILE_P): New function.
(linemap_add): Add new argument DATA_LEN. Support generated data in
LC_GEN maps.
(linemap_check_files_exited): Adapt to API changes supporting LC_GEN.
(linemap_line_start): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_expand_location): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
* directives.cc (_cpp_do_file_change): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Initialize new fields in
expanded_location.
(compatible_locations_p): Use new ORDINARY_MAPS_SAME_FILE_P ()
function.
(layout::calculate_x_offset_display): Use the new expanded_location
overload of location_get_source_line(), so as to support LC_GEN maps.
(layout::print_line): Likewise.
(source_line::source_line): Likewise.
(line_corrections::add_hint): Likewise.
(class line_corrections): Store the location as an exploc rather than
individual filename, so as to support LC_GEN maps.
(layout::print_trailing_fixits): Use the new exploc constructor for
class line_corrections.
(test_layout_x_offset_display_utf8): Test LC_GEN maps as well as normal.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.

[PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output

2023-07-21 Thread Lewis Hyatt via Gcc-patches
The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function.
(sarif_builder::maybe_make_physical_location_object): Support
generated data locations.
(sarif_builder::make_artifact_location_object): Likewise.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc| 115 +++---
 .../diagnostic-format-sarif-file-5.c  |  31 +
 2 files changed, 99 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 5e483988027..29f614124b2 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -173,7 +173,10 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+
+  typedef std::pair filename_or_buffer;
+  json::object *make_artifact_location_object (filename_or_buffer fb);
+
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -196,16 +199,17 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
-   int start_line,
+  json::object *make_artifact_object (filename_or_buffer fb);
+  json::object *
+  maybe_make_artifact_content_object (filename_or_buffer fb) const;
+  json::object *maybe_make_artifact_content_object (expanded_location xloc,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
   json::object *make_artifact_change_object (const rich_location );
   json::object *make_replacement_object (const fixit_hint ) const;
   json::object *make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
+  static filename_or_buffer xloc_to_fb (expanded_location xloc);
 
   diagnostic_context *m_context;
 
@@ -219,7 +223,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -749,6 +757,15 @@ sarif_builder::make_location_object (const 
diagnostic_event )
   return location_obj;
 }
 
+/* Populate a filename_or_buffer pair from an expanded location.  */
+sarif_builder::filename_or_buffer
+sarif_builder::xloc_to_fb (expanded_location xloc)
+{
+  if (xloc.generated_data_len)
+return filename_or_buffer (xloc.generated_data, xloc.generated_data_len);
+  return filename_or_buffer (xloc.file, 0);
+}
+
 /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
or return NULL;
Add any filename to the m_artifacts.  */
@@ -764,7 +781,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  m_filenames.add (xloc_to_fb (expand_location (loc)));
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -788,7 +805,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (xloc_to_fb (expand_location (loc)));
 }
 
 /* The 

[PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
---
 gcc/edit-context.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..ae11b6f2e00 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
 return false;
   if (start.column == 0)
 return false;
+  if (start.generated_data)
+return false;
   if (next_loc.column == 0)
 return false;
+  if (next_loc.generated_data)
+return false;
 
   edited_file  = get_or_insert_file (start.file);
   if (!m_valid)


[PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Hello-

This is an update to the v2 patch series last sent in January:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html

While I did not receive any feedback on the v2 patches yet, they did need some
rebasing on top of other recent commits to input.cc, so I thought it would be
helpful to send them again now. The patches have not otherwise changed from
v2, and the above-linked message explains how all the patches fit in with the
original v1 series sent last November.

Dave, I would appreciate it very much if you could please let me know what you
think of this approach? I feel like the diagnostics we currently
output for _Pragmas are worth improving. As a reminder, say for this example:

=
 #define S "GCC diagnostic ignored \"oops"
 _Pragma(S)
=

We currently output:

=
file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^
=

While after these patches, we would output:

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

Thanks!

-Lewis


[PATCH] c++: fix ICE with constexpr ARRAY_REF [PR110382]

2023-07-21 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --

This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

  gcc_assert (!ctx->object || !DECL_P (ctx->object)
  || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

  new_ctx.ctor = build_constructor (elem_type, NULL); // #1

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

  new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
 gcc/cp/constexpr.cc   |  5 -
 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb94f3cefcb..518b7c7a2d5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   else
 val = build_value_init (elem_type, tf_warning_or_error);
 
-  if (!SCALAR_TYPE_P (elem_type))
+  if (!SCALAR_TYPE_P (elem_type)
+  /* Create a new constructor only if we don't already have one that
+is suitable.  */
+  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE (ctx->ctor
 {
   new_ctx = *ctx;
   new_ctx.ctor = build_constructor (elem_type, NULL);
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C
new file mode 100644
index 000..317c5ecfcd5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C
@@ -0,0 +1,17 @@
+// PR c++/110382
+// { dg-do compile { target c++14 } }
+
+struct S {
+  double a = 0;
+};
+
+constexpr double
+g ()
+{
+  S arr[1];
+  S s = arr[0];
+  (void) arr[0];
+  return s.a;
+}
+
+int main() { return  g (); }

base-commit: 87516efcbe28884c39a8c68e600d11cc91ed96c7
-- 
2.41.0



Re: [WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-21 Thread David Malcolm via Gcc-patches
On Fri, 2023-07-21 at 17:35 +0200, Benjamin Priour wrote:
> Hi,
> 
> Upon David's request I've joined the in progress patch to the below
> email.
> I hope it makes more sense now.
> 
> Best,
> Benjamin.

Thanks for posting the work-in-progress patch; it makes the idea
clearer.

Some thoughts about this:

- I like the idea of defaulting to *not* showing events within system
headers, which the patch achieves
- I don't like the combination of never/system with maxdepth, in that
it seems complicated and I don't think a user is likely to experiment
with different depths.
- Hence I think it would work better as a simple boolean, perhaps
  "-fanalyzer-show-events-in-system-headers"
  or somesuch?  It seems like the sort of thing that we want to provide
a sensible default for, but have the option of turning off for
debugging the analyzer itself, but I don't expect an end-user to touch
that option.

FWIW the patch seems to have been mangled somewhat via email, so I
don't have a sense of what the actual output from patched analyzer
looks like.  What should we output to the user with -fanalyzer and no
other options for the case in PR 110543?  Currently, for
https://godbolt.org/z/sb9dM9Gqa trunk emits 12 events, of which
probably only this last one is useful:

  (12) dereference of NULL 'a.std::__shared_ptr_access::operator->()'

What does the output look like with your patch?

Thanks
Dave





> 
> -- Forwarded message -
> From: Benjamin Priour 
> Date: Tue, Jul 18, 2023 at 3:30 PM
> Subject: [RFC] analyzer: Add optional trim of the analyzer
> diagnostics
> going too deep [PR110543]
> To: , David Malcolm 
> 
> 
> Hi,
> 
> I'd like to request comments on a patch I am writing for PR110543.
> The goal of this patch is to reduce the noise of the analyzer emitted
> diagnostics when dealing with
> system headers, or simply diagnostic paths that are too long. The new
> option only affects the display
> of the diagnostics, but doesn't hinder the actual analysis.
> 
> I've defaulted the new option to "system", thus preventing the
> diagnostic
> paths from showing system headers.
> "never" corresponds to the pre-patch behavior, whereas you can also
> specify
> an unsigned value 
> that prevents paths to go deeper than  frames.
> 
> fanalyzer-trim-diagnostics=
> > Common Joined RejectNegative ToLower
> > Var(flag_analyzer_trim_diagnostics)
> > Init("system")
> > -fanalyzer-trim-diagnostics=[never|system|] Trim
> > diagnostics
> > path that are too long before emission.
> > 
> 
> Does it sounds reasonable and user-friendly ?
> 
> Regstrapping was a success against trunk, although one of the newly
> added
> test case fails for c++14.
> Note that the test case below was done with "never", thus behaves
> exactly
> as the pre-patch analyzer
> on x86_64-linux-gnu.
> 
> /* { dg-additional-options "-fdiagnostics-plain-output
> > -fdiagnostics-path-format=inline-events -fanalyzer-trim-
> > diagnostics=never"
> > } */
> > /* { dg-skip-if "" { c++98_only }  } */
> > 
> > #include 
> > struct A {int x; int y;};
> > 
> > int main () {
> >   std::shared_ptr a;
> >   a->x = 4; /* { dg-line deref_a } */
> >   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a
> > } */
> > 
> >   return 0;
> > }
> > 
> > /* { dg-begin-multiline-output "" }
> >   'int main()': events 1-2
> >     |
> >     |
> >     +--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> > 
> > > ::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool
> >  =
> > false]': events 3-4
> >    |
> >    |
> >    +--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool
> >  =
> > false; bool  = false]': events 5-6
> >   |
> >   |
> >   +--> 'std::__shared_ptr<_Tp, _Lp>::element_type*
> > std::__shared_ptr<_Tp, _Lp>::get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]': events 7-8
> >  |
> >  |
> >   <--+
> >   |
> >     'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool
> >  =
> > false; bool  = false]': event 9
> >   |
> >   |
> >    <--+
> >    |
> >  'std::__shared_ptr_access<_Tp, _Lp, ,
> > 
> > > ::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool
> >  =
> > false]': event 10
> >    |
> >    |
> >     <--+
> >     |
> >   'int main()': events 11-12
> >     |
> >     

Re: [PATCH v4 2/3] c++: Improve constexpr error for dangling local variables [PR110619]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/21/23 01:39, Nathaniel Shead wrote:

On Thu, Jul 20, 2023 at 11:46:47AM -0400, Jason Merrill wrote:

On 7/20/23 05:36, Nathaniel Shead wrote:

Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values.

This patch removes this "optimisation".


This isn't an optimization, it's for safety, removing a way for an attacker
to get a handle on other data on the stack (CWE-562).

But I agree that we need to preserve some element of UB for constexpr
evaluation to see.

Perhaps we want to move this transformation to cp_maybe_instrument_return,
so it happens after maybe_save_constexpr_fundef?


Hm, OK. I can try giving this a go. I guess I should move the entire
maybe_warn_about_returning_address_of_local function to cp-gimplify.cc
to be able to detect this? Or is there a better way of marking that a
return expression will return a reference to a local for this
transformation? (I guess I can't use whether the warning has been
surpressed or not because the warning might not be enabled at all.)


You could use a TREE_LANG_FLAG, looks like none of them are used on 
RETURN_EXPR.



It looks like this warning is raised also by diag_return_locals in
gimple-ssa-isolate-paths, should the transformation also be made here?


Looks like it already is, in warn_return_addr_local:


  tree zero = build_zero_cst (TREE_TYPE (val));
  gimple_return_set_retval (return_stmt, zero);
  update_stmt (return_stmt);


...but, weirdly, only with -fisolate-erroneous-paths-*, even though it 
isn't isolating anything.  Perhaps there should be another flag for this.



I note that the otherwise very similar -Wdangling-pointer warning
doesn't do this transformation either, should that also be something I
look into fixing here?


With that same flag, perhaps.  I wonder if it would make sense to remove 
the isolate-paths handling of locals in favor of the dangling-pointer 
handling?  I don't know either file much at all.


Jason



[PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-07-21 Thread Thiago Jung Bauermann via Gcc-patches
Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: expected iteration declaration or initialization
compiler exited with status 1

to:

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
:8:3: error: invalid type for iteration variable 'i'

Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly.  Also, it's a
better error message.

Therefore, we only need to adjust the testcase to expect the new message.

gcc/testsuite/ChangeLog:
PR testsuite/110756
g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
---
 gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C 
b/gcc/testsuite/g++.dg/gomp/pr58567.C
index 35a5bb027ffe..866d831c65e4 100644
--- a/gcc/testsuite/g++.dg/gomp/pr58567.C
+++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
@@ -5,7 +5,7 @@
 template void foo()
 {
   #pragma omp parallel for
-  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
class, struct, or union type|expected iteration declaration or initialization" 
} */
+  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
class, struct, or union type|invalid type for iteration variable 'i'" } */
 ;
 }
 


[PATCH] libstdc++ Add cstdarg to freestanding

2023-07-21 Thread Paul M. Bendixen via Gcc-patches
P1642 includes the header cstdarg to the freestanding implementation.
This was probably left out by accident, this patch puts it in.
Since this is one of the headers that go in whole cloth, there should be no
further actions needed.
This might be related to PR106953, but since that one touches the partial
headers I'm not sure

/Paul M. Bendixen

-- 
• − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − − •−
•/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//
From 5584c194927678067e412aeb19f10b9662e398a6 Mon Sep 17 00:00:00 2001
From: "Paul M. Bendixen" 
Date: Fri, 21 Jul 2023 22:04:23 +0200
Subject: [PATCH] libstdc++: Include cstdarg in freestanding

P1642 includes cstdarg in the full headers to include. Include it.

Signed-off-by: Paul M. Bendixen 
---
 libstdc++-v3/include/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 0ff875b280b..f09f97e2f6b 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1194,6 +1194,7 @@ c_base_builddir = .
 c_base_freestanding = \
 	${c_base_srcdir}/cfloat \
 	${c_base_srcdir}/climits \
+	${c_base_srcdir}/cstdarg \
 	${c_base_srcdir}/cstddef \
 	${c_base_srcdir}/cstdint \
 	${c_base_srcdir}/cstdlib
@@ -1213,7 +1214,6 @@ c_base_freestanding = \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/csetjmp \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/csignal \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdalign \
-@GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdarg \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdbool \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdio \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstring \
-- 
2.34.1



Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-21 Thread Jeff Law via Gcc-patches




On 7/21/23 15:16, Andreas Schwab wrote:

../../gcc/config/riscv/riscv.cc: In function 'void riscv_option_override()':
../../gcc/config/riscv/riscv.cc:6716:7: error: misspelled term 'can not' in 
format; use 'cannot' instead [-Werror=format-diag]
  6716 |   "Current RISC-V GCC can not support VLEN > 4096bit for 'V' 
Extension");
   |   
^
../../gcc/config/riscv/riscv.cc:6716:7: error: unbalanced punctuation character 
'>' in format [-Werror=format-diag]
Thanks.  There's another similar warning with strong accents.  I'll deal 
with both.


jeff


Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-21 Thread Andreas Schwab
../../gcc/config/riscv/riscv.cc: In function 'void riscv_option_override()':
../../gcc/config/riscv/riscv.cc:6716:7: error: misspelled term 'can not' in 
format; use 'cannot' instead [-Werror=format-diag]
 6716 |   "Current RISC-V GCC can not support VLEN > 4096bit for 'V' 
Extension");
  |   
^   
../../gcc/config/riscv/riscv.cc:6716:7: error: unbalanced punctuation character 
'>' in format [-Werror=format-diag]

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-21 Thread Qing Zhao via Gcc-patches


> On Jul 21, 2023, at 7:21 AM, Martin Uecker via Gcc-patches 
>  wrote:
> 
> 
> 
> This patch adds a warning for allocations with insufficient size
> based on the "alloc_size" attribute and the type of the pointer 
> the result is assigned to. While it is theoretically legal to
> assign to the wrong pointer type and cast it to the right type
> later, this almost always indicates an error. Since this catches
> common mistakes and is simple to diagnose, it is suggested to
> add this warning.
> 
> 
> Bootstrapped and regression tested on x86. 
> 
> 
> Martin
> 
> 
> 
> Add option Walloc-type that warns about allocations that have
> insufficient storage for the target type of the pointer the
> storage is assigned to.
> 
> gcc:
>   * doc/invoke.texi: Document -Wstrict-flex-arrays option.

The above should be “Document -Walloc-type option”. -:).

Qing
> 
> gcc/c-family:
> 
>   * c.opt (Walloc-type): New option.
> 
> gcc/c:
>   * c-typeck.cc (convert_for_assignment): Add Walloc-type warning.
> 
> gcc/testsuite:
> 
>   * gcc.dg/Walloc-type-1.c: New test.
> 
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 4abdc8d0e77..8b9d148582b 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -319,6 +319,10 @@ Walloca
> C ObjC C++ ObjC++ Var(warn_alloca) Warning
> Warn on any use of alloca.
> 
> +Walloc-type
> +C ObjC Var(warn_alloc_type) Warning
> +Warn when allocating insufficient storage for the target type of the
> assigned pointer.
> +
> Walloc-size-larger-than=
> C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
> ByteSize Warning Init(HOST_WIDE_INT_MAX)
> -Walloc-size-larger-than=  Warn for calls to allocation
> functions that
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 7cf411155c6..2e392f9c952 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
> location_t expr_loc, tree type,
>   "request for implicit conversion "
>   "from %qT to %qT not permitted in C++", rhstype,
> type);
> 
> +  /* Warn of new allocations are not big enough for the target
> type.  */
> +  tree fndecl;
> +  if (warn_alloc_type
> +   && TREE_CODE (rhs) == CALL_EXPR
> +   && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> +   && DECL_IS_MALLOC (fndecl))
> + {
> +   tree fntype = TREE_TYPE (fndecl);
> +   tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> +   tree alloc_size = lookup_attribute ("alloc_size",
> fntypeattrs);
> +   if (alloc_size)
> + {
> +   tree args = TREE_VALUE (alloc_size);
> +   int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> +   /* For calloc only use the second argument.  */
> +   if (TREE_CHAIN (args))
> + idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
> (args))) - 1;
> +   tree arg = CALL_EXPR_ARG (rhs, idx);
> +   if (TREE_CODE (arg) == INTEGER_CST
> +   && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> +  warning_at (location, OPT_Walloc_type, "allocation of
> "
> +  "insufficient size %qE for type %qT with
> "
> +  "size %qE", arg, ttl, TYPE_SIZE_UNIT
> (ttl));
> + }
> + }
> +
>   /* See if the pointers point to incompatible address spaces.  */
>   asl = TYPE_ADDR_SPACE (ttl);
>   asr = TYPE_ADDR_SPACE (ttr);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 88e3c625030..6869bed64c3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
> function such as wrappers of
> C++ @code{throw} or fatal error reporting functions leading to
> @code{abort}.
> @end table
> 
> +@opindex Wno-alloc-type
> +@opindex Walloc-type
> +@item -Walloc-type
> +Warn about calls to allocation functions decorated with attribute
> +@code{alloc_size} that specify insufficient size for the target type
> of
> +the pointer the result is assigned to, including those to the built-in
> +forms of the functions @code{aligned_alloc}, @code{alloca},
> @code{calloc},
> +@code{malloc}, and @code{realloc}.
> +
> @opindex Wno-alloc-zero
> @opindex Walloc-zero
> @item -Walloc-zero
> diff --git a/gcc/testsuite/gcc.dg/Walloc-type-1.c
> b/gcc/testsuite/gcc.dg/Walloc-type-1.c
> new file mode 100644
> index 000..bc62e5e9aa3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Walloc-type-1.c
> @@ -0,0 +1,37 @@
> +/* Tests the warnings for insufficient allocation size. 
> +   { dg-do compile }
> + * { dg-options "-Walloc-type" } 
> + * */
> +#include 
> +#include 
> +
> +struct b { int x[10]; };
> +
> +void fo0(void)
> +{
> +struct b *p = malloc(sizeof *p);
> +}
> +
> +void fo1(void)
> +{
> +struct b *p = malloc(sizeof p);  /* { dg-
> warning "allocation of insufficient size" } */
> +}
> +
> +void fo2(void)
> +{
> +struct b *p = 

Re: [PATCH 2/1] c++: passing partially inst ttp as ttp [PR110566]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/21/23 14:34, Patrick Palka wrote:

(This is a follow-up of
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624951.html)

Bootstrapped and regtested on x86_64-pc-linux-gnu, how does this look?

-- >8 --

The previous fix doesn't work for partially instantiated ttps primarily
because most_general_template doesn't work for them.  This patch fixes
this by giving such ttps a DECL_TEMPLATE_INFO (extending the
r11-734-g2fb595f8348e16 fix) with which we can obtain the original ttp.

This patch additionally makes us be more careful about using the correct
amount of levels from the scope of a ttp argument during
coerce_template_template_parms.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Set DECL_TEMPLATE_INFO
on the DECL_TEMPLATE_RESULT of a reduced template template
parameter.
(add_defaults_to_ttp): Also update DECL_TEMPLATE_INFO of the
ttp's DECL_TEMPLATE_RESULT.
(coerce_template_template_parms): Make sure 'scope_args' has
the right amount of levels for the ttp argument.
(most_general_template): Handle template template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp39.C: New test.
---
  gcc/cp/pt.cc  | 46 ---
  gcc/testsuite/g++.dg/template/ttp39.C | 16 ++
  2 files changed, 57 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/ttp39.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e0ed4bc8bbb..be7119dd9a0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4570,8 +4570,14 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  TYPE_DECL, DECL_NAME (decl), type);
  DECL_TEMPLATE_RESULT (decl) = inner;
  DECL_ARTIFICIAL (inner) = true;
- DECL_TEMPLATE_PARMS (decl) = tsubst_template_parms
-   (DECL_TEMPLATE_PARMS (orig_decl), args, complain);
+ tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (orig_decl),
+ args, complain);
+ DECL_TEMPLATE_PARMS (decl) = parms;
+ retrofit_lang_decl (inner);
+ tree orig_inner = DECL_TEMPLATE_RESULT (orig_decl);
+ DECL_TEMPLATE_INFO (inner)
+   = build_template_info (DECL_TI_TEMPLATE (orig_inner),
+  template_parms_to_args (parms));


Should we assert that orig_inner doesn't have its own 
DECL_TEMPLATE_INFO?  I'm wondering if it's possible to reduce the level 
of a TTP more than once.



}
  
/* Attach the TPI to the decl.  */

@@ -7936,6 +7942,19 @@ add_defaults_to_ttp (tree otmpl)
}
  }
  
+  tree oresult = DECL_TEMPLATE_RESULT (otmpl);

+  tree gen_otmpl = DECL_TI_TEMPLATE (oresult);


Hmm, here we're assuming that all TTPs have DECL_TEMPLATE_INFO?


+  tree gen_ntmpl;
+  if (gen_otmpl == otmpl)
+gen_ntmpl = ntmpl;
+  else
+gen_ntmpl = add_defaults_to_ttp (gen_otmpl);
+
+  tree nresult = copy_node (oresult);
+  DECL_TEMPLATE_INFO (nresult) = copy_node (DECL_TEMPLATE_INFO (oresult));
+  DECL_TI_TEMPLATE (nresult) = gen_ntmpl;
+  DECL_TEMPLATE_RESULT (ntmpl) = nresult;
+
hash_map_safe_put (defaulted_ttp_cache, otmpl, ntmpl);
return ntmpl;
  }
@@ -8121,15 +8140,29 @@ coerce_template_template_parms (tree parm_tmpl,
 OUTER_ARGS are not the right outer levels in this case, as they are
 the args we're building up for PARM, and for the coercion we want the
 args for ARG.  If DECL_CONTEXT isn't set for a template template
-parameter, we can assume that it's in the current scope.  In that case
-we might end up adding more levels than needed, but that shouldn't be
-a problem; any args we need to refer to are at the right level.  */
+parameter, we can assume that it's in the current scope.  */
tree ctx = DECL_CONTEXT (arg_tmpl);
if (!ctx && DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
ctx = current_scope ();
tree scope_args = NULL_TREE;
if (tree tinfo = get_template_info (ctx))
scope_args = TI_ARGS (tinfo);
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
+   {
+ int level = TEMPLATE_TYPE_LEVEL (TREE_TYPE (gen_arg_tmpl));
+ int scope_depth = TMPL_ARGS_DEPTH (scope_args);
+ if (scope_depth >= level)
+   /* Only use as many levels from the scope as needed (not
+  including the level of ARG).  */
+   scope_args = strip_innermost_template_args
+ (scope_args, scope_depth - (level - 1));
+
+ /* Add the arguments that appear at the level of ARG.  */
+ tree adj_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (arg_tmpl));
+ adj_args = TMPL_ARGS_LEVEL (adj_args, TMPL_ARGS_DEPTH (adj_args) - 1);
+ scope_args = add_to_template_args (scope_args, adj_args);


Maybe we should add an integer parameter to add_to_template_args so we 
can specify 

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Nathan Sidwell via Gcc-patches

On 7/21/23 10:57, Ben Boeckel wrote:

On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:

On 7/19/23 20:47, Ben Boeckel wrote:

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
(in CMake's case, this is only the modules that are needed; a single
massive mapper file for an entire project would have extra entries) or
"act as a proxy for the socket/program specified" for other
approaches);


This information is in the machine (& human) README section of the CMI.


Ok. That leaves it up to distributing build tools to figure out at
least.


- read the CMIs as it sends to the remote side to gather any other CMIs
that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.


um, the build system needs to create that command line? Where does the build
system get that information?  IIUC it'll need to read some file(s) to do that.


It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.


It occurs to me that the model I am envisioning is similar to CMake's object 
libraries.  Object libraries are a convenient name for a bunch of object files. 
IIUC they're linked by naming the individual object files (or I think the could 
be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
-no-whole-archive.  But for this conversation consider them a bunch of separate 
object files with a convenient group name.


Consider also that object libraries could themselves contain object libraries (I 
don't know of they can, but it seems like a useful concept).  Then one could 
create an object library from a collection of object files and object libraries 
(recursively).  CMake would handle the transitive gtaph.


Now, allow an object library to itself have some kind of tangible, on-disk 
representation.  *BUT* not like a static library -- it doesn't include the 
object files.



Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports 
of a CMI.  CMake knows them, because it knows the graph.





And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?


Module CMIs should be considered a cacheable artifact.  They are neither object
files nor source files.


Sure, cachable sounds fine. What about the installation?

--Ben


--
Nathan Sidwell



Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948]

2023-07-21 Thread David Malcolm via Gcc-patches
On Thu, 2023-07-06 at 16:43 +0200, priour...@gmail.com wrote:
> As per David's suggestion.
> - Improved leading comment of "is_placement_new_p"
> - "kf_operator_new::matches_call_types_p" now checks that arg 0 is of
>   integral type and that arg 1, if any, is of pointer type.
> - Changed ambiguous "int" to "int8_t" and "int64_t" in placement-new-
> size.C
>   to trigger a target independent out-of-bounds warning.
>   Other OOB tests were not based on the size of types, but on the
> number
>   elements, so them using "int" didn't lead to any ambiguity.
> 
> contrib/check_GNU_style.sh still complains about a space before
> square
> brackets in string "operator new []", but as before, this one space
> is
> mandatory for a correct recognition of the function.
> 
> Changes succesfully regstrapped on x86_64-linux-gnu against trunk
> 3c776fdf1a8.
> 
> Is it OK for trunk ?
> Thanks again,
> Benjamin.

Hi Benjamin, thanks for the updated patch.

As before, this looks close to being ready, but I have some further
comments:

[...snip...]

> diff --git a/gcc/analyzer/kf-lang-cp.cc b/gcc/analyzer/kf-lang-cp.cc
> index 393b4f25e79..ef057da863f 100644
> --- a/gcc/analyzer/kf-lang-cp.cc
> +++ b/gcc/analyzer/kf-lang-cp.cc
> @@ -35,6 +35,49 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #if ENABLE_ANALYZER
>  
> +/* Return true if CALL is a non-allocating operator new or operator new []
> +  that contains no user-defined args, i.e. having any signature of:
> +
> +- void* operator new (std::size_t count, void* ptr);
> +- void* operator new[] (std::size_t count, void* ptr);
> +
> +  See https://en.cppreference.com/w/cpp/memory/new/operator_new.  */
> +
> +bool is_placement_new_p (const gcall *call)
> +{
> +  gcc_assert (call);
> +
> +  tree fndecl = gimple_call_fndecl (call);
> +  if (!fndecl)
> +return false;
> +
> +  if (!is_named_call_p (fndecl, "operator new", call, 2)
> +&& !is_named_call_p (fndecl, "operator new []", call, 2))
> +return false;
> +  tree arg1 = gimple_call_arg (call, 1);
> +
> +  if (!POINTER_TYPE_P (TREE_TYPE (arg1)))
> +return false;
> +
> +  /* We must distinguish between an allocating non-throwing new
> +and a non-allocating new.
> +
> +The former might have one of the following signatures :
> +void* operator new (std::size_t count, const std::nothrow_t& tag);
> +void* operator new[] (std::size_t count, const std::nothrow_t& tag);
> +
> +However, debugging has shown that TAG is actually a POINTER_TYPE,
> +not a REFERENCE_TYPE.
> +
> +Thus, we cannot easily differentiate the types, but we instead have to
> +check if the second argument's type identifies as nothrow_t.  */
> +  tree identifier = TYPE_IDENTIFIER (TREE_TYPE (TREE_TYPE (arg1)));
> +  if (!identifier)
> +return true;
> +  const char *name = IDENTIFIER_POINTER (identifier);
> +  return 0 != strcmp (name, "nothrow_t");
> +}
> +

If we're looking for a simple "void *", wouldn't it be simpler and
cleaner to check for arg1 being a pointer to a type that's VOID_TYPE_P,
rather than this name comparison?

[...snip...]

> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index a8c63eb1ce8..41c313c07dd 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -754,7 +754,7 @@ public:
>  override
>{
>  if (change.m_old_state == m_sm.get_start_state ()
> - && unchecked_p (change.m_new_state))
> + && (unchecked_p (change.m_new_state) || nonnull_p (change.m_new_state)))
>// TODO: verify that it's the allocation stmt, not a copy
>return label_text::borrow ("allocated here");
>  if (unchecked_p (change.m_old_state)
> @@ -1910,11 +1910,16 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
>   return true;
> }
>  
> - if (is_named_call_p (callee_fndecl, "operator new", call, 1))
> -   on_allocator_call (sm_ctxt, call, _scalar_delete);
> - else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
> -   on_allocator_call (sm_ctxt, call, _vector_delete);
> - else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
> + if (!is_placement_new_p (call))
> +   {
> +  bool returns_nonnull = !TREE_NOTHROW (callee_fndecl) && flag_exceptions;
> +  if (is_named_call_p (callee_fndecl, "operator new"))
> +on_allocator_call (sm_ctxt, call, _scalar_delete, returns_nonnull);
> +  else if (is_named_call_p (callee_fndecl, "operator new []"))
> +on_allocator_call (sm_ctxt, call, _vector_delete, returns_nonnull);
> +   }
> +
> + if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
>|| is_named_call_p (callee_fndecl, "operator delete", call, 2))
> {
>   on_deallocator_call (sm_ctxt, node, call,

It looks like something's gone wrong with the indentation in the above:
previously we had tab characters, but now I'm seeing a pair of spaces,
which means this wouldn't line up properly.  This might be a 

Re: [pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-21 Thread Vladimir Makarov via Gcc-patches



On 7/20/23 16:45, Rainer Orth wrote:

Hi Vladimir,


The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, and
ppc64le.

There is still avr poring problem with reloading of subreg of frame
pointer.  I'll address it later on this week.

this patch most likely broke sparc-sun-solaris2.11 bootstrap:

/var/gcc/regression/master/11.4-gcc/build/./gcc/xgcc 
-B/var/gcc/regression/master/11.4-gcc/build/./gcc/ 
-B/vol/gcc/sparc-sun-solaris2.11/bin/ -B/vol/gcc/sparc-sun-solaris2.11/lib/ 
-isystem /vol/gcc/sparc-sun-solaris2.11/include -isystem 
/vol/gcc/sparc-sun-solaris2.11/sys-include   -fchecking=1 -c -g -O2   -W -Wall 
-gnatpg -nostdinc   g-alleve.adb -o g-alleve.o
+===GNAT BUG DETECTED==+
| 14.0.0 20230720 (experimental) [master 
506f068e7d01ad2fb107185b8fb204a0ec23785c] (sparc-sun-solaris2.11) GCC error:|
| in update_reg_eliminate, at lra-eliminations.cc:1179 |
| Error detected around g-alleve.adb:4132:8

This is in stage 3.  I haven't investigated further yet.


Thank you for reporting this.  I'll try to fix on this week.  I have a 
patch but unfortunately bootstrap is too slow.  If the patch does not 
work, I'll revert the original patch.





[committed] Require target lra in gcc.c-torture/compile/asmgoto-6.c

2023-07-21 Thread John David Anglin
The asmgoto feature requires LRA support.

Committed to trunk. Tested on hppa64-hp-hpux11.11.

Dave
---

Require target lra in gcc.c-torture/compile/asmgoto-6.c

2023-07-21  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/asmgoto-6.c: Require target lra.

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c 
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
index 0652bd4e4e1..6799b83c20a 100644
--- a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
@@ -1,5 +1,5 @@
 
-/* { dg-do compile } */
+/* { dg-do compile { target lra } } */
 /* PR middle-end/110420 */
 /* PR middle-end/103979 */
 /* PR middle-end/98619 */


signature.asc
Description: PGP signature


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm
missing something?` 


I don't think we can avoid FMV in this case

    fmv.d.x    fa0,zero #1
    addi    a5,a0,64
.L2:
    fld    fa5,0(a0)
    addi    a0,a0,8
    fadd.d    fa0,fa0,fa5   #2
    bne    a0,a5,.L2
    ret

In #1, the zero needs to be setup in FP reg (possible using FMV), since 
in #2 it will be used for FP math.


If we change ur test slightly,

double zadd(double *d) {
 double sum = 0.0;
 for (int i = 0; i < 8; ++i)
   d[i] = sum;
 return sum;
}

We still get the optimal code for writing to FP 0. The last FMV is 
unavoidable as we need an FP return reg.



    addi    a5,a0,64
.L2:
    sd    zero,0(a0)
    addi    a0,a0,8
    bne    a0,a5,.L2
    fmv.d.x    fa0,zero
    ret


Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches



>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 3063e71c8906..b3be65d3efae 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -946,8 +946,8 @@ Objective-C and Objective-C++ Dialects}.
>>
>>  @emph{eBPF Options}
>>  @gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
>> --mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re
>> --mjmpext -mjmp32 -malu32 -mcpu=@var{version}}
>> +-mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
>> +-mjmp32 -malu32 -mcpu=@var{version} -masm=@var{dialect>}}
>
> There is a spurious > character there.
>
> Other than that, the patch is OK.
> Thanks!

Fixed the extra character and committed.
Thanks !


[PATCH] gcc-13/changes.html: Add and fix URL to -fstrict-flex-array option.

2023-07-21 Thread Qing Zhao via Gcc-patches
Hi,

In the current GCC13 release note, the URL to the option -fstrict-flex-array
is wrong (pointing to -Wstrict-flex-array).
This is the change to correct the URL and also add the URL in another place
where -fstrict-flex-array is mentioned.

I have checked the resulting HTML file, works well.

Okay for committing?

thanks.

Qing
---
 htdocs/gcc-13/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 68e8c5cc..39b63a84 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -46,7 +46,7 @@ You may also want to check out our
   will no longer issue warnings for out of
   bounds accesses to trailing struct members of one-element array type
   anymore. Instead it diagnoses accesses to trailing arrays according to
-  -fstrict-flex-arrays. 
+  https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays;>-fstrict-flex-arrays.
 
 https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Static-Analyzer-Options.html;>-fanalyzer
   is still only suitable for analyzing C code.
   In particular, using it on C++ is unlikely to give meaningful 
output.
@@ -213,7 +213,7 @@ You may also want to check out our
  flexible array member for the purpose of accessing the elements of such
  an array. By default, all trailing arrays in aggregates are treated as
  flexible array members. Use the new command-line option
- https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Warning-Options.html#index-Wstrict-flex-arrays;>-fstrict-flex-arrays
+ https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays;>-fstrict-flex-arrays
  to control which array members are treated as flexible arrays.
  
 
-- 
2.31.1



Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
optimize it.


void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.


Given that it can ICE, we should probably backport it to 13.


FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree 
already. And this will merge ahead of that.

I'm fine with backport though.




Ran thru full multilib testsuite, there was 1 false failure due to


Did you run the test with autovec?


I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite 
being enabled at -O2 and above will not kick in.

I think we should add a 'v' multilib.


There's also a pmode_reg_or_0_operand, some of those don't appear 
protected from FP values.  So we might need something like


diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign"

(define_expand "len_mask_gather_load"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "")
   (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if 
that

could manifest as an actual bug, though...


What does 'P' do here ?


+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm

missing something?`


I need to unpack this first :-)



diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c

index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */


I think that autovec one is the only possible dependency that might 
have snuck

in, so we should be safe otherwise.  Thanks!


I'm not sure if this specific comment is related to the xthead test or 
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler 
output.




Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Jeff Law via Gcc-patches




On 7/21/23 12:31, Palmer Dabbelt wrote:



(define_expand "len_mask_gather_load"
   [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
    (match_operand:VNX1_QHSDI 2 "register_operand")
    (match_operand 3 "")
    (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...
But won't this cause (const_int 0) to no longer match because CONST_INT 
nodes are modeless (VOIDmode)?


Jeff


Re: [COMMITTED] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches


This patch fixes define_insn for "neg" to support 2 operands.
Initial implementation assumed the format "neg %0" while the instruction
allows both a destination and source operands. The second operand can
either be a register or an immediate value.

gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@

 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])

 ;;; Multiplication
--
2.38.1


[PATCH 2/1] c++: passing partially inst ttp as ttp [PR110566]

2023-07-21 Thread Patrick Palka via Gcc-patches
(This is a follow-up of
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624951.html)

Bootstrapped and regtested on x86_64-pc-linux-gnu, how does this look?

-- >8 --

The previous fix doesn't work for partially instantiated ttps primarily
because most_general_template doesn't work for them.  This patch fixes
this by giving such ttps a DECL_TEMPLATE_INFO (extending the
r11-734-g2fb595f8348e16 fix) with which we can obtain the original ttp.

This patch additionally makes us be more careful about using the correct
amount of levels from the scope of a ttp argument during
coerce_template_template_parms.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Set DECL_TEMPLATE_INFO
on the DECL_TEMPLATE_RESULT of a reduced template template
parameter.
(add_defaults_to_ttp): Also update DECL_TEMPLATE_INFO of the
ttp's DECL_TEMPLATE_RESULT.
(coerce_template_template_parms): Make sure 'scope_args' has
the right amount of levels for the ttp argument.
(most_general_template): Handle template template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp39.C: New test.
---
 gcc/cp/pt.cc  | 46 ---
 gcc/testsuite/g++.dg/template/ttp39.C | 16 ++
 2 files changed, 57 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp39.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e0ed4bc8bbb..be7119dd9a0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4570,8 +4570,14 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  TYPE_DECL, DECL_NAME (decl), type);
  DECL_TEMPLATE_RESULT (decl) = inner;
  DECL_ARTIFICIAL (inner) = true;
- DECL_TEMPLATE_PARMS (decl) = tsubst_template_parms
-   (DECL_TEMPLATE_PARMS (orig_decl), args, complain);
+ tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (orig_decl),
+ args, complain);
+ DECL_TEMPLATE_PARMS (decl) = parms;
+ retrofit_lang_decl (inner);
+ tree orig_inner = DECL_TEMPLATE_RESULT (orig_decl);
+ DECL_TEMPLATE_INFO (inner)
+   = build_template_info (DECL_TI_TEMPLATE (orig_inner),
+  template_parms_to_args (parms));
}
 
   /* Attach the TPI to the decl.  */
@@ -7936,6 +7942,19 @@ add_defaults_to_ttp (tree otmpl)
}
 }
 
+  tree oresult = DECL_TEMPLATE_RESULT (otmpl);
+  tree gen_otmpl = DECL_TI_TEMPLATE (oresult);
+  tree gen_ntmpl;
+  if (gen_otmpl == otmpl)
+gen_ntmpl = ntmpl;
+  else
+gen_ntmpl = add_defaults_to_ttp (gen_otmpl);
+
+  tree nresult = copy_node (oresult);
+  DECL_TEMPLATE_INFO (nresult) = copy_node (DECL_TEMPLATE_INFO (oresult));
+  DECL_TI_TEMPLATE (nresult) = gen_ntmpl;
+  DECL_TEMPLATE_RESULT (ntmpl) = nresult;
+
   hash_map_safe_put (defaulted_ttp_cache, otmpl, ntmpl);
   return ntmpl;
 }
@@ -8121,15 +8140,29 @@ coerce_template_template_parms (tree parm_tmpl,
 OUTER_ARGS are not the right outer levels in this case, as they are
 the args we're building up for PARM, and for the coercion we want the
 args for ARG.  If DECL_CONTEXT isn't set for a template template
-parameter, we can assume that it's in the current scope.  In that case
-we might end up adding more levels than needed, but that shouldn't be
-a problem; any args we need to refer to are at the right level.  */
+parameter, we can assume that it's in the current scope.  */
   tree ctx = DECL_CONTEXT (arg_tmpl);
   if (!ctx && DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
ctx = current_scope ();
   tree scope_args = NULL_TREE;
   if (tree tinfo = get_template_info (ctx))
scope_args = TI_ARGS (tinfo);
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
+   {
+ int level = TEMPLATE_TYPE_LEVEL (TREE_TYPE (gen_arg_tmpl));
+ int scope_depth = TMPL_ARGS_DEPTH (scope_args);
+ if (scope_depth >= level)
+   /* Only use as many levels from the scope as needed (not
+  including the level of ARG).  */
+   scope_args = strip_innermost_template_args
+ (scope_args, scope_depth - (level - 1));
+
+ /* Add the arguments that appear at the level of ARG.  */
+ tree adj_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (arg_tmpl));
+ adj_args = TMPL_ARGS_LEVEL (adj_args, TMPL_ARGS_DEPTH (adj_args) - 1);
+ scope_args = add_to_template_args (scope_args, adj_args);
+   }
+
   pargs = add_to_template_args (scope_args, pargs);
 
   pargs = coerce_template_parms (gen_arg_parms, pargs, NULL_TREE, tf_none);
@@ -25985,6 +26018,9 @@ most_general_template (tree decl)
return NULL_TREE;
 }
 
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (decl))
+return DECL_TI_TEMPLATE (DECL_TEMPLATE_RESULT (decl));
+
   /* Look for more 

Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Palmer Dabbelt

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.


Given that it can ICE, we should probably backport it to 13.


Ran thru full multilib testsuite, there was 1 false failure due to


Did you run the test with autovec?  There's also a 
pmode_reg_or_0_operand, some of those don't appear protected from FP 
values.  So we might need something like


diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign"

(define_expand "len_mask_gather_load"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "")
   (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...


random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))

 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))

 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same
way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe I'm
missing something?`


diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */


I think that autovec one is the only possible dependency that might have snuck
in, so we should be safe otherwise.  Thanks!

Reviewed-by: Palmer Dabbelt 


[PATCH v2] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
(gcc-13 regression)

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was getting triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output, which is
also fixed in the patch.

gcc/Changelog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - No code changes
  - Updated commitlog: typo, "Fixes:" tag, mention PR in Changelog entry
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:15, Philipp Tomsich wrote:

On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret
This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"


Fixed.


is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Should we add a "Fixes: "?


Sure. Although gcc usage of Fixes tag seems slightly different than say 
linux kernel's.






Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

PR target/110748


Added.

Thx,
-Vineet





 * config/riscv/predicates.md (const_0_operand): Add back
   const_double.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/pr110748-1.c: New Test.
 * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
   patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/predicates.md |  2 +-
  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
  3 files changed, 15 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
 (match_test "INTVAL (op) + 1 != 0")))

  (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
 (match_test "op == CONST0_RTX (GET_MODE (op))")))

  (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
  /* { dg-final { scan-assembler "fmv.x.w" } } */
  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
--
2.34.1





Re: [PATCH v4] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Better with the commit message.
OK.  Thanks.

> This patch fixes define_insn for "neg" to support 2 operands.
> Initial implementation assumed the format "neg %0" while the instruction
> allows both a destination and source operands. The second operand can
> either be a register or an immediate value.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..adf11e151df1 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -139,10 +139,10 @@
>  
>  ;;; Negation
>  (define_insn "neg2"
> -  [(set (match_operand:AM 0 "register_operand" "=r")
> -(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
> +  [(set (match_operand:AM 0 "register_operand" "=r,r")
> +(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])
>  
>  ;;; Multiplication


[PATCH v4] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
This patch fixes define_insn for "neg" to support 2 operands.
Initial implementation assumed the format "neg %0" while the instruction
allows both a destination and source operands. The second operand can
either be a register or an immediate value.

gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.38.1



[PATCH] match.pd, v2: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-21 Thread Drew Ross via Gcc-patches
Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
Also adds the macro bitwise_equal_p for generic and gimple which
returns true iff EXPR1 and EXPR2 have the same value. This helps 
to reduce the number of nop_converts necessary to match the pattern. 
Tested successfully on x86_64 and x86 targets.

PR middle-end/109986

gcc/ChangeLog:

* generic-match-head.cc (bitwise_equal_p): New macro.
* gimple-match-head.cc (bitwise_equal_p): New macro.
(gimple_nop_convert): Declare.
(gimple_bitwise_equal_p): Helper for bitwise_equal_p.
* match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr109986.c: New test.
* gcc.dg/tree-ssa/pr109986.c: New test.

Co-authored-by: Jakub Jelinek 
---
 gcc/generic-match-head.cc |  17 ++
 gcc/gimple-match-head.cc  |  36 
 gcc/match.pd  |   6 +
 .../gcc.c-torture/execute/pr109986.c  |  41 
 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177 ++
 5 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index f011204c5be..b4b5bc88f4b 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -102,3 +102,20 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+/* Return true if EXPR1 and EXPR2 have the same value, but not necessarily
+   same type.  The types can differ through nop conversions.  */
+
+static inline bool
+bitwise_equal_p (tree expr1, tree expr2)
+{
+  STRIP_NOPS (expr1);
+  STRIP_NOPS (expr2);
+  if (expr1 == expr2)
+return true;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == wi::to_wide (expr2);
+  return operand_equal_p (expr1, expr2, 0);
+}
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b08cd891a13..f960d6cf0b9 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -224,3 +224,39 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
 }
   return true;
 }
+
+/* Return true if EXPR1 and EXPR2 have the same value, but not necessarily
+   same type.  The types can differ through nop conversions.  */
+#define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, 
valueize)
+
+bool gimple_nop_convert (tree, tree *, tree (*)(tree));
+
+/* Helper function for bitwise_equal_p macro.  */
+
+static inline bool
+gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
+{
+  if (expr1 == expr2)
+return true;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == wi::to_wide (expr2);
+  if (operand_equal_p (expr1, expr2, 0))
+return true;
+  tree expr3, expr4;
+  if (!gimple_nop_convert (expr1, , valueize))
+expr3 = expr1;
+  if (!gimple_nop_convert (expr2, , valueize))
+expr4 = expr2;
+  if (expr1 != expr3)
+{
+  if (operand_equal_p (expr3, expr2, 0))
+return true;
+  if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
+return true;
+}
+  if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
+return true;
+  return false;
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index a17d6838c14..367e4fc5517 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1627,6 +1627,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert (bit_and @1 (bit_not @0)
 
+/* (~X | Y) ^ X -> ~(X & Y).  */
+(simplify
+ (bit_xor:c (nop_convert1? (bit_ior:c (nop_convert2? (bit_not @0)) @1)) @2)
+ (if (bitwise_equal_p (@0, @2))
+  (convert (bit_not (bit_and @0 (convert @1))
+
 /* Convert ~X ^ ~Y to X ^ Y.  */
 (simplify
  (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109986.c 
b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
new file mode 100644
index 000..00ee9888539
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
@@ -0,0 +1,41 @@
+/* PR middle-end/109986 */
+
+#include "../../gcc.dg/tree-ssa/pr109986.c"
+
+int 
+main ()
+{
+  if (t1 (29789, 29477) != -28678) __builtin_abort ();
+  if (t2 (20196, -18743) != 4294965567) __builtin_abort ();
+  if (t3 (127, 99) != -100) __builtin_abort ();
+  if (t4 (100, 53) != 219) __builtin_abort ();
+  if (t5 (20100, 1283) != -1025) __builtin_abort ();
+  if (t6 (20100, 10283) != 63487) __builtin_abort ();
+  if (t7 (2136614690L, 1136698390L) != -1128276995L) __builtin_abort ();
+  if (t8 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();

Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Philipp Tomsich
On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize 
> it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd  zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"

> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Should we add a "Fixes: "?

> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:

PR target/110748

>
> * config/riscv/predicates.md (const_0_operand): Add back
>   const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>   patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/predicates.md |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index ..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>


[COMMITTED] MAINTAINERS: Add myself to write after approval

2023-07-21 Thread Cupertino Miranda via Gcc-patches


Hi everyone,

Just to confirm that I pushed the change in MAINTAINERS file, adding
myself to the write after approval list.

Thanks,
Cupertino


Re: [PATCH v3] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Hi Cuper.
OK.  Thanks!

> From 7756a4becd1934e55d6d14ac4a9fd6d408a4797b Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Fri, 21 Jul 2023 17:40:07 +0100
> Subject: [PATCH v3] bpf: fixed template for neg (added second operand)
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..adf11e151df1 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -139,10 +139,10 @@
>  
>  ;;; Negation
>  (define_insn "neg2"
> -  [(set (match_operand:AM 0 "register_operand" "=r")
> -(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
> +  [(set (match_operand:AM 0 "register_operand" "=r,r")
> +(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])
>  
>  ;;; Multiplication


[PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/20/23 15:51, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:

On 7/20/23 14:13, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:

On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?


Looks reasonable to me.


Thanks.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.


Sounds plausible.  I think my patch could be applied first since it
removes a tiny bit of code, then I can hopefully remove the flag below,
then maybe go back and optimize the call to is_rvalue_constant_expression.
Does that sound sensible?


Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?


It looks that way.  Seems it's only used in cp_parser_constant_expression:
10806   if (allow_non_constant_p)
10807 *non_constant_p = parser->non_integral_constant_expression_p;
but that could be easily replaced by a local var.  I'd be happy to see if
we can actually do away with it.  (I wonder why it was introduced and when
it actually stopped being useful.)


It was for the C++98 notion of constant-expression, which was more of a
parser-level notion, and has been supplanted by the C++11 version.  I'm
happy to remove it, and therefore remove the is_rvalue_constant_expression
call.


Wonderful.  I'll do that next.
  

-- >8 --

is_really_empty_class is liable to crash when it gets an incomplete
or dependent type.  Since r11-557, we pass the yet-uninstantiated
class type S<0> of the PARM_DECL s to is_really_empty_class -- because
of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
change in cp_parser_constant_expression.  Here we're not parsing
a template so we did not check COMPLETE_TYPE_P as we should.

PR c++/110106

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Check COMPLETE_TYPE_P
even when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept80.C: New test.
---
   gcc/cp/constexpr.cc |  2 +-
   gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
   2 files changed, 13 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..1f59c5472fb 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9116,7 +9116,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 if (now && want_rval)
{
  tree type = TREE_TYPE (t);
- if ((processing_template_decl && !COMPLETE_TYPE_P (type))
+ if (!COMPLETE_TYPE_P (type)
  || dependent_type_p (type)


There shouldn't be a problem completing the type here, so it seems to me
that we're missing a call to complete_type_p, at least when
!processing_template_decl.  Probably need to move the dependent_type_p check
up as a result.


Like so?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
is_really_empty_class is liable to crash when it gets an incomplete
or dependent type.  Since r11-557, we pass the yet-uninstantiated
class type S<0> of the PARM_DECL s to is_really_empty_class -- because
of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
change in cp_parser_constant_expression.  Here we're not parsing
a template so we did not check COMPLETE_TYPE_P as we should.

It should work to complete the type before checking COMPLETE_TYPE_P.

PR c++/110106

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Try to complete the
type when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept80.C: New test.
---
  gcc/cp/constexpr.cc |  5 +++--
  gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
  2 files changed, 15 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..fb94f3cefcb 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9116,8 +9116,9 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
if (now && want_rval)
{
  tree type = TREE_TYPE (t);
- if ((processing_template_decl && !COMPLETE_TYPE_P (type))
- || dependent_type_p (type)
+ if (dependent_type_p (type)
+ || 

Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/20/23 17:58, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 03:51:32PM -0400, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:

On 7/20/23 14:13, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:

On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?


Looks reasonable to me.


Thanks.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.


Sounds plausible.  I think my patch could be applied first since it
removes a tiny bit of code, then I can hopefully remove the flag below,
then maybe go back and optimize the call to is_rvalue_constant_expression.
Does that sound sensible?


Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?


It looks that way.  Seems it's only used in cp_parser_constant_expression:
10806   if (allow_non_constant_p)
10807 *non_constant_p = parser->non_integral_constant_expression_p;
but that could be easily replaced by a local var.  I'd be happy to see if
we can actually do away with it.  (I wonder why it was introduced and when
it actually stopped being useful.)


It was for the C++98 notion of constant-expression, which was more of a
parser-level notion, and has been supplanted by the C++11 version.  I'm
happy to remove it, and therefore remove the is_rvalue_constant_expression
call.


Wonderful.  I'll do that next.


I found a use of parser->non_integral_constant_expression_p:
finish_id_expression_1 can set it to true which then makes
a difference in cp_parser_constant_expression in C++98.  In
cp_parser_constant_expression we set n_i_c_e_p to false, call
cp_parser_assignment_expression in which finish_id_expression_1
sets n_i_c_e_p to true, then back in cp_parser_constant_expression
we skip the cxx11 block, and set *non_constant_p to true.  If I
remove n_i_c_e_p, we lose that.  This can be seen in init/array60.C.


Sure, we would need to use the C++11 code for C++98 mode, which is 
likely fine but is more uncertain.


It's probably simpler to just ignore n_i_c_e_p for C++11 and up, along 
with Patrick's suggestion of allowing null non_constant_p with true 
allow_non_constant_p.


Jason



Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 21, 2023 at 5:13 AM Matthew Malcomson via Gcc-patches
 wrote:
>
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
>
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

Oh this does explain why powerpc also sees it: https://gcc.gnu.org/PR110316 .
I wonder if not adding noinline here but rather changing the code to
tolerate the fused multiple-subtract instead
which is kinda related to what I suggested in comment #1 in PR 110316.

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * timevar.cc (get_time): Make this noinline to avoid fusing
> behaviour and associated test flakyness.
>
>
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 
> d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
>  100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const 
> timevar_time_def *total)
> HAVE_WALL_TIME macros.  */
>
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>now->user = 0;
>
>
>


Re: Fix optimize_mask_stores profile update

2023-07-21 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%.  This patch fixes it.
> > Richi, I don't know how to create a testcase, but having one would
> > be nice.
> >
> > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > late email)
> 
> This should trigger with sth like
> 
>   for (i)
> if (cond[i])
>   out[i] = 1.;
> 
> so a masked store and then using AVX2+.  ISTR we disable AVX masked
> stores on zen (but not AVX512).

Richard,
if we know probability of if (cond[i]) to be p,
then we know that the combined conditional is somewhere between
  low = p  (the strategy packing true and falses into VF sized
blocks)
and
  high = min (p*vf,1)
   (the stragegy doing only one true per block if possible)
Likely value is

  likely = 1-pow(1-p, vf)

I wonder if we can work out p at least in common cases. 
Making store unlikely as we do right now will place it offline with
extra jump.  Making it likely is better unless p is very small.

I think if p is close to 0 or 1 which may be common case the analysis
above may be useful. If range [low...high] is small, we can use likely
and keep it as reliable.
If it is high, we can probably just end up with guessed value close but
above 50% so the store stays inline.

Honza


Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-21 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
 wrote:
>
> Simplifies (x << c) >> c where x is a signed integral type of
> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> on x86_64 and x86 targets.

Thinking about this some more, I think this should be handled in
expand rather than on the gimple level.
It is very much related to PR 110717 even. We are basically truncating
to a signed one bit integer and then sign extending that across the
whole code.

Thanks,
Andrew

>
> PR middle-end/101955
>
> gcc/ChangeLog:
>
> * match.pd (x << c) >> c -> -(x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd| 10 +
>  gcc/testsuite/gcc.dg/pr101955.c | 69 +
>  2 files changed, 79 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..820fc890e8e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& (wi::ltu_p (wi::to_wide (@1), element_precision (type
>(bit_and @0 (rshift { build_minus_one_cst (type); } @1
>
> +/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
> +   into -(X & 1).  */
> +(simplify
> + (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
> + (with { tree cst = uniform_integer_cst_p (@1); }
> + (if (ANY_INTEGRAL_TYPE_P (type)
> +  && !TYPE_UNSIGNED (type)
> +  && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
> +  (negate (bit_and (convert @0) { build_one_cst (type); })
> +
>  /* Optimize x >> x into 0 */
>  (simplify
>   (rshift @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
> new file mode 100644
> index 000..386154911c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int;
> +
> +__attribute__((noipa)) int
> +t1 (int x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) int
> +t2 (int x)
> +{
> +  int y = x << 31;
> +  int z = y >> 31;
> +  return z;
> +}
> +
> +__attribute__((noipa)) int
> +t3 (int x)
> +{
> +  int w = 31;
> +  int y = x << w;
> +  int z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t4 (long long x)
> +{
> +  return (x << 63) >> 63;
> +}
> +
> +__attribute__((noipa)) long long
> +t5 (long long x)
> +{
> +  long long y = x << 63;
> +  long long z = y >> 63;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long x)
> +{
> +  int w = 63;
> +  long long y = x << w;
> +  long long z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) v4si
> +t7 (v4si x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) v4si
> +t8 (v4si x)
> +{
> +  v4si t = {31,31,31,31};
> +  return (x << t) >> t;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
> +
> --
> 2.39.3
>


Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-21 Thread Richard Sandiford via Gcc-patches
Jan Hubicka  writes:
> Avoid scaling flat loop profiles of vectorized loops
>
> As discussed, when vectorizing loop with static profile, it is not always 
> good idea
> to divide the header frequency by vectorization factor because the profile may
> not realistically represent the expected number of iterations.  Since in such 
> cases
> we default to relatively low iteration counts (based on average for 
> spec2k17), this
> will make vectorized loop body look cold.
>
> This patch makes vectorizer to look for flat profiles and only possibly 
> reduce the
> profile by known upper bound on iteration counts.
>
> Bootstrapp/regtested of x86_64-linux in progress. I intend to commit this 
> after
> testers pick other profile related changes from today.
> Tamar, Richard, it would be nice to know if it fixes the testcase you was 
> looking at
> and possibly turn it into a testcase?

Yeah, it does!  Thanks for the quick fix.

The test was gcc.target/aarch64/sve/live_1.c.  Although it wasn't
originally a profile test, I think it should still be a relatively good
way of testing that the latch is treated as more likely than the exit,
without needing to check for that explicitly.

Richard

>
> gcc/ChangeLog:
>
>   * tree-vect-loop.cc (scale_profile_for_vect_loop): Avoid scaling flat
>   profiles by vectorization factor.
>   (vect_transform_loop): Check for flat profiles.
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..d036a7d4480 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10837,11 +10837,25 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> gimple_stmt_iterator *gsi,
>  }
>  
>  /* Scale profiling counters by estimation for LOOP which is vectorized
> -   by factor VF.  */
> +   by factor VF.
> +   If FLAT is true, the loop we started with had unrealistically flat
> +   profile.  */
>  
>  static void
> -scale_profile_for_vect_loop (class loop *loop, unsigned vf)
> +scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
>  {
> +  /* For flat profiles do not scale down proportionally by VF and only
> + cap by known iteration count bounds.  */
> +  if (flat)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Vectorized loop profile seems flat; not scaling iteration "
> +  "count down by the vectorization factor %i\n", vf);
> +  scale_loop_profile (loop, profile_probability::always (),
> +   get_likely_max_loop_iterations_int (loop));
> +  return;
> +}
>/* Loop body executes VF fewer times and exit increases VF times.  */
>edge exit_e = single_exit (loop);
>profile_count entry_count = loop_preheader_edge (loop)->count ();
> @@ -10852,7 +10866,13 @@ scale_profile_for_vect_loop (class loop *loop, 
> unsigned vf)
>while (vf > 1
>&& loop->header->count > entry_count
>&& loop->header->count < entry_count * vf)
> -vf /= 2;
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Vectorization factor %i seems too large for profile "
> +  "prevoiusly believed to be consistent; reducing.\n", vf);
> +  vf /= 2;
> +}
>  
>if (entry_count.nonzero_p ())
>  set_edge_probability_and_rescale_others
> @@ -11184,6 +11204,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>gimple *stmt;
>bool check_profitability = false;
>unsigned int th;
> +  bool flat = maybe_flat_loop_profile (loop);
>  
>DUMP_VECT_SCOPE ("vec_transform_loop");
>  
> @@ -11252,7 +11273,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> _vector, _vector_mult_vf, th,
> check_profitability, niters_no_overflow,
> );
> -
>if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
>&& LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
>  scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
> @@ -11545,7 +11565,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> assumed_vf) - 1
>: wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
>  assumed_vf) - 1);
> -  scale_profile_for_vect_loop (loop, assumed_vf);
> +  scale_profile_for_vect_loop (loop, assumed_vf, flat);
>  
>if (dump_enabled_p ())
>  {


Re: [PATCH v3] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
>From 7756a4becd1934e55d6d14ac4a9fd6d408a4797b Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Fri, 21 Jul 2023 17:40:07 +0100
Subject: [PATCH v3] bpf: fixed template for neg (added second operand)

gcc/ChangeLog:

	* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-21 Thread Jan Hubicka via Gcc-patches
Avoid scaling flat loop profiles of vectorized loops

As discussed, when vectorizing loop with static profile, it is not always good 
idea
to divide the header frequency by vectorization factor because the profile may
not realistically represent the expected number of iterations.  Since in such 
cases
we default to relatively low iteration counts (based on average for spec2k17), 
this
will make vectorized loop body look cold.

This patch makes vectorizer to look for flat profiles and only possibly reduce 
the
profile by known upper bound on iteration counts.

Bootstrapp/regtested of x86_64-linux in progress. I intend to commit this after
testers pick other profile related changes from today.
Tamar, Richard, it would be nice to know if it fixes the testcase you was 
looking at
and possibly turn it into a testcase?

gcc/ChangeLog:

* tree-vect-loop.cc (scale_profile_for_vect_loop): Avoid scaling flat
profiles by vectorization factor.
(vect_transform_loop): Check for flat profiles.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..d036a7d4480 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10837,11 +10837,25 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
 }
 
 /* Scale profiling counters by estimation for LOOP which is vectorized
-   by factor VF.  */
+   by factor VF.
+   If FLAT is true, the loop we started with had unrealistically flat
+   profile.  */
 
 static void
-scale_profile_for_vect_loop (class loop *loop, unsigned vf)
+scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
 {
+  /* For flat profiles do not scale down proportionally by VF and only
+ cap by known iteration count bounds.  */
+  if (flat)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Vectorized loop profile seems flat; not scaling iteration "
+"count down by the vectorization factor %i\n", vf);
+  scale_loop_profile (loop, profile_probability::always (),
+ get_likely_max_loop_iterations_int (loop));
+  return;
+}
   /* Loop body executes VF fewer times and exit increases VF times.  */
   edge exit_e = single_exit (loop);
   profile_count entry_count = loop_preheader_edge (loop)->count ();
@@ -10852,7 +10866,13 @@ scale_profile_for_vect_loop (class loop *loop, 
unsigned vf)
   while (vf > 1
 && loop->header->count > entry_count
 && loop->header->count < entry_count * vf)
-vf /= 2;
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Vectorization factor %i seems too large for profile "
+"prevoiusly believed to be consistent; reducing.\n", vf);
+  vf /= 2;
+}
 
   if (entry_count.nonzero_p ())
 set_edge_probability_and_rescale_others
@@ -11184,6 +11204,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   gimple *stmt;
   bool check_profitability = false;
   unsigned int th;
+  bool flat = maybe_flat_loop_profile (loop);
 
   DUMP_VECT_SCOPE ("vec_transform_loop");
 
@@ -11252,7 +11273,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  _vector, _vector_mult_vf, th,
  check_profitability, niters_no_overflow,
  );
-
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
   && LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
 scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
@@ -11545,7 +11565,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  assumed_vf) - 1
 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
   assumed_vf) - 1);
-  scale_profile_for_vect_loop (loop, assumed_vf);
+  scale_profile_for_vect_loop (loop, assumed_vf, flat);
 
   if (dump_enabled_p ())
 {


Re: [PATCH] bpf: fixed template for neg (added second operand)

2023-07-21 Thread David Faust via Gcc-patches
Hi Cupertino,

On 7/21/23 09:43, Cupertino Miranda wrote:
> gcc/ChangeLog:
> 
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..bb414d8a4428 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -142,7 +142,7 @@
>[(set (match_operand:AM 0 "register_operand" "=r")
>  (neg:AM (match_operand:AM 1 "register_operand" " 0")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])

I think you will need to update the constraint for the second
operand as well; it could be any register, or a 32-bit immediate.

>  
>  ;;; Multiplication


Re: [PATCH v2] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
>From 9db2044c1d20bd9f05acf3c910ad0ffc9d5fda8f Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Fri, 21 Jul 2023 17:40:07 +0100
Subject: [PATCH v2] bpf: fixed template for neg (added second operand)

gcc/ChangeLog:

	* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..2ba862f3935a 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r")
+(neg:AM (match_operand:AM 1 "register_operand" " r")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



[PATCH] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..bb414d8a4428 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -142,7 +142,7 @@
   [(set (match_operand:AM 0 "register_operand" "=r")
 (neg:AM (match_operand:AM 1 "register_operand" " 0")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Alexander Monakov


On Fri, 21 Jul 2023, Xi Ruoyao wrote:

> > See also PR 99903 for an earlier known issue which appears due to x87
> > excess precision and so tweaking -ffp-contract wouldn't help:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903
> 
> Does it affect AArch64 too?

Well, not literally (AArch64 doesn't have excess precision), but absence
of intermediate rounding in FMA is similar to excess precision.

I'm saying it's the same issue manifesting via different pathways on x86
and aarch64. Sorry if I misunderstood your question.

Alexander


Fix gcc.dg/tree-ssa/copy-headers-9.c and gcc.dg/tree-ssa/dce-1.c failures

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes template in the two testcases so it matches the output
correctly.  I did not re-test after last changes in the previous patch,
sorry for that.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-headers-9.c: Fix template for 
tree-ssa-loop-ch.cc changes.
* gcc.dg/tree-ssa/dce-1.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
index 7cc162ca94d..b49d1fc9576 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
@@ -13,8 +13,7 @@ void test (int m, int n)
}
while (i<10);
 }
-/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } */
-/* { dg-final { scan-tree-dump-times "May duplicate bb" 1 "ch2" } } */
-/* { dg-final { scan-tree-dump-times "Duplicating additional BB to obtain 
do-while loop" 1 "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 2 "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win. it has zero" 
1 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will duplicate bb" 2 "ch2" } } */
 /* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
index 91c3bcd6c1c..3ebfa988503 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
@@ -13,6 +13,6 @@ int foo (int b, int j)
 }
 /* Check that empty loop is eliminated in this case.  We should no longer have
the exit condition after the loop.  */
-/* { dg-final { scan-tree-dump-not "999)" "cddce1"} } */
-/* { dg-final { scan-tree-dump-not "1000)" "cddce1"} } */
+/* { dg-final { scan-tree-dump-not "999\\)" "cddce1"} } */
+/* { dg-final { scan-tree-dump-not "1000\\)" "cddce1"} } */
 


Implement flat loop profile detection

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds maybe_flat_loop_profile which can be used in loop profile udpate
to detect situation where the profile may be unrealistically flat and should
not be dwonscalled after vectorizing, unrolling and other transforms that
assume that loop has high iteration count even if the CFG profile says
otherwise.

Profile is flat if it was statically detected and at that time we had
no idea about actual number of iterations or we artificially capped them.
So the function considers flat all profiles that have guessed or lower
reliability in their count and there is no nb_iteration_bounds/estimate
which would prove that the profile iteration count is high enough.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfgloop.h (maybe_flat_loop_profile): Declare
* cfgloopanal.cc (maybe_flat_loop_profile): New function.
* tree-cfg.cc (print_loop_info): Print info about flat profiles.

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 269694c7962..22293e1c237 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -407,6 +407,7 @@ gcov_type expected_loop_iterations_unbounded (const class 
loop *,
 extern bool expected_loop_iterations_by_profile (const class loop *loop,
 sreal *ret,
 bool *reliable = NULL);
+extern bool maybe_flat_loop_profile (const class loop *);
 extern unsigned expected_loop_iterations (class loop *);
 extern rtx doloop_condition_get (rtx_insn *);
 
diff --git a/gcc/cfgloopanal.cc b/gcc/cfgloopanal.cc
index c86a537f024..d8923b27e5d 100644
--- a/gcc/cfgloopanal.cc
+++ b/gcc/cfgloopanal.cc
@@ -303,6 +303,67 @@ expected_loop_iterations_by_profile (const class loop 
*loop, sreal *ret,
   return true;
 }
 
+/* Return true if loop CFG profile may be unrealistically flat.
+   This is a common case, since average loops iterate only about 5 times.
+   In the case we do not have profile feedback or do not know real number of
+   iterations during profile estimation, we are likely going to predict it with
+   similar low iteration count.  For static loop profiles we also artificially
+   cap profile of loops with known large iteration count so they do not appear
+   significantly more hot than other loops with unknown iteration counts.
+
+   For loop optimization heuristics we ignore CFG profile and instead
+   use get_estimated_loop_iterations API which returns estimate
+   only when it is realistic.  For unknown counts some optimizations,
+   like vectorizer or unroller make guess that iteration count will
+   be large.  In this case we need to avoid scaling down the profile
+   after the loop transform.  */
+
+bool
+maybe_flat_loop_profile (const class loop *loop)
+{
+  bool reliable;
+  sreal ret;
+
+  if (!expected_loop_iterations_by_profile (loop, , ))
+return true;
+
+  /* Reliable CFG estimates ought never be flat.  Sanity check with
+ nb_iterations_estimate.  If those differ, it is a but in profile
+ updating code  */
+  if (reliable)
+{
+  int64_t intret = ret.to_nearest_int ();
+  if (loop->any_estimate
+ && (wi::ltu_p (intret * 2, loop->nb_iterations_estimate)
+ || wi::gtu_p (intret, loop->nb_iterations_estimate * 2)))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+   "Loop %i has inconsistent iterations estimates: "
+   "reliable CFG based iteration estimate is %f "
+   "while nb_iterations_estimate is %i\n",
+   loop->num,
+   ret.to_double (),
+   (int)loop->nb_iterations_estimate.to_shwi ());
+ return true;
+   }
+  return false;
+}
+
+  /* Allow some margin of error and see if we are close to known bounds.
+ sreal (9,-3) is 9/8  */
+  int64_t intret = (ret * sreal (9, -3)).to_nearest_int ();
+  if (loop->any_upper_bound && wi::geu_p (intret, 
loop->nb_iterations_upper_bound))
+return false;
+  if (loop->any_likely_upper_bound
+  && wi::geu_p (intret, loop->nb_iterations_likely_upper_bound))
+return false;
+  if (loop->any_estimate
+  && wi::geu_p (intret, loop->nb_iterations_estimate))
+return false;
+  return true;
+}
+
 /* Returns expected number of iterations of LOOP, according to
measured or guessed profile.
 
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index a6c97a04662..c65af8cc800 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -8523,8 +8523,11 @@ print_loop_info (FILE *file, const class loop *loop, 
const char *prefix)
   bool reliable;
   sreal iterations;
   if (loop->num && expected_loop_iterations_by_profile (loop, , 
))
-fprintf (file, "\n%siterations by profile: %f %s", prefix,
-iterations.to_double (), reliable ? "(reliable)" : "(unreliable)");
+{
+  fprintf (file, "\n%siterations by profile: %f (%s%s)", prefix,
+  iterations.to_double (), reliable ? "reliable" : 

[WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-21 Thread Benjamin Priour via Gcc-patches
Hi,

Upon David's request I've joined the in progress patch to the below email.
I hope it makes more sense now.

Best,
Benjamin.

-- Forwarded message -
From: Benjamin Priour 
Date: Tue, Jul 18, 2023 at 3:30 PM
Subject: [RFC] analyzer: Add optional trim of the analyzer diagnostics
going too deep [PR110543]
To: , David Malcolm 


Hi,

I'd like to request comments on a patch I am writing for PR110543.
The goal of this patch is to reduce the noise of the analyzer emitted
diagnostics when dealing with
system headers, or simply diagnostic paths that are too long. The new
option only affects the display
of the diagnostics, but doesn't hinder the actual analysis.

I've defaulted the new option to "system", thus preventing the diagnostic
paths from showing system headers.
"never" corresponds to the pre-patch behavior, whereas you can also specify
an unsigned value 
that prevents paths to go deeper than  frames.

fanalyzer-trim-diagnostics=
> Common Joined RejectNegative ToLower Var(flag_analyzer_trim_diagnostics)
> Init("system")
> -fanalyzer-trim-diagnostics=[never|system|] Trim diagnostics
> path that are too long before emission.
>

Does it sounds reasonable and user-friendly ?

Regstrapping was a success against trunk, although one of the newly added
test case fails for c++14.
Note that the test case below was done with "never", thus behaves exactly
as the pre-patch analyzer
on x86_64-linux-gnu.

/* { dg-additional-options "-fdiagnostics-plain-output
> -fdiagnostics-path-format=inline-events -fanalyzer-trim-diagnostics=never"
> } */
> /* { dg-skip-if "" { c++98_only }  } */
>
> #include 
> struct A {int x; int y;};
>
> int main () {
>   std::shared_ptr a;
>   a->x = 4; /* { dg-line deref_a } */
>   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a } */
>
>   return 0;
> }
>
> /* { dg-begin-multiline-output "" }
>   'int main()': events 1-2
> |
> |
> +--> 'std::__shared_ptr_access<_Tp, _Lp, , 
> >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
>  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> false]': events 3-4
>|
>|
>+--> 'std::__shared_ptr_access<_Tp, _Lp, ,
>  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> ,  >::_M_get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> false; bool  = false]': events 5-6
>   |
>   |
>   +--> 'std::__shared_ptr<_Tp, _Lp>::element_type*
> std::__shared_ptr<_Tp, _Lp>::get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]': events 7-8
>  |
>  |
>   <--+
>   |
> 'std::__shared_ptr_access<_Tp, _Lp, ,
>  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> ,  >::_M_get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> false; bool  = false]': event 9
>   |
>   |
><--+
>|
>  'std::__shared_ptr_access<_Tp, _Lp, , 
> >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
>  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> false]': event 10
>|
>|
> <--+
> |
>   'int main()': events 11-12
> |
> |
>{ dg-end-multiline-output "" } */
>


The first events "'int main()': events 1-2" vary in c++14 (get events 1-3).

>
> // c++14 with fully detailed output
>   ‘int main()’: events 1-3
> |
> |8 | int main () {
> |  | ^~~~
> |  | |
> |  | (1) entry to ‘main’
> |9 |   std::shared_ptr a;
> |  |  ~
> |  |  |
> |  |  (2)
> ‘a.std::shared_ptr::.std::__shared_ptr __gnu_cxx::_S_atomic>::_M_ptr’ is NULL
> |   10 |   a->x = 4; /* { dg-line deref_a } */
> |  |~~
> |  ||
> |  |(3) calling ‘std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->’ from ‘main’
>

whereas c++17 and posterior give

> // c++17 with fully detailed output
>
// ./xg++ -fanalyzer
>  ../../gcc/gcc/testsuite/g++.dg/analyzer/fanalyzer-trim-diagnostics-never.C
>  -B. -shared-libgcc -fanalyzer-trim-diagnostics=never -std=c++17
>
  ‘int main()’: events 1-2
> |
> |8 | int main () {
> |  | ^~~~
> |  | |
> |  | (1) entry to ‘main’
> |9 |   std::shared_ptr a;
> |   10 |   a->x = 4; /* { dg-line deref_a } */
> |  |~~
> |  ||
> |  |(2) calling ‘std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->’ from ‘main’
>

Is there a way to make dg-multiline-output check for a regex ? Or would
checking the multiline-output only for c++17 and c++20 be acceptable ?
This 

Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches

On 7/21/23 14:45, Xi Ruoyao wrote:

On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:

My understanding is that this is not a hardware bug and that it's
specified that rounding does not happen on the multiply "sub-part" in
`FNMSUB`, but rounding happens on the `FMUL` that generates some input
to it.


AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.



Thanks -- I'll look into whether `-ffp-contract=on` works.



It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?



Sure -- The outline is that `timer::validate_phases` sees the sum of 
sub-part timers as greater than the timer for the "overall" time 
(outside of a tolerance of 1.01).  It then complains and hits 
`gcc_unreachable()`.


While I found it difficult to get enough information out of the test 
that is run in the testsuite, I found that if passing an invalid 
argument to `cc1plus` all sub-parts would be zero, and sometimes the 
"total" would be negative.


This was due to the `times` syscall returning the same clock tick for 
start and end of the "total" timer and the difference in rounding 
between FNMSUB and FMUL means that depending on what that clock tick is 
the "elapsed time" can end up calculated as negative.


I didn't proove it 100% but I believe the same fundamental difference 
(but opposite rounding error) could trigger the testsuite failure -- if 
the "end" of one sub-phase timer is greater than the "start" of another 
sub-phase timer then sum of parts could be greater than total.


There is a "tolerance" in this test that I considered increasing, but 
since that would not affect the "invalid arguments" thing (where the 
total is negative and hence the tolerance multiplication of 1.01 
would have to be supplemented by a positive offset) I suggested avoiding 
the inline.


W.r.t. the x86 bug that Alexander Monakov has pointed to, it's a very 
similar thing but in this case the problem is not bit-precision of 
values after the inlining, but rather a difference between fused and not 
fused operations after the inlining.


Agreed that using integral arithmetic is the more robust solution.


[PATCH] testsuite/110763: Ensure zero return from test

2023-07-21 Thread Siddhesh Poyarekar
The test deliberately reads beyond bounds to exersize ubsan and the
return value may be anything, based on previous allocations.  The OFF
test caters for it by ANDing the return with 0, do the same for the DYN
test.

gcc/testsuite/ChangeLog:

PR testsuite/110763
* gcc.dg/ubsan/object-size-dyn.c (dyn): New parameter RET.
(main): Use it.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c 
b/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
index 0159f5b9820..49c3abe2e72 100644
--- a/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
+++ b/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
@@ -5,12 +5,12 @@
 
 int
 __attribute__ ((noinline))
-dyn (int size, int i)
+dyn (int size, int i, int ret)
 {
   __builtin_printf ("dyn\n");
   fflush (stdout);
   int *alloc = __builtin_calloc (size, sizeof (int));
-  int ret = alloc[i];
+  ret = ret & alloc[i];
   __builtin_free (alloc);
   return ret;
 }
@@ -28,7 +28,7 @@ off (int size, int i, int ret)
 int
 main (void)
 {
-  int ret = dyn (2, 2);
+  int ret = dyn (2, 2, 0);
 
   ret |= off (4, 4, 0);
 
-- 
2.41.0



Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 16:58 +0300, Alexander Monakov wrote:
> 
> On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:
> 
> > Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> > are building GCC 14 snapshot).  The default is "fast" (if no -std=
> > option is used), which allows some contractions disallowed by the
> > standard.
> 
> Not fully, see below.
> 
> > But GCC is in C++ and I'm not sure if the C++ standard has the same
> > definition for allowed contractions as C.
> 
> It doesn't, but in GCC we should aim to provide the same semantics in C++
> as in C.
> 
> > > (Or is the severity of lack of support sufficiently different in the two 
> > > cases that this is fine -- i.e. not compile vs may trigger floating 
> > > point rounding inaccuracies?)
> > 
> > It's possible that the test itself is flaky.  Can you provide some
> > detail about how it fails?
> 
> See also PR 99903 for an earlier known issue which appears due to x87
> excess precision and so tweaking -ffp-contract wouldn't help:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Does it affect AArch64 too?

> Now that multiple platforms are hitting this, can we _please_ get rid
> of the questionable attempt to compute time in a floating-point variable
> and just use an uint64_t storing nanoseconds?

To me this is the correct thing to do.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


> Thanks for the suggestions/fixes in changelog.
> Inlined new patch.
>
> Cupertino
>
>>> gcc/ChangeLog:
>>>
>>> * config/bpf/bpf.opt: Added option -masm=.
>>> * config/bpf/bpf-opts.h: Likewize.
>>> * config/bpf/bpf.cc: Changed it to conform with new pseudoc
>>>   dialect support.
>>> * config/bpf/bpf.h: Likewise.
>>> * config/bpf/bpf.md: Added pseudo-c templates.
>>> * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
>>
>> I think the ChangeLog could be made more useful, and the syntax of the
>> last entry is not entirely right.  I suggest something like:
>>
>>  * config/bpf/bpf.opt: Added option -masm=.
>>  * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>>  * config/bpf/bpf.cc (bpf_print_register): New function.
>>  (bpf_print_register): Support pseudo-c syntax for registers.
>>  (bpf_print_operand_address): Likewise.
>>  * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>>  (ASSEMBLER_DIALECT): Define.
>>  * config/bpf/bpf.md: Added pseudo-c templates.
>>  * doc/invoke.texi (-masm=DIALECT): New eBPF option item.
>>
>> Please make sure to run the contrib/gcc-changelog/git_check-commit.py
>> script.
>>
>
> From 6ebe3229a59b32ffb2ed24b3a2cf8c360a807c31 Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Mon, 17 Jul 2023 17:42:42 +0100
> Subject: [PATCH v3] bpf: pseudo-c assembly dialect support
>
> New pseudo-c BPF assembly dialect already supported by clang and widely
> used in the linux kernel.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>   * config/bpf/bpf.cc (bpf_print_register): New function.
>   (bpf_print_register): Support pseudo-c syntax for registers.
>   (bpf_print_operand_address): Likewise.
>   * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>   (ASSEMBLER_DIALECT): Define.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi (-masm=): New eBPF option item.
> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  gcc/doc/invoke.texi   | 21 -
>  6 files changed, 133 insertions(+), 56 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE 

[PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-21 Thread Drew Ross via Gcc-patches
Simplifies (x << c) >> c where x is a signed integral type of
width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
on x86_64 and x86 targets.

PR middle-end/101955

gcc/ChangeLog:

* match.pd (x << c) >> c -> -(x & 1): New simplification.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 10 +
 gcc/testsuite/gcc.dg/pr101955.c | 69 +
 2 files changed, 79 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..820fc890e8e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && (wi::ltu_p (wi::to_wide (@1), element_precision (type
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1
 
+/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
+   into -(X & 1).  */
+(simplify
+ (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
+ (with { tree cst = uniform_integer_cst_p (@1); }
+ (if (ANY_INTEGRAL_TYPE_P (type)
+  && !TYPE_UNSIGNED (type)
+  && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
+  (negate (bit_and (convert @0) { build_one_cst (type); })
+
 /* Optimize x >> x into 0 */
 (simplify
  (rshift @0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..386154911c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int;
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t2 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  int w = 31;
+  int y = x << w;
+  int z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t4 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t5 (long long x)
+{
+  long long y = x << 63;
+  long long z = y >> 63;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  int w = 63;
+  long long y = x << w;
+  long long z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) v4si
+t7 (v4si x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) v4si
+t8 (v4si x)
+{
+  v4si t = {31,31,31,31};
+  return (x << t) >> t;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
+
-- 
2.39.3



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Ben Boeckel via Gcc-patches
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
> On 7/19/23 20:47, Ben Boeckel wrote:
> > But it is inhibiting distributed builds because the distributing tool
> > would need to know:
> > 
> > - what CMIs are actually imported (here, "read the module mapper file"
> >(in CMake's case, this is only the modules that are needed; a single
> >massive mapper file for an entire project would have extra entries) or
> >"act as a proxy for the socket/program specified" for other
> >approaches);
> 
> This information is in the machine (& human) README section of the CMI.

Ok. That leaves it up to distributing build tools to figure out at
least.

> > - read the CMIs as it sends to the remote side to gather any other CMIs
> >that may be needed (recursively);
> > 
> > Contrast this with the MSVC and Clang (17+) mechanism where the command
> > line contains everything that is needed and a single bolus can be sent.
> 
> um, the build system needs to create that command line? Where does the build 
> system get that information?  IIUC it'll need to read some file(s) to do that.

It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.

> > And relocatable is probably fine. How does it interact with reproducible
> > builds? Or are GCC CMIs not really something anyone should consider for
> > installation (even as a "here, maybe this can help consumers"
> > mechanism)?
> 
> Module CMIs should be considered a cacheable artifact.  They are neither 
> object 
> files nor source files.

Sure, cachable sounds fine. What about the installation?

--Ben


Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches

Thanks for the suggestions/fixes in changelog.
Inlined new patch.

Cupertino

>> gcc/ChangeLog:
>>
>>  * config/bpf/bpf.opt: Added option -masm=.
>>  * config/bpf/bpf-opts.h: Likewize.
>>  * config/bpf/bpf.cc: Changed it to conform with new pseudoc
>>dialect support.
>>  * config/bpf/bpf.h: Likewise.
>>  * config/bpf/bpf.md: Added pseudo-c templates.
>>  * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
>
> I think the ChangeLog could be made more useful, and the syntax of the
> last entry is not entirely right.  I suggest something like:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>   * config/bpf/bpf.cc (bpf_print_register): New function.
>   (bpf_print_register): Support pseudo-c syntax for registers.
>   (bpf_print_operand_address): Likewise.
>   * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>   (ASSEMBLER_DIALECT): Define.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi (-masm=DIALECT): New eBPF option item.
>
> Please make sure to run the contrib/gcc-changelog/git_check-commit.py
> script.
>

>From 6ebe3229a59b32ffb2ed24b3a2cf8c360a807c31 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Mon, 17 Jul 2023 17:42:42 +0100
Subject: [PATCH v3] bpf: pseudo-c assembly dialect support

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

	* config/bpf/bpf.opt: Added option -masm=.
	* config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
	* config/bpf/bpf.cc (bpf_print_register): New function.
	(bpf_print_register): Support pseudo-c syntax for registers.
	(bpf_print_operand_address): Likewise.
	* config/bpf/bpf.h (ASM_SPEC): handle -msasm.
	(ASSEMBLER_DIALECT): Define.
	* config/bpf/bpf.md: Added pseudo-c templates.
	* doc/invoke.texi (-masm=): New eBPF option item.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 gcc/doc/invoke.texi   | 21 -
 6 files changed, 133 insertions(+), 56 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "w10");
+	  else
+	fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+	}
+  else
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "r10");
+	  else
+	fprintf (file, "%s", reg_names[REGNO (op)]+1);
+	}
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
 	if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
 	  {
-	fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+	bpf_print_register (file, op0, 0);
+	fprintf (file, "+");
 	output_addr_const (file, op1);
-	fputs ("]", file);
+	fprintf (file, asm_dialect == 

[pushed] Darwin: Handle linker '-demangle' option.

2023-07-21 Thread Iain Sandoe via Gcc-patches
Tested with Darwin linker versions that do/do not support the option
and on x86_64-linux-gnu, pushed to trunk, thanks
Iain

--- 8< ---

Most of the Darwin linkers in use support this option which we will
now pass by default (matching the Xcode clang impl.)>

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config.in: Regenerate.
* config/darwin.h (DARWIN_LD_DEMANGLE): New.
(LINK_COMMAND_SPEC_A): Add demangle handling.
* configure: Regenerate.
* configure.ac: Detect linker support for '-demangle'.
---
 gcc/config.in   |  9 -
 gcc/config/darwin.h |  7 +++
 gcc/configure   | 19 +++
 gcc/configure.ac| 14 ++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/config.in b/gcc/config.in
index 0e62b9fbfc9..5cf51bc1b01 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2178,6 +2178,12 @@
 #endif
 
 
+/* Define to 1 if ld64 supports '-demangle'. */
+#ifndef USED_FOR_TARGET
+#undef LD64_HAS_DEMANGLE
+#endif
+
+
 /* Define to 1 if ld64 supports '-export_dynamic'. */
 #ifndef USED_FOR_TARGET
 #undef LD64_HAS_EXPORT_DYNAMIC
@@ -2239,7 +2245,8 @@
 #endif
 
 
-/* Define to the sub-directory where libtool stores uninstalled libraries. */
+/* Define to the sub-directory in which libtool stores uninstalled libraries.
+   */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 1b538c73593..e0e8672a455 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -270,6 +270,12 @@ extern GTY(()) int darwin_ms_struct;
   "%&6; }
 gcc_cv_ld64_major=`echo "$gcc_cv_ld64_version" | sed -e 's/\..*//'`
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_major" >&5
 $as_echo "$gcc_cv_ld64_major" >&6; }
+if test "$gcc_cv_ld64_major" -ge 97; then
+  gcc_cv_ld64_demangle=1
+fi
 if test "$gcc_cv_ld64_major" -ge 236; then
   gcc_cv_ld64_export_dynamic=1
 fi
@@ -30517,6 +30521,15 @@ $as_echo_n "checking linker version... " >&6; }
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_version" >&5
 $as_echo "$gcc_cv_ld64_version" >&6; }
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for -demangle 
support" >&5
+$as_echo_n "checking linker for -demangle support... " >&6; }
+gcc_cv_ld64_demangle=1
+if $gcc_cv_ld -demangle < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
+  gcc_cv_ld64_demangle=0
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_demangle" >&5
+$as_echo "$gcc_cv_ld64_demangle" >&6; }
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for 
-export_dynamic support" >&5
 $as_echo_n "checking linker for -export_dynamic support... " >&6; }
 gcc_cv_ld64_export_dynamic=1
@@ -30545,6 +30558,12 @@ _ACEOF
   fi
 
 
+cat >>confdefs.h <<_ACEOF
+#define LD64_HAS_DEMANGLE $gcc_cv_ld64_demangle
+_ACEOF
+
+
+
 cat >>confdefs.h <<_ACEOF
 #define LD64_HAS_EXPORT_DYNAMIC $gcc_cv_ld64_export_dynamic
 _ACEOF
diff --git a/gcc/configure.ac b/gcc/configure.ac
index e91073ba831..46e58a27661 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6211,6 +6211,7 @@ if test x"$ld64_flag" = x"yes"; then
   # Set defaults for possibly untestable items.
   gcc_cv_ld64_export_dynamic=0
   gcc_cv_ld64_platform_version=0
+  gcc_cv_ld64_demangle=0
 
   if test "$build" = "$host"; then
 darwin_try_test=1
@@ -6232,6 +6233,9 @@ if test x"$ld64_flag" = x"yes"; then
 AC_MSG_CHECKING(ld64 specified version)
 gcc_cv_ld64_major=`echo "$gcc_cv_ld64_version" | sed -e 's/\..*//'`
 AC_MSG_RESULT($gcc_cv_ld64_major)
+if test "$gcc_cv_ld64_major" -ge 97; then
+  gcc_cv_ld64_demangle=1
+fi
 if test "$gcc_cv_ld64_major" -ge 236; then
   gcc_cv_ld64_export_dynamic=1
 fi
@@ -6246,6 +6250,13 @@ if test x"$ld64_flag" = x"yes"; then
 fi
 AC_MSG_RESULT($gcc_cv_ld64_version)
 
+AC_MSG_CHECKING(linker for -demangle support)
+gcc_cv_ld64_demangle=1
+if $gcc_cv_ld -demangle < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
+  gcc_cv_ld64_demangle=0
+fi
+AC_MSG_RESULT($gcc_cv_ld64_demangle)
+
 AC_MSG_CHECKING(linker for -export_dynamic support)
 gcc_cv_ld64_export_dynamic=1
 if $gcc_cv_ld -export_dynamic < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
@@ -6266,6 +6277,9 @@ if test x"$ld64_flag" = x"yes"; then
   [Define to ld64 version.])
   fi
 
+  AC_DEFINE_UNQUOTED(LD64_HAS_DEMANGLE, $gcc_cv_ld64_demangle,
+  [Define to 1 if ld64 supports '-demangle'.])
+
   AC_DEFINE_UNQUOTED(LD64_HAS_EXPORT_DYNAMIC, $gcc_cv_ld64_export_dynamic,
   [Define to 1 if ld64 supports '-export_dynamic'.])
 
-- 
2.39.2 (Apple Git-143)



Fix sreal::to_int and implement sreal::to_nearest_int

2023-07-21 Thread Jan Hubicka via Gcc-patches
Fix sreal::to_int and implement sreal::to_nearest_int

while exploring new loop estimate dumps, I noticed that loop iterating 1.8
times by profile is etimated as iterating once instead of 2 by nb_estimate.
While nb_estimate should really be a sreal and I will convert it incrementally,
I found problem is in previous patch doing:

+ *nit = (snit + 0.5).to_int ();

this does not work for sreal because it has only constructor from integer, so
first 0.5 is rounded to 0 and then added to snit.

Some code uses sreal(1, -1) which produces 0.5, but it reuqires unnecessary
addition, so I decided to add to_nearest_int.  Testing it I noticed that to_int
is buggy:
  (sreal(3)/2).to_int () == 1
while
  (sreal(-3)/2).to_int () == -2
Probably not big deal in practice as we do not do conversions on
negative values.

Fix is easy, we need to correctly shift in positive values.  This patch fixes
it and adds the to_nearest_int alternative.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* sreal.cc (sreal::to_nearest_int): New.
(sreal_verify_basics): Verify also to_nearest_int.
(verify_aritmetics): Likewise.
(sreal_verify_conversions): New.
(sreal_cc_tests): Call sreal_verify_conversions.
* sreal.h: (sreal::to_nearest_int): Declare

diff --git a/gcc/sreal.cc b/gcc/sreal.cc
index 8e99d871420..606a571e339 100644
--- a/gcc/sreal.cc
+++ b/gcc/sreal.cc
@@ -116,7 +116,26 @@ sreal::to_int () const
   if (m_exp > 0)
 return sign * (SREAL_ABS ((int64_t)m_sig) << m_exp);
   if (m_exp < 0)
-return m_sig >> -m_exp;
+return sign * (SREAL_ABS ((int64_t)m_sig) >> -m_exp);
+  return m_sig;
+}
+
+/* Return nearest integer value of *this.  */
+
+int64_t
+sreal::to_nearest_int () const
+{
+  int64_t sign = SREAL_SIGN (m_sig);
+
+  if (m_exp <= -SREAL_BITS)
+return 0;
+  if (m_exp >= SREAL_PART_BITS)
+return sign * INTTYPE_MAXIMUM (int64_t);
+  if (m_exp > 0)
+return sign * (SREAL_ABS ((int64_t)m_sig) << m_exp);
+  if (m_exp < 0)
+return sign * ((SREAL_ABS ((int64_t)m_sig) >> -m_exp)
+  + ((SREAL_ABS (m_sig) >> (-m_exp - 1)) & 1));
   return m_sig;
 }
 
@@ -286,6 +305,8 @@ sreal_verify_basics (void)
 
   ASSERT_EQ (INT_MIN/2, minimum.to_int ());
   ASSERT_EQ (INT_MAX/2, maximum.to_int ());
+  ASSERT_EQ (INT_MIN/2, minimum.to_nearest_int ());
+  ASSERT_EQ (INT_MAX/2, maximum.to_nearest_int ());
 
   ASSERT_FALSE (minus_two < minus_two);
   ASSERT_FALSE (seven < seven);
@@ -315,6 +336,10 @@ verify_aritmetics (int64_t a, int64_t b)
   ASSERT_EQ (a - b, (sreal (a) - sreal (b)).to_int ());
   ASSERT_EQ (b + a, (sreal (b) + sreal (a)).to_int ());
   ASSERT_EQ (b - a, (sreal (b) - sreal (a)).to_int ());
+  ASSERT_EQ (a + b, (sreal (a) + sreal (b)).to_nearest_int ());
+  ASSERT_EQ (a - b, (sreal (a) - sreal (b)).to_nearest_int ());
+  ASSERT_EQ (b + a, (sreal (b) + sreal (a)).to_nearest_int ());
+  ASSERT_EQ (b - a, (sreal (b) - sreal (a)).to_nearest_int ());
 }
 
 /* Verify arithmetics for interesting numbers.  */
@@ -377,6 +402,33 @@ sreal_verify_negative_division (void)
   ASSERT_EQ (sreal (1234567) / sreal (-1234567), sreal (-1));
 }
 
+static void
+sreal_verify_conversions (void)
+{
+  ASSERT_EQ ((sreal (11) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (11) / sreal (3)).to_nearest_int (), 4);
+  ASSERT_EQ ((sreal (10) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (10) / sreal (3)).to_nearest_int (), 3);
+  ASSERT_EQ ((sreal (9) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (9) / sreal (3)).to_nearest_int (), 3);
+  ASSERT_EQ ((sreal (-11) / sreal (3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-11) / sreal (3)).to_nearest_int (), -4);
+  ASSERT_EQ ((sreal (-10) / sreal (3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-10) / sreal (3)).to_nearest_int (), -3);
+  ASSERT_EQ ((sreal (-3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-3)).to_nearest_int (), -3);
+  for (int i = -10 ; i < 10; i += 123)
+for (int j = -1 ; j < 10; j += 71)
+  if (j != 0)
+   {
+ sreal sval = ((sreal)i) / (sreal)j;
+ double val = (double)i / (double)j;
+ ASSERT_EQ ((fabs (sval.to_double () - val) < 0.1), true);
+ ASSERT_EQ (sval.to_int (), (int)val);
+ ASSERT_EQ (sval.to_nearest_int (), lround (val));
+   }
+}
+
 /* Run all of the selftests within this file.  */
 
 void sreal_cc_tests ()
@@ -385,6 +437,7 @@ void sreal_cc_tests ()
   sreal_verify_arithmetics ();
   sreal_verify_shifting ();
   sreal_verify_negative_division ();
+  sreal_verify_conversions ();
 }
 
 } // namespace selftest
diff --git a/gcc/sreal.h b/gcc/sreal.h
index 8700807a131..4dbb83c3005 100644
--- a/gcc/sreal.h
+++ b/gcc/sreal.h
@@ -51,6 +51,7 @@ public:
 
   void dump (FILE *) const;
   int64_t to_int () const;
+  int64_t to_nearest_int () const;
   double to_double () const;
   void stream_out (struct output_block *);
   static sreal stream_in (class lto_input_block *);


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
Hi Martin,


Thank you for telling me about the Python code format specification.
I'm no idea how to add checks for pushed commits.
Anyway, first make sure I don't introduce new format errors myself.


Best,
Lehua

Re: [PATCH v2] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h: Likewize.
>   * config/bpf/bpf.cc: Changed it to conform with new pseudoc
> dialect support.
>   * config/bpf/bpf.h: Likewise.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.

I think the ChangeLog could be made more useful, and the syntax of the
last entry is not entirely right.  I suggest something like:

* config/bpf/bpf.opt: Added option -masm=.
* config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
* config/bpf/bpf.cc (bpf_print_register): New function.
(bpf_print_register): Support pseudo-c syntax for registers.
(bpf_print_operand_address): Likewise.
* config/bpf/bpf.h (ASM_SPEC): handle -msasm.
(ASSEMBLER_DIALECT): Define.
* config/bpf/bpf.md: Added pseudo-c templates.
* doc/invoke.texi (-masm=DIALECT): New eBPF option item.

Please make sure to run the contrib/gcc-changelog/git_check-commit.py
script.

> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  gcc/doc/invoke.texi   | 21 -
>  6 files changed, 133 insertions(+), 56 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
> {
> - fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> + bpf_print_register (file, op0, 0);
> + fprintf (file, "+");
>   output_addr_const (file, op1);
> - fputs ("]", file);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
> }
>   else
> fatal_insn ("invalid address in operand", addr);
> @@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
>  }
>  }
>  
> -
>  /* This pass finds accesses to structures marked with the BPF target 
> attribute
> __attribute__((preserve_access_index)). For every such access, a CO-RE
> relocation record is generated, to be output in the .BTF.ext section.  */
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index 344aca02d1bb..9561bf59b800 100644
> --- a/gcc/config/bpf/bpf.h

Re: [PATCH v2] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches

Hi Jose,

Thanks for the review.
New patch is inline attached.

Regards,
Cupertino

Jose E. Marchesi writes:

> Hello Cuper.
>
> Thanks for the patch.
>
> We will need an update for the "eBPF Options" section in the GCC manual,
> documenting -masm=@var{dialect} and the supported values.  Can you
> please add it and re-submit?
>
>
>> Hi everyone,
>>
>> Looking forward to all your reviews.
>>
>> Best regards,
>> Cupertino


>From fa227fefd84e6eaaf8edafed698e9960d7b115e6 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Mon, 17 Jul 2023 17:42:42 +0100
Subject: [PATCH v2] bpf: pseudo-c assembly dialect support

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

	* config/bpf/bpf.opt: Added option -masm=.
	* config/bpf/bpf-opts.h: Likewize.
	* config/bpf/bpf.cc: Changed it to conform with new pseudoc
	  dialect support.
	* config/bpf/bpf.h: Likewise.
	* config/bpf/bpf.md: Added pseudo-c templates.
	* doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 gcc/doc/invoke.texi   | 21 -
 6 files changed, 133 insertions(+), 56 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "w10");
+	  else
+	fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+	}
+  else
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "r10");
+	  else
+	fprintf (file, "%s", reg_names[REGNO (op)]+1);
+	}
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
 	if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
 	  {
-	fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+	bpf_print_register (file, op0, 0);
+	fprintf (file, "+");
 	output_addr_const (file, op1);
-	fputs ("]", file);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
 	  }
 	else
 	  fatal_insn ("invalid address in operand", addr);
@@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
 }
 }
 
-
 /* This pass finds accesses to structures marked with the BPF target attribute
__attribute__((preserve_access_index)). For every such access, a CO-RE
relocation record is generated, to be output in the .BTF.ext section.  */
diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 344aca02d1bb..9561bf59b800 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -22,7 +22,8 @@
 
 / Controlling the Compilation Driver.  */
 
-#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
+#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
+  "%{masm=pseudoc:-mdialect=pseudoc}"
 #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
 #define LIB_SPEC ""
 #define 

Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Alexander Monakov


On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:

> Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> are building GCC 14 snapshot).  The default is "fast" (if no -std=
> option is used), which allows some contractions disallowed by the
> standard.

Not fully, see below.

> But GCC is in C++ and I'm not sure if the C++ standard has the same
> definition for allowed contractions as C.

It doesn't, but in GCC we should aim to provide the same semantics in C++
as in C.

> > (Or is the severity of lack of support sufficiently different in the two 
> > cases that this is fine -- i.e. not compile vs may trigger floating 
> > point rounding inaccuracies?)
> 
> It's possible that the test itself is flaky.  Can you provide some
> detail about how it fails?

See also PR 99903 for an earlier known issue which appears due to x87
excess precision and so tweaking -ffp-contract wouldn't help:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Now that multiple platforms are hitting this, can we _please_ get rid
of the questionable attempt to compute time in a floating-point variable
and just use an uint64_t storing nanoseconds?

Alexander


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
> My understanding is that this is not a hardware bug and that it's 
> specified that rounding does not happen on the multiply "sub-part" in 
> `FNMSUB`, but rounding happens on the `FMUL` that generates some input
> to it.

AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.

> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be 
> able to handle `attribute((noinline))` would we also be concerned that
> this flag may not be supported?

Only use it in BOOT_CFLAGS, i. e. 'make BOOT_CFLAGS="-O2 -g -ffp-
contract=on"' (or "off" instead of "on").  In 3-stage bootstrapping it's
only applied in stage 2 and 3, during which GCC is compiled by itself.

> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating 
> point rounding inaccuracies?)

It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Richard Biener via Gcc-patches



> Am 21.07.2023 um 15:12 schrieb Matthew Malcomson :
> 
> Responding to two emails at the same time ;-)
> 
>> On 7/21/23 13:47, Richard Biener wrote:
>>> On Fri, 21 Jul 2023, Matthew Malcomson wrote:
>>> On some AArch64 bootstrapped builds, we were getting a flaky test
>>> because the floating point operations in `get_time` were being fused
>>> with the floating point operations in `timevar_accumulate`.
>>> 
>>> This meant that the rounding behaviour of our multiplication with
>>> `ticks_to_msec` was different when used in `timer::start` and when
>>> performed in `timer::stop`.  These extra inaccuracies led to the
>>> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>>> 
>>> This change ensures those operations are not fused and hence stops the test
>>> being flaky on that particular machine.  There is no expected change in the
>>> generated code.
>>> Bootstrap & regtest on AArch64 passes with no regressions.
>> I think this is undesriable.  With fused you mean we use FMA?
>> I think you could use -ffp-contract=off for the TU instead.
> 
> Yeah -- we used fused multiply subtract because we combined the multiply in 
> `get_time` with the subtract in `timevar_accumulate`.
> 
>> Note you can't use __attribute__((noinline)) literally since the
>> host compiler might not support this.
>> Richard.
> 
> On 7/21/23 13:49, Xi Ruoyao wrote:
> ...
>> I don't think it's correct.  It will break bootstrapping GCC from other
>> ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
>> And IMO it's just hiding the real problem.
>> We need more info of the "particular machine".  Is this a hardware bug
>> (i.e. the machine violates the AArch64 spec) or a GCC code generation
>> issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?
> 
> My understanding is that this is not a hardware bug and that it's specified 
> that rounding does not happen on the multiply "sub-part" in `FNMSUB`, but 
> rounding happens on the `FMUL` that generates some input to it.
> 
> I was given to understand from discussions with others that this codegen is 
> allowed -- though I honestly didn't confirm the line of reasoning through all 
> the relevant standards.
> 
> 
> 
> W.r.t. both:
> Thanks for pointing out bootstrapping from other ISO C++ compilers -- (didn't 
> realise that was a concern).
> 
> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be able to 
> handle `attribute((noinline))` would we also be concerned that this flag may 
> not be supported?
> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating point 
> rounding inaccuracies?)

I’d only use it in stage2+ flags where we know we’re dealing with GCC 

Richard 

> 
> 


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches

Responding to two emails at the same time ;-)

On 7/21/23 13:47, Richard Biener wrote:

On Fri, 21 Jul 2023, Matthew Malcomson wrote:


On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

This change ensures those operations are not fused and hence stops the test
being flaky on that particular machine.  There is no expected change in the
generated code.
Bootstrap & regtest on AArch64 passes with no regressions.


I think this is undesriable.  With fused you mean we use FMA?
I think you could use -ffp-contract=off for the TU instead.


Yeah -- we used fused multiply subtract because we combined the multiply 
in `get_time` with the subtract in `timevar_accumulate`.




Note you can't use __attribute__((noinline)) literally since the
host compiler might not support this.

Richard.



On 7/21/23 13:49, Xi Ruoyao wrote:
...

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?



My understanding is that this is not a hardware bug and that it's 
specified that rounding does not happen on the multiply "sub-part" in 
`FNMSUB`, but rounding happens on the `FMUL` that generates some input 
to it.


I was given to understand from discussions with others that this codegen 
is allowed -- though I honestly didn't confirm the line of reasoning 
through all the relevant standards.




W.r.t. both:
Thanks for pointing out bootstrapping from other ISO C++ compilers -- 
(didn't realise that was a concern).


I can look into `-ffp-contract=off` as you both have recommended.
One question -- if we have concerns that the host compiler may not be 
able to handle `attribute((noinline))` would we also be concerned that 
this flag may not be supported?
(Or is the severity of lack of support sufficiently different in the two 
cases that this is fine -- i.e. not compile vs may trigger floating 
point rounding inaccuracies?)





Re: [PATCH] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Hello Cuper.

Thanks for the patch.

We will need an update for the "eBPF Options" section in the GCC manual,
documenting -masm=@var{dialect} and the supported values.  Can you
please add it and re-submit?


> Hi everyone,
>
> Looking forward to all your reviews.
>
> Best regards,
> Cupertino
>
> New pseudo-c BPF assembly dialect already supported by clang and widely
> used in the linux kernel.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h: Likewize.
>   * config/bpf/bpf.cc: Changed it to conform with new pseudoc
> dialect support.
>   * config/bpf/bpf.h: Likewise.
>   * config/bpf/bpf.md: Added pseudo-c templates.
> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  5 files changed, 114 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
> {
> - fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> + bpf_print_register (file, op0, 0);
> + fprintf (file, "+");
>   output_addr_const (file, op1);
> - fputs ("]", file);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
> }
>   else
> fatal_insn ("invalid address in operand", addr);
> @@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
>  }
>  }
>  
> -
>  /* This pass finds accesses to structures marked with the BPF target 
> attribute
> __attribute__((preserve_access_index)). For every such access, a CO-RE
> relocation record is generated, to be output in the .BTF.ext section.  */
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index 344aca02d1bb..9561bf59b800 100644
> --- a/gcc/config/bpf/bpf.h
> +++ b/gcc/config/bpf/bpf.h
> @@ -22,7 +22,8 @@
>  
>  / Controlling the Compilation Driver.  */
>  
> -#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
> +#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
> +  "%{masm=pseudoc:-mdialect=pseudoc}"
>  #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
>  #define LIB_SPEC ""
>  #define STARTFILE_SPEC ""
> @@ -503,4 

Re: loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
> > The patch requires bit of testsuite changes
> >  - I disabled ch in loop-unswitch-17.c since it tests unswitching of
> >loop invariant conditional.
> >  - pr103079.c needs ch disabled to trigger vrp situation it tests for
> >(otherwise we optimize stuff earlier and better)
> >  - copy-headers-7.c now gets only 2 basic blocks duplicated since
> >last conditional does not seem to benefit from duplicating,
> >so I reordered them.
> > copy-headers-9 tests the new logic.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> OK.  In case the size heuristics are a bit too optimistic we could avoid the
Thanks!
> peeling in the -Os case?  Did you do any stats on TUs to see whether code
> actually increases in the end?

I did only stats on tramp3d and some GCC source files with -O2 where the
new heuristics actually tends to duplicate fewer BBs overall because of
the logic stopping the duplication chain after last winning header while
the prevoious implementation keeps rolling loop more.  Difference is
small (sub 1%) since most loops are very simple and have only one header
BB to duplicate.  We however handle more loops overall and produce more
do-whiles.

I think there is some potential in getting heuristics more speculative
now and allowing more partial peeling, but the code right now is still
on safe side.

For -Os we set code growth limit to 0 so we only duplicate if we know
that one of the two copies will be optimized out.  This is more strict
than we did previously and I need to get more stats on this - we may
want to bump up the limit or at least increase it to account the extra
jump saved with while -> do-while conversion.

Honza


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 13:11 +0100, Matthew Malcomson via Gcc-patches
wrote:
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.
> 
> gcc/ChangeLog:
> 
>   * timevar.cc (get_time): Make this noinline to avoid fusing
>   behaviour and associated test flakyness.

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, 21 Jul 2023, Matthew Malcomson wrote:

> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

I think this is undesriable.  With fused you mean we use FMA?
I think you could use -ffp-contract=off for the TU instead.

Note you can't use __attribute__((noinline)) literally since the
host compiler might not support this.

Richard.

> gcc/ChangeLog:
> 
>   * timevar.cc (get_time): Make this noinline to avoid fusing
>   behaviour and associated test flakyness.
> 
> 
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 
> d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
>  100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const 
> timevar_time_def *total)
> HAVE_WALL_TIME macros.  */
>  
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>now->user = 0;
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: loop-ch improvements, part 5

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 1:53 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> currently loop-ch skips all do-while loops.  But when loop is not do-while
> in addition to original goal of turining it to do-while it can do additional
> things:
>  1) move out loop invariant computations
>  2) duplicate loop invariant conditionals and eliminate them in loop body.
>  3) prove that some exits are always true in first iteration
> and can be skipped
>
> Most of time 1 can be done by lim (exception is when the invariant computation
> is conditional). For 2 we however don't really have other place doing it 
> except
> for loop unswitching that is more expensive (it will duplicate the loop and
> then optimize out one path to non-loop).
> 3 can be done by loop peeling but it is also more expensive by duplicating 
> full
> loop body.
>
> This patch improves heuristics by not giving up on do-while loops and trying
> to find sequence of BBs to duplicate to obtain one of goals:
>  - turn loop to do-while
>  - eliminate invariant conditional in loop body
>  - do partial "peeling" as long as code optimizes enough so this does not
>increase code size.
> This can be improved upon, but I think this patch should finally get
> heuristics into shape that it does not do weird things.
>
> The patch requires bit of testsuite changes
>  - I disabled ch in loop-unswitch-17.c since it tests unswitching of
>loop invariant conditional.
>  - pr103079.c needs ch disabled to trigger vrp situation it tests for
>(otherwise we optimize stuff earlier and better)
>  - copy-headers-7.c now gets only 2 basic blocks duplicated since
>last conditional does not seem to benefit from duplicating,
>so I reordered them.
> copy-headers-9 tests the new logic.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK.  In case the size heuristics are a bit too optimistic we could avoid the
peeling in the -Os case?  Did you do any stats on TUs to see whether code
actually increases in the end?

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-ch.cc (enum ch_decision): New enum.
> (should_duplicate_loop_header_p): Return info on profitability.
> (do_while_loop_p): Watch for constant conditionals.
> (update_profile_after_ch): Do not sanity check that all
> static exits are taken.
> (ch_base::copy_headers): Run on all loops.
> (pass_ch::process_loop_p): Improve heuristics by handling also
> do_while loop and duplicating shortest sequence containing all
> winning blocks.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/loop-unswitch-17.c: Disable ch.
> * gcc.dg/pr103079.c: Disable ch.
> * gcc.dg/tree-ssa/copy-headers-7.c: Update so ch behaves
> as expected.
> * gcc.dg/tree-ssa/copy-headers.c: Update template.
> * gcc.dg/tree-ssa/copy-headers-9.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-17.c 
> b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> index 8655e09a51c..4b806c475b1 100644
> --- a/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized" } */
> +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized 
> -fno-tree-ch" } */
>
>  int foo (int a)
>  {
> diff --git a/gcc/testsuite/gcc.dg/pr103079.c b/gcc/testsuite/gcc.dg/pr103079.c
> index 7f6632fc669..7b107544725 100644
> --- a/gcc/testsuite/gcc.dg/pr103079.c
> +++ b/gcc/testsuite/gcc.dg/pr103079.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Os -fdump-tree-vrp2" } */
> +/* { dg-options "-Os -fdump-tree-vrp2 -fno-tree-ch" } */
>
>  int a, b = -2;
>  int main() {
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> index e2a6c75f2e9..b3df3b6398e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> @@ -4,7 +4,7 @@
>  int is_sorted(int *a, int n, int m, int k)
>  {
>if (k > 0)
> -for (int i = 0; i < n - 1 && m && k > i; i++)
> +for (int i = 0; k > i && m && i < n - 1 ; i++)
>if (a[i] > a[i + 1])
> return 0;
>return 1;
> @@ -17,5 +17,4 @@ int is_sorted(int *a, int n, int m, int k)
>  /* { dg-final { scan-tree-dump-times "Conditional combines static and 
> invariant" 0 "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will elliminate invariant exit" 1 
> "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will eliminate peeled conditional" 1 
> "ch2" } } */
> -/* { dg-final { scan-tree-dump-times "Not duplicating bb .: condition based 
> on non-IV loop variant." 1 "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will duplicate bb" 3 "ch2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
> new file mode 100644
> index 000..7cc162ca94d
> --- 

[PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches
On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

This change ensures those operations are not fused and hence stops the test
being flaky on that particular machine.  There is no expected change in the
generated code.
Bootstrap & regtest on AArch64 passes with no regressions.

gcc/ChangeLog:

* timevar.cc (get_time): Make this noinline to avoid fusing
behaviour and associated test flakyness.


N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
global reviewer that had the most contributions to this file and Richard
Sandiford since I've asked him for reviews a lot in the past.


### Attachment also inlined for ease of reply###


diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 
d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def 
*total)
HAVE_WALL_TIME macros.  */
 
 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;



diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 
d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def 
*total)
HAVE_WALL_TIME macros.  */
 
 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;





[PATCH] tree-optimization/41320 - remove bogus XFAILed testcase

2023-07-21 Thread Richard Biener via Gcc-patches
gcc.dg/tree-ssa/forwprop-12.c looks for reconstruction of an
ARRAY_REF from pointer arithmetic and dereference.  That's not
safe because ARRAY_REFs carry special semantics we later exploit
during data dependence analysis.

The following removes the testcase, closing the bug as WONTFIX.

Pushed.

PR tree-optimization/41320
* gcc.dg/tree-ssa/forwprop-12.c: Remove.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c | 21 -
 1 file changed, 21 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
deleted file mode 100644
index de16c6848f2..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-forwprop1" } */
-
-struct X { int a[256]; };
-
-int foo(struct X *p, __SIZE_TYPE__ i)
-{
-  int *q = >a[0];
-  int *q2 = (int *)((void *)q + i*4 + 32);
-  return *q2;
-}
-
-int bar(struct X *p, int i)
-{
-  return *((int *)p + i + 8);
-}
-
-/* We should have propagated the base array address through the
-   address arithmetic into the memory access as an array access.  */
-
-/* { dg-final { scan-tree-dump-times "->a\\\[D\\\." 2 "forwprop1" { xfail 
*-*-* } } } */
-- 
2.35.3


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Fri, Jul 21 2023, Lehua Ding wrote:
> Hi Martin,
>
>
> By the way, is there a standard format required for these Python files?

Generally, our Python coding conventions are at
https://gcc.gnu.org/codingconventions.html#python

> I see that other Python files have similar format error when checked
> using flake8.

For historic reasons (i.e. Martin Liška set it up that way), we
currently use flake8 to check python formatting of
contrib/gcc-changelog, contrib/mklog.py and
maintainer-scripts/branch_changer.py and use pytest to check
contrib/gcc-changelog and contrib/test_mklog.py.  That is how I found
out.

I guess many of the files predate the coding conventions and so don't
adhere to them.  Patches to fix them are welcome (I guess) but at least
we should not regress (I guess).

> If so, it feels necessary to configure a git hook on git server to do
> this check.

Performing more thorough checks on pushed commits is a much larger topic
than this thread.  FWIW, I would not oppose to checking python scripts
that are known to be OK.

Martin


loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
currently loop-ch skips all do-while loops.  But when loop is not do-while
in addition to original goal of turining it to do-while it can do additional
things:
 1) move out loop invariant computations
 2) duplicate loop invariant conditionals and eliminate them in loop body.
 3) prove that some exits are always true in first iteration
and can be skipped

Most of time 1 can be done by lim (exception is when the invariant computation
is conditional). For 2 we however don't really have other place doing it except
for loop unswitching that is more expensive (it will duplicate the loop and
then optimize out one path to non-loop).
3 can be done by loop peeling but it is also more expensive by duplicating full
loop body.

This patch improves heuristics by not giving up on do-while loops and trying
to find sequence of BBs to duplicate to obtain one of goals:
 - turn loop to do-while
 - eliminate invariant conditional in loop body
 - do partial "peeling" as long as code optimizes enough so this does not
   increase code size.
This can be improved upon, but I think this patch should finally get
heuristics into shape that it does not do weird things.

The patch requires bit of testsuite changes
 - I disabled ch in loop-unswitch-17.c since it tests unswitching of
   loop invariant conditional.
 - pr103079.c needs ch disabled to trigger vrp situation it tests for
   (otherwise we optimize stuff earlier and better)
 - copy-headers-7.c now gets only 2 basic blocks duplicated since
   last conditional does not seem to benefit from duplicating,
   so I reordered them.
copy-headers-9 tests the new logic.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-ch.cc (enum ch_decision): New enum.
(should_duplicate_loop_header_p): Return info on profitability.
(do_while_loop_p): Watch for constant conditionals.
(update_profile_after_ch): Do not sanity check that all
static exits are taken.
(ch_base::copy_headers): Run on all loops.
(pass_ch::process_loop_p): Improve heuristics by handling also
do_while loop and duplicating shortest sequence containing all
winning blocks.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-17.c: Disable ch.
* gcc.dg/pr103079.c: Disable ch.
* gcc.dg/tree-ssa/copy-headers-7.c: Update so ch behaves
as expected.
* gcc.dg/tree-ssa/copy-headers.c: Update template.
* gcc.dg/tree-ssa/copy-headers-9.c: New test.

diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-17.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
index 8655e09a51c..4b806c475b1 100644
--- a/gcc/testsuite/gcc.dg/loop-unswitch-17.c
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized 
-fno-tree-ch" } */
 
 int foo (int a)
 {
diff --git a/gcc/testsuite/gcc.dg/pr103079.c b/gcc/testsuite/gcc.dg/pr103079.c
index 7f6632fc669..7b107544725 100644
--- a/gcc/testsuite/gcc.dg/pr103079.c
+++ b/gcc/testsuite/gcc.dg/pr103079.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Os -fdump-tree-vrp2" } */
+/* { dg-options "-Os -fdump-tree-vrp2 -fno-tree-ch" } */
 
 int a, b = -2;
 int main() {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
index e2a6c75f2e9..b3df3b6398e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
@@ -4,7 +4,7 @@
 int is_sorted(int *a, int n, int m, int k)
 {
   if (k > 0)
-for (int i = 0; i < n - 1 && m && k > i; i++)
+for (int i = 0; k > i && m && i < n - 1 ; i++)
   if (a[i] > a[i + 1])
return 0;
   return 1;
@@ -17,5 +17,4 @@ int is_sorted(int *a, int n, int m, int k)
 /* { dg-final { scan-tree-dump-times "Conditional combines static and 
invariant" 0 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will elliminate invariant exit" 1 "ch2" 
} } */
 /* { dg-final { scan-tree-dump-times "Will eliminate peeled conditional" 1 
"ch2" } } */
-/* { dg-final { scan-tree-dump-times "Not duplicating bb .: condition based on 
non-IV loop variant." 1 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will duplicate bb" 3 "ch2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
new file mode 100644
index 000..7cc162ca94d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch-details" } */
+int a[100];
+void test (int m, int n)
+{
+   int i = 0;
+   do
+   {
+   if (m)
+   break;
+   i++;
+   a[i]=0;
+   }
+   while (i<10);
+}
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } */
+/* { dg-final { 

Re: [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740]

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 8:08 AM Kewen.Lin  wrote:
>
> Hi,
>
> The function vect_update_epilogue_niters which has been
> removed by r14-2281 has some code taking care of that if
> there is only one scalar iteration left for epilogue then
> we won't try to vectorize it any more.
>
> Although costing should be able to care about it eventually,
> I think we still want this special casing without costing
> enabled, so this patch is to add it back in function
> vect_analyze_loop_costing, and make it more general for
> both main and epilogue loops as Richi suggested, it can fix
> some exposed failures on Power10:
>
>  - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
>  - gcc.dg/vect/slp-perm-{1,5,6,7}.c
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
> PR tree-optimization/110740
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a
> loop with a single scalar iteration.
> ---
>  gcc/tree-vect-loop.cc | 55 ++-
>  1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..92d2abde094 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>   epilogue we can also decide whether the main loop leaves us
>   with enough iterations, prefering a smaller vector epilog then
>   also possibly used for the case we skip the vector loop.  */
> -  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> -  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>  {
>widest_int scalar_niters
> = wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1;
> @@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>% lowest_vf + gap);
> }
> }
> -
> -  /* Check that the loop processes at least one full vector.  */
> -  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  if (known_lt (scalar_niters, vf))
> +  /* Reject vectorizing for a single scalar iteration, even if
> +we could in principle implement that using partial vectors.  */
> +  unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
> +  if (scalar_niters <= peeling_gap + 1)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"loop does not have enough iterations "
> -"to support vectorization.\n");
> +"not vectorized: loop only has a single "
> +"scalar iteration.\n");
>   return 0;
> }
>
> -  /* If we need to peel an extra epilogue iteration to handle data
> -accesses with gaps, check that there are enough scalar iterations
> -available.
> -
> -The check above is redundant with this one when peeling for gaps,
> -but the distinction is useful for diagnostics.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> - && known_le (scalar_niters, vf))
> +  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"loop does not have enough iterations "
> -"to support peeling for gaps.\n");
> - return 0;
> + /* Check that the loop processes at least one full vector.  */
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + if (known_lt (scalar_niters, vf))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"loop does not have enough iterations "
> +"to support vectorization.\n");
> + return 0;
> +   }
> +
> + /* If we need to peel an extra epilogue iteration to handle data
> +accesses with gaps, check that there are enough scalar iterations
> +available.
> +
> +The check above is redundant with this one when peeling for gaps,
> +but the distinction is useful for diagnostics.  */
> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> + && known_le (scalar_niters, vf))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"loop does not have enough iterations "
> +"to support peeling for gaps.\n");
> + return 0;
> +   }
> 

[PATCH] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches
Hi everyone,

Looking forward to all your reviews.

Best regards,
Cupertino

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

* config/bpf/bpf.opt: Added option -masm=.
* config/bpf/bpf-opts.h: Likewize.
* config/bpf/bpf.cc: Changed it to conform with new pseudoc
  dialect support.
* config/bpf/bpf.h: Likewise.
* config/bpf/bpf.md: Added pseudo-c templates.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 5 files changed, 114 insertions(+), 54 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+   {
+ if (REGNO (op) == BPF_FP)
+   fprintf (file, "w10");
+ else
+   fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+   }
+  else
+   {
+ if (REGNO (op) == BPF_FP)
+   fprintf (file, "r10");
+ else
+   fprintf (file, "%s", reg_names[REGNO (op)]+1);
+   }
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
  {
-   fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+   fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+   bpf_print_register (file, op0, 0);
+   fprintf (file, "+");
output_addr_const (file, op1);
-   fputs ("]", file);
+   fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
  }
else
  fatal_insn ("invalid address in operand", addr);
@@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
 }
 }
 
-
 /* This pass finds accesses to structures marked with the BPF target attribute
__attribute__((preserve_access_index)). For every such access, a CO-RE
relocation record is generated, to be output in the .BTF.ext section.  */
diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 344aca02d1bb..9561bf59b800 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -22,7 +22,8 @@
 
 / Controlling the Compilation Driver.  */
 
-#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
+#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
+  "%{masm=pseudoc:-mdialect=pseudoc}"
 #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
 #define LIB_SPEC ""
 #define STARTFILE_SPEC ""
@@ -503,4 +504,6 @@ enum reg_class
 #define DO_GLOBAL_DTORS_BODY   \
   do { } while (0)
 
+#define ASSEMBLER_DIALECT ((int) asm_dialect)
+
 #endif /* ! GCC_BPF_H */
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index f6be0a212345..0b8f409db687 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -77,6 +77,8 @@
 
 (define_mode_attr mop [(QI "b") (HI "h") (SI "w") (DI "dw")

Re: finite_loop_p tweak

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 1:45 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> we have finite_p flag in loop structure.  finite_loop_p already know to
> use it, but we also may set the flag when we prove loop to be finite by
> SCEV analysis to avoid duplicated work.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK

> gcc/ChangeLog:
>
> * tree-ssa-loop-niter.cc (finite_loop_p): Reorder to do cheap
> tests first; update finite_p flag.
>
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 3c4e66291fb..e5985bee235 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -3338,24 +3338,6 @@ finite_loop_p (class loop *loop)
>widest_int nit;
>int flags;
>
> -  flags = flags_from_decl_or_type (current_function_decl);
> -  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "Found loop %i to be finite: it is within pure or 
> const function.\n",
> -loop->num);
> -  return true;
> -}
> -
> -  if (loop->any_upper_bound
> -  || max_loop_iterations (loop, ))
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "Found loop %i to be finite: upper bound 
> found.\n",
> -loop->num);
> -  return true;
> -}
> -
>if (loop->finite_p)
>  {
>unsigned i;
> @@ -3368,11 +3350,36 @@ finite_loop_p (class loop *loop)
>   {
> if (dump_file)
>   fprintf (dump_file, "Assume loop %i to be finite: it has an 
> exit "
> -  "and -ffinite-loops is on.\n", loop->num);
> +  "and -ffinite-loops is on or loop was
> +  " previously finite.\n",
> +  loop->num);
> return true;
>   }
>  }
>
> +  flags = flags_from_decl_or_type (current_function_decl);
> +  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file,
> +"Found loop %i to be finite: it is within "
> +"pure or const function.\n",
> +loop->num);
> +  loop->finite_p = true;
> +  return true;
> +}
> +
> +  if (loop->any_upper_bound
> +  /* Loop with no normal exit will not pass max_loop_iterations.  */
> +  || (!loop->finite_p && max_loop_iterations (loop, )))
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "Found loop %i to be finite: upper bound 
> found.\n",
> +loop->num);
> +  loop->finite_p = true;
> +  return true;
> +}
> +
>return false;
>  }
>


finite_loop_p tweak

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
we have finite_p flag in loop structure.  finite_loop_p already know to
use it, but we also may set the flag when we prove loop to be finite by
SCEV analysis to avoid duplicated work.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (finite_loop_p): Reorder to do cheap
tests first; update finite_p flag.

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 3c4e66291fb..e5985bee235 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -3338,24 +3338,6 @@ finite_loop_p (class loop *loop)
   widest_int nit;
   int flags;
 
-  flags = flags_from_decl_or_type (current_function_decl);
-  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Found loop %i to be finite: it is within pure or 
const function.\n",
-loop->num);
-  return true;
-}
-
-  if (loop->any_upper_bound
-  || max_loop_iterations (loop, ))
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Found loop %i to be finite: upper bound found.\n",
-loop->num);
-  return true;
-}
-
   if (loop->finite_p)
 {
   unsigned i;
@@ -3368,11 +3350,36 @@ finite_loop_p (class loop *loop)
  {
if (dump_file)
  fprintf (dump_file, "Assume loop %i to be finite: it has an exit "
-  "and -ffinite-loops is on.\n", loop->num);
+  "and -ffinite-loops is on or loop was
+  " previously finite.\n",
+  loop->num);
return true;
  }
 }
 
+  flags = flags_from_decl_or_type (current_function_decl);
+  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Found loop %i to be finite: it is within "
+"pure or const function.\n",
+loop->num);
+  loop->finite_p = true;
+  return true;
+}
+
+  if (loop->any_upper_bound
+  /* Loop with no normal exit will not pass max_loop_iterations.  */
+  || (!loop->finite_p && max_loop_iterations (loop, )))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Found loop %i to be finite: upper bound found.\n",
+loop->num);
+  loop->finite_p = true;
+  return true;
+}
+
   return false;
 }
 


[C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-21 Thread Martin Uecker via Gcc-patches



This patch adds a warning for allocations with insufficient size
based on the "alloc_size" attribute and the type of the pointer 
the result is assigned to. While it is theoretically legal to
assign to the wrong pointer type and cast it to the right type
later, this almost always indicates an error. Since this catches
common mistakes and is simple to diagnose, it is suggested to
add this warning.
 

Bootstrapped and regression tested on x86. 


Martin



Add option Walloc-type that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to.

gcc:
* doc/invoke.texi: Document -Wstrict-flex-arrays option.

gcc/c-family:

* c.opt (Walloc-type): New option.

gcc/c:
* c-typeck.cc (convert_for_assignment): Add Walloc-type warning.

gcc/testsuite:

* gcc.dg/Walloc-type-1.c: New test.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4abdc8d0e77..8b9d148582b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -319,6 +319,10 @@ Walloca
 C ObjC C++ ObjC++ Var(warn_alloca) Warning
 Warn on any use of alloca.
 
+Walloc-type
+C ObjC Var(warn_alloc_type) Warning
+Warn when allocating insufficient storage for the target type of the
assigned pointer.
+
 Walloc-size-larger-than=
 C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
ByteSize Warning Init(HOST_WIDE_INT_MAX)
 -Walloc-size-larger-than=   Warn for calls to allocation
functions that
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..2e392f9c952 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
location_t expr_loc, tree type,
"request for implicit conversion "
"from %qT to %qT not permitted in C++", rhstype,
type);
 
+  /* Warn of new allocations are not big enough for the target
type.  */
+  tree fndecl;
+  if (warn_alloc_type
+ && TREE_CODE (rhs) == CALL_EXPR
+ && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
+ && DECL_IS_MALLOC (fndecl))
+   {
+ tree fntype = TREE_TYPE (fndecl);
+ tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
+ tree alloc_size = lookup_attribute ("alloc_size",
fntypeattrs);
+ if (alloc_size)
+   {
+ tree args = TREE_VALUE (alloc_size);
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
+ /* For calloc only use the second argument.  */
+ if (TREE_CHAIN (args))
+   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
(args))) - 1;
+ tree arg = CALL_EXPR_ARG (rhs, idx);
+ if (TREE_CODE (arg) == INTEGER_CST
+ && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
+warning_at (location, OPT_Walloc_type, "allocation of
"
+"insufficient size %qE for type %qT with
"
+"size %qE", arg, ttl, TYPE_SIZE_UNIT
(ttl));
+   }
+   }
+
   /* See if the pointers point to incompatible address spaces.  */
   asl = TYPE_ADDR_SPACE (ttl);
   asr = TYPE_ADDR_SPACE (ttr);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 88e3c625030..6869bed64c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
function such as wrappers of
 C++ @code{throw} or fatal error reporting functions leading to
@code{abort}.
 @end table
 
+@opindex Wno-alloc-type
+@opindex Walloc-type
+@item -Walloc-type
+Warn about calls to allocation functions decorated with attribute
+@code{alloc_size} that specify insufficient size for the target type
of
+the pointer the result is assigned to, including those to the built-in
+forms of the functions @code{aligned_alloc}, @code{alloca},
@code{calloc},
+@code{malloc}, and @code{realloc}.
+
 @opindex Wno-alloc-zero
 @opindex Walloc-zero
 @item -Walloc-zero
diff --git a/gcc/testsuite/gcc.dg/Walloc-type-1.c
b/gcc/testsuite/gcc.dg/Walloc-type-1.c
new file mode 100644
index 000..bc62e5e9aa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-type-1.c
@@ -0,0 +1,37 @@
+/* Tests the warnings for insufficient allocation size. 
+   { dg-do compile }
+ * { dg-options "-Walloc-type" } 
+ * */
+#include 
+#include 
+
+struct b { int x[10]; };
+
+void fo0(void)
+{
+struct b *p = malloc(sizeof *p);
+}
+
+void fo1(void)
+{
+struct b *p = malloc(sizeof p);/* { dg-
warning "allocation of insufficient size" } */
+}
+
+void fo2(void)
+{
+struct b *p = alloca(sizeof p);/* { dg-
warning "allocation of insufficient size" } */
+}
+
+void fo3(void)
+{
+struct b *p = calloc(1, sizeof p); /* { dg-warning
"allocation of insufficient size" } */
+}
+
+void g(struct b* p);
+
+void fo4(void)
+{
+g(malloc(4));  /* { dg-warning "allocation of
insufficient size" } */
+}
+
+





[PATCH v2] mklog: handle Signed-Off-By, minor cleanup

2023-07-21 Thread Marc Poulhiès via Gcc-patches
Consider Signed-Off-By lines as part of the ending of the initial
commit to avoid having these in the middle of the log when the
changelog part is injected after.

This is particularly usefull with:

 $ git gcc-commit-mklog --amend -s

that can be used to create the changelog and add the Signed-Off-By line.

Also applies most of the shellcheck suggestions on the
prepare-commit-msg hook.

contrib/ChangeLog:

* mklog.py: Leave SOB lines after changelog.
* prepare-commit-msg: Apply most shellcheck suggestions.

Signed-off-by: Marc Poulhiès 
---
Previous version was missing the ChangeLog.

This command is used in particular during the dev of the frontend
for the Rust language (see r13-7099-g4b25fc15b925f8 as an example
of a SoB ending in the middle of the commit message).

Ok for master?

 contrib/mklog.py   | 34 +-
 contrib/prepare-commit-msg | 20 ++--
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 777212c98d7..e5cc69e0d0a 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -41,7 +41,34 @@ from unidiff import PatchSet
 
 LINE_LIMIT = 100
 TAB_WIDTH = 8
-CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
+
+# Initial commit:
+#   +--+
+#   | gccrs: Some title|
+#   |  | This is the "start"
+#   | This is some text explaining the commit. |
+#   | There can be several lines.  |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+#
+# Results in:
+#   +--+
+#   | gccrs: Some title|
+#   |  |
+#   | This is some text explaining the commit. | This is the "start"
+#   | There can be several lines.  |
+#   |  |<--->
+#   | gcc/rust/ChangeLog:  |
+#   |  | This is the generated
+#   | * some_file (bla):   | ChangeLog part
+#   | (foo):   |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+
+# this regex matches the first line of the "end" in the initial commit message
+FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by|co-authored-by|#): ')
 
 pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
@@ -330,10 +357,7 @@ def update_copyright(data):
 
 
 def skip_line_in_changelog(line):
-if line.lower().startswith(CO_AUTHORED_BY_PREFIX) or line.startswith('#'):
-return False
-return True
-
+return FIRST_LINE_OF_END_RE.match(line) == None
 
 if __name__ == '__main__':
 extra_args = os.getenv('GCC_MKLOG_ARGS')
diff --git a/contrib/prepare-commit-msg b/contrib/prepare-commit-msg
index 48c9dad3c6f..1e94706ba40 100755
--- a/contrib/prepare-commit-msg
+++ b/contrib/prepare-commit-msg
@@ -32,11 +32,11 @@ if ! [ -f "$COMMIT_MSG_FILE" ]; then exit 0; fi
 # Don't do anything unless requested to.
 if [ -z "$GCC_FORCE_MKLOG" ]; then exit 0; fi
 
-if [ -z "$COMMIT_SOURCE" ] || [ $COMMIT_SOURCE = template ]; then
+if [ -z "$COMMIT_SOURCE" ] || [ "$COMMIT_SOURCE" = template ]; then
 # No source or "template" means new commit.
 cmd="diff --cached"
 
-elif [ $COMMIT_SOURCE = message ]; then
+elif [ "$COMMIT_SOURCE" = message ]; then
 # "message" means -m; assume a new commit if there are any changes staged.
 if ! git diff --cached --quiet; then
cmd="diff --cached"
@@ -44,23 +44,23 @@ elif [ $COMMIT_SOURCE = message ]; then
cmd="diff --cached HEAD^"
 fi
 
-elif [ $COMMIT_SOURCE = commit ]; then
+elif [ "$COMMIT_SOURCE" = commit ]; then
 # The message of an existing commit.  If it's HEAD, assume --amend;
 # otherwise, assume a new commit with -C.
-if [ $SHA1 = HEAD ]; then
+if [ "$SHA1" = HEAD ]; then
cmd="diff --cached HEAD^"
if [ "$(git config gcc-config.mklog-hook-type)" = "smart-amend" ]; then
# Check if the existing message still describes the staged changes.
f=$(mktemp /tmp/git-commit.XX) || exit 1
-   git log -1 --pretty=email HEAD > $f
-   printf '\n---\n\n' >> $f
-   git $cmd >> $f
+   git log -1 --pretty=email HEAD > "$f"
+   printf '\n---\n\n' >> "$f"
+   git $cmd >> "$f"
if contrib/gcc-changelog/git_email.py "$f" >/dev/null 2>&1; then
  

Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Oh. Sorry for missing a fix, Now I fix as you suggested on V4
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625169.html 

Change it as follows:

  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
   def0, mask, len, bias);
  else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var,
   def0);

Sorry for that.

Bootstrap && Regression on running.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-21 18:51
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for 
length loop control
On Fri, 21 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch support floating-point in-order reduction for loop length control.
> 
> Consider this following case:
> 
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
> 
> When compile with **NO** -ffast-math on ARM SVE, we will end up with:
> 
> loop_mask = WHILE_ULT
> result = MASK_FOLD_LEFT_PLUS (...loop_mask...)
> 
> For RVV, we don't use length loop control instead of mask:
> 
> So, with this patch, we expect to see:
> 
> loop_len = SELECT_VL
> result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
> (vectorize_fold_left_reduction): Ditto.
> (vectorizable_reduction): Ditto.
> (vect_transform_reduction): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc | 41 -
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..59ab7879d55 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -6800,11 +6800,13 @@ static internal_fn
>  get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
>  {
>internal_fn mask_reduc_fn;
> +  internal_fn mask_len_reduc_fn;
>  
>switch (reduc_fn)
>  {
>  case IFN_FOLD_LEFT_PLUS:
>mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
> +  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
>break;
>  
>  default:
> @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
> vectype_in)
>if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
>OPTIMIZE_FOR_SPEED))
>  return mask_reduc_fn;
> +  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
> +   OPTIMIZE_FOR_SPEED))
> +return mask_len_reduc_fn;
>return IFN_LAST;
>  }
>  
> @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
> gimple *reduc_def_stmt,
> tree_code code, internal_fn reduc_fn,
> tree ops[3], tree vectype_in,
> -int reduc_index, vec_loop_masks *masks)
> +int reduc_index, vec_loop_masks *masks,
> +vec_loop_lens *lens)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gimple *new_stmt;
>tree mask = NULL_TREE;
> +  tree len = NULL_TREE;
> +  tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
> +i, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +   bias = build_int_cst (intQI_type_node, biasval);
> +   mask = build_minus_one_cst (truth_type_for (vectype_in));
> + }
>  
>/* Handle MINUS by adding the negative.  */
>if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
> @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  the preceding operation.  */
>if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
>  {
> -   if (mask && mask_reduc_fn != IFN_LAST)
> +   if (len && mask && mask_reduc_fn != IFN_LAST)
 
check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead?
 
> + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
> +def0, mask, len, bias);
> +   else if (mask && mask_reduc_fn != IFN_LAST)
 
Likewise.
 
Otherwise looks good to me.
 
Richard.
 
>  new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
> def0, mask);
>else
> @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info 

[PATCH V4] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add 
mask_len_fold_left_plus.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..3b296d41157 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ else
+  

Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Thanks Richi,

Address comment on V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625167.html 

Bootstrap and regression is on the way.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-21 18:51
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for 
length loop control
On Fri, 21 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch support floating-point in-order reduction for loop length control.
> 
> Consider this following case:
> 
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
> 
> When compile with **NO** -ffast-math on ARM SVE, we will end up with:
> 
> loop_mask = WHILE_ULT
> result = MASK_FOLD_LEFT_PLUS (...loop_mask...)
> 
> For RVV, we don't use length loop control instead of mask:
> 
> So, with this patch, we expect to see:
> 
> loop_len = SELECT_VL
> result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
> (vectorize_fold_left_reduction): Ditto.
> (vectorizable_reduction): Ditto.
> (vect_transform_reduction): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc | 41 -
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..59ab7879d55 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -6800,11 +6800,13 @@ static internal_fn
>  get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
>  {
>internal_fn mask_reduc_fn;
> +  internal_fn mask_len_reduc_fn;
>  
>switch (reduc_fn)
>  {
>  case IFN_FOLD_LEFT_PLUS:
>mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
> +  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
>break;
>  
>  default:
> @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
> vectype_in)
>if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
>OPTIMIZE_FOR_SPEED))
>  return mask_reduc_fn;
> +  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
> +   OPTIMIZE_FOR_SPEED))
> +return mask_len_reduc_fn;
>return IFN_LAST;
>  }
>  
> @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
> gimple *reduc_def_stmt,
> tree_code code, internal_fn reduc_fn,
> tree ops[3], tree vectype_in,
> -int reduc_index, vec_loop_masks *masks)
> +int reduc_index, vec_loop_masks *masks,
> +vec_loop_lens *lens)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gimple *new_stmt;
>tree mask = NULL_TREE;
> +  tree len = NULL_TREE;
> +  tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
> +i, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +   bias = build_int_cst (intQI_type_node, biasval);
> +   mask = build_minus_one_cst (truth_type_for (vectype_in));
> + }
>  
>/* Handle MINUS by adding the negative.  */
>if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
> @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  the preceding operation.  */
>if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
>  {
> -   if (mask && mask_reduc_fn != IFN_LAST)
> +   if (len && mask && mask_reduc_fn != IFN_LAST)
 
check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead?
 
> + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
> +def0, mask, len, bias);
> +   else if (mask && mask_reduc_fn != IFN_LAST)
 
Likewise.
 
Otherwise looks good to me.
 
Richard.
 
>  new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
> def0, mask);
>else
> @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>  {
>vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
>internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
>  
>if (reduction_type != FOLD_LEFT_REDUCTION
> @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  }
>else
> - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
> -vectype_in, NULL);
> + {
> +   internal_fn 

[PATCH V3] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add 
mask_len_fold_left_plus.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..9256bc17c9d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask && mask_reduc_fn != IFN_LAST)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ else
+   

Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
Hi Martin,


By the way, is there a standard format required for these Python files?
I see that other Python files have similar format error when checked
using flake8.If so, it feels necessary to configure a git hook on git 
server
to do this check.


Best,
Lehua

  1   2   >