[patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-03 Thread Jerry D

Hi all,

The attached log entry and patch (git show) fixes this issue by adding 
logic to handle spaces in eat_separators. One or more spaces by 
themselves are a valid separator. So in this case we look at the 
character following the spaces to see if it is a comma or semicolon.


If so, I change it to the valid separator for the given decimal mode, 
point or comma. This allows the comma or semicolon to be interpreted as 
a null read on the next effective item in the formatted read.


I chose a permissive approach here that allows reads to proceed when the
input line is mal-formed with an incorrect separator as long as there is 
at least one space in front of it.


New test case included. Regression tested on X86-64.

OK for trunk?  Backport to 13 after some time.

Regards,

Jerrycommit 7d1a958d6b099ea88b6c51649baf5dbd5e598909
Author: Jerry DeLisle 
Date:   Wed Apr 3 18:07:30 2024 -0700

libfortran: Fix handling of formatted separators.

PR libfortran/114304

libgfortran/ChangeLog:

* io/list_read.c (eat_separator): Add logic to handle spaces
preceding a comma or semicolon such that that a 'null' read
occurs without error at the end of comma or semicolon
terminated input lines.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114304.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/pr114304.f90 b/gcc/testsuite/gfortran.dg/pr114304.f90
new file mode 100644
index 000..57af619246b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114304.f90
@@ -0,0 +1,49 @@
+! { dg-do run }
+! pr114304
+real :: x(3)
+character(len=1) :: s
+integer :: ios
+
+s = 'x'
+
+open(99, decimal="comma", status='scratch')
+write(99, '(a)') '1,23435 1243,24 13,24 a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 1
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24;a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 2
+
+! Note: not reading 's'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *) x
+if (ios /= 0) stop 3
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x
+if (ios /= 0) stop 4
+
+! Now reading 's'
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 5
+
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 6
+close(99)
+end
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index fb3f7dbc34d..f6f169043bf 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -461,11 +461,30 @@ eat_separator (st_parameter_dt *dtp)
   int c, n;
   int err = 0;
 
-  eat_spaces (dtp);
   dtp->u.p.comma_flag = 0;
+  c = next_char (dtp);
+  if (c == ' ')
+{
+  eat_spaces (dtp);
+  c = next_char (dtp);
+  if (c == ',')
+	{
+	  if (dtp->u.p.current_unit->decimal_status == DECIMAL_COMMA)
+	unget_char (dtp, ';');
+	  dtp->u.p.comma_flag = 1;
+	  eat_spaces (dtp);
+	  return err;
+	}
+  if (c == ';')
+	{
+	  if (dtp->u.p.current_unit->decimal_status == DECIMAL_POINT)
+	unget_char (dtp, ',');
+	  dtp->u.p.comma_flag = 1;
+	  eat_spaces (dtp);
+	  return err;
+	}
+}
 
-  if ((c = next_char (dtp)) == EOF)
-return LIBERROR_END;
   switch (c)
 {
 case ',':
@@ -476,7 +495,10 @@ eat_separator (st_parameter_dt *dtp)
 	  unget_char (dtp, c);
 	  break;
 	}
-/* Fall through. */
+  dtp->u.p.comma_flag = 1;
+  eat_spaces (dtp);
+  break;
+
 case ';':
   dtp->u.p.comma_flag = 1;
   eat_spaces (dtp);


Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-03 Thread Richard Sandiford
Alex Coplan  writes:
> On 23/02/2024 16:41, Ajit Agarwal wrote:
>> Hello Richard/Alex/Segher:
>
> Hi Ajit,
>
> Sorry for the delay and thanks for working on this.
>
> Generally this looks like the right sort of approach (IMO) but I've left
> some comments below.
>
> I'll start with a meta comment: in the subject line you have marked this
> as 0/2, but usually 0/n is reserved for the cover letter of a patch
> series and wouldn't contain an actual patch.  I think this might have
> confused the Linaro CI suitably such that it didn't run regression tests
> on the patch.

Alex, thanks for the thorough and in-depth review.  I agree with all the
comments FWIW.  Just to add a couple of things:

> > @@ -138,8 +138,18 @@ struct alt_base
> >poly_int64 offset;
> >  };
> >  
> > +// Virtual base class for load/store walkers used in alias analysis.
> > +struct alias_walker
> > +{
> > +  virtual bool conflict_p (int ) const = 0;
> > +  virtual insn_info *insn () const = 0;
> > +  virtual bool valid () const  = 0;
> > +  virtual void advance () = 0;
> > +};
> > +
> > +
> >  // State used by the pass for a given basic block.
> > -struct ldp_bb_info
> > +struct pair_fusion
>
> As a comment on the high-level design, I think we want a generic class
> for the overall pass, not just for the BB-specific structure.
>
> That is because naturally we want the ldp_fusion_bb function itself to
> be a member of such a class, so that it can access virtual functions to
> query the target e.g. about the load/store pair policy, and whether to
> try and promote writeback pairs.
>
> If we keep all of the virtual functions in such an outer class, then we
> can keep the ldp_fusion_bb class generic (not needing an override for
> each target) and that inner class can perhaps be given a pointer or
> reference to the outer class when it is instantiated in ldp_fusion_bb.

I agree that in general, the new virtual methods should belong to a pass
class rather than the per-bb class.

In principle, if we need to virtualise existing members of ldp_bb_info
(or code contained within existing members of ldp_bb_info), and if that
code accesses members of the bb info, then it might make sense to have
target-specific derivatives of the bb info structure too, with a virtual
function to create the bb info structure for a given bb.

However, it looks like all but one of the virtual functions in the patch
are self-contained (in the sense of depending only on their arguments
and on globals).  The one exception is transform_for_base, but Alex
asked whether that needs to be virtualised.  If it doesn't, then like
Alex says, it seems that all virtuals could belong to the pass class
rather than to the bb info.

>> [...]
>> +  }
>>  };
>>  
>> +bool
>> +store_modifies_mem_p (rtx mem, insn_info *store_insn, int );
>> +bool load_modified_by_store_p (insn_info *load,
>> +  insn_info *store,
>> +  int );
>> +extern insn_info *
>> +try_repurpose_store (insn_info *first,
>> + insn_info *second,
>> + const insn_range_info _range);
>> +
>> +void reset_debug_use (use_info *use);
>> +
>> +extern void
>> +fixup_debug_uses (obstack_watermark ,
>> +  insn_info *insns[2],
>> +  rtx orig_rtl[2],
>> +  insn_info *pair_dst,
>> +  insn_info *trailing_add,
>> +  bool load_p,
>> +  int writeback,
>> +  rtx writeback_effect,
>> +  unsigned base_regno);
>> +
>> +void
>> +fixup_debug_uses_trailing_add (obstack_watermark ,
>> +   insn_info *pair_dst,
>> +   insn_info *trailing_add,
>> +   rtx writeback_effect);
>> +
>> +
>> +extern void
>> +fixup_debug_use (obstack_watermark ,
>> + use_info *use,
>> + def_info *def,
>> + rtx base,
>> + poly_int64 wb_offset);
>> +
>> +extern insn_info *
>> +find_trailing_add (insn_info *insns[2],
>> +   const insn_range_info _range,
>> +   int initial_writeback,
>> +   rtx *writeback_effect,
>> +   def_info **add_def,
>> +   def_info *base_def,
>> +   poly_int64 initial_offset,
>> +   unsigned access_size);
>> +
>> +rtx drop_writeback (rtx mem);
>> +rtx pair_mem_strip_offset (rtx mem, poly_int64 *offset);
>> +bool any_pre_modify_p (rtx x);
>> +bool any_post_modify_p (rtx x);
>> +int encode_lfs (lfs_fields fields);
>> +extern insn_info * latest_hazard_before (insn_info *insn, rtx *ignore,
>> +  insn_info *ignore_insn = nullptr);
>> +insn_info * first_hazard_after (insn_info *insn, rtx *ignore);
>> +bool ranges_overlap_p (const insn_range_info , const insn_range_info 
>> );
>> +insn_range_info get_def_range (def_info *def);
>> +insn_range_info def_downwards_move_range (def_info *def);
>> +insn_range_info def_upwards_move_range (def_info *def);
>> +rtx gen_tombstone (void);
>> 

Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jason Merrill

On 4/3/24 15:16, Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 12:58:12PM -0400, Jason Merrill wrote:

On 4/3/24 12:42, Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 12:07:48PM -0400, Jason Merrill wrote:

Using std::is_constant_evaluated directly in a loop condition is, as the
paper says, unlikely and "horrendous code", so I'm not concerned about
surprising effects, though I guess we should check for it with
maybe_warn_for_constant_evaluated.


Ok, though guess the question is what to say about it.
Because unlike the existing cases in maybe_warn_for_constant_evaluated
where it always evaluates to true or always to false depending on where,
in the trivial empty iteration statements it evaluates to always true or
always false depending or sometimes true, sometimes false, depending on
if the condition is a constant expression that evaluates to true (then it is
always true), or if in immediate function (also always true), or if not
in constexpr function (then always false), or in constexpr function (then
it might be true or false).
Not sure how exactly to word that.
Maybe just say that it is horrendous code to use std::is_constant_evaluated
() in trivial empty iteration statement conditions ;)


Maybe if the condition constant-evaluates to true, warn something like
"% always constant-evaluates to true in the
condition of a trivially empty iteration statement"?


Given the mails on core about this, isn't it actually the case that not all
"trivial infinite loops" actually loop at all or if they loop, don't loop
infinitely?
I.e. what I wrote in the patch would be wrong and that generally we need to
use ANNOTATE_EXPR in some cases?


Apparently so.


As in, for trivially empty iteration statements try to evaluate the
maybe_const_value evaluate condition with mce_true, if it yields true,
try to maybe_const_value evaluate condition with mce_false, if it also
yields true, replace it with true, but if the former returns true and
the latter isn't a constant expression or returns false mark the loop with
ANNOTATE_EXPR as infinite loop regardless of -ffinite-loops and actually
keep invoking the condition in there?


Sure.

Jason



[gcc-13 backport PATCH] RISC-V: Fix C23 (...) functions returning large aggregates [PR114175]

2024-04-03 Thread Edwin Lu
We assume that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and
there is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
This causes gcc.dg/c23-stdarg-[68].c to fail

Fix the issue by checking if arg.type is NULL as r14-9503-g218d1749612
explains

Tested on linux rv64gcv.

gcc/ChangeLog:

PR target/114175
* config/riscv/riscv.cc (riscv_setup_incoming_varargs): Only skip
riscv_funciton_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL

(cherry picked from commit 60586710b0646efdbbd77a7f53b93fb5edb87a61)
---
 gcc/config/riscv/riscv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 01eebc83cc5..cefd3b7b2b2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3961,7 +3961,8 @@ riscv_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 riscv_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.34.1



New Swedish PO file for 'gcc' (version 14.1-b20240218)

2024-04-03 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-14.1-b20240218.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 12:58:12PM -0400, Jason Merrill wrote:
> On 4/3/24 12:42, Jakub Jelinek wrote:
> > On Wed, Apr 03, 2024 at 12:07:48PM -0400, Jason Merrill wrote:
> > > Using std::is_constant_evaluated directly in a loop condition is, as the
> > > paper says, unlikely and "horrendous code", so I'm not concerned about
> > > surprising effects, though I guess we should check for it with
> > > maybe_warn_for_constant_evaluated.
> > 
> > Ok, though guess the question is what to say about it.
> > Because unlike the existing cases in maybe_warn_for_constant_evaluated
> > where it always evaluates to true or always to false depending on where,
> > in the trivial empty iteration statements it evaluates to always true or
> > always false depending or sometimes true, sometimes false, depending on
> > if the condition is a constant expression that evaluates to true (then it is
> > always true), or if in immediate function (also always true), or if not
> > in constexpr function (then always false), or in constexpr function (then
> > it might be true or false).
> > Not sure how exactly to word that.
> > Maybe just say that it is horrendous code to use std::is_constant_evaluated
> > () in trivial empty iteration statement conditions ;)
> 
> Maybe if the condition constant-evaluates to true, warn something like
> "% always constant-evaluates to true in the
> condition of a trivially empty iteration statement"?

Given the mails on core about this, isn't it actually the case that not all
"trivial infinite loops" actually loop at all or if they loop, don't loop
infinitely?
I.e. what I wrote in the patch would be wrong and that generally we need to
use ANNOTATE_EXPR in some cases?
As in, for trivially empty iteration statements try to evaluate the
maybe_const_value evaluate condition with mce_true, if it yields true,
try to maybe_const_value evaluate condition with mce_false, if it also
yields true, replace it with true, but if the former returns true and
the latter isn't a constant expression or returns false mark the loop with
ANNOTATE_EXPR as infinite loop regardless of -ffinite-loops and actually
keep invoking the condition in there?

Jakub



Re: [PATCH] c++/modules: Prefer partition indexes when installing imported entities [PR99377]

2024-04-03 Thread Jason Merrill

On 3/28/24 08:22, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The testcase in comment 15 of the linked PR is caused because the
following assumption in depset::hash::make_dependency doesn't hold:

   if (DECL_LANG_SPECIFIC (not_tmpl)
   && DECL_MODULE_IMPORT_P (not_tmpl))
 {
   /* Store the module number and index in cluster/section,
  so we don't have to look them up again.  */
   unsigned index = import_entity_index (decl);
   module_state *from = import_entity_module (index);
   /* Remap will be zero for imports from partitions, which
  we want to treat as-if declared in this TU.  */
   if (from->remap)
 {
   dep->cluster = index - from->entity_lwm;
   dep->section = from->remap;
   dep->set_flag_bit ();
 }
 }

This is because at least for template specialisations, we first see the
declaration in the header unit imported from the partition, and then the
instantiation provided by the partition itself.  This means that the
'import_entity_index' lookup doesn't report that the specialisation was
declared in the partition and thus should be considered as-if it was
part of the TU, and get exported.


I think "exported" is the wrong term here; IIUC template specializations 
are not themselves exported, just the template itself.


But if the declaration or point of instantiation of the specialization 
is within a module instantiation unit, it is reachable to any importers, 
including the primary module interface unit importing the partition 
interface unit.


Does this work differently if "check" is a separate module rather than a 
partition?



To fix this, this patch allows, as a special case for installing an
entity from a partition, to overwrite the entity_map entry with the
(later) index into the partition so that this assumption holds again.


Rather than special-casing partitions, would it make sense to override a 
declaration with a definition?


Jason



Re: [PATCH] c++: constexpr error with fn redecl in local scope [PR111132]

2024-04-03 Thread Jason Merrill

On 4/2/24 13:52, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --
We evaluate constexpr functions on the original, pre-genericization bodies.
That means that the function body we're evaluating will not have gone
through cp_genericize_r's "Map block scope extern declarations to visible
declarations with the same name and type in outer scopes if any".  Here:

   constexpr bool bar() { return true; } // #1
   constexpr bool foo() {
 constexpr bool bar(void); // #2
 return bar();
   }

it means that we:
1) register_constexpr_fundef (#1)
2) cp_genericize (#1)
nothing interesting happens
3) register_constexpr_fundef (foo)
does copy_fn, so we have two copies of the BIND_EXPR
4) cp_genericize (foo)
this remaps #2 to #1, but only on one copy of the BIND_EXPR
5) retrieve_constexpr_fundef (foo)
we find it, no problem
6) retrieve_constexpr_fundef (#2)
and here #2 isn't found in constexpr_fundef_table, because
we're working on the BIND_EXPR copy where #2 wasn't mapped to #1
so we fail.  We've only registered #1.

It should work to use DECL_LOCAL_DECL_ALIAS (which used to be
extern_decl_map).  We evaluate constexpr functions on pre-cp_fold
bodies to avoid diagnostic problems, but the remapping I'm proposing
should not interfere with diagnostics.

This is not a problem for a global scope redeclaration; there we go
through duplicate_decls which keeps the DECL_UID:
   DECL_UID (olddecl) = olddecl_uid;
and DECL_UID is what constexpr_fundef_hasher::hash uses.

PR c++/32

gcc/cp/ChangeLog:

* constexpr.cc (get_function_named_in_call): If there's
a DECL_LOCAL_DECL_ALIAS, use it.


Perhaps this function should use cp_get_fndecl_from_callee, and this 
change should be made there instead?


Jason



Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jason Merrill

On 4/3/24 12:42, Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 12:07:48PM -0400, Jason Merrill wrote:

Using std::is_constant_evaluated directly in a loop condition is, as the
paper says, unlikely and "horrendous code", so I'm not concerned about
surprising effects, though I guess we should check for it with
maybe_warn_for_constant_evaluated.


Ok, though guess the question is what to say about it.
Because unlike the existing cases in maybe_warn_for_constant_evaluated
where it always evaluates to true or always to false depending on where,
in the trivial empty iteration statements it evaluates to always true or
always false depending or sometimes true, sometimes false, depending on
if the condition is a constant expression that evaluates to true (then it is
always true), or if in immediate function (also always true), or if not
in constexpr function (then always false), or in constexpr function (then
it might be true or false).
Not sure how exactly to word that.
Maybe just say that it is horrendous code to use std::is_constant_evaluated
() in trivial empty iteration statement conditions ;)


Maybe if the condition constant-evaluates to true, warn something like 
 "% always 
constant-evaluates to true in the condition of a trivially empty 
iteration statement"?



What about loops with non-empty bodies or other reasons why they aren't
trivial empty iteration statements?  Shall we do
maybe_warn_for_constant_evaluated for those (with the current wording,
false in non-constexpr fn, true in consteval)?


Sounds good.

Jason



Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 12:07:48PM -0400, Jason Merrill wrote:
> Using std::is_constant_evaluated directly in a loop condition is, as the
> paper says, unlikely and "horrendous code", so I'm not concerned about
> surprising effects, though I guess we should check for it with
> maybe_warn_for_constant_evaluated.

Ok, though guess the question is what to say about it.
Because unlike the existing cases in maybe_warn_for_constant_evaluated
where it always evaluates to true or always to false depending on where,
in the trivial empty iteration statements it evaluates to always true or
always false depending or sometimes true, sometimes false, depending on
if the condition is a constant expression that evaluates to true (then it is
always true), or if in immediate function (also always true), or if not
in constexpr function (then always false), or in constexpr function (then
it might be true or false).
Not sure how exactly to word that.
Maybe just say that it is horrendous code to use std::is_constant_evaluated
() in trivial empty iteration statement conditions ;)
What about loops with non-empty bodies or other reasons why they aren't
trivial empty iteration statements?  Shall we do
maybe_warn_for_constant_evaluated for those (with the current wording,
false in non-constexpr fn, true in consteval)?

> > So, the patch below attempts to discover trivially empty iteration
> > statements at cp_fold time if it is the final mce_false folding,
> > attempts to maybe_constant_value with mce_false evaluate the conditions
> > and replaces it with the returned value if constant non-zero.
> 
> Please refactor this code to share most of the implementation between the
> loop types.

Will do.

Jakub



Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread Peter Bergner
On 4/3/24 10:45 AM, Andrew Pinski wrote:
> On Wed, Apr 3, 2024 at 8:32 AM Peter Bergner  wrote:
> I think you misunderstood the patch/situtation. Most ifunc resolves
> don't use TLS at all; what is happening here is that the profiler
> (-fprofile-generate) is adding TLS usage to the ifunc resolver which
> then causes issues. And the use of TLS causes a PLT call to be inside
> the ifun which causes all the fun stuff.
> 
> This is not about ifunc resolves using TLS directly in code but rather
> indirectly via -fprofile-generate.

Ah, ok.  Thanks to you and H.J. for clarifying.

Peter



Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jason Merrill

On 4/3/24 03:25, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement P2809R3, which has been voted
in as a DR.

The middle-end has its behavior documented:
'-ffinite-loops'
  Assume that a loop with an exit will eventually take the exit and
  not loop indefinitely.  This allows the compiler to remove loops
  that otherwise have no side-effects, not considering eventual
  endless looping as such.

  This option is enabled by default at '-O2' for C++ with -std=c++11
  or higher.

So, the following patch attempts to detect trivial infinite loops and
turn their conditions into INTEGER_CSTs so that they don't really have
exits in the middle-end and so regardless of -ffinite-loops or
-fno-finite-loops they are handled as infinite loops by the middle-end.
Otherwise, if the condition would be a large expression calling various
constexpr functions, I'd be afraid we could e.g. just inline some of them
and not all of them and the middle-end could still see tests in the
condition and with -ffinite-loops optimize it by assuming that such loops
need to be finite.

The "A trivial infinite loop is a trivially empty iteration statement for
which the converted controlling expression is a constant expression, when
interpreted as a constant-expression ([expr.const]), and evaluates to true."
wording isn't clear to me what it implies for manifest constant evaluation
of the expression, especially given the
int x = 42;
while (std::is_constant_evaluated() || --x) ;
example in the rationale.


The "interpreted as a constant-expression" wording was specifically 
intended (per CWG email discussion) to mean manifestly 
constant-evaluated.  I also note that the paper expects


  while (std::is_constant_evaluated() || --x) ;

to be recognized as a trivial infinite loop, which means treating the 
condition as manifestly constant-evaluated.



The patch assumes that the condition expression aren't manifestly constant
evaluated.  If it would be supposed to be manifestly constant evaluated,
then I think the DR would significantly change behavior of existing programs
and have really weird effects.  Before the DR has been voted in, I think
void foo (int x)
{
   if (x == 0)
 while (std::is_constant_evaluated())
   ;
   else
 while (!std::is_constant_evaluated())
   ;
}
would have well defined behavior of zero loop body iterations if x == 0 and
undefined behavior otherwise.  If the condition expression is manifestly
constant evaluated if it evaluates to true, and otherwise can be
non-constant or not manifestly constant evaluated otherwise, then the
behavior would be that for x == 0 it is well defined trvial infinite loop,
while for x != 0 it would keep to be undefined behavior (infinite loop,
as !std::is_constant_evaluated() is false when manifestly constant evaluated
and if we keep the condition as is, evaluates then to true.  I think it
would be fairly strange if both loops are infinite even when their condition
are negated.  Similar for anything that is dependent on if consteval or
std::is_constant_evaluated() inside of it.


Using std::is_constant_evaluated directly in a loop condition is, as the 
paper says, unlikely and "horrendous code", so I'm not concerned about 
surprising effects, though I guess we should check for it with 
maybe_warn_for_constant_evaluated.



So, the patch below attempts to discover trivially empty iteration
statements at cp_fold time if it is the final mce_false folding,
attempts to maybe_constant_value with mce_false evaluate the conditions
and replaces it with the returned value if constant non-zero.


Please refactor this code to share most of the implementation between 
the loop types.



The testcases then try to check if the FE changed the calls in the
conditions into constants.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or is it really supposed to be mce_true with the above described weird
behavior?  If so, I think the standard at least should mention it in Annex C
(though, where when it is a DR?).


Good question, I'll raise this with CWG.

Jason



Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread Andrew Pinski
On Wed, Apr 3, 2024 at 8:32 AM Peter Bergner  wrote:
>
> On 4/3/24 7:40 AM, H.J. Lu wrote:
> > We can't profile indirect calls to IFUNC resolvers nor their callees as
> > it requires TLS which hasn't been set up yet when the dynamic linker is
> > resolving IFUNC symbols.
> >
> > Add an IFUNC resolver caller marker to cgraph_node and set it if the
> > function is called by an IFUNC resolver.  Disable indirect call profiling
> > for IFUNC resolvers and their callees.
>
> The IFUNC resolvers on Power do not use TLS, so isn't this a little too
> conservative?  Should this be triggered via a target hook so architectures
> that don't use TLS in their IFUNC resolvers could still profile them?

I think you misunderstood the patch/situtation. Most ifunc resolves
don't use TLS at all; what is happening here is that the profiler
(-fprofile-generate) is adding TLS usage to the ifunc resolver which
then causes issues. And the use of TLS causes a PLT call to be inside
the ifun which causes all the fun stuff.

This is not about ifunc resolves using TLS directly in code but rather
indirectly via -fprofile-generate.

Thanks,
Andrew Pinski


>
> Peter
>
>


Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread H.J. Lu
On Wed, Apr 3, 2024 at 8:31 AM Peter Bergner  wrote:
>
> On 4/3/24 7:40 AM, H.J. Lu wrote:
> > We can't profile indirect calls to IFUNC resolvers nor their callees as
> > it requires TLS which hasn't been set up yet when the dynamic linker is
> > resolving IFUNC symbols.
> >
> > Add an IFUNC resolver caller marker to cgraph_node and set it if the
> > function is called by an IFUNC resolver.  Disable indirect call profiling
> > for IFUNC resolvers and their callees.
>
> The IFUNC resolvers on Power do not use TLS, so isn't this a little too
> conservative?  Should this be triggered via a target hook so architectures
> that don't use TLS in their IFUNC resolvers could still profile them?

It is not about IFUNC resolver using TLS.   TLS is used by indirect call
profiling which hasn't been set up yet when the dynamic linker is resolving
IFUNC symbols.   Doesn't Power need to set up TLS before using TLS?

-- 
H.J.


[COMMITTED] Regenerate i386.opt.urls

2024-04-03 Thread Mark Wielaard
LoongArch added an -mtls-dialect option, causing the similarly
named option for i386 get renumbered.

Fixes: b253b4695dda ("LoongArch: Add support for TLS descriptors.")

gcc/ChangeLog:

* config/i386/i386.opt.urls: Regenerate.
---

Pushed as obvious.

 gcc/config/i386/i386.opt.urls | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index fa821eba2006..81c5bb9a9270 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -128,7 +128,7 @@ mstackrealign
 UrlSuffix(gcc/x86-Options.html#index-mstackrealign)
 
 mtls-dialect=
-UrlSuffix(gcc/x86-Options.html#index-mtls-dialect-1)
+UrlSuffix(gcc/x86-Options.html#index-mtls-dialect-2)
 
 mtls-direct-seg-refs
 UrlSuffix(gcc/x86-Options.html#index-mtls-direct-seg-refs)
-- 
2.44.0



Re: [C PATCH] fix aliasing for structures/unions with incomplete types

2024-04-03 Thread Joseph Myers
On Tue, 2 Apr 2024, Martin Uecker wrote:

> Am Dienstag, dem 02.04.2024 um 20:42 + schrieb Joseph Myers:
> > On Tue, 2 Apr 2024, Martin Uecker wrote:
> > 
> > > [C23]fix aliasing for structures/unions with incomplete types
> > > 
> > > When incomplete structure/union types are completed later, compatibility
> > > of struct types that contain pointers to such types changes.  When forming
> > > equivalence classes for TYPE_CANONICAL, we therefor need to be 
> > > conservative
> > > and treat all structs with the same tag which are pointer targets as
> > > equivalent.
> > 
> > I don't see how what it done is actually about "which are pointer 
> > targets".
> 
> Right, I see now that the description needs to be improved. This refers
> only to targets of pointers included somewhere in the type we process
> for purposes of determining the equivalence class of this type (but
> not for other contexts).

A detailed self-contained description of the TYPE_CANONICAL / aliasing 
logic also needs to go somewhere in a comment in the source code rather 
than only in commit messages / emails.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread Peter Bergner
On 4/3/24 7:40 AM, H.J. Lu wrote:
> We can't profile indirect calls to IFUNC resolvers nor their callees as
> it requires TLS which hasn't been set up yet when the dynamic linker is
> resolving IFUNC symbols.
> 
> Add an IFUNC resolver caller marker to cgraph_node and set it if the
> function is called by an IFUNC resolver.  Disable indirect call profiling
> for IFUNC resolvers and their callees.

The IFUNC resolvers on Power do not use TLS, so isn't this a little too
conservative?  Should this be triggered via a target hook so architectures
that don't use TLS in their IFUNC resolvers could still profile them?

Peter




Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-03 Thread Alex Coplan
On 23/02/2024 16:41, Ajit Agarwal wrote:
> Hello Richard/Alex/Segher:

Hi Ajit,

Sorry for the delay and thanks for working on this.

Generally this looks like the right sort of approach (IMO) but I've left
some comments below.

I'll start with a meta comment: in the subject line you have marked this
as 0/2, but usually 0/n is reserved for the cover letter of a patch
series and wouldn't contain an actual patch.  I think this might have
confused the Linaro CI suitably such that it didn't run regression tests
on the patch.

> 
> This patch adds the changed code for target independent and
> dependent code for load store fusion.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> Bootstrapped for aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Place target independent and dependent changed code in one file.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> 2024-02-23  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>  1 file changed, 305 insertions(+), 132 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..2ef22ff1e96 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -40,10 +40,10 @@
>  
>  using namespace rtl_ssa;
>  
> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
> (PAIR_MEM_IMM_BITS - 1));
> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;

These constants shouldn't be renamed: they are specific to aarch64 so the
original names should be preserved in this file.

I expect we want to introduce virtual functions to validate an offset
for a paired access.  The aarch64 code could then implement it by
comparing the offset against LDP_{MIN,MAX}_IMM, and the generic code
could use that hook to replace the code that queries those constants
directly (i.e. in find_trailing_add and get_viable_bases).

>  
>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>  // (LFS) which we use as part of the key into the main hash tables.
> @@ -138,8 +138,18 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;
> +  virtual void advance () = 0;
> +};
> +
> +
>  // State used by the pass for a given basic block.
> -struct ldp_bb_info
> +struct pair_fusion

As a comment on the high-level design, I think we want a generic class
for the overall pass, not just for the BB-specific structure.

That is because naturally we want the ldp_fusion_bb function itself to
be a member of such a class, so that it can access virtual functions to
query the target e.g. about the load/store pair policy, and whether to
try and promote writeback pairs.

If we keep all of the virtual functions in such an outer class, then we
can keep the ldp_fusion_bb class generic (not needing an override for
each target) and that inner class can perhaps be given a pointer or
reference to the outer class when it is instantiated in ldp_fusion_bb.

>  {
>using def_hash = nofree_ptr_hash;
>using expr_key_t = pair_hash>;
> @@ -161,13 +171,13 @@ struct ldp_bb_info
>static const size_t obstack_alignment = sizeof (void *);
>bb_info *m_bb;
>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>{
>  obstack_specify_allocation (_obstack, OBSTACK_CHUNK_SIZE,
>   obstack_alignment, obstack_chunk_alloc,
>  

Re: [PATCH] c++: Keep DECL_SAVED_TREE of destructor instantiations in modules [PR104040]

2024-04-03 Thread Jason Merrill

On 4/2/24 20:57, Nathaniel Shead wrote:

On Tue, Apr 02, 2024 at 01:18:17PM -0400, Jason Merrill wrote:

On 3/28/24 23:21, Nathaniel Shead wrote:

- && !(modules_p () && DECL_DECLARED_INLINE_P (fn)))
+ && !(modules_p ()
+  && (DECL_DECLARED_INLINE_P (fn)
+  || DECL_TEMPLATE_INSTANTIATION (fn


How about using vague_linkage_p?



Right, of course.  How about this?
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

A template instantiation still needs to have its DECL_SAVED_TREE so that
its definition is emitted into the CMI. This way it can be emitted in
the object file of any importers that use it, in case it doesn't end up
getting emitted in this TU.

PR c++/104040

gcc/cp/ChangeLog:

* semantics.cc (expand_or_defer_fn_1): Keep DECL_SAVED_TREE for
all vague linkage functions.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr104040_a.C: New test.
* g++.dg/modules/pr104040_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
---
  gcc/cp/semantics.cc   |  5 +++--
  gcc/testsuite/g++.dg/modules/pr104040_a.C | 14 ++
  gcc/testsuite/g++.dg/modules/pr104040_b.C |  8 
  3 files changed, 25 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr104040_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr104040_b.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index adb1ba48d29..03800a20b26 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5033,9 +5033,10 @@ expand_or_defer_fn_1 (tree fn)
/* We don't want to process FN again, so pretend we've written
 it out, even though we haven't.  */
TREE_ASM_WRITTEN (fn) = 1;
-  /* If this is a constexpr function, keep DECL_SAVED_TREE.  */
+  /* If this is a constexpr function, or the body might need to be
+exported from a module CMI, keep DECL_SAVED_TREE.  */
if (!DECL_DECLARED_CONSTEXPR_P (fn)
- && !(modules_p () && DECL_DECLARED_INLINE_P (fn)))
+ && !(modules_p () && vague_linkage_p (fn)))


Also, how about module_maybe_has_cmi_p?  OK with that change.

Jason



Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread H.J. Lu
On Wed, Apr 3, 2024 at 6:38 AM Jan Hubicka  wrote:
>
> > We can't profile indirect calls to IFUNC resolvers nor their callees as
> > it requires TLS which hasn't been set up yet when the dynamic linker is
> > resolving IFUNC symbols.
> >
> > Add an IFUNC resolver caller marker to cgraph_node and set it if the
> > function is called by an IFUNC resolver.  Disable indirect call profiling
> > for IFUNC resolvers and their callees.
> >
> > Tested with profiledbootstrap on Fedora 39/x86-64.
> >
> > gcc/ChangeLog:
> >
> >   PR tree-optimization/114115
> >   * cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
> >   (cgraph_node): Add called_by_ifunc_resolver.
> >   * cgraphunit.cc (symbol_table::compile): Call
> >   symtab_node::check_ifunc_callee_symtab_nodes.
> >   * symtab.cc (check_ifunc_resolver): New.
> >   (ifunc_ref_map): Likewise.
> >   (is_caller_ifunc_resolver): Likewise.
> >   (symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
> >   * tree-profile.cc (gimple_gen_ic_func_profiler): Disable indirect
> >   call profiling for IFUNC resolvers and their callees.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR tree-optimization/114115
> >   * gcc.dg/pr114115.c: New test.
> > +/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
> > "optimized" } } */
> > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > index aed13e2b1bc..373dbd60481 100644
> > --- a/gcc/tree-profile.cc
> > +++ b/gcc/tree-profile.cc
> > @@ -520,7 +520,10 @@ gimple_gen_ic_func_profiler (void)
> >gcall *stmt1;
> >tree tree_uid, cur_func, void0;
> >
> > -  if (c_node->only_called_directly_p ())
> > +  /* Disable indirect call profiling for an IFUNC resolver and its
> > + callees.  */
> Please add a comment here referring to the PR and need to have TLS
> initialized.
>
> OK witht that change.
> Honza

I am checking in this patch with the updated comments:

  /* Disable indirect call profiling for an IFUNC resolver and its
 callees since it requires TLS which hasn't been set up yet when
 the dynamic linker is resolving IFUNC symbols.  See
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115
   */

Thanks.

-- 
H.J.
From 65d320ea9652d6b5f68d8b9057838224a8e2322e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 26 Feb 2024 08:38:58 -0800
Subject: [PATCH] tree-profile: Disable indirect call profiling for IFUNC
 resolvers

We can't profile indirect calls to IFUNC resolvers nor their callees as
it requires TLS which hasn't been set up yet when the dynamic linker is
resolving IFUNC symbols.

Add an IFUNC resolver caller marker to cgraph_node and set it if the
function is called by an IFUNC resolver.  Disable indirect call profiling
for IFUNC resolvers and their callees.

Tested with profiledbootstrap on Fedora 39/x86-64.

gcc/ChangeLog:

	PR tree-optimization/114115
	* cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
	(cgraph_node): Add called_by_ifunc_resolver.
	* cgraphunit.cc (symbol_table::compile): Call
	symtab_node::check_ifunc_callee_symtab_nodes.
	* symtab.cc (check_ifunc_resolver): New.
	(ifunc_ref_map): Likewise.
	(is_caller_ifunc_resolver): Likewise.
	(symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
	* tree-profile.cc (gimple_gen_ic_func_profiler): Disable indirect
	call profiling for IFUNC resolvers and their callees.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114115
	* gcc.dg/pr114115.c: New test.
---
 gcc/cgraph.h|  6 +++
 gcc/cgraphunit.cc   |  2 +
 gcc/symtab.cc   | 89 +
 gcc/testsuite/gcc.dg/pr114115.c | 24 +
 gcc/tree-profile.cc |  8 ++-
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr114115.c

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 47f35e8078d..a8c3224802c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -479,6 +479,9 @@ public:
  Return NULL if there's no such node.  */
   static symtab_node *get_for_asmname (const_tree asmname);
 
+  /* Check symbol table for callees of IFUNC resolvers.  */
+  static void check_ifunc_callee_symtab_nodes (void);
+
   /* Verify symbol table for internal consistency.  */
   static DEBUG_FUNCTION void verify_symtab_nodes (void);
 
@@ -896,6 +899,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), declare_variant_alt (false),
   calls_declare_variant_alt (false), gc_candidate (false),
+  called_by_ifunc_resolver (false),
   m_uid (uid), m_summary_id (-1)
   {}
 
@@ -1495,6 +1499,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
  is set for local SIMD clones when they are created and cleared if the
  vectorizer uses them.  */
   unsigned gc_candidate : 1;
+  /* Set if the function is called by an IFUNC resolver.  */
+  unsigned 

Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread Jan Hubicka
> We can't profile indirect calls to IFUNC resolvers nor their callees as
> it requires TLS which hasn't been set up yet when the dynamic linker is
> resolving IFUNC symbols.
> 
> Add an IFUNC resolver caller marker to cgraph_node and set it if the
> function is called by an IFUNC resolver.  Disable indirect call profiling
> for IFUNC resolvers and their callees.
> 
> Tested with profiledbootstrap on Fedora 39/x86-64.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114115
>   * cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
>   (cgraph_node): Add called_by_ifunc_resolver.
>   * cgraphunit.cc (symbol_table::compile): Call
>   symtab_node::check_ifunc_callee_symtab_nodes.
>   * symtab.cc (check_ifunc_resolver): New.
>   (ifunc_ref_map): Likewise.
>   (is_caller_ifunc_resolver): Likewise.
>   (symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
>   * tree-profile.cc (gimple_gen_ic_func_profiler): Disable indirect
>   call profiling for IFUNC resolvers and their callees.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114115
>   * gcc.dg/pr114115.c: New test.
> +/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
> "optimized" } } */
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index aed13e2b1bc..373dbd60481 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -520,7 +520,10 @@ gimple_gen_ic_func_profiler (void)
>gcall *stmt1;
>tree tree_uid, cur_func, void0;
>  
> -  if (c_node->only_called_directly_p ())
> +  /* Disable indirect call profiling for an IFUNC resolver and its
> + callees.  */
Please add a comment here referring to the PR and need to have TLS
initialized.

OK witht that change.
Honza


Re: [Patch] lto-wrapper.cc: Add offload target name to 'offload_args' suffix

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Tobias Burnus wrote:

> Found when working with -save-temps and looking at 'mkoffload'
> with a GCC configured for both nvptx and gcn offloading.
> 
> Before (for 'a.out') for mkoffload:a.offload_args now:
> a.amdgcn-amdhsa.offload_args and a.nvptx-none.offload_args
> OK for mainline?

OK.

Richard.

> Tobias
> 
> PS: The code does not free the 'xmalloc'ed memory, but that's also
> the case of all/most 'concat' in this file; the concat could also
> be skipped when no save_temps is used, in case this optimization
> makes sense.
> 

Re: PING: [PATCH v2] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-04-03 Thread H.J. Lu
On Tue, Apr 2, 2024 at 10:03 AM Jan Hubicka  wrote:
>
> > > I am bit worried about commonly used functions getting "infected" by
> > > being called once from ifunc resolver.  I think we only use thread local
> > > storage for indirect call profiling, so we may just disable indirect
> > > call profiling for these functions.
> >
> > Will change it.
> >
> > > Also the patch will be noop with -flto -flto-partition=max, so probably
> > > we need to compute this flag at WPA time and stream to partitions.
> > >
> >
> > Why is it a nop with -flto -flto-partition=max? I got
> >
> > (gdb) bt
> > #0  symtab_node::check_ifunc_callee_symtab_nodes ()
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/symtab.cc:1440
> > #1  0x00e487d3 in symbol_table::compile (this=0x7fffea006000)
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/cgraphunit.cc:2320
> > #2  0x00d23ecf in lto_main ()
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/lto/lto.cc:687
> > #3  0x015254d2 in compile_file ()
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/toplev.cc:449
> > #4  0x015284a4 in do_compile ()
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/toplev.cc:2154
> > #5  0x01528864 in toplev::main (this=0x7fffd84a, argc=16,
> > argv=0x42261f0) at 
> > /export/gnu/import/git/gitlab/x86-gcc/gcc/toplev.cc:2310
> > #6  0x030a3fe2 in main (argc=16, argv=0x7fffd958)
> > at /export/gnu/import/git/gitlab/x86-gcc/gcc/main.cc:39
> >
> > Do you have a testcase to show that it is a nop?
> Aha, sorry.  I tought this is run during late optimization, but it is
> done early, so LTo partitioning does not mix things up.  So current
> patch modified to disable only instrumentation that needs TLS should be
> fine.
>

Done.  Here is the v3 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648733.html

-- 
H.J.


[PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread H.J. Lu
We can't profile indirect calls to IFUNC resolvers nor their callees as
it requires TLS which hasn't been set up yet when the dynamic linker is
resolving IFUNC symbols.

Add an IFUNC resolver caller marker to cgraph_node and set it if the
function is called by an IFUNC resolver.  Disable indirect call profiling
for IFUNC resolvers and their callees.

Tested with profiledbootstrap on Fedora 39/x86-64.

gcc/ChangeLog:

PR tree-optimization/114115
* cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
(cgraph_node): Add called_by_ifunc_resolver.
* cgraphunit.cc (symbol_table::compile): Call
symtab_node::check_ifunc_callee_symtab_nodes.
* symtab.cc (check_ifunc_resolver): New.
(ifunc_ref_map): Likewise.
(is_caller_ifunc_resolver): Likewise.
(symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
* tree-profile.cc (gimple_gen_ic_func_profiler): Disable indirect
call profiling for IFUNC resolvers and their callees.

gcc/testsuite/ChangeLog:

PR tree-optimization/114115
* gcc.dg/pr114115.c: New test.
---
 gcc/cgraph.h|  6 +++
 gcc/cgraphunit.cc   |  2 +
 gcc/symtab.cc   | 89 +
 gcc/testsuite/gcc.dg/pr114115.c | 24 +
 gcc/tree-profile.cc |  5 +-
 5 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr114115.c

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 47f35e8078d..a8c3224802c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -479,6 +479,9 @@ public:
  Return NULL if there's no such node.  */
   static symtab_node *get_for_asmname (const_tree asmname);
 
+  /* Check symbol table for callees of IFUNC resolvers.  */
+  static void check_ifunc_callee_symtab_nodes (void);
+
   /* Verify symbol table for internal consistency.  */
   static DEBUG_FUNCTION void verify_symtab_nodes (void);
 
@@ -896,6 +899,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), declare_variant_alt (false),
   calls_declare_variant_alt (false), gc_candidate (false),
+  called_by_ifunc_resolver (false),
   m_uid (uid), m_summary_id (-1)
   {}
 
@@ -1495,6 +1499,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  is set for local SIMD clones when they are created and cleared if the
  vectorizer uses them.  */
   unsigned gc_candidate : 1;
+  /* Set if the function is called by an IFUNC resolver.  */
+  unsigned called_by_ifunc_resolver : 1;
 
 private:
   /* Unique id of the node.  */
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index d200166f7e9..2bd0289ffba 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -2317,6 +2317,8 @@ symbol_table::compile (void)
 
   symtab_node::checking_verify_symtab_nodes ();
 
+  symtab_node::check_ifunc_callee_symtab_nodes ();
+
   timevar_push (TV_CGRAPHOPT);
   if (pre_ipa_mem_report)
 dump_memory_report ("Memory consumption before IPA");
diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index 4c7e3c135ca..3256133891d 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -1369,6 +1369,95 @@ symtab_node::verify (void)
   timevar_pop (TV_CGRAPH_VERIFY);
 }
 
+/* Return true and set *DATA to true if NODE is an ifunc resolver.  */
+
+static bool
+check_ifunc_resolver (cgraph_node *node, void *data)
+{
+  if (node->ifunc_resolver)
+{
+  bool *is_ifunc_resolver = (bool *) data;
+  *is_ifunc_resolver = true;
+  return true;
+}
+  return false;
+}
+
+static auto_bitmap ifunc_ref_map;
+
+/* Return true if any caller of NODE is an ifunc resolver.  */
+
+static bool
+is_caller_ifunc_resolver (cgraph_node *node)
+{
+  bool is_ifunc_resolver = false;
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+{
+  /* Return true if caller is known to be an IFUNC resolver.  */
+  if (e->caller->called_by_ifunc_resolver)
+   return true;
+
+  /* Check for recursive call.  */
+  if (e->caller == node)
+   continue;
+
+  /* Skip if it has been visited.  */
+  unsigned int uid = e->caller->get_uid ();
+  if (bitmap_bit_p (ifunc_ref_map, uid))
+   continue;
+  bitmap_set_bit (ifunc_ref_map, uid);
+
+  if (is_caller_ifunc_resolver (e->caller))
+   {
+ /* Return true if caller is an IFUNC resolver.  */
+ e->caller->called_by_ifunc_resolver = true;
+ return true;
+   }
+
+  /* Check if caller's alias is an IFUNC resolver.  */
+  e->caller->call_for_symbol_and_aliases (check_ifunc_resolver,
+ _ifunc_resolver,
+ true);
+  if (is_ifunc_resolver)
+   {
+ /* Return true if caller's alias is an IFUNC resolver.  */
+ e->caller->called_by_ifunc_resolver = true;
+ return true;
+   }

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
on 2024/4/3 19:18, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 07:01:50PM +0800, Kewen.Lin wrote:
>> Thanks for the details on debugging support, but IIUC with this workaround
>> being adopted, the debuggability on hidden args are already broken, aren't?
> 
> No.
> In the correct program case, which should be the usual case, the caller will
> pass the arguments and one should be able to see the values in the debugger
> even if the function doesn't actually use those arguments.
> If the caller is buggy and doesn't pass those arguments, one should be able
> to see garbage values for those arguments and perhaps that way figure out
> that the program is buggy and fix it.

But it's not true with Ajit's current implementation that is lying args are
passed in r11..., so whatever the caller is usual (in argument save area) or
not (not such value), the values are all broken.

> 
>> Since with a usual caller, the actual argument is passed in argument save
>> area, but the callee debug information says the location is %r11 or some
>> other stack slot.
>>
>> I think the difficulty is that: with this workaround, for some arguments we
>> are lying they are not passed in argument save area, then we have to pretend
>> they are passed in r11,r12..., but in fact these registers are not valid to
>> pass arguments, so it's unreasonable and confusing.  With your explanation,
>> I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
>> this confusion, maybe always using GP_ARG_MIN_REG/GP_ARG_MAX_REG for things
>> exceeding GP_ARG_MAX_REG can reduce the unreasonableness (but still confusing
>> IMHO).
> 
> If those arguments aren't passed in r11/r12, but in memory, the workaround
> shouldn't pretend they are passed somewhere where they aren't actually
> passed.

Unfortunately the current implementation doesn't conform this, I misunderstood
you knew that.

> Instead, it should load them from the memory where they are actually
> normally passed.
> What needs to be ensured though is that those arguments are for -O0 loaded
> from those stack slots and saved to different stack slots (inside of the
> callee frame, rather than in caller's frame), for -O1+ just not loaded at
> all and pretended to just live in the caller's frame, and most importantly
> ensure that the callee doesn't try to think there is a parameter save area
> in the caller's frame which it can use for random saving related or
> unrelated data.  So, e.g. REG_EQUAL/REG_EQUIV shouldn't be used, nor tell
> that the 1st-8th arguments could be saved to the parameter save area.
> So, for the 1st-8th arguments it really should act as if there is no
> parameter save area and for the DECL_HIDDEN_STRING_LENGTH ones after it
> as it those are passed in memory, but as if that memory is owned by the
> caller, not callee, so it is not correct to modify that memory.

Now I got your points.  I like this proposal and also believe it makes more
sense on both the resulted assembly and the debuggability support, though
it sounds the implementation has to be more complicated than what's done.

Thanks for all the inputs!!

BR,
Kewen



[Patch] nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

2024-04-03 Thread Tobias Burnus

Nvptx's mkoffload.cc contains 14 'fatal_error' calls and one 'warning_at' call,
which stands out more clearly (color, bold) when enabling
  diagnostic_color_init
which this patch does. — Additionally, the call gcc_init_libintl permits that
the already translated error messages also show up as translation.

OK for mainline?

Tobias

PS: Example: 'nvptx mkoffload:' is bold and 'fatal error:' is in red
in English and some language variants.

nvptx mkoffload: fatal error: COLLECT_GCC must be set.
nvptx mkoffload: 致命的エラー: COLLECT_GCC must be set.
nvptx mkoffload: erreur fatale: COLLECT_GCC doit être défini.
nvptx mkoffload: schwerwiegender Fehler: COLLECT_GCC muss gesetzt sein.

(BTW: It looks as if many languages did not translate the error string
itself, e.g. jp or zh or pl or zh_TW/zh_CN or fi or ...)
nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (main): Call
	gcc_init_libintl and diagnostic_color_init.

 gcc/config/nvptx/mkoffload.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index a7fc28cbd3f..503b1abcefd 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -638,7 +638,9 @@ main (int argc, char **argv)
   const char *outname = 0;
 
   progname = tool_name;
+  gcc_init_libintl ();
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
 
   if (atexit (mkoffload_cleanup) != 0)
 fatal_error (input_location, "atexit failed");


Re: [PATCH] Don't set full_profile in auto-profile [PR113765]

2024-04-03 Thread Richard Biener
On Thu, Mar 28, 2024 at 4:03 AM Eugene Rozenfeld
 wrote:
>
> auto-profile currently doesn't guarantee that it will set probabilities
> on all edges because of zero basic block counts. Normally those edges
> just have probabilities set by the preceding profile_estimate pass but
> under -O0 profile_estimate pass doesn't run. The patch removes setting
> of full_profile to true in auto-profile.

OK.

Thanks,
Richard.

> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> * auto-profile.cc (afdo_annotate_cfg): Don't set full_profile to true
> ---
>  gcc/auto-profile.cc | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index e5407d32fbb..de59b94bcb3 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1580,7 +1580,6 @@ afdo_annotate_cfg (const stmt_set _stmts)
>  }
>update_max_bb_count ();
>profile_status_for_fn (cfun) = PROFILE_READ;
> -  cfun->cfg->full_profile = true;
>if (flag_value_profile_transformations)
>  {
>gimple_value_profile_transformations ();
> --
> 2.25.1
>


[Patch] lto-wrapper.cc: Add offload target name to 'offload_args' suffix

2024-04-03 Thread Tobias Burnus

Found when working with -save-temps and looking at 'mkoffload'
with a GCC configured for both nvptx and gcn offloading.

Before (for 'a.out') for mkoffload:a.offload_args now: a.amdgcn-amdhsa.offload_args 
and a.nvptx-none.offload_args

OK for mainline?

Tobias

PS: The code does not free the 'xmalloc'ed memory, but that's also
the case of all/most 'concat' in this file; the concat could also
be skipped when no save_temps is used, in case this optimization
makes sense.
lto-wrapper.cc: Add offload target name to 'offload_args' suffix

lto-wrapper.cc's compile_offload_image calls mkoffload with
an @./a.offload_args argument ('a.' in case of, e.g., 'a.out'). However,
when generating code for both nvptx and gcn, they use the same name
with -save-temps. Hence, this commit adds a  + '.' before
'offload_args' in line with other offload-target-specific files.

gcc/ChangeLog:

	* lto-wrapper.cc (compile_offload_image): Prefix 'offload_args'
	suffix by the target name.

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index ca53e4b462e..610594cdc2b 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -993,7 +993,8 @@ compile_offload_image (const char *target, const char *compiler_path,
 
   obstack_ptr_grow (_obstack, NULL);
   argv = XOBFINISH (_obstack, char **);
-  fork_execute (argv[0], argv, true, "offload_args");
+  suffix = concat (target, ".offload_args", NULL);
+  fork_execute (argv[0], argv, true, suffix);
   obstack_free (_obstack, NULL);
 
   free_array_of_ptrs ((void **) paths, n_paths);


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-03 Thread Chung-Lin Tang
Hi Richard, Thomas,

On 2023/10/30 8:46 PM, Richard Biener wrote:
>>
>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>> flag.
>>
>> The actual optimization then is done in this second patch.  Chung-Lin
>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>> I don't have much experience with most of the following generic code, so
>> would appreciate a helping hand, whether that conceptually makes sense as
>> well as from the implementation point of view:

First of all, I have removed all of the gimplify-stage scanning and setting of
DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
gimplify.cc now)

I remember this code was an artifact of earlier attempts to allow struct-member
pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
anyways.
I think the omp_data_* member accesses when building child function side
receiver_refs is blocking points-to analysis from working (didn't try digging 
deeper)

Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) 
for map
clause decl reference building, so hoping that the variables "happen to be" 
single-use and
DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
appear to be
a little risky.

However, for firstprivate pointers processed during omp-low, it appears to be 
somewhat different.
(see below description)

> No, I don't think you can use that flag on non-default-defs, nor
> preserve it on copying.  So
> it also doesn't nicely extend to DECLs as done by the patch.  We
> currently _only_ use it
> for incoming parameters.  When used on arbitrary code you can get to for 
> example
> 
> ptr1(points-to-readony-memory) = >x;
> ... access via ptr1 ...
> ptr2 = >x;
> ... access via ptr2 ...
> 
> where both are your OMP regions differently constrained (the constrain is on 
> the
> code in the region, _not_ on the actual protections of the pointed to
> data, much like
> for the fortran case).  But now CSE comes along and happily replaces all ptr2
> with ptr2 in the second region and ... oops!

Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
the second region"?

That doesn't happen, because during omp-lower/expand, OMP target regions (which 
is all that
this applies currently) is separated into different individual child functions.

(Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
omp-lower, when
for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the 
first load
of this pointer)

  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
foo (a, a[8]);
r = a[8];
  }
  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
foo (a, a[12]);
r = a[12];
  }

After omp-expand (before SSA):

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
{
 ...
   :
  D.2962 = .omp_data_i->D.2947;
  a.8 = D.2962;
  r.1 = (*a.8)[12];
  foo (a.8, r.1);
  r.1 = (*a.8)[12];
  D.2965 = .omp_data_i->r;
  *D.2965 = r.1;
  return;
}

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
{
  ...
   :
  D.2968 = .omp_data_i->D.2939;
  a.4 = D.2968;
  r.0 = (*a.4)[8];
  foo (a.4, r.0);
  r.0 = (*a.4)[8];
  D.2971 = .omp_data_i->r;
  *D.2971 = r.0;
  return;
}

So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a 
default-def
for an PARM_DECL, at least conceptually.

(If offloading was structured significantly differently, say if child functions
were separated much earlier before omp-lowering, than this readonly-modifier 
might
possibly be a direct application of 'r' in the "fn spec" attribute)

Other changes since first version of patch include:
1) update of C/C++ FE changes to new style in c-family/c-omp.cc
2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify 
removal of a MEM load, also as Thomas suggested.

I have re-tested this patch using mainline, with no regressions. Is this okay 
for mainline?

Thanks,
Chung-Lin

2024-04-03  Chung-Lin Tang  

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_address_inspector::expand_array_base):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
(c_omp_address_inspector::expand_component_selector): Likewise.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_array_section):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/ChangeLog:

* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
for VAR_DECLs.
* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
variables of 

Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Andrew Stubbs

On 03/04/2024 10:27, Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote:

@@ -3954,8 +3956,8 @@ on the GPU.
  To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured 
with
  @option{--with-arch=@code{fiji}} or
  @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji
-devices has been removed in ROCm 4.0 and support in LLVM is deprecated and will
-be removed in LLVM 18.
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and has
+been removed in LLVM 18.


Shouldn't we at configure time then detect the case where fiji can't be
supported and either error if it is included explicitly in multilib list, or
implicitly take it out from that list and arrange error to be emitted when
using -march=fiji/gfx803 ?
Sure, if one configures against LLVM 17 and then updates LLVM to 18, it will
still result in weird errors/LLVM ICEs, but at least in the common case when
one configures GCC 14 against LLVM 18 one won't suffer from those ICEs and
get clear diagnostics that fiji is sadly no longer supported.


One additional point: since our departure from Siemens, we no longer 
have access to any Fiji devices ourselves. I plan to rip that stuff out 
the first chance I get (not necessarily very soon).


In the meantime, Fiji is not included in the default configuration of 
GCC 14, so anyone enabling it is doing so explicitly and a) will have 
read the documentation, and b) would be surprised if Fiji were 
automatically excluded.


We could emit an error at configure time if an unsuitable LLVM is 
detected, but I don't think it's worth the effort for what is a niche 
product that requires drivers so old they were only supported on now-EOL 
OS versions.


I'm happy with Tobias's patch with s/LLVM is deprecated/LLVM was 
deprecated/. The Newlib versions are a bit awkward, but we can't 
recommend 4.5 until it exists.


Andrew


Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 07:01:50PM +0800, Kewen.Lin wrote:
> Thanks for the details on debugging support, but IIUC with this workaround
> being adopted, the debuggability on hidden args are already broken, aren't?

No.
In the correct program case, which should be the usual case, the caller will
pass the arguments and one should be able to see the values in the debugger
even if the function doesn't actually use those arguments.
If the caller is buggy and doesn't pass those arguments, one should be able
to see garbage values for those arguments and perhaps that way figure out
that the program is buggy and fix it.

> Since with a usual caller, the actual argument is passed in argument save
> area, but the callee debug information says the location is %r11 or some
> other stack slot.
> 
> I think the difficulty is that: with this workaround, for some arguments we
> are lying they are not passed in argument save area, then we have to pretend
> they are passed in r11,r12..., but in fact these registers are not valid to
> pass arguments, so it's unreasonable and confusing.  With your explanation,
> I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
> this confusion, maybe always using GP_ARG_MIN_REG/GP_ARG_MAX_REG for things
> exceeding GP_ARG_MAX_REG can reduce the unreasonableness (but still confusing
> IMHO).

If those arguments aren't passed in r11/r12, but in memory, the workaround
shouldn't pretend they are passed somewhere where they aren't actually
passed.
Instead, it should load them from the memory where they are actually
normally passed.
What needs to be ensured though is that those arguments are for -O0 loaded
from those stack slots and saved to different stack slots (inside of the
callee frame, rather than in caller's frame), for -O1+ just not loaded at
all and pretended to just live in the caller's frame, and most importantly
ensure that the callee doesn't try to think there is a parameter save area
in the caller's frame which it can use for random saving related or
unrelated data.  So, e.g. REG_EQUAL/REG_EQUIV shouldn't be used, nor tell
that the 1st-8th arguments could be saved to the parameter save area.
So, for the 1st-8th arguments it really should act as if there is no
parameter save area and for the DECL_HIDDEN_STRING_LENGTH ones after it
as it those are passed in memory, but as if that memory is owned by the
caller, not callee, so it is not correct to modify that memory.

Jakub



[PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

2024-04-03 Thread Richard Biener
The following avoids re-walking and re-combining the instructions
between i2 and i3 when the pattern of i2 doesn't change.

Bootstrap and regtest running ontop of a reversal of 
r14-9692-g839bc42772ba7a.

It brings down memory use frmo 9GB to 400MB and compile-time from
80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
but has shown code generation regressions across acrchitectures.

OK to revert r14-9692-g839bc42772ba7a?

OK to install this instead?

Thanks,
Richard.

PR rtl-optimization/101523
* combine.cc (try_combine): When the pattern of i2 doesn't
change do not re-start combining at i2 or an earlier insn which
had links or notes added.
---
 gcc/combine.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..ff25752cac4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   adjust_for_new_dest (i3);
 }
 
+  bool i2_unchanged = false;
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+i2_unchanged = true;
+
   /* We now know that we can do this combination.  Merge the insns and
  update the status of registers and LOG_LINKS.  */
 
@@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   combine_successes++;
   undo_commit ();
 
+  if (i2_unchanged)
+return i3;
+
   rtx_insn *ret = newi2pat ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret))
 ret = added_links_insn;
-- 
2.35.3


Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
Hi!

on 2024/4/3 17:23, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 05:02:40PM +0800, Kewen.Lin wrote:
>> on 2024/4/3 16:35, Jakub Jelinek wrote:
>>> On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments 
> that just some
> invalid code doesn't pass.  By removing them you basically always create 
> an
> invalid case, this time in the other direction, valid caller passes more
> arguments than the callee (invalidly) expects.

 Thanks for the comments, do you mean it can affect the arguments 
 validation when there
 is explicit function declaration with interface?  Then can we strip them 
 when we are
 going to expand them (like checking currently_expanding_function_start)?
>>>
>>> I'd prefer not stripping them at all; they are clearly marked as perhaps not
>>> passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
>>> removing them implies the decl is a throw away, that after expansion
>>
>> Yes, IMHO it's safe as they are unused.
> 
> But they are still passed in the usual case.
> 
>>> nothing will actually look at it anymore.  I believe that is the case of
>>> function bodies, we expand them into RTL and throw away the GIMPLE, and
>>> after final clear the bodies, but it is not the case of the FUNCTION_DECL
>>> or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
>>> callers could be looking at those as well.
>>
>> At expand time GIMPLE optimizations should already finish, so it should be
>> safe to strip them at that time?
> 
> No.
> The IPA/post IPA behavior is that IPA optimizations are performed and then
> cgraph finalizes one function at a time, going there from modifications
> needed from IPA passes, post IPA GIMPLE optimizations, expansion to RTL,
> RTL optimizations, emitting assembly, throwing away the body, then picking
> another function and repeating that etc.
> So, when one function makes it to expansion, if you modify its
> DECL_ARGUMENTS etc., all the post IPA GIMPLE optimization passes of other
> functions might still see such changes.

Thanks for explaining, I agree it's risky from this perspective.

> 
>>  It would surprise me if expansions of
>> callers will look at callee's information, it's more like what should be
>> done in IPA analysis instead?
> 
> Depends on what exactly it is.  E.g. C non-prototyped functions have
> just DECL_ARGUMENTS to check how many arguments the call should have vs.
> what is actually passed.

OK.

> 
>> No, it's not what I was looking for.  Peter's comments made me feel it's not
>> good to have assembly at O0 like:
>>
>> std %r3,112(%r31)
>> std %r4,120(%r31)
>> std %r5,128(%r31)
>> std %r6,136(%r31)
>> std %r7,144(%r31)
>> std %r8,152(%r31)
>> std %r9,160(%r31)
>> std %r10,168(%r31)
>> std %r11,176(%r31) // this mislead people that we pass 9th arg via 
>> r11,
>>// it would be nice not to have it.
>>
>> so I was thinking if there is some way to get rid of it.
> 
> You want to optimize at -O0?  Don't.

I don't really want optimization but try to get rid of the unreasonable
assembly code.  :)

> That will screw up debugging.  The function does have that argument, it
> should show up in debug info; it should show up also at -O2 in debug info
> etc.  If you remove chains from DECL_ARGUMENTS, because we have early dwarf
> these days, DW_TAG_formal_parameter nodes should have been already created,
> but it would mean that DW_AT_location for those arguments likely isn't
> filled.  Now, for -O2 it might be the case that the argument has useful
> location only at the start of the function, could have
> DW_OP_entry_value(%r11) afterwards, but at -O0 it better should have some
> stack slot into which the argument is saved and DW_AT_location should be
> that stack slot.  All that should change with the workaround is that if the
> stack slot would be normally in the argument save area in the caller's
> frame, if such argument save area can't be counted on, then it needs to be
> saved in some other stack slot, like arguments are saved to when there are
> only <= 8 arguments.

Thanks for the details on debugging support, but IIUC with this workaround
being adopted, the debuggability on hidden args are already broken, aren't?
Since with a usual caller, the actual argument is passed in argument save
area, but the callee debug information says the location is %r11 or some
other stack slot.

I think the difficulty is that: with this workaround, for some arguments we
are lying they are not passed in argument save area, then we have to pretend
they are passed in r11,r12..., but in fact these registers are not valid to
pass arguments, so it's unreasonable and confusing.  With your explanation,
I agree that stripping DECL_ARGUMENTS chains isn't a good way to eliminate
this confusion, maybe always using 

[committed] libstdc++: Reverse arguments in constraint for std::optional's <=> [PR104606]

2024-04-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Backports likely to follow.

-- >8 --

This is a workaround for a possible compiler bug that causes constraint
recursion in the operator<=>(const optional&, const U&) overload.

libstdc++-v3/ChangeLog:

PR libstdc++/104606
* include/std/optional (operator<=>(const optional&, const U&)):
Reverse order of three_way_comparable_with template arguments.
* testsuite/20_util/optional/relops/104606.cc: New test.
---
 libstdc++-v3/include/std/optional  |  2 +-
 .../20_util/optional/relops/104606.cc  | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/optional/relops/104606.cc

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 1e88e8b8880..3507c36a4d8 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1430,7 +1430,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_three_way_comparison
   template
 requires (!__is_optional_v<_Up>)
-  && three_way_comparable_with<_Tp, _Up>
+  && three_way_comparable_with<_Up, _Tp>
 constexpr compare_three_way_result_t<_Tp, _Up>
 operator<=>(const optional<_Tp>& __x, const _Up& __v)
 { return bool(__x) ? *__x <=> __v : strong_ordering::less; }
diff --git a/libstdc++-v3/testsuite/20_util/optional/relops/104606.cc 
b/libstdc++-v3/testsuite/20_util/optional/relops/104606.cc
new file mode 100644
index 000..2b8df245219
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/relops/104606.cc
@@ -0,0 +1,18 @@
+// { dg-do compile { target c++17 } }
+
+// Bug 104606 comparison operator resolution with std::optional and -std=c++20
+
+#include 
+#include 
+#include 
+
+struct Value : std::variant> { };
+
+struct Comparator {
+  template  bool operator<=(const T &) { return true; }
+};
+
+std::optional o;
+Comparator c;
+
+auto x = c <= o;
-- 
2.44.0



Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Tobias Burnus

Hi Jakub, hello world

Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote:

@@ -3954,8 +3956,8 @@ on the GPU.
  To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured 
with
  @option{--with-arch=@code{fiji}} or
  @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji 
[...]
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and has
+been removed in LLVM 18.

Shouldn't we at configure time then detect the case where fiji can't be
supported and either error if it is included explicitly in multilib list, or
implicitly take it out from that list and arrange error to be emitted when
using -march=fiji/gfx803 ?


I am not sure that it is really needed for the reasons given below.
And while it would help some specific use (having LLVM 17 and wanting to use 
Fiji),
it will also cause some confusion as GCC 14 will magically behave differently
depending how build.

Additionally:

* I bet most use gcc/config.gcc which works in most cases just fine
  (LLVM >= 17; enabling all but Fiji)

* Fiji itself is old – removed from recent ROCm and LLVM >= 18,
  which also implies that it is seen as not seeing a lot of use

While there is no configure-time check, using Fiji with LLVM 18 will
fail with a semi-clear compile-time error when doing the in-tree newlib
build or the libgomp build.
(This shows up by default as issue with LLVM 18 + GCC 12/13;
 see https://gcc.gnu.org/PR114419)

Likewise, it will fail with LLVM < 15 when building gfx1100/gfx1103.

* * *

Note: The compiler itself is perfectly happy to handle fiji and gfx1100 itself,
just the LLVM MC assembler doesn't support one [< 15] or the other [>=LLVM 18].

* * *

For those tracking GCC or caring, the documentation at
  https://gcc.gnu.org/gcc-14/changes.html#amdgcn
and
  https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa
provides some glory details.

And it is also mentioned at https://gcc.gnu.org/wiki/Offloading


Tobias



Re: [Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Andrew Stubbs

On 03/04/2024 10:05, Tobias Burnus wrote:

This patch handles --with-arch= in GCN's mkoffload.cc

While mkoffload mostly does not know this and passes it through to the 
GCN lto1 compiler,
it writes an .o file with debug information - and here the -march= in 
the ELF flags must
agree with the one in the other files. Hence, it uses now the 
--with-arch= config argument.


Doing so, there is now a diagnostic if the -march= or --with-arch= is 
unknown. While the
latter should be rejected at GCC compile time, the latter was not 
diagnosed in mkoffload

but only later in GCN's compiler.

But as there is now a fatal_error in mkoffload, which comes before the 
GCN-compiler call,
the 'note:' which devices are available were lost. This has been 
reinstated by using
the multilib settings. (That's not identical to the compiler supported 
flags the output

is reasonable, arguable better or worse than lto1.)

Advantage: The output is less cluttered than a later fail.

To make mkoffload errors - and especially this one - more useful, it now 
also initializes

the colorization / bold.

OK for mainline?


OK. Thanks for fixing this.

Andrew



* * *

Example error:

gcn mkoffload: error: unrecognized argument in option '-march=gfx'
gcn mkoffload: note: valid arguments to '-march=' are: gfx906, gfx908, 
gfx90a, gfx1030, gfx1036, gfx1100, gfx1103


where on my TERM=xterm-256color,  'gcn mkoffload:' and the quoted texts 
are in bold,

'error:' is red and 'note:' is cyan.

Compared to cc1, the 'note:' lacks 'fiji', the list is separated by ', '
instead of ' ', and cc1 has a "; did you mean 'gfx1100'?".
And the program name is 'gcn mkoffload' instead of 'cc1'.

Tobias

PS: The generated multilib list could be later changed to be based on 
the gcn-.def file;

or we just keep the multiconfig variant of this patch.


I think a .def file would be more future-proof if we ever have multilibs 
for options other than -march, but this works for now.


Andrew


C++ Patch ping

2024-04-03 Thread Jakub Jelinek
Hi!

I'd like to ping the following patches:

https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647445.html
PR111284 P2

https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648215.html
PR114409 (part of a P1)

https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648381.html
PR114426 P1

Thanks.

Jakub



Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Iain Sandoe wrote:

> Hi Richard,
> 
> > On 7 Mar 2024, at 13:40, FX Coudert  wrote:
> > 
> >> I think it's an obvious change ...
> > 
> > Thanks, pushed.
> > 
> > Dimitry, I suggest you post the second patch for review.
> 
> Given that the two patches here (for 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632) were considered obvious 
> - and are needed on release branches.
> 
> OK for backporting?

OK.

> (Gerald has volunteered to do the earlier ones, I have already made/tested 
> the gcc-13 case)
> 
> thanks
> Iain
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote:
> @@ -3954,8 +3956,8 @@ on the GPU.
>  To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured 
> with
>  @option{--with-arch=@code{fiji}} or
>  @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji
> -devices has been removed in ROCm 4.0 and support in LLVM is deprecated and 
> will
> -be removed in LLVM 18.
> +devices has been removed in ROCm 4.0 and support in LLVM is deprecated and 
> has
> +been removed in LLVM 18.

Shouldn't we at configure time then detect the case where fiji can't be
supported and either error if it is included explicitly in multilib list, or
implicitly take it out from that list and arrange error to be emitted when
using -march=fiji/gfx803 ?
Sure, if one configures against LLVM 17 and then updates LLVM to 18, it will
still result in weird errors/LLVM ICEs, but at least in the common case when
one configures GCC 14 against LLVM 18 one won't suffer from those ICEs and
get clear diagnostics that fiji is sadly no longer supported.

Jakub



Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 05:02:40PM +0800, Kewen.Lin wrote:
> on 2024/4/3 16:35, Jakub Jelinek wrote:
> > On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
> >>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments 
> >>> that just some
> >>> invalid code doesn't pass.  By removing them you basically always create 
> >>> an
> >>> invalid case, this time in the other direction, valid caller passes more
> >>> arguments than the callee (invalidly) expects.
> >>
> >> Thanks for the comments, do you mean it can affect the arguments 
> >> validation when there
> >> is explicit function declaration with interface?  Then can we strip them 
> >> when we are
> >> going to expand them (like checking currently_expanding_function_start)?
> > 
> > I'd prefer not stripping them at all; they are clearly marked as perhaps not
> > passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
> > removing them implies the decl is a throw away, that after expansion
> 
> Yes, IMHO it's safe as they are unused.

But they are still passed in the usual case.

> > nothing will actually look at it anymore.  I believe that is the case of
> > function bodies, we expand them into RTL and throw away the GIMPLE, and
> > after final clear the bodies, but it is not the case of the FUNCTION_DECL
> > or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
> > callers could be looking at those as well.
> 
> At expand time GIMPLE optimizations should already finish, so it should be
> safe to strip them at that time?

No.
The IPA/post IPA behavior is that IPA optimizations are performed and then
cgraph finalizes one function at a time, going there from modifications
needed from IPA passes, post IPA GIMPLE optimizations, expansion to RTL,
RTL optimizations, emitting assembly, throwing away the body, then picking
another function and repeating that etc.
So, when one function makes it to expansion, if you modify its
DECL_ARGUMENTS etc., all the post IPA GIMPLE optimization passes of other
functions might still see such changes.

>  It would surprise me if expansions of
> callers will look at callee's information, it's more like what should be
> done in IPA analysis instead?

Depends on what exactly it is.  E.g. C non-prototyped functions have
just DECL_ARGUMENTS to check how many arguments the call should have vs.
what is actually passed.

> No, it's not what I was looking for.  Peter's comments made me feel it's not
> good to have assembly at O0 like:
> 
> std %r3,112(%r31)
> std %r4,120(%r31)
> std %r5,128(%r31)
> std %r6,136(%r31)
> std %r7,144(%r31)
> std %r8,152(%r31)
> std %r9,160(%r31)
> std %r10,168(%r31)
> std %r11,176(%r31) // this mislead people that we pass 9th arg via 
> r11,
>// it would be nice not to have it.
> 
> so I was thinking if there is some way to get rid of it.

You want to optimize at -O0?  Don't.
That will screw up debugging.  The function does have that argument, it
should show up in debug info; it should show up also at -O2 in debug info
etc.  If you remove chains from DECL_ARGUMENTS, because we have early dwarf
these days, DW_TAG_formal_parameter nodes should have been already created,
but it would mean that DW_AT_location for those arguments likely isn't
filled.  Now, for -O2 it might be the case that the argument has useful
location only at the start of the function, could have
DW_OP_entry_value(%r11) afterwards, but at -O0 it better should have some
stack slot into which the argument is saved and DW_AT_location should be
that stack slot.  All that should change with the workaround is that if the
stack slot would be normally in the argument save area in the caller's
frame, if such argument save area can't be counted on, then it needs to be
saved in some other stack slot, like arguments are saved to when there are
only <= 8 arguments.

Now, sure, if IPA optimizes a call because it can prove it sees a callee and
all its callers and can modify them all, it can optimize away passing of
that argument but 1) it isn't done at -O0/-Og 2) it ensures debug info
reflects the case that the argument isn't passed and arranges for the
callers to provide DW_OP_GNU_parameter_ref/call site info value 3) it
doesn't modify the FUNCTION_DECL itself or its DECL_ARGUMENTS, but makes
a clone of the FUNCTION_DECL and modifies the clones DECL_ARGUMENTS

Jakub



[Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Tobias Burnus

Update for the GCN Newlib commit 7dd4eb1db "amdgcn: Implement proper locks",
https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=7dd4eb1db

And change future to past tense regarding the LLVM 18 release.

OK for mainline?

Thanks,

Tobias
GCN: install.texi update for Newlib change and LLVM 18 release

gcc/ChangeLog:

	* doc/install.texi (amdgcn-*-amdhsa): Update Newlib recommendation
	and update wording for LLVM 18 release.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 269fe7ec870..022bc32901c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3944,7 +3944,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
+7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use
@@ -3954,8 +3956,8 @@ on the GPU.
 To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with
 @option{--with-arch=@code{fiji}} or
 @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji
-devices has been removed in ROCm 4.0 and support in LLVM is deprecated and will
-be removed in LLVM 18.
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and has
+been removed in LLVM 18.
 
 @html
 


[Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Tobias Burnus

This patch handles --with-arch= in GCN's mkoffload.cc

While mkoffload mostly does not know this and passes it through to the GCN lto1 
compiler,
it writes an .o file with debug information - and here the -march= in the ELF 
flags must
agree with the one in the other files. Hence, it uses now the --with-arch= 
config argument.

Doing so, there is now a diagnostic if the -march= or --with-arch= is unknown. 
While the
latter should be rejected at GCC compile time, the latter was not diagnosed in 
mkoffload
but only later in GCN's compiler.

But as there is now a fatal_error in mkoffload, which comes before the 
GCN-compiler call,
the 'note:' which devices are available were lost. This has been reinstated by 
using
the multilib settings. (That's not identical to the compiler supported flags 
the output
is reasonable, arguable better or worse than lto1.)

Advantage: The output is less cluttered than a later fail.

To make mkoffload errors - and especially this one - more useful, it now also 
initializes
the colorization / bold.

OK for mainline?

* * *

Example error:

gcn mkoffload: error: unrecognized argument in option '-march=gfx'
gcn mkoffload: note: valid arguments to '-march=' are: gfx906, gfx908, gfx90a, 
gfx1030, gfx1036, gfx1100, gfx1103

where on my TERM=xterm-256color,  'gcn mkoffload:' and the quoted texts are in 
bold,
'error:' is red and 'note:' is cyan.

Compared to cc1, the 'note:' lacks 'fiji', the list is separated by ', '
instead of ' ', and cc1 has a "; did you mean 'gfx1100'?".
And the program name is 'gcn mkoffload' instead of 'cc1'.

Tobias

PS: The generated multilib list could be later changed to be based on the 
gcn-.def file;
or we just keep the multiconfig variant of this patch.
GCN: Fix --with-arch= handling in mkoffload [PR111966]

The default -march= setting used in mkoffload did not reflect the modified
default set by GCC's configure-time --with-arch=, causing issues when
generating debug code.

gcc/ChangeLog:

	PR other/111966
	* config/gcn/mkoffload.cc (get_arch): New; moved -march= flag
	handling from ...
	(main): ... here; call it to handle --with-arch config option
	and -march= commandline.

 gcc/config/gcn/mkoffload.cc | 90 -
 1 file changed, 72 insertions(+), 18 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 04356b86195..31266d2099b 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -35,6 +35,8 @@
 #include "gomp-constants.h"
 #include "simple-object.h"
 #include "elf.h"
+#include "configargs.h"  /* For configure_default_options.  */
+#include "multilib.h"  /* For multilib_options.  */
 
 /* These probably won't (all) be in elf.h for a while.  */
 #undef  EM_AMDGPU
@@ -846,6 +848,62 @@ compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_free (_obstack, NULL);
 }
 
+int
+get_arch (const char *str, const char *with_arch_str)
+{
+  if (strcmp (str, "fiji") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX803;
+  else if (strcmp (str, "gfx900") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX900;
+  else if (strcmp (str, "gfx906") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX906;
+  else if (strcmp (str, "gfx908") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX908;
+  else if (strcmp (str, "gfx90a") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX90a;
+  else if (strcmp (str, "gfx1030") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1030;
+  else if (strcmp (str, "gfx1036") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1036;
+  else if (strcmp (str, "gfx1100") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1100;
+  else if (strcmp (str, "gfx1103") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1103;
+
+  error ("unrecognized argument in option %<-march=%s%>", str);
+
+  /* The suggestions are based on the configured multilib support; the compiler
+ itself might support more.  */
+  if (multilib_options[0] != '\0')
+{
+  /* Example: "march=gfx900/march=gfx906" */
+  char *args = (char *) alloca (strlen (multilib_options));
+  const char *p = multilib_options, *q = NULL;
+  args[0] = '\0';
+  while (true)
+	{
+	  p = strchr (p, '=');
+	  if (!p)
+	break;
+	  if (q)
+	strcat (args, ", ");
+	  ++p;
+	  q = strchr (p, '/');
+	  if (q)
+	strncat (args, p, q-p);
+	  else
+	strcat (args, p);
+	}
+  inform (UNKNOWN_LOCATION, "valid arguments to %<-march=%> are: %s", args);
+}
+  else if (with_arch_str)
+inform (UNKNOWN_LOCATION, "valid argument to %<-march=%> is %qs", with_arch_str);
+
+  exit (FATAL_EXIT_CODE);
+
+  return 0;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -853,9 +911,21 @@ main (int argc, char **argv)
   FILE *out = stdout;
   FILE *cfile = stdout;
   const char *outname = 0;
+  const char *with_arch_str = NULL;
 
   progname = tool_name;
+  gcc_init_libintl ();
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
+
+  for (size_t i = 0; i < ARRAY_SIZE 

Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-04-03 Thread Iain Sandoe
Hi Richard,

> On 7 Mar 2024, at 13:40, FX Coudert  wrote:
> 
>> I think it's an obvious change ...
> 
> Thanks, pushed.
> 
> Dimitry, I suggest you post the second patch for review.

Given that the two patches here (for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632) were considered obvious - 
and are needed on release branches.

OK for backporting?

(Gerald has volunteered to do the earlier ones, I have already made/tested the 
gcc-13 case)

thanks
Iain



Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Kewen.Lin
Hi Jakub,

on 2024/4/3 16:35, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
>>> I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments 
>>> that just some
>>> invalid code doesn't pass.  By removing them you basically always create an
>>> invalid case, this time in the other direction, valid caller passes more
>>> arguments than the callee (invalidly) expects.
>>
>> Thanks for the comments, do you mean it can affect the arguments validation 
>> when there
>> is explicit function declaration with interface?  Then can we strip them 
>> when we are
>> going to expand them (like checking currently_expanding_function_start)?
> 
> I'd prefer not stripping them at all; they are clearly marked as perhaps not
> passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
> removing them implies the decl is a throw away, that after expansion

Yes, IMHO it's safe as they are unused.

> nothing will actually look at it anymore.  I believe that is the case of
> function bodies, we expand them into RTL and throw away the GIMPLE, and
> after final clear the bodies, but it is not the case of the FUNCTION_DECL
> or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
> callers could be looking at those as well.

At expand time GIMPLE optimizations should already finish, so it should be
safe to strip them at that time?  It would surprise me if expansions of
callers will look at callee's information, it's more like what should be
done in IPA analysis instead?

> 
>> since from the
>> perspective of resulted assembly, with this workaround, the callee can:
>>   1) pass the hidden args in unexpected GPR like r11, ... at -O0;
>>   2) get rid of such hidden args as they are unused at -O2;
>> This proposal aims to make the assembly at -O0 not to pass with r11... (same 
>> as -O2),
>> comparing to the assembly at O2, the mismatch isn't actually changed.
> 
> The aim for the workaround was just avoid assuming there is a argument save
> area in the caller stack when it is sometimes missing.

Yeah, understood.

> If you are looking for optimizations where nothing actually passes the
> unneeded arguments and nothing expects them to be passed, then it is a task
> for IPA optimizations and should be done solely if IPA determines that all
> callers can be adjusted together with the callee; I think IPA already does
> that in that case for years, regardless if it is DECL_HIDDEN_STRING_LENGTH
> PARM_DECL or not.

No, it's not what I was looking for.  Peter's comments made me feel it's not
good to have assembly at O0 like:

std %r3,112(%r31)
std %r4,120(%r31)
std %r5,128(%r31)
std %r6,136(%r31)
std %r7,144(%r31)
std %r8,152(%r31)
std %r9,160(%r31)
std %r10,168(%r31)
std %r11,176(%r31) // this mislead people that we pass 9th arg via r11,
   // it would be nice not to have it.

so I was thinking if there is some way to get rid of it.

BR,
Kewen



Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 01:18:54PM +0800, Kewen.Lin wrote:
> > I'd prefer not to remove DECL_ARGUMENTS chains, they are valid arguments 
> > that just some
> > invalid code doesn't pass.  By removing them you basically always create an
> > invalid case, this time in the other direction, valid caller passes more
> > arguments than the callee (invalidly) expects.
> 
> Thanks for the comments, do you mean it can affect the arguments validation 
> when there
> is explicit function declaration with interface?  Then can we strip them when 
> we are
> going to expand them (like checking currently_expanding_function_start)?

I'd prefer not stripping them at all; they are clearly marked as perhaps not
passed in buggy programs (the DECL_HIDDEN_STRING_LENGTH argument) and
removing them implies the decl is a throw away, that after expansion
nothing will actually look at it anymore.  I believe that is the case of
function bodies, we expand them into RTL and throw away the GIMPLE, and
after final clear the bodies, but it is not the case of the FUNCTION_DECL
or its DECL_ARGUMENTs etc.  E.g. GIMPLE optimizations or expansion of
callers could be looking at those as well.

> since from the
> perspective of resulted assembly, with this workaround, the callee can:
>   1) pass the hidden args in unexpected GPR like r11, ... at -O0;
>   2) get rid of such hidden args as they are unused at -O2;
> This proposal aims to make the assembly at -O0 not to pass with r11... (same 
> as -O2),
> comparing to the assembly at O2, the mismatch isn't actually changed.

The aim for the workaround was just avoid assuming there is a argument save
area in the caller stack when it is sometimes missing.
If you are looking for optimizations where nothing actually passes the
unneeded arguments and nothing expects them to be passed, then it is a task
for IPA optimizations and should be done solely if IPA determines that all
callers can be adjusted together with the callee; I think IPA already does
that in that case for years, regardless if it is DECL_HIDDEN_STRING_LENGTH
PARM_DECL or not.

Jakub



[committed] libquadmath: Don't assume the storage for __float128 arguments is aligned [PR114533]

2024-04-03 Thread Jakub Jelinek
On Tue, Mar 12, 2024 at 08:03:52PM +0100, Simon Chopin wrote:
> On x86, this compiles into movdqa which segfaults on unaligned access.
> 
> This kind of failure has been seen when running against glibc 2.39,
> which incidentally changed the printf implementation to move away from
> alloca() for this data to instead append it at the end of an existing
> "scratch buffer", with arbitrary alignement, whereas alloca() was
> probably more likely to be naturally aligned.
> 
> Tested by adding the patch to the Ubuntu gcc-14 package in
> https://launchpad.net/~schopin/+archive/ubuntu/libquadmath

The formatting was incorrect and we need to also change it in another
place.

Here is what I've committed instead:

With the register_printf_type/register_printf_modifier/register_printf_specifier
APIs the C library is just told the size of the argument and is provided with
a callback to fetch the argument from va_list using va_arg into C library 
provided
memory.  The C library isn't told what alignment requirement it has, but we were
using direct load of a __float128 value from that memory which assumes
__alignof (__float128) alignment.

The following patch fixes that by using memcpy instead.

I haven't been able to reproduce an actual crash, tried
 #include 
 #include 
 #include 

int main ()
{
  __float128 r;
  int prec = 20;
  int width = 46;
  char buf[128];

  r = 2.0q;
  r = sqrtq (r);
  int n = quadmath_snprintf (buf, sizeof buf, "%+-#*.20Qe", width, r);
  if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: +1.41421356237309504880e+00 */
  quadmath_snprintf (buf, sizeof buf, "%Qa", r);
  if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: 0x1.6a09e667f3bcc908b2fb1366ea96p+0 */
  n = quadmath_snprintf (NULL, 0, "%+-#46.*Qe", prec, r);
  if (n > -1)
{
  char *str = malloc (n + 1);
  if (str)
{
  quadmath_snprintf (str, n + 1, "%+-#46.*Qe", prec, r);
  printf ("%s\n", str);
  /* Prints: +1.41421356237309504880e+00 */
}
  free (str);
}
  printf ("%+-#*.20Qe\n", width, r);
  printf ("%Qa\n", r);
  printf ("%+-#46.*Qe\n", prec, r);
  printf ("%d %Qe %d %Qe %d %Qe\n", 1, r, 2, r, 3, r);
  return 0;
}
In any case, I think memcpy for loading from it is right.

2024-04-03  Simon Chopin  
Jakub Jelinek  

PR libquadmath/114533
* printf/printf_fp.c (__quadmath_printf_fp): Use memcpy to copy
__float128 out of args.
* printf/printf_fphex.c (__quadmath_printf_fphex): Likewise.

Signed-off-by: Simon Chopin 

--- libquadmath/printf/printf_fp.c.jj   2020-01-12 11:54:39.787362505 +0100
+++ libquadmath/printf/printf_fp.c  2024-04-02 19:28:31.254670746 +0200
@@ -363,7 +363,7 @@ __quadmath_printf_fp (struct __quadmath_
 
   /* Fetch the argument value. */
 {
-  fpnum = **(const __float128 **) args[0];
+  memcpy (, *(const void *const *) args[0], sizeof (fpnum));
 
   /* Check for special values: not a number or infinity.  */
   if (isnanq (fpnum))
--- libquadmath/printf/printf_fphex.c.jj2020-01-12 11:54:39.787362505 
+0100
+++ libquadmath/printf/printf_fphex.c   2024-04-02 19:29:03.968223151 +0200
@@ -163,7 +163,8 @@ __quadmath_printf_fphex (struct __quadma
 
   /* Fetch the argument value. */
 {
-  fpnum.value = **(const __float128 **) args[0];
+  memcpy (, *(const void *const *) args[0],
+ sizeof (fpnum.value));
 
   /* Check for special values: not a number or infinity.  */
   if (isnanq (fpnum.value))


Jakub



Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jakub Jelinek
On Wed, Apr 03, 2024 at 09:35:07AM +0200, Richard Biener wrote:
> Just in case making the control expression constant to the middle-end
> doesn't scale.

I think we need to evaluate it as constant expression in any case, that is
the only way to determine if it is trivial infinite loop or not.
The constant evaluator already has constexpr_ops_count counter and
doesn't process more than -fconstexpr-ops-limit= operations, then it will
just punt (in this case quietly).
And, when we already evaluate it, just storing the result is very cheap.

Jakub



Re: [PATCH] expr: Fix up emit_push_insn [PR114552]

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> r13-990 added optimizations in multiple spots to optimize during
> expansion storing of constant initializers into targets.
> In the load_register_parameters and expand_expr_real_1 cases,
> it checks it has a tree as the source and so knows we are reading
> that whole decl's value, so the code is fine as is, but in the
> emit_push_insn case it checks for a MEM from which something
> is pushed and checks for SYMBOL_REF as the MEM's address, but
> still assumes the whole object is copied, which as the following
> testcase shows might not always be the case.  In the testcase,
> k is 6 bytes, then 2 bytes of padding, then another 4 bytes,
> while the emit_push_insn wants to store just the 6 bytes.
> 
> The following patch simply verifies it is the whole initializer
> that is being stored, I think that is best thing to do so late
> in GCC 14 cycle as well for backporting.
> 
> For GCC 15, perhaps the code could stop requiring it must be at offset zero,
> nor that the size is a subset, but could use
> get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually
> extract just part of the initializer if we e.g. push just some subset
> (of course, still verify that it is a subset).  For sizes which are power
> of two bytes and we have some integer modes, we could use as type for
> fold_ctor_reference corresponding integral types, otherwise dunno, punt
> or use some structure (e.g. try to find one in the initializer?), whatever.
> But even in the other spots it could perhaps handle loading of
> COMPONENT_REFs or MEM_REFs from the .rodata vars.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-04-02  Jakub Jelinek  
> 
>   PR middle-end/114552
>   * expr.cc (emit_push_insn): Only use store_constructor for
>   immediate_const_ctor_p if int_expr_size matches size.
> 
>   * gcc.c-torture/execute/pr114552.c: New test.
> 
> --- gcc/expr.cc.jj2024-03-15 10:10:51.209237835 +0100
> +++ gcc/expr.cc   2024-04-02 16:01:39.566744302 +0200
> @@ -5466,6 +5466,7 @@ emit_push_insn (rtx x, machine_mode mode
> /* If source is a constant VAR_DECL with a simple constructor,
>   store the constructor to the stack instead of moving it.  */
> const_tree decl;
> +   HOST_WIDE_INT sz;
> if (partial == 0
> && MEM_P (xinner)
> && SYMBOL_REF_P (XEXP (xinner, 0))
> @@ -5473,9 +5474,11 @@ emit_push_insn (rtx x, machine_mode mode
> && VAR_P (decl)
> && TREE_READONLY (decl)
> && !TREE_SIDE_EFFECTS (decl)
> -   && immediate_const_ctor_p (DECL_INITIAL (decl), 2))
> - store_constructor (DECL_INITIAL (decl), target, 0,
> -int_expr_size (DECL_INITIAL (decl)), false);
> +   && immediate_const_ctor_p (DECL_INITIAL (decl), 2)
> +   && (sz = int_expr_size (DECL_INITIAL (decl))) > 0
> +   && CONST_INT_P (size)
> +   && INTVAL (size) == sz)
> + store_constructor (DECL_INITIAL (decl), target, 0, sz, false);
> else
>   emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
>   }
> --- gcc/testsuite/gcc.c-torture/execute/pr114552.c.jj 2024-04-02 
> 16:08:12.959366793 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr114552.c2024-04-02 
> 16:03:49.829963659 +0200
> @@ -0,0 +1,24 @@
> +/* PR middle-end/114552 */
> +
> +struct __attribute__((packed)) S { short b; int c; };
> +struct T { struct S b; int e; };
> +static const struct T k = { { 1, 0 }, 0 };
> +
> +__attribute__((noinline)) void
> +foo (void)
> +{
> +  asm volatile ("" : : : "memory");
> +}
> +
> +__attribute__((noinline)) void
> +bar (struct S n)
> +{
> +  foo ();
> +}
> +
> +int
> +main ()
> +{
> +  bar (k.b);
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] expr: Fix up emit_push_insn [PR114552]

2024-04-03 Thread Jakub Jelinek
Hi!

r13-990 added optimizations in multiple spots to optimize during
expansion storing of constant initializers into targets.
In the load_register_parameters and expand_expr_real_1 cases,
it checks it has a tree as the source and so knows we are reading
that whole decl's value, so the code is fine as is, but in the
emit_push_insn case it checks for a MEM from which something
is pushed and checks for SYMBOL_REF as the MEM's address, but
still assumes the whole object is copied, which as the following
testcase shows might not always be the case.  In the testcase,
k is 6 bytes, then 2 bytes of padding, then another 4 bytes,
while the emit_push_insn wants to store just the 6 bytes.

The following patch simply verifies it is the whole initializer
that is being stored, I think that is best thing to do so late
in GCC 14 cycle as well for backporting.

For GCC 15, perhaps the code could stop requiring it must be at offset zero,
nor that the size is a subset, but could use
get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually
extract just part of the initializer if we e.g. push just some subset
(of course, still verify that it is a subset).  For sizes which are power
of two bytes and we have some integer modes, we could use as type for
fold_ctor_reference corresponding integral types, otherwise dunno, punt
or use some structure (e.g. try to find one in the initializer?), whatever.
But even in the other spots it could perhaps handle loading of
COMPONENT_REFs or MEM_REFs from the .rodata vars.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-04-02  Jakub Jelinek  

PR middle-end/114552
* expr.cc (emit_push_insn): Only use store_constructor for
immediate_const_ctor_p if int_expr_size matches size.

* gcc.c-torture/execute/pr114552.c: New test.

--- gcc/expr.cc.jj  2024-03-15 10:10:51.209237835 +0100
+++ gcc/expr.cc 2024-04-02 16:01:39.566744302 +0200
@@ -5466,6 +5466,7 @@ emit_push_insn (rtx x, machine_mode mode
  /* If source is a constant VAR_DECL with a simple constructor,
  store the constructor to the stack instead of moving it.  */
  const_tree decl;
+ HOST_WIDE_INT sz;
  if (partial == 0
  && MEM_P (xinner)
  && SYMBOL_REF_P (XEXP (xinner, 0))
@@ -5473,9 +5474,11 @@ emit_push_insn (rtx x, machine_mode mode
  && VAR_P (decl)
  && TREE_READONLY (decl)
  && !TREE_SIDE_EFFECTS (decl)
- && immediate_const_ctor_p (DECL_INITIAL (decl), 2))
-   store_constructor (DECL_INITIAL (decl), target, 0,
-  int_expr_size (DECL_INITIAL (decl)), false);
+ && immediate_const_ctor_p (DECL_INITIAL (decl), 2)
+ && (sz = int_expr_size (DECL_INITIAL (decl))) > 0
+ && CONST_INT_P (size)
+ && INTVAL (size) == sz)
+   store_constructor (DECL_INITIAL (decl), target, 0, sz, false);
  else
emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
}
--- gcc/testsuite/gcc.c-torture/execute/pr114552.c.jj   2024-04-02 
16:08:12.959366793 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr114552.c  2024-04-02 
16:03:49.829963659 +0200
@@ -0,0 +1,24 @@
+/* PR middle-end/114552 */
+
+struct __attribute__((packed)) S { short b; int c; };
+struct T { struct S b; int e; };
+static const struct T k = { { 1, 0 }, 0 };
+
+__attribute__((noinline)) void
+foo (void)
+{
+  asm volatile ("" : : : "memory");
+}
+
+__attribute__((noinline)) void
+bar (struct S n)
+{
+  foo ();
+}
+
+int
+main ()
+{
+  bar (k.b);
+  return 0;
+}

Jakub



Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Richard Biener
On Wed, Apr 3, 2024 at 9:25 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch attempts to implement P2809R3, which has been voted
> in as a DR.
>
> The middle-end has its behavior documented:
> '-ffinite-loops'
>  Assume that a loop with an exit will eventually take the exit and
>  not loop indefinitely.  This allows the compiler to remove loops
>  that otherwise have no side-effects, not considering eventual
>  endless looping as such.
>
>  This option is enabled by default at '-O2' for C++ with -std=c++11
>  or higher.
>
> So, the following patch attempts to detect trivial infinite loops and
> turn their conditions into INTEGER_CSTs so that they don't really have
> exits in the middle-end and so regardless of -ffinite-loops or
> -fno-finite-loops they are handled as infinite loops by the middle-end.
> Otherwise, if the condition would be a large expression calling various
> constexpr functions, I'd be afraid we could e.g. just inline some of them
> and not all of them and the middle-end could still see tests in the
> condition and with -ffinite-loops optimize it by assuming that such loops
> need to be finite.
>
> The "A trivial infinite loop is a trivially empty iteration statement for
> which the converted controlling expression is a constant expression, when
> interpreted as a constant-expression ([expr.const]), and evaluates to true."
> wording isn't clear to me what it implies for manifest constant evaluation
> of the expression, especially given the
> int x = 42;
> while (std::is_constant_evaluated() || --x) ;
> example in the rationale.
>
> The patch assumes that the condition expression aren't manifestly constant
> evaluated.  If it would be supposed to be manifestly constant evaluated,
> then I think the DR would significantly change behavior of existing programs
> and have really weird effects.  Before the DR has been voted in, I think
> void foo (int x)
> {
>   if (x == 0)
> while (std::is_constant_evaluated())
>   ;
>   else
> while (!std::is_constant_evaluated())
>   ;
> }
> would have well defined behavior of zero loop body iterations if x == 0 and
> undefined behavior otherwise.  If the condition expression is manifestly
> constant evaluated if it evaluates to true, and otherwise can be
> non-constant or not manifestly constant evaluated otherwise, then the
> behavior would be that for x == 0 it is well defined trvial infinite loop,
> while for x != 0 it would keep to be undefined behavior (infinite loop,
> as !std::is_constant_evaluated() is false when manifestly constant evaluated
> and if we keep the condition as is, evaluates then to true.  I think it
> would be fairly strange if both loops are infinite even when their condition
> are negated.  Similar for anything that is dependent on if consteval or
> std::is_constant_evaluated() inside of it.
>
> So, the patch below attempts to discover trivially empty iteration
> statements at cp_fold time if it is the final mce_false folding,
> attempts to maybe_constant_value with mce_false evaluate the conditions
> and replaces it with the returned value if constant non-zero.
>
> The testcases then try to check if the FE changed the calls in the
> conditions into constants.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I'll note that -ffinite-loops is reflected into the IL (the loop structure) at
CFG build time in the following simple way:

  /* Push the global flag_finite_loops state down to individual loops.  */
  loop->finite_p = flag_finite_loops;

and the "has an exit" is checked when we evalutate a loop->finite_p loop.
The above flag evaluation could be made a langhook as it's suitably
early, but I'm not sure whether that would help.   The flag is evaluated
when we evaluate the loop ANNOTATE_EXPRs, so annotating finite
loops with a new annotate might be possible as well.

Just in case making the control expression constant to the middle-end
doesn't scale.

Richard.

> Or is it really supposed to be mce_true with the above described weird
> behavior?  If so, I think the standard at least should mention it in Annex C
> (though, where when it is a DR?).
>
> 2024-04-02  Jakub Jelinek  
>
> PR c++/114462
> * cp-gimplify.cc (cp_fold): Implement C++26 P2809R3 - Trivial infinite
> loops are not Undefined Behavior.  For trivially empty WHILE_STMT,
> DO_STMT or FOR_STMT iteration statements check if the condition is
> constant expression which evaluates to true and in that case replace
> the condition with true.
>
> * g++.dg/cpp26/trivial-infinite-loop1.C: New test.
> * g++.dg/cpp26/trivial-infinite-loop2.C: New test.
> * g++.dg/cpp26/trivial-infinite-loop3.C: New test.
>
> --- gcc/cp/cp-gimplify.cc.jj2024-03-23 11:17:06.958445857 +0100
> +++ gcc/cp/cp-gimplify.cc   2024-04-02 11:27:56.069170914 +0200
> @@ -3527,6 +3527,78 @@ cp_fold (tree x, fold_flags_t flags)
>x = 

[PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Jakub Jelinek
Hi!

The following patch attempts to implement P2809R3, which has been voted
in as a DR.

The middle-end has its behavior documented:
'-ffinite-loops'
 Assume that a loop with an exit will eventually take the exit and
 not loop indefinitely.  This allows the compiler to remove loops
 that otherwise have no side-effects, not considering eventual
 endless looping as such.

 This option is enabled by default at '-O2' for C++ with -std=c++11
 or higher.

So, the following patch attempts to detect trivial infinite loops and
turn their conditions into INTEGER_CSTs so that they don't really have
exits in the middle-end and so regardless of -ffinite-loops or
-fno-finite-loops they are handled as infinite loops by the middle-end.
Otherwise, if the condition would be a large expression calling various
constexpr functions, I'd be afraid we could e.g. just inline some of them
and not all of them and the middle-end could still see tests in the
condition and with -ffinite-loops optimize it by assuming that such loops
need to be finite.

The "A trivial infinite loop is a trivially empty iteration statement for
which the converted controlling expression is a constant expression, when
interpreted as a constant-expression ([expr.const]), and evaluates to true."
wording isn't clear to me what it implies for manifest constant evaluation
of the expression, especially given the
int x = 42;
while (std::is_constant_evaluated() || --x) ;
example in the rationale.

The patch assumes that the condition expression aren't manifestly constant
evaluated.  If it would be supposed to be manifestly constant evaluated,
then I think the DR would significantly change behavior of existing programs
and have really weird effects.  Before the DR has been voted in, I think
void foo (int x)
{
  if (x == 0)
while (std::is_constant_evaluated())
  ;
  else
while (!std::is_constant_evaluated())
  ;
}
would have well defined behavior of zero loop body iterations if x == 0 and
undefined behavior otherwise.  If the condition expression is manifestly
constant evaluated if it evaluates to true, and otherwise can be
non-constant or not manifestly constant evaluated otherwise, then the
behavior would be that for x == 0 it is well defined trvial infinite loop,
while for x != 0 it would keep to be undefined behavior (infinite loop,
as !std::is_constant_evaluated() is false when manifestly constant evaluated
and if we keep the condition as is, evaluates then to true.  I think it
would be fairly strange if both loops are infinite even when their condition
are negated.  Similar for anything that is dependent on if consteval or
std::is_constant_evaluated() inside of it.

So, the patch below attempts to discover trivially empty iteration
statements at cp_fold time if it is the final mce_false folding,
attempts to maybe_constant_value with mce_false evaluate the conditions
and replaces it with the returned value if constant non-zero.

The testcases then try to check if the FE changed the calls in the
conditions into constants.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or is it really supposed to be mce_true with the above described weird
behavior?  If so, I think the standard at least should mention it in Annex C
(though, where when it is a DR?).

2024-04-02  Jakub Jelinek  

PR c++/114462
* cp-gimplify.cc (cp_fold): Implement C++26 P2809R3 - Trivial infinite
loops are not Undefined Behavior.  For trivially empty WHILE_STMT,
DO_STMT or FOR_STMT iteration statements check if the condition is
constant expression which evaluates to true and in that case replace
the condition with true.

* g++.dg/cpp26/trivial-infinite-loop1.C: New test.
* g++.dg/cpp26/trivial-infinite-loop2.C: New test.
* g++.dg/cpp26/trivial-infinite-loop3.C: New test.

--- gcc/cp/cp-gimplify.cc.jj2024-03-23 11:17:06.958445857 +0100
+++ gcc/cp/cp-gimplify.cc   2024-04-02 11:27:56.069170914 +0200
@@ -3527,6 +3527,78 @@ cp_fold (tree x, fold_flags_t flags)
   x = evaluate_requires_expr (x);
   break;
 
+case WHILE_STMT:
+  /* For trivially empty iteration statements check if the condition
+is constant expression which evaluates to true and in that case
+change the condition to the constant, so that the middle-end treats
+it as an infinite loop regardless of -ffinite-loops.  */
+  if ((flags & ff_mce_false) && !integer_nonzerop (WHILE_COND (x)))
+   {
+ tree body = WHILE_BODY (x);
+ if (expr_first (body) == NULL_TREE)
+   {
+ r = maybe_constant_value (WHILE_COND (x), /*decl=*/NULL_TREE,
+   mce_false);
+ if (integer_nonzerop (r))
+   {
+ x = copy_node (x);
+ WHILE_COND (x) = r;
+ break;
+   }
+   }
+   }
+  return org_x;
+
+case DO_STMT:

Re: Re:[PATCH v2 1/1] [RISC-V] Add support for _Bfloat16

2024-04-03 Thread Xiao Zeng
2024-04-03 11:19  Jin Ma  wrote:
>
 
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/bf16_arithmetic.c: New test.
>> * gcc.target/riscv/bf16_call.c: New test.
>> * gcc.target/riscv/bf16_comparison.c: New test.
>> * gcc.target/riscv/bf16_float_libcall_convert.c: New test.
>> * gcc.target/riscv/bf16_integer_libcall_convert.c: New test.
>
>  Hi, I have test this patch and it is very good. I think we need to add some
>runable tests to ensure that the results are right for various types of
>conversions, operations, and libfuncs. 
Yes, we must ensure that running tests is also feasible.

A great testcase has already been provided in the gcc test suite: 
gcc/testsuite/g++.dg/cpp23/ext-floating14.C

So, I didn't add any test cases to run the tests.
>
>BR,
>Jin
 
Thanks
Xiao Zeng